Month: October 2012


  • Best Practices for Users to Organize Files in a Google Apps World

    We have completed the move to Google Apps at Casper College and are thus entering a new era of Digital Asset Creation. This is actually my second institution(the first was Western Oklahoma State College in 2007) to convert to Google Apps for education so I have a pretty good perspective how users can rapidly become overwhelmed when Google Doc files begin to accumulate at a rapid rate. 
    I have written some on this issue before and thought the time is right to a re-post my set of “Best Practice” recommendations for our move to Google Apps.  Most Google Apps Training  Resources show the mechanics very well,  but few provide resources for Best Practices Implementations.  First I will repeat my disclaimer from the original post: “ I am organizationally dysfunctional by nature.  I have forced myself to make some adaptations to that natural state so that I could remain gainfully employed.  Nobody wants an organizationally dysfunctional IT director…and who wants to be unwanted.  The Best Practice Topics to be discussed are:

    1. Sensitive Data
    2. Digital Asset Management and File Naming Conventions
    3. Sharing Folders vs. Sharing Documents
    4. eMail Subject Line
    5. Using Google Docs for Collaborative Projects
    6. Best Practices for Administrative Assistants
    7. Is All this Worth Doing?

    Be Careful with Sensitive Data

    You should never store social security or credit card numbers in a Google Doc!  However, you may have somewhat confidential data in your document.  If that is the case, it is good to double-check who that document is shared with.
    How? If you want to check who is able to access your document, open the document, click the Share menu on the top right, and select “Sharing Settings”. Whether your document is shared individually or via a folder, this will list the people who have been given access.

    Digital Asset Management and File Naming Conventions

    The importance of this discussion came together for me a couple years ago while still employed in Oklahoma.  On one particularly busy/hectic morning I received at least a half dozen emails with attachments all named “Kent”   There were a few Word documents and couple PDF’s and maybe even a spreadsheet or two. I was in a hurry and I was frustrated that I had to open each and every doc to  find out what it contained.
    squirrelblog with cluttered desktop-2017-04-25
    Is this your file management solution?
    Normally, it isn’t that big of a deal to just open the doc to see the contents, but on this particular day we were “getting hammered” as things weren’t going as well as I would have liked. I was in a hurry and didn’t need the hassle of opening each document to see what it contained.  It was at that point I began researching a document naming scheme which would provide a means for communicating key document information to the user at a glance. It was also helpful that we were beginning to research Document Imaging systems and really I got my original ideas for this from that world.  If this were a recipe I would also say add in a dash of Digital Asset Management whose roots really come from the .  This makes a lot of sense as on day-to-day basis department or project staff are constantly sharing documents via online cloud based storage, network storage, email and portable media storage devices and as a result it can be easy to lose track of what a document contains and which version is which.
    It probably sounds like I have spent a little too much time reading Dilbert and just for the record Dilbert’s advice on implementing a file naming convention  is

    A Brief Intro to DAM
    Digital Asset Management (DAM) is the management, organization, and distribution of digital assets from a central repository. almost always applies to images and media, however  Digital assets include all kinds of files: product images, stock photos, audio, video, presentations—you name it. If it’s on a drive and can be useful, it’s a digital asset.  However, in a practical terms DAM almost always refers to the organization and retrieval of images and media and almost all of the material discussing implementation of DAM refers to images and media.
    Some geeks will tell you a file naming convention is so 2004.  They will say you need to think about Digital Asset Management (DAM) Even with metadata, filenames can also be critical in differentiating things like colorspace or resolution. While the DAM can easily differentiate between these objects via metadata, humans have a little bit more difficulty.  Humans name things. That’s how we’re built. While DAMs do reduce the necessity for encoding metadata in filename/path (thankfully!) there is it is still useful to have some differentiation between similar objects.  Also some Mac users have a terrible habit of putting bullets, percent signs, and other punctuation in their filenames (Smith).
    File Naming Best Practice Rule #1  You should be able to figure out what the file is about with a simple glance
    Consider file names such as:

    • DSCN0619.jpg
    • C-1956.jpg
    • IMG0006.png
    • 819.eps

    These file names convey very little information about the images within and thus make them difficult to categorize. The impression made with the above example can be compared with this example below from Onison, a company which works in Digital Asset Management:
    070329_YVR_boardmeeting_onison_BW.jpg
    Their specific focus is on images, but contains two features which which should be included in any file naming convention/rule  for any type of system and any type of environment
    File Naming Best Practice  Rule #2  Always Include the date
    File Naming Best Rule #3  Always Include a description
    File Naming Rule # 4  The only permitted characters in your file naming scheme are a-z, 0-9, underscore, dash, and period
    Putting the key document information in the title has several benefits, (1) it will assist your project team members to quickly identify the project, department/function, document title and version/revision number without having to open the document and scan for updates and (2) this information will assist in the development, management, security, storage/retrieval and the eventual deliberate destruction of the document.
    Implementing a document naming convention in a project/department/organization goes a little further than just sending out an interoffice memo or ‘All Staff’ email. Project staff need to be trained (ideally as part of their induction into the working group) and a focal point (usually the project administrator) needs to be appointed to advise on how to implement the project filing when questions arise with a resource document for reference.
    File Naming Rule #5 Spaces in filenames are bad
    File Naming Rule #6 Use Initials for the department and the file creator.   Include the initials of the Department
    File Naming Rule #8 Also keep in mind that local acronyms and abbreviations may not make sense to all users that access the system.
    File Naming Rule #8 Include all pertinent info, but not too much. If users need to refer to a manual just to name an asset, there’s a good chance the convention will not be adopted. (Yes this is basically a repeat of #1
    A Practical Example
    For example If I created a Word file about about creating a file naming policy on February 15, 2012 it would look something like this.
    20120215_planning_DoIT_filenaming_policy_creation_itkdb.doc
    A) yyyymmdd-B) Document-Imaging-C) DepartmentD) filenaming-policy-creation-E) itkdb-F) v01-G) 00.H) doc
    A) Reverse Creation Date-B) ProjectTopicalArea-C) department-D) document-name-E) department abbreviation + creator intitals-
    Possible Add ons for further depth if your going to have multiple versions of a doc.  You may want to rely on the versioning capabilities of tools such as Google Docs for this
    F) version number-G) revision number-H) file name extension  as shown below:
    20120215_planning_DoIT_filenaming_policy_creation_itkdb-v01-00.doc
    When document version number is final I usually add the word FINAL if it is the final version of a document
    One other thought on this issue:
    I read somewhere recently that in a file naming convention where you want to consider Search Engine Optimization (SEO) you might wish to substitute a period for an underscore.  I need to do some more reading on this, but the basic concept is:
    20120215.planning.DoIT.filenaming.policy.creation.itkdb.doc
    Sometimes I have a second date reference if the document references another date or document with a specific or important date as shown in the example below:
    20120215.planning.DoIT.20120115filenaming.policy.creation.itkdb.doc

    Notes on Some of the Components
    Reverse Creation Date
    Computer filing systems such as Window XP sort numerically and alphabetically, as such, using the reverse creation format “yymmdd” will ensure the file automatically list in order of creation. Some people may not like to use the “yyyy” format, as in “2006″ but I think it easier to see the year in four characters although some may say, “why add more characters to your file name than you have to?”
    Project Topical Area Name
    Obviously there are millions of combinations and permutations for project name abbreviations and I have read  a six letter code has proven to be quiet effective. The first 3 letters in this scheme are for the client organization and the second 3 are for the project abbreviation.  However, I have decided to simply come up with a list of topical areas and I do usually spell it out as again I want something I can reference at a glance without having to convert in my head what it means.  However, if saving characters to a person then creating appropriate abbreviations such as shown below may be important.
    Example: Project Topical Area or Category
    BU- Budget
    PL – Planning
    PM – ProjectManagement
    TRG – Training
    SCRC -Screencapture  (Note: This may not make sense for some, but I use it all the time)
    Example: Sample Department Acronyms
    Casper College Datatel Department Abbreviations
    Other Possibilities
    IT – Information Technology
    HR – Human Resources
    SEC – Security / Risk Management
    LEG – Legal
    VEH – Vehicle Fleet Mgt
    LOG – Logistics
    DOIT – Department of Information Technology
    PRO – Procurement
    FIN – Finance
    FAC – Facilities Management
    INV – Inventory / Material Management
    INF – Information Management
    Other stuff
    If you want to have some other options for identifying documents you may look at something like the following suggested method for version and revision numbers.  I don’t use these, but often in many systems this or a similar scheme are often used.
    0.01 – 0.89 = DRAFT
    0.90 – 0.99 = REVIEW
    1.00 = FINAL (client version)
    1.01 – 1.89 = DRAFT for second version)
    1.90 – 1.99 = REVIEW for second version)
    2.00 = FINAL (re-released client version)
    There are obviously many ways of doing this however I’ve found this document naming convention to be quite useful in keeping track of what I am working on.  When you get hundreds or thousands of documents you must sort through to find a specific single doc you have created you will appreciate having some sort of organizational system
    Shared Collections vs  Shared Documents
    Share Collections, not Documents.  If it is likely that you will share documents in the future with the same group of people, it is best to create a collection and share it. All documents you put in that collection will be automatically shared with the same group of people and assigned permissions. Sharing individual documents is more time consuming and can lead to errors and inconsistencies. When sharing a collection, it is easier to keep track of who has access to the documents and give a new person the ability to access many files at once. Also, using a collection allows everyone in your group to add to that collection, creating an easy-to-find archive of group materials.
    Best Practices for email Subject Line
    The best email subject lines are short, descriptive and provide the reader with a reason to explore your message further. Trying to stand out in the inbox, by using ALL CAPS, splashy or cheesy phrases, will invariably result in your email being ignored.

    1.  Action-oriented

    Email subject lines must convey action and should start with an action verb . Having an action-oriented subject line makes the email standout from the other emails in their inbox.
    Another tip is to use one keyword in the subject line that people will recognize. The keyword should be a non-branded keyword like “blogging best practices.”  Branded keywords are good to use in the emails “from” name and can be used in the subject line as well.

    2.  Compelling

    You only have a couple seconds to grab the attention of people skimming through their inbox. Make sure you include the email’s offer in the subject line, so people know the value the email will provide them. Creating a sense of urgency is a good tactic to use in conjunction with the compelling offer.
    One way to do this is to use brackets in the subject line. For example, you might be promoting an upcoming webinar and you want to make sure recipients realize this right away. Your subject line could be, “Learn to Become an Efficient Blogger [Webinar in Two Days]

    3.  Spammy

    You can’t afford to have your subject line get caught in spam or firewall filters. Therefore, you should be very careful when choosing what words you put in the subject line. Words like free, act now, offer, or credit will almost always get flagged by spam filters.

    4.  Consistent

    The two things people will see before they even open your email is the subject line and the first sentence or two in the email. This is because most people use the preview function in their email client to determine if they want to spend the time reading the email.
    You want to keep the first couple sentences very consistent with the email’s subject line. They should reinforce and add to the compelling offer and should be action-oriented. I recommend including a link in the first or second sentence that sends them to the page you want them to take action on.

    5.  Short

    Email subject lines cannot be very long. I recommend you keep them under 45 characters or you run the risk of people not seeing the entire subject line. You also want to put the most important and compelling information in the beginning of every subject line.

    5. Search and Retrieval of eMail

    Retrieving email is a pain at best.  Using subject lines which convey information may help you retrieve email more efficiently.

    Has this ever been you?
    Best Practices for Using Google Docs for Collaborative Projects
    The following best practices should be taken into consideration when using Google Docs for collaborative purposes.
    1. Require each participant /student to use an institutional Gmail account to reinforce this account is used for official campus communication.  If you do allow students to use private email accounts make sure it includes some rendition of his or her first and last name (e.g. ronald1906wagner). This will make it easier for you to decipher which student made the edits to the documents.
    2. Be sure to require the student to add your email address to the list of collaborators. This will allow you to monitor collaborative activity.
    3. Require the students to use a pre-determined file name convention. For example: Creation/Duedate_CoursePrefixCourseNumber_projecttitle_instructorname_studentname
    Example:  20120716_cs1153_finalproject_kentbrooks_studentname
    This will make it much easier for you to sort your documents. I typically create a folder for each class and move the documents to the correct folder as they are shared with me
    Administrative Assistant Best Practices
    Administrative assistants Gmail

    Administrative assistants Calendar

    http://www.youtube.com/watch?v=uLDpkE0AO-0
    Googles Forum for Adminstrative Assistants Best Practices
    http://assistants.googleapps.com/
    Is all this worth doing?
    I know these methods saves me time on a personal basis, but some people want a more detailed summary of the benefits.  Ed Smith in a Sept 2011 post on Edward Smith Digital Asset Management blog shows a great way to determine ROI for implementing DAM  but minus the purchase of a Digital Asset Management is there value in finding time saving measures in all of your electronic messaging.
    For this example, I’m going to consider how much time and therefore money is saved by DAM. You can also do ROI calculations based on:

    • Spending less on stock photo purchases
    • Decreased licensing fees or fines
    • Selling or licensing asset collections
    • Avoiding print overruns
    • Spending less on desktop software and hardware upgrades

    Let’s say we have 5 people that each make around $50,000 each year and waste 1 hour each week searching for images. We’ll consider an investment of $3,000 into DAM ($2,000 for software and $1,000 for hardware).
    First, we figure out how many hours are wasted each year:
    5 people x 1 hour wasted searching each week x 52 weeks in the year = 260 hours wasted each year
    Next we determine how much money that time is worth:
    $50,000 average salary / 2080 work hours in the year = about $24 dollars an hour
    260 hours wasted each year x $24 dollars an hour = $6,240
    Now we know that it “costs” $6,240 annually to find images. Let’s figure in that DAM cuts the time it takes to find images by 75%:
    $6,240 x 75% = $4,680
    In this case, we can save $4,680 a year with DAM. Now, let’s see how that compares to what we spent on DAM in the first year:
    $4,680 savings each year – $3,000 invested in DAM = $1,680 net savings in the first year.
    In this case DAM saves $1,680 in the first year, and potentially even more during the following years when little to no additional money is spent on the DAM.
    We’re almost done! We just need to turn these numbers into a percentage, which is the ROI. The ROI is calculated as the difference between the savings and cost of the investment, divided by the cost of the investment. If that last sentence hurts your brain when you read it, here’s what the calculation looks like:
    (Savings from DAM – Investment in DAM) / Investment in DAM = ROI
    Let’s plug in the numbers:
    (4,680 – $3,000 )  / $3,000 = 56%
    In this scenario, our DAM system provides a ROI of 56% in the first year. If this DAM was a savings account, I’d put all my money in it (especially in this economy!)
    Of course there are other intangible benefits from DAM like brand consistency, improved customer service, and improved morale. Combining a solid ROI with the intangible benefits can help you make a good case for DAM in your organization.
    Background Reading and References
    http://www.damlearningcenter.com/street-smarts/are-you-a-hoarder-of-digital-assets-materials-from-the-rich-media-hoarders-seminar/
    http://www.christophermerrill.com/blog/?p=3954
    http://www.guidingtech.com/11384/transfer-files-between-google-drive-dropbox-skydrive-online/
    Google Shared Storage:
    http://www.guidingtech.com/11384/transfer-files-between-google-drive-dropbox-skydrive-online/
    http://otixo.com/index.html
    Subject Line:
    http://learning.hubspot.com/blog/bid/109618/5-Email-Subject-Line-Be…
    Collaborative Docs
    Wagner R. Educational technology: Using Google Doc as a collaboration tool. Athl Train Educ J. 2010;5(2):94-96
    – See more at: https://web.archive.org/web/20130228035653/http://kentbrooks.ning.com/profiles/blogs/best-practices-for-users-to-organize-files-in-a-google-apps-world#sthash.9P5AmO77.dpuf


  • Are You Ready for the Digital Dark Age?

    Data Storage and the Digital Dark Age
    “Back when information was hard to copy people valued the copies and took care of them. Now, copies are so common as to be considered worthless, and very little attention is given to preserving them over the long term. “(Brandt, 2003)
    —-Danny Hillis
    It is a valid concern as to whether there is enough storage space, but focusing only on space, and not on retrievability, let alone what problems are in fathoming the relationship of the various data, can overshadow what the implications, both good and bad, of having so much data with which to deal.
    More and more people and groups such as the Internet Archive are discussing the potential vacuum of  historical photos and digital materials from this era  because of rapidly changing technologies and lack of process for saving these treasures, but on the individual level we would argue the gap is going to be even greater. The Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. This is great but is it enough?
    Personally, I have started retrieving and scanning photos which I found in my moms basement and photos such as the one below of my Aunt and Frank “Pistol Pete: Eaton (the real life character behind the Oklahoma State University mascot “Pistol Pete”.   One of the treats in getting to sort through boxes and boxes of photos, letters and cards is finding treats such as this one.

    Experts in the Dark?
    “The very significant downside to this change in process is not something any of us will immediately recognize. we’ll just notice, thirty years from now, that we won’t have anything to casually leaf through and remember the important bits”(Grey, 2010)
    Max Gerber
    Digital preservation is defined as the maintenance of digitally stored information. This is different from digitization which is the process of creating digital information. Digital librarians, archivists, and related experts use the following methods to preserve digital content:
    Data migration: this is the transfer of data to newer systems. Through this process experts ensure continual access to digital content despite ever-changing technologies and formats (Breeding, 2012).
    Data refreshing:  it is a fact that digital data will degrade over time. One way to combat this phenomenon is to transfer or copy the data from one storage medium to another. Through this process the data lifespan is increased (Groenewald and Breytenbach, 2011).
    Data emulation: this process allows older digital artifacts to be accessed from newer computers. Emulation focuses on the application software as a solution and seeks to develop software that can still access older digital artifacts (Besser, 2007).
    Data replication: digital information that exists in only one location could be lost if there is a hardware or software failure. To guard against this threat of data loss experts in digital perseveration will typically back-up digital content in several locations (Thomas, 2000).
    For at least a decade digital librarians and other archival specialists have struggled with digital preservation.  For example, Harvey (2012) identifies four challenges facing any digital preservation expert: changing from intermittent to continuous preservation practices, continuous professional development of experts, development of “best practices” that are applicable to anyone, and funding. The importance of this last point cannot be overstated.
    The sad reality for digital preservation experts is that we are not living in a Star Trek utopia. Educational institutions, organizations and experts typically do not provide services without accounting for cost. Infrastructure, staffing, and continual upgrades of both are not free. Digital preservation practices are not a one-time cost and require continuous funding. Unfortunately even the “experts” are often underfunded, lack training or lack the manpower to consistently preserve digital artifacts. If the expert faces these challenges then what can we realistically expect of the average person?
    Which brings us to a great paradox of the digital universe:  As our ability to store digital bits increases, our ability to store them over time decreases.  Think about this, the Dead Sea Scrolls, thousands of years old, are made of animal skins, papyrus and one of copper  are thousands of years old( 25 Interesting Facts)  There are many instances of clay tablets thousands of years old,  photographs and microfilm a hundred years old. But can we read a 8-track tape from 35 years ago, a floppy disk from 20 years  ago, or a VHS tape from 10 years ago?   The life-span of digital recording media is nowhere near as long as stone or paper – the media degrades and the  playback mechanisms become obsolete. The design life of a low cost hard drive is 10 years, USB drives 10 years, the usable lifespan of magnetic tape has been estimated to be as little as 10 years,viii and the life expectancy of CDs and DVDs may be as little as 20 years, while DVD technology is 100 years Keep in mind that DVDs may be worthless not too far into the future. How many people still own Record Players? ZIP drives?(Grey 2010).
    In short, the life of stored data follows two conflicting curves: one where capacities go up and one where longevity goes down.  For the moment the solution recommended to digital archivists by the National Media Lab is to transcribe digital records to new media every 10-20 years – a tough assignment for all but the well-organized.
    Due to the relentless obsolescence of digital formats and platforms, along with the ten-year life spans of digital storage media such as magnetic tape and CD-ROMs, there has never been a time of such drastic and irretrievable information loss as right now.  I am very excited by some of the things that are happening in the digital world, but I also wonder what it means 10,50 or 100 years from now.   For example one of my favorite companies in the educational realm is Flat World Knowledge.  Their mission per their web page is:  “We are a college textbook publishing company on a mission. By using technology and innovative business models to lower costs, we are increasing access and personalizing learning for college students and faculty worldwide.”  I know as a college administrator we have to do something about rising costs before we price ourselves out of the reach of students and become irrelevant, however a part of me  becomes nostalgic about those textbooks which are in museums, libraries and even my bookshelf.  There is something comforting about holding a book, even a crummy ol’ textbook, in your hand.  The images below are from a well hidden secret part of my academic life…I have about a dozen graduate hours in taxation.  Yep I agree yuck!


    I kept this book, which I can hold in my hand, because I have never studied one topic ( a graduate class on “Federal Taxation of Corporations” ) so hard and understood so little. Notice the soiled and marked pages. I really did peruse this book more than any other book I have ever had.   I keep it on my shelf and pull it out to remind myself of why I did not go into accounting and it makes me happy when I get a little whiney about not going that route as I assume would have made a whole lot more money  over the course of my career doing that vs. going into education. Do you think I would have access to that reminder if I would have had an eBook for this course?  I don’t know.
    The half-life of data is currently about five years. There is no improvement in sight because the attention span of the high-tech industry can only reach as far as next year’s upgrade, and its products reflect that.
    The loss is already considerable.  For a long time I had sign on my wall that said, “Things are really going to take off when everyone has dual floppies”  Oh my was that a long time ago and really pretty funny statement for those in the tech field.  However,  you may have noticed that any files you carefully recorded on 5l/4″ floppy disks a few years ago are now unreadable. Not only have those disk drives disappeared, but so have the programs, operating systems, and machines that wrote the files (WordStar in CP/M on a Kaypro?). Your files may be intact, but they are as unrecoverable as if they never existed. The same is true of Landsat satellite data from the 1960s and early 1970s on countless reels of now-unreadable magnetic tape. All of the early pioneer computer work at labs such as MIT Artificial Intelligence is similarly lost, no matter how carefully it was recorded at the time  (Wilson 2005).
    The increasing status of social networks as a repository for photos is an ever increasing storage dilemma.   Myspace used to be the biggest social networking space on the web.  The question becomes are there backups or archives of all the photos that people shoot and post immediately to their favorite social networking site. We are a right now people and I am afraid the consequences of losing a piece of our corporate or individual self and or culture is closer to our doorstep than many may think.
    Background Reading and References
    Arthur, Charles. “What’s a Zettabyte? By 2015, the Internet Will Know, Says Cisco.” The Guardian. Guardian News and Media, 18 June 0029. Web. 01 Oct. 2012. a href=”http://www.guardian.co.uk/technology/blog/2011/jun/29/zettabyte-data-internet-cisco%3E”>http://www.guardian.co.uk/technology/blog/2011/jun/29/zettabyte-dat…;.
    Brand, Stewart. “Escaping The Digital Dark Age.” Rense.com. Published in Library Journal Vol. 124. Issue 2, P46-49, 20 June 2003. Web. 22 Jan. 2012. a href=”http://www.rense.com/general38/escap.htm%3E”>http://www.rense.com/general38/escap.htm>;.
    Besser, H. (2007). Collaboration for electronic preservation. Library Trends, 56(1), 216-229.
    Breeding, M. (2012). From disaster recovery to digital preservation. Computers In Libraries, 32(4), 22-25.
    Grey, Tim. “Losing Memories to Digital | Tim Grey’s Blog.” Tim Grey – Digital Imaging Expert. Timgrey.com, 23 Mar. 2010. Web. 22 Jan. 2012. a href=”http://timgrey.com/blog/2010/losing-memories-to-digital/%3E”>http://timgrey.com/blog/2010/losing-memories-to-digital/>;.
    Groenewald, R., & Breytenbach, A. (2011). The use of metadata and preservation methods for continuous access to digital data. Electronic Library, 29(2), 236-248.
    Kozierok, Charles. “The TCP/IP Guide – Binary Information and Representation: Bits, Bytes, Nibbles, Octets and Characters.” Welcome to The TCP/IP Guide! Charles Kozierok. Web. 22 Jan. 2012. a href=”http://www.tcpipguide.com/free/t_BinaryInformationandRepresentationBitsBytesNibbles-3.htm%3E”>http://www.tcpipguide.com/free/t_BinaryInformationandRepresentation…;
    Harvey, R. (2012). Preserving digital materials, 2nd ed. Berlin: De Gruyter Saur.
    Krynsky, Mark. “Wired Article on Lifestreaming Pioneer Gordon Bell.” Lifestream Blog. Wired Magazine, 24 Aug. 2009. Web. 22 Jan. 2012. a href=”http://lifestreamblog.com/wired-article-on-lifestreaming-pioneer-gordon-bell/%3E”>http://lifestreamblog.com/wired-article-on-lifestreaming-pioneer-go…;.
    Lawerence, Katerine. “Rethinking the LAMP Stack — Drupal Disruptive Open Source Part 2 | PINGV Creative Blog.” PINGV Creative | Web Strategy • Design • Drupal Development. PINGV, 2 Dec. 2010. Web. 22 Jan. 2012. a href=”http://pingv.com/blog/rethinking-the-lamp-stack-disruptive-technology%3E”>http://pingv.com/blog/rethinking-the-lamp-stack-disruptive-technolo…;.
    Melvin, Jasmin. “Mobile Device Boom Sparks U.S. Net Address Shortage| Reuters.” Business & Financial News, Breaking US & International News | Reuters.com. Reuters, 28 Sept. 2010. Web. 22 Jan. 2012. a>http://www.reuters.com/article/2010/09/28/us-usa-internet-upgrade-i…
    Sutter, John D. “Microsoft Researcher Building ‘e-memory’ – CNN.com.” CNN.com – Breaking News, U.S., World, Weather, Entertainment & Video News. CNN, 24 Oct. 2009. Web. 22 Jan. 2012. a href=”http://www.cnn.com/2009/TECH/10/24/tech.total.recall.microsoft.bell/index.html?iref=allsearc%3E”>http://www.cnn.com/2009/TECH/10/24/tech.total.recall.microsoft.bell…;.
    Thomas, C. F. (2000). Replication: the forgotten component in digital library interoperability? Technicalities, 20(4), 3-5.
    Wilson, Carson. “Longevity of Film versus Digital Images.” Apples & Oranges: How Digital and Film Cameras Differ. CarsonWilson.com, 13 Sept. 2005. Web. 22 Jan. 2012. a href=”http://carsonwilson.com/apples/index.php?/archives/10-Longevity-of-Film-versus-Digital-Images.html%3E”>http://carsonwilson.com/apples/index.php?/archives/10-Longevity-of-…;.
    “25 Fascinating Facts About the Dead Sea Scrolls @ Century One Bookstore.” Archaeology | Biblical Studies | Dead Sea Scrolls | Religion | Century One Bookstore. Century One Bookstore. Web. 22 Jan. 2012. <http://www.centuryone.com/25dssfacts.html>
    “KB, Mb, GHz, and All of That Stuff.” Coolnerds Home Page. Web. 22 Jan. 2012. a href=”http://www.coolnerds.com/Newbies/kBmBgB/SizeAndSpeed.htm%3E”>http://www.coolnerds.com/Newbies/kBmBgB/SizeAndSpeed.htm>;.
    “The Expanding Digital Universe.” EMC. Web. 22 Jan. 2012. a href=”http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf%3E”>http://www.emc.com/collateral/analyst-reports/expanding-digital-idc…;.


  • Are you Ready for the Zettabyte?

    NOTE:  I have really enjoyed discussing techy stuff with our new Casper College librarian Brad Matthies since he arrived at Casper College and asked him to add his 2 cents to this blog post.   He contributed some really great stuff related to the Digital Dark Age section of this post and greatly reduced the number of  Okieisms such as “uins” yontoos, and ya’lls” this post would have had otherwise.  Thanks Brad.
    A Zettabyte is a billion Terabytes...there I said it. The Zettabyte or zettabytes is a term that that Microsoft Word spellchecker does not recognize.  I suppose I have always loved the challenge of getting good and appropriate technology to the end user at the institutions where I have served and data storage is certainly one of biggest challenges.  Cloud computing, SAN (Storage Area Networks), big data, data transfer, bandwidth and data warehousing are all terms related to this conversation.  At my previous position we worked on an ARRA grant that will eventually bring a 10GB connection to Western Oklahoma State College (that is 10,000MB).  I always stated I was excited to get this much bandwidth because I would now be able to “fax a pizza” to all my friends.  At my current job with Casper College we have moved to a 100MB connection in the fall of 2011 and then to a 200MB connection in the spring of 2012 so for a little while I will have more bandwidth than we had at my previous stop.   How exciting is that?  Very exciting actually, but what does all of this mean?
    Well, one thing it means is that we are learning new numbers and sizes.  In a recent Reuters article, an author tells us the iPv6 standard will allocate a “trillion, trillion, trillion.” addresses. Wisely, the author did not use the word “unodecillion.”(Melvin, 2010)   Unodecillion?   I had never heard of “  Unodecillion”  prior to writing this post.   Are you ready for the Zettabyte?  As of 2011, no storage system has achieved one zettabyte of information (you will learn how much that is shortly). The combined space of all computer hard drives in the world was estimated at approximately 160 exabytes in 2006. (EMC)  Interestingly we are learning about these numbers without actually knowing or understanding what the old ones are.  No matter how much storage space we think we need we can put it into terms we understand.  The naming of the little company we know as Google is a perfect example.  Back in the day when their little company was founded Larry and Sergi named their new search engine for the biggest number they could think of and…it wasn’t big enough.  Similarly, that is where are at with data storage?
    One of the most common work-related discussions I have had in the past couple years is about appropriate and available storage for the ever increasing digital stuff we are always creating.  You are also seeing a proliferation of professional development opportunities in the area of Data Warehousing.  Many people I have spoken with toss around the term “Big Data”   It all revolves around the rapidly expanding data inventory we are facing.  Yes, I can really start to see the time of the Zettabyte.    An email this past week from our director of distance learning Ana Thompson reinforced for me the challenges we have with storage:
    “At this time, I would like to ask all of you to please check your accounts and delete any recordings that you do not need.  You have the option to download any of the WebEx ARF files to your computer…”
    I often have thoughts such as “How does Google provide so much space for my Gmail accounts” and “how do they provide enough space to put up all of those photos everyone posts to Facebook and all those videos to Youtube”.  And think of this…those applications have only taken off in the past few years.   In 2006 prior to the golden age of the previously mentioned media rich applications, the amount of digital information created, captured, and replicated was 1,288 x 1018 bits. In computer parlance, that’s 161 exabytes or 161 billion gigabytes (keep reading for more on these terms). This is about 3 million times the information in all the books ever written (EMC,2007).
    At Casper College we have begun a rapid expansion of storage which by the time our next fiscal year will be about 100 times (see below) what it was only 4 years.  This does not even count the space we are using for distributing data via sources such as YouTube and Vimeo.
    Life Logging
    Beyond the institutional need for storage personal data storage is rapidly changing.  For the past dozen years or so Gordon Bell, of Microsoft has been attempting to store all the information he creates and captures. The project originally stored encoded archival material, such as books he read, music he listened to, or documents he created on his PC. It then evolved to capturing audio recordings of conversations, phone calls, web pages accessed, medical information, and even pictures captured by a camera that automatically takes pictures when its sensors indicate that the user might want a photograph. The original plan was to test the hypothesis that an individual could store a lifetime’s worth of information on a single terabyte drive, which, if compressed and excluding pre-recorded video (movies or TV shows he watched) still seems possible. By 2009 Bell had collected more than 350 gigabytes worth, not including the streaming audio and video — this collection is considered by Bell a replica of his biological memory (Sutter). However, in one experiment where TV programs he watched were recorded, he quickly ran up 2 terabytes of storage. So the one terabyte capacity is considered reasonable for text/audio recording at 20th century resolutions, but not full video. In his experiment, Bell mimicked one of the trends we forecast for the digital universe. In 2000 he was shooting digital camera pictures at 2 MB per image; when he got a new camera in 2005 the images swelled to 5 MB. Along the way his email files got bigger as his attachments got bigger. So let’s see, at one terabyte per person, if everyone on the planet recorded everything Gordon Bell did, that would mean we’d need 620 exabytes of storage – about 30 times what’s available today (EMC 2007, Krynsky 2009, Sutter 2009)
    First I think it may be time for a review of what we already know.  Lets go with some basics first (WOW I feel like I am getting ready to teach my Introduction to Computer Class 20 years ago…CP101 I believe it was).
    The basic numbers

    Abbreviation Stands for Spoken as Approximate # Actual #
    K Kilo kay or killa 1,000 (a thousand) 1,024
    M Mega meg 1,000,000 (a million) 1,048,576
    G Giga gig or giga 1,000,000,000 (a billion) 1,073,741,8

    The pattern is fairly simple.  Each time you move up to a bigger number K to M to G, you stick another ,000 onto the end of the preceding number.
     
    Bits, Bytes, Kilobytes and beyond.
    A “bit” is the smallest unit of information that can be stored in a computer, and consists of either a 1 or 0 (or on/off state). All computer calculations are in bits.  It is pretty easy to picture a byte – it’s the equivalent of a character on a page – or even a megabyte, which contains about the same amount of information as a small novel.
    The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer architectures.  Formally, however, an octet is the correct term for exactly eight bits, while a byte is the smallest number of bits that can be accessed in a computer system, which may or may not equal eight. In practice, modern computers use 8-bit bytes, and the terms are used interchangeably (with byte being more common in North America, and octet often being preferred in Europe
    Please note that all numbers are an approximation, but I have included actual numbers on KB, MB and GB for emphasis. Here is the progression:
     
    Old Familiar Data Terms
    Bit (b) 1 or 0
    Byte (B) 8 bits
    Kilobyte (KB)  approximately 1,000 bytes or A thousand bytes  (Actual 1024)
    Megabyte (MB) approximately 1,000 KB  or A million bytes (Actual 1,048,576 bytes)
    Gigabyte (GB)  approximately 1,000 MB or A billion bytes 1,073,741,824 bytes)
    Terabyte (TB) 1,000, GB
     
    New Data Terms
    Petabyte (PB) 1,000 TB
    Exabyte (EB) 1,000 PB
    Zettabyte (ZB) 1,000 EB
    In 2007, the digital universe was 281 exabytes. That is: 281 billion gigabytes, and in that year, for the first time, the data generated exceeded storage capacity. Next year, one prediction says it will be 1,800 billion gigabytes. That is 1.8 zettabytes — again this is a number so unfamiliar that Microsoft Word spellchecker does not recognize it.  A zettabyte is a billion terabytes (Lawerence 2010).  You can also say a zettabyte is roughly 1000 exabytes. To place that amount of volume in more practical terms, an exabyte alone has the capacity to hold over 36,000 years worth of HD quality video…or stream the entire Netflix catalog more than 3,000 times. A zettabyte is equivalent to about 250 billion DVDs.”(Aurther 2011)
    Aurther (2011) says, “Cisco sees the movement towards the exabyte as an inevitable endpoint of the growth in video traffic online. Its analysis suggests that we’ll have shifted into the zettabyte age by 2015″
    How does this relate to my life?  It depends, but if you participate in common Internet activities such as posting to Facebook, uploading Youtube videos, etc., then you are part of the challenge in providing enough storage.  The point is it takes a defined amount of “space” to store information outside of our brains.  That’s because the information which needs to be stored such as words, numbers, pictures, or something takes up space. In a computer, it is this basic “unit” of measure  as defined above is a byte.   This is basically the amount of space it takes to store one character, like the letter “A” or a punctuation mark such as the semi colon;  So it takes about four bytes to store the word “Kent”. It takes about 2,000 bytes to store one double-spaced page of typed text.
    When you see an uppercase letter “B”, that stands for “byte”. So instead of saying it takes “four bytes” to to store the alphabetic representation of my name  “Kent”   I would say it takes about 4B to store the word “Kent”.     To carry this example further I could say it takes about 2000B to store a typed page of text or with the understanding that 1000B = 1 Kilobyte or 1KB I would probably say this document takes up about 2KB of storage space.    If you had a 5  ¼ DD (Double Density) floppy disk which had a storage capacity of 360KB then you could simply divide 2KB into 360KB  and determine that you could hold approximately 180 typed pages of text ( I vaguely remember giving this example every semester while teaching the introduction to microcomputer computers class many years ago).  The formula is not quite so simple once you start adding images, highlighting and complicated formatting to a document.   Let’s look at an example.  If you already have files stored on your computer, and know how to get around in folders, you can see that every file has a size. You’ll need to use the Details view (choose View > Details from the menu bar above the file icons). The figure below shows an example where you can see the sizes of some pictures in a folder on my computer.

    Now lets take some common 2011 applications for data storage and see how this information can be applied.
    Examples of Gigabyte Sized Storage

    • One hour of SDTV video at 2.2 Mbit/s is approximately 1 GB.
    • Seven minutes of HDTV video at 19.39 Mbit/s is approximately 1 GB.
    • 114 minutes of uncompressed CD-quality audio at 1.4 Mbit/s is approximately 1 GB.
    • A DVD-R can hold about 4.7 GB.
    • A dual-layered Blu-ray disc can hold about 50 GB.
    • A Universal Media Disc can hold about 0.9 GB of data. (1.8 GB on dual-layered discs.)

    —http://en.wikipedia.org/wiki/Gigabyte
    These numbers are beginning to look small as shown in the following:
    Examples of Terabyte sized storage:

    • Library data – The U.S. Library of Congress Web Capture team claims that “As of April 2011, the Library has collected about 235 terabytes of data” and that it adds about 5 terabytes per month.[1]
    • Online databases – Ancestry.com claims approximately 600 TB of genealogical data with the inclusion of US Census data from 1790 to 1930.[2]
    • Computer hardware – Hitachi introduced the world’s first one terabyte hard disk drive in 2007.[3]
    • Historical Internet traffic – In 1993, total Internet traffic amounted to approximately 100 TB for the year.[4] As of June 2008, Cisco Systems estimated Internet traffic at 160 TB/s (which, assuming to be statistically constant, comes to 5 zettabytes for the year).[5] In other words, the amount of Internet used per second in 2008 exceeded all of the Internet used in 1993.
    • Social networks – As of May 2009, Yahoo! Groups had “40 terabytes of data to index”.[6]
    • Video – Released in 2009, the 3D animated film Monsters vs. Aliens used 100 TB of storage during development.[7]
    • Usenet messages – In October 2000, the Deja News Usenet archive had stored over 500 million Usenet messages which used 1.5 TB of storage.[8]
    • Encyclopedia – Wikipedia‘s January 2010 raw data uses a 5.87 terabyte dump.[9]
    • Climate science – In 2010, Germany’s Climate Research Centre (DKRZ) was generating 10,000 TB of data per year, from a supercomputer with a 20 TB memory and 7,000 TB disk space.[10]
    • Audio – One terabyte of audio recorded at CD quality will contain around 2,000 hours of audio. Additionally, one terabyte of compressed audio recorded at 128 kB/s will contain about 17,000 hours of audio.
    • The first 20 years worth of observations by the Hubble Space Telescope has amassed more than 45 terabytes of data. [11]
    • The IBM computer Watson, in which Jeopardy! contestants competed against in February 2011, has 16 terabytes of RAM.[12]

    —http://en.wikipedia.org/wiki/Terabyte
    Examples of the use of the petabyte to describe data sizes in different fields are:

    • The world’s effective capacity to exchange information through two-way telecommunication networks was 281 petabytes of (optimally compressed) information in 1986, 471 petabytes in 1993, 2,200 petabytes in 2000, and 65,000 (optimally compressed) petabytes in 2007 (this is the informational equivalent to every person exchanging 6 newspapers per day). [4]
    • Computer hardware: Teradata Database 12 has a capacity of 50 petabytes of compressed data.[5][6]
    • Internet: Google processes about 24 petabytes of data per day.[7] The BBC’s iPlayer is reported to use 7 petabytes of bandwidth each month.[8]
    • Telecoms: AT&T transfers about 19 petabytes of data through their networks each day.[9]
    • Physics: The experiments in the Large Hadron Collider produce about 15 petabytes of data per year, which will be distributed over the LHC Computing Grid.[10]
    • Neurology: The adult human Brain has been estimated to store a limit of up to 2.5 petabytes of binary data equivalent.[11]
    • Climate science: The German Climate Computing Center (DKRZ) has a storage capacity of 60 petabytes of climate data.[12]
    • Archives: The Internet Archive contains about 5.8 petabytes of data as of December 2010.[13] It was growing at the rate of about 100 terabytes per month in March 2009.[14][15]
    • Games: World of Warcraft uses 1.3 petabytes of storage to maintain its game.[16] Steam, a digital gaming service developed by Valve, delivers over 30 petabytes of content monthly.[17]
    • Film: The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects.[18][19]
    • In August 2011, IBM was reported to have built the largest storage array ever, with a capacity of 120 petabytes.[20]

    —http://en.wikipedia.org/wiki/Petabyte

    Every year the data mass increases 60-percent

     
    Let’s backtrack for a moment, earlier we said that in 2007, the digital universe was 281 exabytes. That is: 281 billion gigabytes, and in that year, for the first time, the data generated exceeded storage capacity. Next year, one prediction says it will be 1,800 billion gigabytes.
    Lawerence (2010)  sites a 2008 IDC study,which indicates the data universe will have increased 10-fold from 2006 to 2011. Taking the 5th root of 10 (fold) gives just under 60% compound growth
    The IDC in 2008 provided some data about the growth of data. In a IDC states that from 2006 to 2011 — five years — that data will increase 10-fold.
    Digital Information Created, Captured, Replicated, Worldwide Exabytes Log
    As I think about the past few years we have been adding a terabyte here and a terabyte there to address the need to store data (Actually the very recent past few years).  We have been offloading data to video services such Youtube and Vimeo.  The array of devices with which we are dealing are absolutely overwhelming but make the data storage needs obvious.   My incomplete list of data guzzling tools include:  digital TV, digital movies, OCR,Scanners, document imaging, digital HD videocameras, digital cameras, VoIP surveillance cameras, smart phones, Internet access in emerging countries, sensor-based applications, traditional PC activities such as email and IM, videoconferencing, gaming, GPS, datacenters supporting “cloud computing,” and social networks.

    Moore’s Law

    In April, 1965, still a relatively unknown physical chemist, Gordon Moore, wrote a three-and-a-half page article in the journal, “Electronics.” on the accelerated increase in computing power provided by integrated circuits which will ultimately lead to machines that can process data faster.  He said integrated circuits will lead to such wonders as home computers or at least terminals connected to a central computer, automatic controls for automobiles, and personal portable communications equipment.  The timeline below sums up what we have seen in this area since 1965.
    —  http://download.intel.com/pressroom/kits/events/moores_law_40th/MLT…
    Machines capable of processing data faster will also generate data more quickly. Moore’s Law suggests computer power that grows geometrically will produce data geometrically and I believe that is exactly what we have seen.
     
    Background Reading and References
    Arthur, Charles. “What’s a Zettabyte? By 2015, the Internet Will Know, Says Cisco.” The Guardian. Guardian News and Media, 18 June 0029. Web. 01 Oct. 2012. a href=”http://www.guardian.co.uk/technology/blog/2011/jun/29/zettabyte-data-internet-cisco%3E”>http://www.guardian.co.uk/technology/blog/2011/jun/29/zettabyte-dat…;.
    Brand, Stewart. “Escaping The Digital Dark Age.” Rense.com. Published in Library Journal Vol. 124. Issue 2, P46-49, 20 June 2003. Web. 22 Jan. 2012. a href=”http://www.rense.com/general38/escap.htm%3E”>http://www.rense.com/general38/escap.htm>;.
    Besser, H. (2007). Collaboration for electronic preservation. Library Trends, 56(1), 216-229.
    Breeding, M. (2012). From disaster recovery to digital preservation. Computers In Libraries, 32(4), 22-25.
    Grey, Tim. “Losing Memories to Digital | Tim Grey’s Blog.” Tim Grey – Digital Imaging Expert. Timgrey.com, 23 Mar. 2010. Web. 22 Jan. 2012. a href=”http://timgrey.com/blog/2010/losing-memories-to-digital/%3E”>http://timgrey.com/blog/2010/losing-memories-to-digital/>;.
    Groenewald, R., & Breytenbach, A. (2011). The use of metadata and preservation methods for continuous access to digital data. Electronic Library, 29(2), 236-248.
    Kozierok, Charles. “The TCP/IP Guide – Binary Information and Representation: Bits, Bytes, Nibbles, Octets and Characters.” Welcome to The TCP/IP Guide! Charles Kozierok. Web. 22 Jan. 2012. a href=”http://www.tcpipguide.com/free/t_BinaryInformationandRepresentationBitsBytesNibbles-3.htm%3E”>http://www.tcpipguide.com/free/t_BinaryInformationandRepresentation…;
    Harvey, R. (2012). Preserving digital materials, 2nd ed. Berlin: De Gruyter Saur.
    Krynsky, Mark. “Wired Article on Lifestreaming Pioneer Gordon Bell.” Lifestream Blog. Wired Magazine, 24 Aug. 2009. Web. 22 Jan. 2012. a href=”http://lifestreamblog.com/wired-article-on-lifestreaming-pioneer-gordon-bell/%3E”>http://lifestreamblog.com/wired-article-on-lifestreaming-pioneer-go…;.
    Lawerence, Katerine. “Rethinking the LAMP Stack — Drupal Disruptive Open Source Part 2 | PINGV Creative Blog.” PINGV Creative | Web Strategy • Design • Drupal Development. PINGV, 2 Dec. 2010. Web. 22 Jan. 2012. a href=”http://pingv.com/blog/rethinking-the-lamp-stack-disruptive-technology%3E”>http://pingv.com/blog/rethinking-the-lamp-stack-disruptive-technolo…;.
    Melvin, Jasmin. “Mobile Device Boom Sparks U.S. Net Address Shortage| Reuters.” Business & Financial News, Breaking US & International News | Reuters.com. Reuters, 28 Sept. 2010. Web. 22 Jan. 2012. a>http://www.reuters.com/article/2010/09/28/us-usa-internet-upgrade-i…
    Sutter, John D. “Microsoft Researcher Building ‘e-memory’ – CNN.com.” CNN.com – Breaking News, U.S., World, Weather, Entertainment & Video News. CNN, 24 Oct. 2009. Web. 22 Jan. 2012. a href=”http://www.cnn.com/2009/TECH/10/24/tech.total.recall.microsoft.bell/index.html?iref=allsearc%3E”>http://www.cnn.com/2009/TECH/10/24/tech.total.recall.microsoft.bell…;.
    Thomas, C. F. (2000). Replication: the forgotten component in digital library interoperability? Technicalities, 20(4), 3-5.
    Wilson, Carson. “Longevity of Film versus Digital Images.” Apples & Oranges: How Digital and Film Cameras Differ. CarsonWilson.com, 13 Sept. 2005. Web. 22 Jan. 2012. a href=”http://carsonwilson.com/apples/index.php?/archives/10-Longevity-of-Film-versus-Digital-Images.html%3E”>http://carsonwilson.com/apples/index.php?/archives/10-Longevity-of-…;.
    “25 Fascinating Facts About the Dead Sea Scrolls @ Century One Bookstore.” Archaeology | Biblical Studies | Dead Sea Scrolls | Religion | Century One Bookstore. Century One Bookstore. Web. 22 Jan. 2012. <http://www.centuryone.com/25dssfacts.html>
    “KB, Mb, GHz, and All of That Stuff.” Coolnerds Home Page. Web. 22 Jan. 2012. a href=”http://www.coolnerds.com/Newbies/kBmBgB/SizeAndSpeed.htm%3E”
    http://www.coolnerds.com/Newbies/kBmBgB/SizeAndSpeed.htm>;.
    “The Expanding Digital Universe.” EMC. Web. 22 Jan. 2012. a href=”http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf%3E”>http://www.emc.com/collateral/analyst-reports/expanding-digital-idc…;.