NOTE: I have really enjoyed discussing techy stuff with our new Casper College librarian Brad Matthies since he arrived at Casper College and asked him to add his 2 cents to this blog post. He contributed some really great stuff related to the Digital Dark Age section of this post and greatly reduced the number of Okieisms such as “uins” yontoos, and ya’lls” this post would have had otherwise. Thanks Brad.
A Zettabyte is a billion Terabytes...there I said it. The Zettabyte or zettabytes is a term that that Microsoft Word spellchecker does not recognize. I suppose I have always loved the challenge of getting good and appropriate technology to the end user at the institutions where I have served and data storage is certainly one of biggest challenges. Cloud computing, SAN (Storage Area Networks), big data, data transfer, bandwidth and data warehousing are all terms related to this conversation. At my previous position we worked on an ARRA grant that will eventually bring a 10GB connection to Western Oklahoma State College (that is 10,000MB). I always stated I was excited to get this much bandwidth because I would now be able to “fax a pizza” to all my friends. At my current job with Casper College we have moved to a 100MB connection in the fall of 2011 and then to a 200MB connection in the spring of 2012 so for a little while I will have more bandwidth than we had at my previous stop. How exciting is that? Very exciting actually, but what does all of this mean?
Well, one thing it means is that we are learning new numbers and sizes. In a recent Reuters article, an author tells us the iPv6 standard will allocate a “trillion, trillion, trillion.” addresses. Wisely, the author did not use the word “unodecillion.”(Melvin, 2010) Unodecillion? I had never heard of “ Unodecillion” prior to writing this post. Are you ready for the Zettabyte? As of 2011, no storage system has achieved one zettabyte of information (you will learn how much that is shortly). The combined space of all computer hard drives in the world was estimated at approximately 160 exabytes in 2006. (EMC) Interestingly we are learning about these numbers without actually knowing or understanding what the old ones are. No matter how much storage space we think we need we can put it into terms we understand. The naming of the little company we know as Google is a perfect example. Back in the day when their little company was founded Larry and Sergi named their new search engine for the biggest number they could think of and…it wasn’t big enough. Similarly, that is where are at with data storage?
One of the most common work-related discussions I have had in the past couple years is about appropriate and available storage for the ever increasing digital stuff we are always creating. You are also seeing a proliferation of professional development opportunities in the area of Data Warehousing. Many people I have spoken with toss around the term “Big Data” It all revolves around the rapidly expanding data inventory we are facing. Yes, I can really start to see the time of the Zettabyte. An email this past week from our director of distance learning Ana Thompson reinforced for me the challenges we have with storage:
“At this time, I would like to ask all of you to please check your accounts and delete any recordings that you do not need. You have the option to download any of the WebEx ARF files to your computer…”
I often have thoughts such as “How does Google provide so much space for my Gmail accounts” and “how do they provide enough space to put up all of those photos everyone posts to Facebook and all those videos to Youtube”. And think of this…those applications have only taken off in the past few years. In 2006 prior to the golden age of the previously mentioned media rich applications, the amount of digital information created, captured, and replicated was 1,288 x 1018 bits. In computer parlance, that’s 161 exabytes or 161 billion gigabytes (keep reading for more on these terms). This is about 3 million times the information in all the books ever written (EMC,2007).
At Casper College we have begun a rapid expansion of storage which by the time our next fiscal year will be about 100 times (see below) what it was only 4 years. This does not even count the space we are using for distributing data via sources such as YouTube and Vimeo.
Beyond the institutional need for storage personal data storage is rapidly changing. For the past dozen years or so Gordon Bell, of Microsoft has been attempting to store all the information he creates and captures. The project originally stored encoded archival material, such as books he read, music he listened to, or documents he created on his PC. It then evolved to capturing audio recordings of conversations, phone calls, web pages accessed, medical information, and even pictures captured by a camera that automatically takes pictures when its sensors indicate that the user might want a photograph. The original plan was to test the hypothesis that an individual could store a lifetime’s worth of information on a single terabyte drive, which, if compressed and excluding pre-recorded video (movies or TV shows he watched) still seems possible. By 2009 Bell had collected more than 350 gigabytes worth, not including the streaming audio and video — this collection is considered by Bell a replica of his biological memory (Sutter). However, in one experiment where TV programs he watched were recorded, he quickly ran up 2 terabytes of storage. So the one terabyte capacity is considered reasonable for text/audio recording at 20th century resolutions, but not full video. In his experiment, Bell mimicked one of the trends we forecast for the digital universe. In 2000 he was shooting digital camera pictures at 2 MB per image; when he got a new camera in 2005 the images swelled to 5 MB. Along the way his email files got bigger as his attachments got bigger. So let’s see, at one terabyte per person, if everyone on the planet recorded everything Gordon Bell did, that would mean we’d need 620 exabytes of storage – about 30 times what’s available today (EMC 2007, Krynsky 2009, Sutter 2009)
First I think it may be time for a review of what we already know. Lets go with some basics first (WOW I feel like I am getting ready to teach my Introduction to Computer Class 20 years ago…CP101 I believe it was).
The basic numbers
|Abbreviation||Stands for||Spoken as||Approximate #||Actual #|
|K||Kilo||kay or killa||1,000 (a thousand)||1,024|
|M||Mega||meg||1,000,000 (a million)||1,048,576|
|G||Giga||gig or giga||1,000,000,000 (a billion)||1,073,741,8|
The pattern is fairly simple. Each time you move up to a bigger number K to M to G, you stick another ,000 onto the end of the preceding number.
Bits, Bytes, Kilobytes and beyond.
A “bit” is the smallest unit of information that can be stored in a computer, and consists of either a 1 or 0 (or on/off state). All computer calculations are in bits. It is pretty easy to picture a byte – it’s the equivalent of a character on a page – or even a megabyte, which contains about the same amount of information as a small novel.
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer architectures. Formally, however, an octet is the correct term for exactly eight bits, while a byte is the smallest number of bits that can be accessed in a computer system, which may or may not equal eight. In practice, modern computers use 8-bit bytes, and the terms are used interchangeably (with byte being more common in North America, and octet often being preferred in Europe
Please note that all numbers are an approximation, but I have included actual numbers on KB, MB and GB for emphasis. Here is the progression:
Old Familiar Data Terms
Bit (b) 1 or 0
Byte (B) 8 bits
Kilobyte (KB) approximately 1,000 bytes or A thousand bytes (Actual 1024)
Megabyte (MB) approximately 1,000 KB or A million bytes (Actual 1,048,576 bytes)
Gigabyte (GB) approximately 1,000 MB or A billion bytes 1,073,741,824 bytes)
Terabyte (TB) 1,000, GB
New Data Terms
Petabyte (PB) 1,000 TB
Exabyte (EB) 1,000 PB
Zettabyte (ZB) 1,000 EB
In 2007, the digital universe was 281 exabytes. That is: 281 billion gigabytes, and in that year, for the first time, the data generated exceeded storage capacity. Next year, one prediction says it will be 1,800 billion gigabytes. That is 1.8 zettabytes — again this is a number so unfamiliar that Microsoft Word spellchecker does not recognize it. A zettabyte is a billion terabytes (Lawerence 2010). You can also say a zettabyte is roughly 1000 exabytes. To place that amount of volume in more practical terms, an exabyte alone has the capacity to hold over 36,000 years worth of HD quality video…or stream the entire Netflix catalog more than 3,000 times. A zettabyte is equivalent to about 250 billion DVDs.”(Aurther 2011)
Aurther (2011) says, “Cisco sees the movement towards the exabyte as an inevitable endpoint of the growth in video traffic online. Its analysis suggests that we’ll have shifted into the zettabyte age by 2015″
How does this relate to my life? It depends, but if you participate in common Internet activities such as posting to Facebook, uploading Youtube videos, etc., then you are part of the challenge in providing enough storage. The point is it takes a defined amount of “space” to store information outside of our brains. That’s because the information which needs to be stored such as words, numbers, pictures, or something takes up space. In a computer, it is this basic “unit” of measure as defined above is a byte. This is basically the amount of space it takes to store one character, like the letter “A” or a punctuation mark such as the semi colon; So it takes about four bytes to store the word “Kent”. It takes about 2,000 bytes to store one double-spaced page of typed text.
When you see an uppercase letter “B”, that stands for “byte”. So instead of saying it takes “four bytes” to to store the alphabetic representation of my name “Kent” I would say it takes about 4B to store the word “Kent”. To carry this example further I could say it takes about 2000B to store a typed page of text or with the understanding that 1000B = 1 Kilobyte or 1KB I would probably say this document takes up about 2KB of storage space. If you had a 5 ¼ DD (Double Density) floppy disk which had a storage capacity of 360KB then you could simply divide 2KB into 360KB and determine that you could hold approximately 180 typed pages of text ( I vaguely remember giving this example every semester while teaching the introduction to microcomputer computers class many years ago). The formula is not quite so simple once you start adding images, highlighting and complicated formatting to a document. Let’s look at an example. If you already have files stored on your computer, and know how to get around in folders, you can see that every file has a size. You’ll need to use the Details view (choose View > Details from the menu bar above the file icons). The figure below shows an example where you can see the sizes of some pictures in a folder on my computer.
Now lets take some common 2011 applications for data storage and see how this information can be applied.
Examples of Gigabyte Sized Storage
- One hour of SDTV video at 2.2 Mbit/s is approximately 1 GB.
- Seven minutes of HDTV video at 19.39 Mbit/s is approximately 1 GB.
- 114 minutes of uncompressed CD-quality audio at 1.4 Mbit/s is approximately 1 GB.
- A DVD-R can hold about 4.7 GB.
- A dual-layered Blu-ray disc can hold about 50 GB.
- A Universal Media Disc can hold about 0.9 GB of data. (1.8 GB on dual-layered discs.)
These numbers are beginning to look small as shown in the following:
Examples of Terabyte sized storage:
- Library data – The U.S. Library of Congress Web Capture team claims that “As of April 2011, the Library has collected about 235 terabytes of data” and that it adds about 5 terabytes per month.
- Online databases – Ancestry.com claims approximately 600 TB of genealogical data with the inclusion of US Census data from 1790 to 1930.
- Computer hardware – Hitachi introduced the world’s first one terabyte hard disk drive in 2007.
- Historical Internet traffic – In 1993, total Internet traffic amounted to approximately 100 TB for the year. As of June 2008, Cisco Systems estimated Internet traffic at 160 TB/s (which, assuming to be statistically constant, comes to 5 zettabytes for the year). In other words, the amount of Internet used per second in 2008 exceeded all of the Internet used in 1993.
- Social networks – As of May 2009, Yahoo! Groups had “40 terabytes of data to index”.
- Video – Released in 2009, the 3D animated film Monsters vs. Aliens used 100 TB of storage during development.
- Usenet messages – In October 2000, the Deja News Usenet archive had stored over 500 million Usenet messages which used 1.5 TB of storage.
- Encyclopedia – Wikipedia‘s January 2010 raw data uses a 5.87 terabyte dump.
- Climate science – In 2010, Germany’s Climate Research Centre (DKRZ) was generating 10,000 TB of data per year, from a supercomputer with a 20 TB memory and 7,000 TB disk space.
- Audio – One terabyte of audio recorded at CD quality will contain around 2,000 hours of audio. Additionally, one terabyte of compressed audio recorded at 128 kB/s will contain about 17,000 hours of audio.
- The first 20 years worth of observations by the Hubble Space Telescope has amassed more than 45 terabytes of data. 
- The IBM computer Watson, in which Jeopardy! contestants competed against in February 2011, has 16 terabytes of RAM.
Examples of the use of the petabyte to describe data sizes in different fields are:
- The world’s effective capacity to exchange information through two-way telecommunication networks was 281 petabytes of (optimally compressed) information in 1986, 471 petabytes in 1993, 2,200 petabytes in 2000, and 65,000 (optimally compressed) petabytes in 2007 (this is the informational equivalent to every person exchanging 6 newspapers per day). 
- Computer hardware: Teradata Database 12 has a capacity of 50 petabytes of compressed data.
- Internet: Google processes about 24 petabytes of data per day. The BBC’s iPlayer is reported to use 7 petabytes of bandwidth each month.
- Telecoms: AT&T transfers about 19 petabytes of data through their networks each day.
- Physics: The experiments in the Large Hadron Collider produce about 15 petabytes of data per year, which will be distributed over the LHC Computing Grid.
- Neurology: The adult human Brain has been estimated to store a limit of up to 2.5 petabytes of binary data equivalent.
- Climate science: The German Climate Computing Center (DKRZ) has a storage capacity of 60 petabytes of climate data.
- Archives: The Internet Archive contains about 5.8 petabytes of data as of December 2010. It was growing at the rate of about 100 terabytes per month in March 2009.
- Games: World of Warcraft uses 1.3 petabytes of storage to maintain its game. Steam, a digital gaming service developed by Valve, delivers over 30 petabytes of content monthly.
- Film: The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects.
- In August 2011, IBM was reported to have built the largest storage array ever, with a capacity of 120 petabytes.
Every year the data mass increases 60-percent
Let’s backtrack for a moment, earlier we said that in 2007, the digital universe was 281 exabytes. That is: 281 billion gigabytes, and in that year, for the first time, the data generated exceeded storage capacity. Next year, one prediction says it will be 1,800 billion gigabytes.
Lawerence (2010) sites a 2008 IDC study,which indicates the data universe will have increased 10-fold from 2006 to 2011. Taking the 5th root of 10 (fold) gives just under 60% compound growth
The IDC in 2008 provided some data about the growth of data. In a IDC states that from 2006 to 2011 — five years — that data will increase 10-fold.
Digital Information Created, Captured, Replicated, Worldwide Exabytes Log
As I think about the past few years we have been adding a terabyte here and a terabyte there to address the need to store data (Actually the very recent past few years). We have been offloading data to video services such Youtube and Vimeo. The array of devices with which we are dealing are absolutely overwhelming but make the data storage needs obvious. My incomplete list of data guzzling tools include: digital TV, digital movies, OCR,Scanners, document imaging, digital HD videocameras, digital cameras, VoIP surveillance cameras, smart phones, Internet access in emerging countries, sensor-based applications, traditional PC activities such as email and IM, videoconferencing, gaming, GPS, datacenters supporting “cloud computing,” and social networks.
In April, 1965, still a relatively unknown physical chemist, Gordon Moore, wrote a three-and-a-half page article in the journal, “Electronics.” on the accelerated increase in computing power provided by integrated circuits which will ultimately lead to machines that can process data faster. He said integrated circuits will lead to such wonders as home computers or at least terminals connected to a central computer, automatic controls for automobiles, and personal portable communications equipment. The timeline below sums up what we have seen in this area since 1965.
Machines capable of processing data faster will also generate data more quickly. Moore’s Law suggests computer power that grows geometrically will produce data geometrically and I believe that is exactly what we have seen.
Background Reading and References
Arthur, Charles. “What’s a Zettabyte? By 2015, the Internet Will Know, Says Cisco.” The Guardian. Guardian News and Media, 18 June 0029. Web. 01 Oct. 2012. a href=”http://www.guardian.co.uk/technology/blog/2011/jun/29/zettabyte-data-internet-cisco%3E”>http://www.guardian.co.uk/technology/blog/2011/jun/29/zettabyte-dat…;.
Brand, Stewart. “Escaping The Digital Dark Age.” Rense.com. Published in Library Journal Vol. 124. Issue 2, P46-49, 20 June 2003. Web. 22 Jan. 2012. a href=”http://www.rense.com/general38/escap.htm%3E”>http://www.rense.com/general38/escap.htm>;.
Besser, H. (2007). Collaboration for electronic preservation. Library Trends, 56(1), 216-229.
Breeding, M. (2012). From disaster recovery to digital preservation. Computers In Libraries, 32(4), 22-25.
Grey, Tim. “Losing Memories to Digital | Tim Grey’s Blog.” Tim Grey – Digital Imaging Expert. Timgrey.com, 23 Mar. 2010. Web. 22 Jan. 2012. a href=”http://timgrey.com/blog/2010/losing-memories-to-digital/%3E”>http://timgrey.com/blog/2010/losing-memories-to-digital/>;.
Groenewald, R., & Breytenbach, A. (2011). The use of metadata and preservation methods for continuous access to digital data. Electronic Library, 29(2), 236-248.
Kozierok, Charles. “The TCP/IP Guide – Binary Information and Representation: Bits, Bytes, Nibbles, Octets and Characters.” Welcome to The TCP/IP Guide! Charles Kozierok. Web. 22 Jan. 2012. a href=”http://www.tcpipguide.com/free/t_BinaryInformationandRepresentationBitsBytesNibbles-3.htm%3E”>http://www.tcpipguide.com/free/t_BinaryInformationandRepresentation…;
Harvey, R. (2012). Preserving digital materials, 2nd ed. Berlin: De Gruyter Saur.
Krynsky, Mark. “Wired Article on Lifestreaming Pioneer Gordon Bell.” Lifestream Blog. Wired Magazine, 24 Aug. 2009. Web. 22 Jan. 2012. a href=”http://lifestreamblog.com/wired-article-on-lifestreaming-pioneer-gordon-bell/%3E”>http://lifestreamblog.com/wired-article-on-lifestreaming-pioneer-go…;.
Lawerence, Katerine. “Rethinking the LAMP Stack — Drupal Disruptive Open Source Part 2 | PINGV Creative Blog.” PINGV Creative | Web Strategy • Design • Drupal Development. PINGV, 2 Dec. 2010. Web. 22 Jan. 2012. a href=”http://pingv.com/blog/rethinking-the-lamp-stack-disruptive-technology%3E”>http://pingv.com/blog/rethinking-the-lamp-stack-disruptive-technolo…;.
Melvin, Jasmin. “Mobile Device Boom Sparks U.S. Net Address Shortage| Reuters.” Business & Financial News, Breaking US & International News | Reuters.com. Reuters, 28 Sept. 2010. Web. 22 Jan. 2012. a>http://www.reuters.com/article/2010/09/28/us-usa-internet-upgrade-i…
Sutter, John D. “Microsoft Researcher Building ‘e-memory’ – CNN.com.” CNN.com – Breaking News, U.S., World, Weather, Entertainment & Video News. CNN, 24 Oct. 2009. Web. 22 Jan. 2012. a href=”http://www.cnn.com/2009/TECH/10/24/tech.total.recall.microsoft.bell/index.html?iref=allsearc%3E”>http://www.cnn.com/2009/TECH/10/24/tech.total.recall.microsoft.bell…;.
Thomas, C. F. (2000). Replication: the forgotten component in digital library interoperability? Technicalities, 20(4), 3-5.
Wilson, Carson. “Longevity of Film versus Digital Images.” Apples & Oranges: How Digital and Film Cameras Differ. CarsonWilson.com, 13 Sept. 2005. Web. 22 Jan. 2012. a href=”http://carsonwilson.com/apples/index.php?/archives/10-Longevity-of-Film-versus-Digital-Images.html%3E”>http://carsonwilson.com/apples/index.php?/archives/10-Longevity-of-…;.
“25 Fascinating Facts About the Dead Sea Scrolls @ Century One Bookstore.” Archaeology | Biblical Studies | Dead Sea Scrolls | Religion | Century One Bookstore. Century One Bookstore. Web. 22 Jan. 2012. <http://www.centuryone.com/25dssfacts.html>
“KB, Mb, GHz, and All of That Stuff.” Coolnerds Home Page. Web. 22 Jan. 2012. a href=”http://www.coolnerds.com/Newbies/kBmBgB/SizeAndSpeed.htm%3E”
“The Expanding Digital Universe.” EMC. Web. 22 Jan. 2012. a href=”http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf%3E”>http://www.emc.com/collateral/analyst-reports/expanding-digital-idc…;.