According to IDC, the total amount of digital information created and replicated in 2013 surpassed 4.4 Zettabytes, and the size of the digital universe is more than doubling every two years. The pool of information is expected to grow to almost 44 Zettabytes (44 trillion gigabytes!) by 2020.
By whatever measure you choose, and no matter how much you believe the numbers, that’s a lot of data to be stored and processed. In fact, this growth rate appears to be faster than Moore’s Law and not slowing down.
Emerging technologies such as artificial intelligence, personal video broadcasting (e.g., Periscope), augmented reality (Pokémon Go, of course), sophisticated analytics, and distributed ledgers (i.e., Blockchain) all mean there will be no shortage of data to be classified, curated and stored. Even streetlights will someday be sources of smart city data. Another example is a device that “phones home” every minute to report its CPU temperature.
Companies offering free consumer applications often depend on the data they have collected to generate profits. And public sector organizations (governments and municipalities) are increasingly making “open data” available for innovators to use in creative ways.
The bottom line is that the functions of collecting, analyzing, organizing and preserving data have become essential for every web-scale system; storage is the “glue” that ties it all together.
Basic storage requirements
Intuitively, an ideal data storage facility would provide:
- Almost unlimited storage capacity at very low cost (preferably free);
- Capacity on demand for scalability and elasticity;
- High bandwidth (so that lots of data can be handled at the same time);
- Low latency (getting data in or out of storage should take minimal time and effort);
- Very high security, reliability and longevity (with automated recovery after failures); and
- Adaptive and agile administration and management services.
Needless to say, the storage life cycle management has always been a primary service of the enterprise IT department. More recently, cloud storage service offerings have been added to the mix of resources for primary storage, backup and disaster recovery.
What was suitable for traditional structured corporate data is no longer sufficient in today’s unstructured multimedia environment – photos, video, audio, image, text, messages, social graphs, sensor data (seismic), genomic, etc. are now part of the mix. The emergence of unstructured data and the emergence of very large data sets is disrupting the storage status quo.
In addition to the above, modern storage platforms should include:
- Security and privacy mechanisms to protect data assets;
- Orchestration, distribution, backup and archival capabilities;
- Support for monitoring, billing and performance management; and
- Multitenancy for the sharing of resources among different customers.
There will be many other specific needs but these highlight how requirements are evolving as we move to a cloud-based world.
Many different storage technologies have been used over the past 50 years, from physically punched cards to high density storage arrays. Storage technologies emerge, get used and then disappear. For example, we have had magnetic cores replaced by RAM; floppy disks by CDs, DVDs and removable magnetic disks; magnetic tape by hard drives, etc. The most recent, but not the end of progress, is the solid state disk.
Recently Seagate announced a 60TB solid-state-drive (SSD), the largest capacity SSD produced to date. 60TB is so massive that it would be able to store 400 million photos or even 12,000 movies, according to The Verge. These levels of capacity may be needed just for Pokémon Go if the frenzy continues!
As prices come down, SSD devices are taking their place in the arsenal of storage devices.
There are various ways to slice and dice storage systems including: block, file and object storage; solid state vs. spinning media; integrated vs. network-based; centralized vs. distributed; permanent vs. temporary, etc. Each class of storage solution has its pros and cons and use cases.
There is no doubt that planning, engineering and managing the “storage environment” for both in-house and cloud-based systems is an important but possibly under-appreciated cross-disciplinary IT role.
Storage as a service
According to Gartner, “object storage is pervasive as the underlying platform for cloud applications that we consume in our personal lives, such as content streaming, photo sharing and file collaboration services. The degree of awareness and the level of adoption of object storage are less in the enterprise, but they continue to grow.”
Gartner goes on to say:
“Object storage is characterized by access through RESTful interfaces via a standard Internet Protocol (IP), such as HTTP, that have granular, object-level security and rich metadata that can be tagged to it. Object storage products are available in a variety of deployment models — virtual appliances, managed hosting, purpose-built hardware appliances or software that can be installed on standard server hardware. These products are capable of huge scale in capacity, and many of the vendors included in this research have production deployments beyond 10PB. They are better-suited to workloads that require high bandwidth than transactional workloads that demand high input/output operations per second (IOPS) and low latency.”
Some examples of storage services offered by cloud service providers are:
- Amazon: Elastic Block Store (EBS; block storage), Amazon Elastic File System (EFS; file storage currently in preview) Amazon Glacier (archival object storage), Amazon Simple Storage Service (S3; multiple tiers of object storage)
- Microsoft: Object Storage (standard and archive blob storage), File Storage, Block Storage (standard and premium)
- Google: Google Cloud Storage (object storage), persistent disk storage (block storage), Google Cloud Storage Nearline (archival object storage)
It seems to me that storage (and information in general) does not get the attention it deserves. If IoT, big data, smart systems and other mass data generation apps are to be successful, we will need to become more expert at collecting, organizing and managing both corporate/personal private data as well as the open data that will be available for use.
This is what I think; do you think storage is like a tidal wave that is approaching faster than we might think?