The Evolution of Modern Data Storage
by Andrew Mullen on October 4, 2017
Digital information is being generated today at unprecedented rates. It wasn’t that long ago that an entire enterprise server with thousands of users required only one or two terabytes of data storage. Now the amounts of data companies must handle on a daily basis is measured in petabytes or even exabytes. IDC reports that 16.1 zettabytes (that’s 16.1 billion terabytes) of data was generated worldwide in 2016 alone.
This explosion of data is transforming the corporate IT landscape. Traditional storage solutions were never designed for today’s data-intensive environment, and are proving to be more and more inadequate to handle ever-increasing capacity and performance demands.
Why the Old Storage Model Doesn't Work Anymore
The traditional approach to storage was designed for a world in which the IT infrastructure of a typical large enterprise was expected to handle thousands of files and perhaps several hundred concurrent users. Systems were expected to be taken offline for planned maintenance at scheduled intervals, and periodic unscheduled downtime was simply a fact of life.
For most companies, that world no longer exists. Many have become global enterprises that work with petabytes of data and many thousands or even millions of users spread around the world. Customers not only expect but demand that these corporate systems be online 24/7, with no downtime, scheduled or otherwise, and that whenever additional storage capacity is needed, it will be automatically provisioned without disrupting operations.
Legacy systems were simply not designed for this type of environment. Historically, data storage has revolved around storage arrays built with dedicated, proprietary hardware that was housed on-premises in corporate data centers. These devices were very costly and quite limited in their ability to scale. Increases in capacity required the acquisition of additional hardware through capital expenditures and had to be planned for well in advance due to corporate budget cycles. That meant administrators had to be prophets who could accurately forecast capacity requirements several years into the future. But today, with data usage growing at exponential rates, those forecasts are rarely on target; actual needs usually far exceed projections.
For the huge amounts of unstructured data organizations are generating and accumulating today, reaching into the petabyte range and beyond, these legacy storage systems are far too costly to purchase, manage, and upgrade.
Cloud Computing Necessitates a New Approach to Storage
The limitations in the traditional storage model became particularly evident with the advent of cloud computing. In today’s corporate environment, the movement of IT infrastructure from on-site data centers to the public cloud is already well underway. Now, rather than buying hardware and software products to install in their data centers, corporations are purchasing IT services from cloud service providers (CSPs).
Charles Foley, Senior Vice President at Talon Storage, notes that corporations no longer want to buy bricks of IT infrastructure, such as servers, storage and networking, and put them all together in their own data centers. They’d rather get the functionality they need through the cloud.
And CSPs are happy to take on that role. As Foley put it in a recent interview, “Public clouds want to be the infrastructure layer for the world to tap into.”
But in order to provide the services their customers now expect, cloud platforms must build systems with unlimited scalability, 100 percent up-time, accessibility from diverse locations around the world, and greatly simplified storage management. Technology planners at cloud-based businesses such as Google, Facebook, and Amazon quickly realized that in a modern multi-petabyte storage environment, traditional SAN (storage area network) solutions were far too difficult and costly to scale and manage.
As they searched for ways to meet the rigorous demands cloud computing imposes on data storage, cloud technologists turned to a new storage model, software-defined storage, or SDS.
The Emergence of Software-Defined Storage
The defining characteristic of SDS is that it lifts the intelligence of the storage system out of hardware and into an overarching layer of software. That approach acknowledges the reality that it is simply much easier to do things in software than in hardware.
That’s the point emphasized by Talon’s Charles Foley. He explains that today’s environment requires technologies that are more flexible and dynamic than ever before. And that necessitates a software-defined approach not only to storage, but to networking (SDN) and to the data center as a whole (SDDC). Says Foley, “In order to get scale and flexibility, you must break apart the underlying physical resource from the logic that drives it.”
Shifting the storage system’s intelligence from hardware to software gives an SDS implementation some important advantages over traditional storage solutions. For example, because each storage unit in the system is directly managed by software, an SDS-based storage system works just as well with storage arrays built on inexpensive x86 servers and commodity disk drives, as with costly dedicated storage appliances. In fact, a wide range of devices and media can be intentionally employed, allowing the software to use tiering and caching algorithms to dynamically allocate specific storage units based on the performance or cost requirements of the workloads that use them.
SDS-based storage is inherently highly scalable. Rather than scaling up by adding additional drives behind a storage controller, SDS systems scale out by incorporating additional nodes. Each node, often consisting of a x86-based server and attached drives, brings with it added compute power as well as increased storage capacity.
Perhaps the greatest advantage of SDS is its substantial reduction in management complexity. Administrators, users, and applications (via APIs) interact with the system through a single, consistent interface that is the same no matter what mix of storage hardware devices may be employed. In fact, the software-defined paradigm provides users with the ability to manage multiple data centers as if they were a single computer.
Because with SDS the entire system is managed at a granular level by the software, the most sophisticated functionality can be implemented and controlled at a central location, and uniformly applied across the system. For example, both low-level functions such as deduplication, replication, and snapshots, as well as high level features such as backup/restore and disaster recovery regimes, can be implemented once and extended to all devices in the system, no matter what their individual capabilities or characteristics might be. Administrators are not required to concern themselves with the idiosyncrasies of the various hardware/firmware configurations that may be included in the system.
How Talon Storage Fits Into the Software-Defined Ecosystem
The fastest growing portion of the information explosion companies are dealing with today is in unstructured data, the kind that’s created by people. In fact, unstructured data, including the files and folders associated with widely used applications such as MS Office, Adobe Creative Suite, Autodesk, etc., make up 80 percent of all enterprise data. And some 60 to 80 percent of that information is generated and used at the edge – that is, in the remote and branch office (ROBO) sites of national and global enterprises.
A major problem for such companies is the difficulty of making information generated at the edge available to the organization as a whole. Often collaboration among globally dispersed teams is hindered because critical information is isolated at one or another of the ROBO sites, with no practical way to deliver it in real time to users in other locations.
The first issue is the challenge of dispersing data generated or changed in one location to all the other locations that use it. Network bandwidth limitations can make this an impractically slow process for large datasets. And when local changes are not efficiently propagated, it’s very easy for data consistency problems to arise, with users in various locations unknowingly working on different versions of what they presume to be the same data.
What’s needed is a means to ensure that all users in the organization, no matter where in the world they may be located, are interacting with a single authoritative copy of the data that incorporates changes made at remote sites in near real time. And that's exactly the issue on which has Talon focused.
The company has developed an SDS-based solution, called Talon FAST™, that allows global corporations to centralize their data and safely provide concurrent access to users around the world. Talon explains its mission in the following terms:
To provide visibility, access, seamless performance to a global community, no matter where their data might be.
How Talon FAST™ Works
Talon FAST™ solves the problem of providing concurrent global access to all of an organization’s data, wherever it might reside, by consolidating it into a single, centralized, authoritative instance. When that central dataset is accessed from a ROBO site, only the active portion of the data is downloaded and locally cached using the Talon FAST™ Intelligent File Caching engine. When users make changes to the cached information, delta differencing is used to send only the changes back to the central location for incorporation into the authoritative instance of the dataset. This vastly reduces the amount of data that must be sent from and to remote locations and essentially eliminates bandwidth and responsiveness issues. Users interact with the data as if it were resident at their location.
But what happens if users in different locations are working on the same files, and make simultaneous but incompatible changes? Talon FAST™ eliminates that possibility through its Global File Locking mechanism. Once a change to a file is initiated anywhere in the system, all other instances of that file are locked against editing so that no other changes can be made until the initial change is completed and reflected back to the centralized instance of the data. This ensures that users can never make conflicting changes that result in data inconsistencies.
In addition to Intelligent File Caching and Global File Locking, which are the heart of the Talon FAST™ SDS solution, the system also natively incorporates data deduplication, compression, and streaming capabilities. Through the use of these features, all backup/restore and disaster recovery functions can be shifted out of remote sites and implemented universally at the central location.
Talon Plays a Critical Role in the Modern Storage Marketplace
Talon occupies a unique niche in the SDS ecosystem and works hand-in-hand with other providers in the SDS and cloud storage communities. For example, Talon FAST™, which is a storage-agnostic solution that runs on a physical or virtual Windows Server, is closely integrated with Microsoft Azure and native Azure features such as Active Directory. Talon FAST™ also integrates well with the offerings of storage providers such as NetApp and SoftNAS, as well as with those of other players in the cloud space, such as Nutanix and Scality. For example, Talon and SoftNAS recently entered into a strategic partnership that aims to “provide joint customers with a central cloud-based storage namespace that is secure, highly resilient, and can grow on-demand.”
If you’d like to know more about how Talon can help your company implement a state-of-the-art storage infrastructure, please watch this brief Talon FAST video.