The IT Pro's Guide to Enterprise Data Storage
by Andrew Mullen on October 5, 2016
The market for enterprise data storage is expanding fast. Applications such as social media, cloud services, server virtualization, and big data analytics are driving an explosion in the amount of data companies generate, access, and must retain. Users who are dependent on the instant availability of that data to do their jobs may include a company's workers or customers seated in their offices or moving around the world with their mobile devices. An even greater driver of enterprise data growth is the exponential expansion of the Internet of Things - machines and mobile devices that talk not to humans, but to each other.
In a 2013 survey of more than 1,000 IT professionals, storage vendor EMC reported that respondents identified "managing storage growth" as their biggest challenge.
Unstructured data, which already accounts for about 80 percent of installed storage capacity, is growing at a rate of more than 50 percent per year. Not only must such data be made available in real time to the rapidly increasing number of applications and users that need it on a daily basis, but much of it must also be retained, perhaps for decades, in case it is required at some future time. And all of it must be backed up to insure a company's mission-critical information can never literally go up in flames, or be washed away by some natural or man-made disaster.
Making all that possible is the job of enterprise data storage.
What is Enterprise Data Storage?
The term "enterprise data storage" refers to a system of products and services that provide a centralized digital repository for a company's business information. At the enterprise level, such systems are designed to handle large amounts of data, and to make it available to many users concurrently.
Enterprise storage is comprised of several major subdivisions. Primary storage consists of the data to which users need immediate, real time access. Backup storage is used for data that must be retained for disaster recovery and backup use. Archival storage saves data that may not be required for current use, but that may be needed for historical or future purposes.
Benefits of a Good Enterprise Data Storage System
Today's emerging workloads are totally dependent on quick and reliable access to stored data. Applications such as online transaction processing, e-commerce, distributed computer aided design, and big data analysis, not to mention the quickly expanding area of machine-to-machine (M2M) interactions that underly the Internet Of Things, are crucial to the enterprise's operations. An inadequate or technologically obsolete storage system can effectively hamstring the modern corporation.
On the other hand, a good data storage system will provide the enterprise with some important advantages in the following areas:
Capacity: As stated before, the distinguishing mark of enterprise-level data is that there is a huge amount of it. In a 2012 report, IDC and EMC estimated that by 2020, the amount of data created and stored in a single year will reach 40,000 exabytes (an exabyte is equal to a billion gigabytes). That would represent a doubling time of about two years. A good storage system will not only accommodate the amount of data the enterprise generates in a particular year, but will also allow it to retain and retrieve all the data accumulated in previous years.
Scalability: Not only is enterprise data growing at a staggering rate, but as business opportunities quickly appear (and often just as quickly disappear), the amount of data storage a company needs in order to survive and thrive in a changing environment can expand very rapidly. An effective enterprise storage system will be able to quickly scale: that is, to swiftly and seamlessly add capacity as needed.
Reliability: Storing huge amounts of data is useless if it cannot be reliably retrieved. In any electronic system, a certain rate of data errors will predictably occur. So, an effective storage system will provide a means of recovering data that may be lost.
This can be done by a system of data replication that keeps copies of the data in several different geographical areas, so that if one repository is wiped out, the data can be restored from a different location. Alternatively, some storage systems include data coding schemes that allow recalculation of lost data without the storage space penalty of keeping entire copies. A key requirement of such data recovery is that it should be seamless - that is, lost data is reconstituted in a way that is transparent to users of the system.
Security: Because we live in a hacker-infested world, keeping a company's data secure is of paramount importance. An effective enterprise storage system will be able to provide easy access to authorized users, while reliably keeping intruders out.
Responsiveness: Whether the users of a data storage system are humans, machines, or computer applications, all will be sensitive to how quickly their requests for specific datasets are fulfilled. A storage system that takes too long to retrieve and deliver needed data can be, in many use cases, practically useless. An effective enterprise storage system will be perceived by users as providing almost instantaneous access to data.
Accessibility: A good enterprise data storage system will be concurrently accessible to a large number of users. For example, enterprises in the Architecture, Engineering, and Construction (AEC) sector often work with huge data sets that should be available to employees, contractors, and partners based at any number of locations around the world. Or, the users may be mobile employees, such as truck drivers, who must access a company database on their handheld devices to get route instructions. All require practically instant access to stored data, wherever the user may be at that time.
Types of Enterprise Data Storage Solutions
Enterprise data storage systems are usually characterized by some combination of three different types of storage.
Direct Attached Storage (DAS) refers to storage that is directly attached to a server, such as the hard disk drives that are installed on full-featured computers. However, enterprise level DAS implements additional features to enhance reliability. This usually involves some type of data replication, such as RAID (Redundant Array Of Independent Disks) to enhance reliability. DAS is normally the cheapest solution. However, it suffers from the disadvantage that the number of available disk slots on most servers is limited. This means that increasing storage capacity beyond a certain level can become difficult or expensive.
Network Attached Storage (NAS) refers to storage that is attached to the network, either through one of the servers on the network (in which case that server sees it as DAS, but offers NAS access to other servers on the network), or through a dedicated device. In either case, the device housing the storage acts as a file server, and network clients access the data at the file (as opposed to block) level. The advantage of NAS is that capacity can be expanded simply by adding NAS nodes to the network. The disadvantage is that responsiveness is dependent on the speed of the network, and may quickly deteriorate as additional attached devices contribute to network congestion.
Storage Area Network (SAN) is the term used for a network in which attached clients access the data store at the block level. The SAN storage appears to each server on the network as a locally attached disk which it manages as it would any other disk, using protocols such as Fibre Channel or iSCSI. One advantage of SAN is that servers can be configured to boot from the SAN rather than from a local disk. This provides greater robustness in the event of hardware failure, since a new server can be easily configured to appear to the network exactly as if it is the one being replaced.
In addition, SAN allows greater flexibility in managing the storage environment because the location and configuration of the physical storage devices is immaterial to users. Since servers on the network have direct access only to the SAN device, but not to the storage implemented on that device, expansion and reconfigurations of SAN storage can be accomplished without impacting the devices that use that storage. This allows implementation of more sophisticated backup and disaster recovery schemes than can be easily accomplished with DAS and NAS.
The major disadvantage of SAN is greater initial implementation costs.
Most enterprise storage systems consist of some combination of DAS, NAS, and SAN.
Hardware Considerations in Enterprise Data Storage Solutions
Internal view of a hard disk drive
Historically, the most cost-effective server data storage technology has been the hard disk drive (HDD). Since IBM shipped the first hard drive in 1956, the HDD has been king of the data center. Until recently that technology provided a combination of performance and low cost that was unmatched.
But HDDs have some built-in limitations in terms of speed and capacity. The drive consists of rotating platters that are accessed by physically moving a read/write head to hover over the track on which a block of data is to be read or written. The amount of time it takes the head to get in position is called the seek time. Once the head is in position, it must wait for the data location on the platter to rotate under it. The average amount of time required for the data block location to rotate to where the head is positioned is called latency. The combination of seek time and latency imposes physical limitations on how quickly HDDs can respond to data I/O (input/output) requests. Also, since the HDD is essentially a rotating machine, there are limits to how much can be crammed into a given amount of physical space.
In the last several years a new contender has appeared, and is quickly gaining momentum. The Solid State Drive (SSD) or Flash Drive is semiconductor memory that retains data even after it is powered down. Since there are no moving parts in a SSD, there are no delays due to seek times or latency. Data is read and written very quickly, through random access, just like the RAM in a computer. Also, semiconductor modules can be packaged much more densely than HDD platters for the same amount of storage.
HDD vs SSD
Although flash drives significantly outperform HDDs in terms of capacity and speed, until recently the hindrance to their wide adoption for enterprise storage has been price. But now Flash memory costs are quickly coming down, due in large part to the economies realized through the widespread use of the technology in consumer devices. According to Chris DeAcutis, Vice President of Marketing and Corporate Development at PEAK Resources, enterprise flash memory currently costs around $4 to $5 per gigabyte.
This is still considerably more than the $0.80 to $1.00 per gigabyte price for high capacity HDDs. But, says DeAcutis, "Comparing the cost per gig is an outdated way to look at the cost of storage. You have to take into account the impressive performance improvements that you gain through the use of flash." Moreover, when factors such as the data center space saved because of the much smaller storage footprint of flash memory, as well as costs associated with the electrical power required to operate and cool the storage arrays are taken into account, the TCO disparity between SSDs and HDDs is much smaller than it might first appear.
In fact, according to David Floyer, CTO at Wikibon, the four-year TCO curves of Flash vs Disk crossed in 2016, and by 2020, the TCO for HDDs will be about $74 per TB (terabyte) vs. about $9 for Flash. IDC estimates that enterprise flash drive capacity is now growing at a sustained rate of almost 50 percent per year.
Currently, many enterprises are employing both HDD and SSD technologies. They use SSD for data for which the quickest access times are important, and HDD for backup or archival data. But as SSD prices continue to come down, most observers are confident that flash memory arrays will eventually displace HDDs completely.
Business Considerations in Enterprise Data Storage Solutions
A major consideration for enterprises in choosing the best approach for their data storage system is whether to implement it in house, in their own data centers, or in the cloud. The momentum is now with cloud storage, which offers advantages in cost and ease of management. Cloud storage vendors offer their products on a monthly fee basis in which the customer pays only for the amount of storage actually used during that billing period. Thus, cloud data storage is funded by OpEx rather than the CapEx funding required when companies build, staff, and maintain their own data centers.
On the other hand, enterprises often have business-critical data that they prefer to keep on site for security or performance reasons.
Actually, most large companies are now moving to a hybrid approach, a combination of both on-premises and cloud storage. In this model, the enterprise keeps some of its data in its own data center, either managing it itself, or calling on a cloud services vendor to do so. Less sensitive data is stored in either the public cloud or a private cloud dedicated entirely to that company's operations.
The hybrid model allows an enterprise to get the cost and operational advantages of the cloud, while retaining the security and performance benefits of the on-site data center.
Choosing an Enterprise Data Storage Solution
Whichever technological and financial choices seem best suited to your company's needs, you'll still want to insure that your enterprise data storage system meets the highest standards in terms of access, reliability, security, and cost effectiveness. Talon can help you do that with our FAST™ data centralization and caching technology. If you'd like to know more, please take a look at the Talon FAST™ video.