Overcoming Common Challenges Around Data Centralization, Consolidation, and Collaboration with Solutions of the Future
by Michael Fiorenza on August 21, 2017
In today’s corporate IT environment, data sprawl is a pervasive fact of life.
According to the Merriam Webster dictionary, sprawl is defined as “to spread or develop irregularly or without restraint.” That definition very aptly describes the current situation with regard to the data that is the lifeblood of the modern corporation.
The amount of data businesses create and consume continues to grow at exponential rates. Much of that information is generated, modified, and stored not at a corporate headquarters location, but in remote or branch office (ROBO) sites around the country or, increasingly, around the world. According to the 2016 Riverbed Remote Office/Branch Office IT Survey, almost half of all corporate employees work at ROBO sites, and about half of all business data is stored there. For many corporate IT departments, exercising adequate control over the production, distribution, and use of that vital information is becoming more and more difficult.
The fact is that the unrestrained dispersal of sometimes business-critical data across a wide range of ROBO sites and business units presents a number of daunting challenges to a company’s IT organization. Let’s take a look at the most pressing of these.
The Challenge of Maintaining Data Security
Wherever an organization’s sensitive information is stored, that location becomes a target for would-be intruders. When ROBO locations store the data they generate and use on site, they also take on the responsibility of ensuring that it remains well protected. That means the site must be able to prevent or quickly recover from both human-made threats, such as theft or the attempted corruption of a ransomware attack, as well as unintended occurrences such as human error, equipment malfunctions, or natural disasters.
For remote locations that typically lack even one dedicated IT staff member, keeping locally stored data safe and secure is a tall order. Providing first-class protection for data stored on-site is not a trivial task. It requires a level of specialized data security expertise that, in most cases, is simply not available in remote locations. Although the IT staff at a company’s main or headquarters location may be well versed in the skills necessary to implement and maintain a first-class data security regime, attempting to make that expertise available at ROBO sites can result in a costly, inefficient, and often ineffective use of staff resources.
That’s perhaps the major reason why the movement to consolidate all of a company’s data in a central location is gathering steam. In the Riverbed survey, 46 percent of respondents said they struggle to provide the needed level of IT expertise at ROBO sites. Many have concluded that if it’s not cost-effective to send the security experts to where the data is, it makes sense to send the data to where the security experts are.
The Challenge of Distributing Data Across the Organization
For the modern corporation, information is gold. Not only are a wide range of documents, files, and databases an indispensable part of the company’s daily operations, but the insights that result from analytics performed on that data are the raw material for decision-making at all levels of the organization’s executive and managerial hierarchy.
That’s why information generated in remote sites must not only be made available wherever needed across the company, but it also must be up-to-the-minute accurate. In today’s business environment, geographically dispersed departments often collaborate in real time using a common pool of shared data. Front line and customer-facing employees (as well as privileged outsiders such as suppliers or subcontractors) require instant access to relevant information, wherever in the organization that data may have been generated or stored. And decision-makers must have confidence that the information on which they base their conclusions is not outdated.
But quickly and reliably dispersing ROBO-generated data throughout an extensive organization is a technically non-trivial task. A major limiting factor is WAN bandwidth. Most ROBOs don’t have access to sophisticated tools such as WAN accelerators that can help move large amounts of data across the network more rapidly. This often results in unacceptable delays as users attempt to access files stored at remote locations.
In a top-flight distributed data solution, the bandwidth issue is addressed through delta differencing. In this approach, entire datasets are not transmitted over the network, but only the portions of the data that have been changed. By substantially reducing the amount of information that must be transferred in order to keep various locations in sync, the practical impact of network bandwidth limitations can be significantly reduced or even eliminated.
Another issue that often surfaces when attempting to combine datasets generated in several locations is the fact that different sites may each have their own unique naming and formatting conventions for the data they produce. Before ROBO data can be integrated with information from other sources or locations, it may need to undergo a process of data normalization to bring its various attributes into alignment with a common standard.
The necessity for data normalization can be eliminated by imposing and enforcing a universal formatting standard for all data, whatever its source. The difficulty, however, is in consistently enforcing that standard, especially when ROBO sites, or even individual workers, engage in the common practice of informally employing applications that fit their needs, but which the company's central IT authority may not even be aware of. Perhaps the best way to minimize such “shadow IT” is to consolidate all the company’s data into a centralized dataset instance that remote users access exactly as if it was local.
The Challenge of Maintaining Data Consistency
One of the most potentially dangerous issues that affects the dispersal of data generated or modified in different locations is the difficulty of maintaining data consistency. That term refers to the necessity of ensuring that when different users attempt to access the same dataset, they all are presented with exactly the same information.
The difficulty arises from the fact that when data is shared across a geographically dispersed organization, users in different locations can simultaneously attempt to make inconsistent changes to the data. When that happens, the copy of that dataset held in each of those locations is different. Which is the correct version? The answer is both and neither. The inconsistency that arises when users are allowed to simultaneously make changes to a common dataset is a recipe for widespread confusion and the ultimate corruption of the data.
The modern answer to this challenge is the use of intelligent file locking. This technology precludes simultaneous changes by ensuring that once a change is initiated in one location, other users are not allowed to make any changes until the original modification is incorporated into the common dataset instance.
How Data Consolidation and Centralization Meets These Challenges
The most effective means of addressing these issues is the consolidation of all of a company’s data into a single, centralized, authoritative instance. In such a system, all users have concurrent access to that common dataset, which appears to them as though it is local.
With this arrangement, users at various company sites are never working with disparate versions of what is supposedly the same data. Instead, they all interact in a controlled fashion with the central dataset. The files that are currently in use at a particular location, and only those, are locally cached. Then, by the use of delta differencing, only the changes to the cached version of those files are transmitted back to the central instance, minimizing latency effects due to WAN bandwidth limitations.
Because users always interact with the authoritative version of the files they use, the necessity of data normalization is avoided – data must conform to the expected format before it can be entered into the system.
One of the greatest benefits of consolidating an organization’s data into a single, centralized instance is that it allows state-of-the-art data security, backup/restore, and disaster recovery protections to be implemented once at the central location. There’s no need for ROBO sites to concern themselves with developing such solutions – they can be designed, implemented, and managed by the IT organization’s most expert and highly skilled staff members.
The Talon FAST™ Solution
The Talon FAST™ Fabric is an example of a modern software-defined storage (SDS) solution that’s especially designed to help global enterprises meet the challenges of handling data that is widely dispersed in remote locations. It incorporates all the features necessary to facilitate the consolidation of all a company’s data into a central instance.
The Talon FAST™ software, which runs on Windows Server 2012 and above, makes use of intelligent file caching and delta differencing to ensure that only the data currently in use is present at remote sites. Only changes to that cached data are transmitted back to the central dataset instance. “Stale” cached data is automatically purged over time. In addition, Talon FAST™ employs global file locking to ensure data consistency by preventing simultaneous changes to the same file.
Because it is storage-agnostic, Talon FAST™ allows customers to leverage their existing IT infrastructure without being required to purchase specialized storage or server hardware.
With its SDS foundation, Talon FAST™ is well positioned to help enterprises overcome the challenges of data consolidation and centralization, both now and well into the future. If you would like to know more about how Talon FAST™ can help your company get its data under control, please watch this brief video to learn more about the Talon FAST™ solution.