Trends in Storage – Then and Now

October 7th, 2015

Steve Adil
Disk Products Manager

 

It’s 1980. Having just completed my first year in IT, as a Cobol programmer, I find myself bored. The job does not keep me busy enough, and no one has any ideas of what else I can do, so I look around. There’s this guy they call a “data administrator,” who comes through now and then and always seems busy. Our company is a Beta site for a new IBM software tool called “Hierarchical Storage Manager,” which is his big project. I ask if he needs help. I’m now the company’s second storage administrator.

Back in 1979, storage was something that programmers did on the side. Every application team had their “DASD volumes” and was responsible for performance and availability. Every time something broke, the application team would work with Operations to try to get the data back. It sometimes took days or weeks, and sometimes the data would never be retrieved. At that point, it would have to be re-created. DASD utilization levels were typically at 30-40%, and data usage patterns were not considered when trying to improve efficiency.

IBM brought HSM into the market to help customers lower their cost. Implementing HSM meant having some control over the storage. That, coupled with the difficulties and growing dissatisfaction among the application teams with managing storage, produced a new paradigm: centralized storage management. This meant making our team responsible for the day to day care and feeding of storage resources, but the big change was coming: standards.

Another dramatic new technology was coming out around the same time – the IBM 3850 Mass Storage System. This was technology that used tape drives, with automation, to emulate disk. This is backwards to the trends we see today (using disk to emulate tape) and it was driven by the need to reduce cost. IBM was able to make this work because disk had not yet become dramatically faster than tape as it is today. The MSS as it was called, used a round tape cartridge that resided in a honeycomb. When called for, a robotic arm picked out the tape and mounted it on a drive. The data was then read to a disk buffer in anticipation of being used by an application. The tapes were “pooled” – grouped into small, medium and large. Applications would be required to allocate space through a new process using an already developing change: standards.

HSM’s influence on standards was to introduce dataset naming conventions. This was needed to identify how data was to be used. At my company, the standard called for identifying the owning application through the first “node” in the dataset name – also referred to as the “high level qualifier.” The next node identified another aspect of usage; the third, something else. So, a dataset name would look something like: PLSD.VSAM1.PROVIDER. The first node identified the owner as Personal Lines Systems Division. The second node defined the type of dataset it was, and the third identified that it was the Provider database. A test copy of the same file could be TST.VSAM1.PROVIDER. The standard was now useful to allow HSM to decide how to treat the files. HSM’s role was to move qualified datasets off of expensive DASD onto cheaper DASD or tape, and to compress it while moving it, to save expensive resources. The PLSD qualifier told HSM that it was not to be touched. The TST qualifier was defined to HSM as migratable to disk after 30 days of no access, and to tape after 120 days.

The MSS influenced standards in a different way. Originally, allocation of disk volumes to applications was done simply by selecting the type of volume to be used along with the volume name for a disk volume that was assigned to that application. Because the MSS created virtual disk volumes, and pooled them in “Mass Storage Volume Groups” (MSVGP) according to user defined criteria (in our case into small, medium and large data requirements), the application no longer used a volume name to allocate. Instead, a new methodology was employed: esoteric group names. The unit name, specified in a job’s Job Control Language (JCL), would be UNIT=DASD if the application needed a disk volume to be made available. UNIT=TAPE would cause a tape to be mounted. If the job specified UNIT=3330V, an MSS virtual disk volume would be selected from one of the esoteric pools, depending on the volume group selected.

So, now we have implemented standards for dataset names, and the type of storage to be allocated for a job to process. Because of this control, we can now effect some policies. However, policies will not be effective without participation.

The third trend that began in the 1980’s was the movement of storage responsibilities from the application groups to a centralized Storage Management group. The Storage group was responsible for setting up applications, and their batch jobs, with the standards for how their datasets were to be named and the storage pools they were authorized to use. In the beginning, it was up to the applications to monitor and manage the amount of available space within the pools of storage they were assigned. This forced participation in the standards by everyone who shared the disk (and tape) resources owned by the organization.

Next came automation. Moving the tasks of creating and managing storage to a central group dramatically improved the efficiency and effectiveness of the storage resources but as the organization grew, the job of managing storage became more and more complex and difficult. IBM responded to this trend with an OS based facility known as Systems Managed Storage (SMS). Built into the Mainframe’s storage layer, SMS was able to provide selectable classes of storage to applications for their requirements. It integrated naming conventions, the concept of esoteric pooling, and hierarchical storage management into one facility. Based on just the dataset name, SMS could decide the type of storage to allocate: fast, slow, backed up, not backed up, etc. It could also determine how to treat the data over its lifecycle – should it be migrated to cheaper media and when?

These fundamental changes were pretty much set in concrete by the 1980’s, and haven’t changed dramatically since then. By this time, I had gotten a new job – creating a centralized storage function for a fairly large banking organization. After designing and implementing standards, I moved onto implementing SMS. After a few years, I saw the opportunity to build on this structure and to further improve storage management efficiency and effectiveness. I developed software that automated how the storage process looked to my users. Formerly, they would fill out a form to request storage services. Our team would then perform the necessary work to get them set up, so their jobs would get the resources they needed. The software that I had written replaced both the forms and the process performed by my team. Instead of making calculations for storage requirements, and entering them on a form, the application owner would enter the “logical” requirements into the software. The software would calculate the “physical” requirements as they would map to the storage resources. I took this a step further. I used capacity forecasts that were submitted during budgeting season and subtracted the storage requirements for a particular project from the overall project allocation. Once approved, the environment was then created for the necessary storage to be used by the application for the new project. I was then able to provide reports to the organization’s management on where the storage was allocated, as well as who was within budget.

So, this all sounds good right? But, how much effect can it have overall? Well, I have a good proof point: during the banking crisis of the mid 1980’s, our main competitor was unable to control capital expenditures and ultimately went out of business. My storage team was able to conserve over $3 million in capital over a two year period. How you ask? Since I had a high level of insight into how storage was used, I was able to map out where resources could be reallocated, or otherwise harvested for the important projects. I have to think that went a long way in preserving our viability.

Today’s Trends

All of the disciplines described above went out the door in the move to “client server” processing environments in the 1990’s. Again, individual applications owned not only their storage, but the processing environment as well, and were responsible for the care and feeding of the hardware. Typically, the storage used in these environments was not as robust as what was deployed on mainframes. Once the reality of managing the hardware became clear to the application owners, they began to turn to centralized IT for help.

The first trend to surface was the emergence of data protection. Again, IBM saw the need, took the mainframe HSM software, and redeployed it for this new environment. Originally called ADSTAR Distributed Storage Manager (ADSM), it was HSM recoded to work on Open platforms such as RS6000, S36, etc. ADSM evolved into TSM, and from the beginning it was a robust backup technology. It evolved quickly to cover x86 and all the major Open platforms. TSM is still a major player in the market, and is now called Spectrum Protect.

Next, Systems Resource Monitoring (SRM) tools came out. In order to help IT departments get control over resource consumption and to stay ahead of application hardware requirements, these tools tracked hardware utilization. APPIQ was one of the early players (founded by Ash Ashutosh, today’s Actifio founder) and is still around after it was purchased by HP. Others quickly followed, and the SRM evolution is paying dividends with each new feature. Mainline has been very successful in assisting our customers with storage assessments using SRM tools to gather pertinent data. Usage patterns are analyzed and recommendations are made to improve efficiency. This is an important step in setting the environment up for policy-based storage management. Once usage patterns are established, policies can be constructed to maximize the efficient use of storage resources. Once polices are established, the organization has a base to create automated storage management processes. In order to implement policies, there needs to be centralized management of the storage layer.

Along the way, the involvement of a centralized storage management team has ebbed and flowed. IT organizations see the value in consolidating this effort, yet at the same time, hot new requirements or technology groups, go around the central IT group and deploy their own environments. Often, they do not conform to the standards set forth by central IT. This dynamic evoked the next trend: software defined storage. The move toward Software Defined Storage (SDS) married the need for central control of resources, with the reality that many organizations could not realistically move toward a centralized management model.

SDS is not convergence or hyperconvergence, yet it is a central component of a converged environment. It can also stand alone and does not require convergence. SDS is partly virtualization of the storage infrastructure. It becomes easier to understand if you look at what server virtualization did for that environment. Technologies like VMware, took day to day physical management of the server and replaced it with a hypervisor. Consolidation of platforms onto fewer physical servers and automation of the underlying management tasks, greatly improved efficiency and effectiveness. The same is happening in storage management. Pooling, as it looked on the Mainframe, now works well in the Open world. It is eerily similar to what the 3850 MSS did for storage management in 1978.

Virtualization also consolidates the toolset used by storage managers. Instead of a different tool for point in time or remote replication, thin provisioning, performance tiering, etc. from each array, one common toolset is used no matter what the underlying storage infrastructure looks like. By taking LUN’s (Logical Unit Numbers) from various arrays and pooling them by class of service, provisioning storage becomes much improved and simpler.

Virtualization is a major component of SDS and there are other aspects that provide benefits as well. Consolidation of the different ways that storage is mapped to hosts is enabled. Block and file environments are more easily unified. Object storage technologies become just another choice for the application. It also enables consolidation of management into automated policies designed to promote efficiency and effectiveness.

The top layer of the SDS infrastructure is the policy management engine. Much like what System Managed Storage did to dramatically improve the storage environment for the Mainframe, SDS promises to do for the non-mainframe environments. Enabling clients to automatically select the correct area of the storage infrastructure for their particular application requirements is the ultimate goal of the policy layer. They can then manage the data over its lifecycle and locality based on the predefined requirements for that data, as well as the characteristics of how the data is accessed over its lifespan.

Multi-tenancy provides centralized availability of polices out to a diverse set of clients and enables the SDS infrastructure to dynamically respond to both current and future storage requirements as they evolve. The end state of an SDS environment is automation and simplicity. Complex organizations can leverage simplicity through the consolidated toolset to increase responsiveness to requirements and to reduce the number of technology touch points. Training of new hires becomes less intensive and the skill set required to manage storage can be streamlined. Finally, organizations that do not have deep storage skills can leverage the technology to achieve a sophisticated storage structure.

Use Case Differentiation

We have reviewed the trends toward standardization and automation of storage environments. Software Defined Storage has been compared to Converged/Hyperconverged structures. How do these – and the traditional hardware-centric landscape – compare in how they are used?

Hardware-centric infrastructures depend on the IT department to integrate the components. The positive side of this is the ability to decide upon “best of breed” technologies and to build custom solutions that perform to the highest levels, from both a process and price performance perspective. The downside is the difficulties posed by change – either from the organizational side or from the technology side. Staff changes require intensive training of the new hire to understand and manage the environment. Technology changes are often disruptive from both a process standpoint and from a financial standpoint.

Converged infrastructures provide simplification of not only the infrastructure itself but of the support requirement. Convergence is a vendor-specific phenomenon in that the solution is provided from a single source even if some of the underlying components are provided by different vendors. Because of this single sourcing, the support structure becomes simplified. Organizational or technological change becomes much easier to manage.

The SDS option can be a challenge. The positive side is freedom from hardware constraints. Commoditization of the storage layer reduces cost, increases flexibility and improves responsiveness. The challenge lies in the increased dependence on the IT department to integrate and manage the components. As with the hardware-centric choice, deployment becomes an IT responsibility. Day to day management can be a challenge. Many times availability is not as robust. Technology change is up to the customer to evaluate and integrate. Over time, providers of SDS options will get better at packaging solutions that reduce dependency on the customer, so this development bears watching.

The Future

The pundits who follow trends in storage commonly predict a move toward software defined infrastructures. They see IT migrating toward packaged solutions that can be ordered with a minimum number of SKU’s and can be easily customized for their specific needs. Commoditization of the hardware layer allows customers to choose cost-effective solutions and frees them from real or perceived vendor lock-in.

I agree with this as an important future trend, and at the same time, see a resistance to that approach. Many organizations have highly evolved IT staffs and they see the move toward software defined management of a commoditized hardware layer as too risky. They typically have a good relationship with their hardware vendor and leave the integration of the various components to that vendor. Software defined infrastructures leave the IT organization in charge of integration and support; something that evolved organizations shy away from. Convergence and hyperconvergence can moderate the integration/support issue, but then the organization faces “hyper vendor lock”.

My belief is that we will see a move toward software defined/converged infrastructures where it makes sense for each situation. When the choices are designed to coexist with other architectures, it’s not an all or nothing decision. Converged and hyperconverged infrastructures may face the same dynamic. SDS will probably not become as pervasive as the pundits will have you believe and at the same time, will drive the vendor community towards more non-proprietary solutions.

Submit a Comment

Your email address will not be published. Required fields are marked *