z Systems Solutions Consultant
The mainframe houses most of the enterprise’s business data, even in today’s highly distributed industries. The mainframe is known for its extremely high transaction rates, and continues to be the most secure platform in the industry. Companies are choosing to maintain this infrastructure, due to the value it provides, as well as the tremendously high cost of moving the applications to another platform.
Companies are also searching for the best and easiest ways to make that mainframe data accessible to the business. So, that leads us into the world of analytics.
What is the problem?
Around the turn of the century, Extract, Transform and Load (ETL) tools emerged, leading to the eruption of data warehouses and data marts. These tools eliminated the need for writing home-grown applications to move mainframe data to the data warehouses. At the time, this was received as a triumph for the IT industry. As years moved forward, the IT industry saw these ETL tools as an increasing expense item, due to the staff required to maintain not only the tool, but all the logic defined for the ETL process at the extract and load site. In addition, these tools ran on Mainframe General Purpose processors, leading to an increase in mainframe software costs to support the ETL process.
Besides the ETL complexities and escalating costs, data scientists were not getting the data they wanted in a timely fashion. The ETL process left them looking at day-old, or even week-old, data. Decisions were made on faulty assumptions. Day and week-old data fell short of meeting the business analysts and data scientists needs. This industry began searching for faster, more real-time access to the mainframe data.
IBM Open Data Analytics for z/OS and Apache Spark
IBM and Rocket Socket have collaborated, using Apache Spark and a proprietary “optimized data layer” from Rocket Software, to enable access to mainframe data such as DB2, CICS, VSAM, IDMS, Sequential and other data types, including distributed data such as Oracle and SQL Server. It can also access the Hadoop data, enabling a complete view of the customer.
In July 2017, IBM announced the IBM Open Data Analytics for z/OS product, the next evolution of the z/OS Platform for Apache Spark that was announced in 2016. Originally supporting only Apache Spark, this announcement enables Python applications to connect to mainframe data through Apache Spark running in z/OS. It also supports the Anaconda math, science and parallel computing packages. With Anaconda and Python, data analysts and data scientists have access to hundreds of the industry’s data science libraries, thereby shortening the time to develop queries.
The IBM Open Data Analytics for z/OS product provides an eclipsed-based tool used by the data analysts to quickly create, test and validate queries against federated test data. Once validated, the data scientist executes these queries against the mainframe data, and other federated data sources, leading to highly accurate business decisions on the near real-time data.
Together, with the new z14 Mainframe capabilities and IBM specialty processors, called zIIP, IBM Open Data Analytics for z/OS offers the analytics industry quick and efficient access to mainframe data. This will be explored in a series of BLOGs.
- The Financial Business Case for IBM Open Data Analytics for z/OS
- IBM Open Data Analytics for z/OS Proof of Technology – What to Know to Begin Your Project
- IBM Open Data Analytics for z/OS Proof of Technology – How to Develop POT use Cases that Make Sense
Watch for the next BLOG that will outline the development of a business case for IBM Open Data Analytics for z/OS.
For further information, please reach out to Marianne Eggett at email@example.com or your local Mainline Account Executive to learn how IBM Open Data Analytics for z/OS can help your IT infrastructure.
Please contact your Mainline Account Executive directly, or click here to contact us with any questions.