We’ve determined that the best way to help people understand the power of The Single Pass Optimization Engine is to contrast it–and begin to integrate it–with the better known Apache Spark.
The POC will be composed of various runs of Spark on z/OS using GenevaERS components in some, and ultimately full native GenevaERS.
GenevaERS Spark POC Approach
The configurations run include the following:
- The initial execution to produce the required outputs will be native Spark, on an open platform.
- Spark on z/OS, utilizing JZOS as the IO engine. JZOS is a set of Java Utilities to access z/OS facilities. They are C++ routines having Java wrappers, allowing them to be included easily in Spark. Execution of this process will all be on z/OS.
- The first set of code planned for release for GenevaERS is a set of utilities that perform GenevaERS encapsulated functions, such as GenevaERS overlapped BSAM IO, and the unique GenevaERS join algorithm. As part of this POC, we’ll encapsulate these modules, written in z/OS Assembler, in Java wrappers, to allow them to be called by Spark.
- If time permits, we’ll switch out the standard Spark join processes for the GenevaERS utility module.
- The last test is native GenevaERS execution.
The results of this POC will start to contribute to this type of architectural choice framework, which will highlight where GenevaERS’s single pass optimization can enhance Apache Spark’s formidable capabilities.
Note that the POC scope will not test the efficiencies of the Map and Reduce functions directly, but will provide the basis upon which that work can be done.
Start Contributing
Open source is all about people making contributions. Expertise is needed in all types off efforts to carry of the POC.
- Data preparation. We are working to build a data set that can be used on either z/OS or open systems to provide a fair comparison for either platform, but with enough volume to test some performance. Work here includes scripting, data analysis, and cross platform capabilities and design.
- Spark. We want some good Spark code, that uses the power of that system, and makes the POC give real world results. Expertise needed includes writing Spark functions, data design, and turning.
- Java JNI. Java Native Interface is the means by which one calls other language routines under Java, and thus under Spark. Assistance can be used in helping to encapsulate the GenevaERS utility function, GVBUR20 to perform fast IO for our test.
- GenevaERS. The configuration we create we hope to be able to extract as GenevaERS VDP XML, and provide it as a download for initial installation testing. A similar goal with the sample JCL that will be provided. GenevaERS expertise in these fields is needed.
- Documentation, Repository Work, and on and on and on. At the end of drafting this blog entry, facing the distinct chance it will be released with typos and other problems, we recognize we could use help in many more areas.
The focus for our work is this Spark-POC repository. Clone, fork, edit, commit, create pull request, repeat.
New to git? or want to see our standing meeting cadence, check out the How We Work page for more information.