Tape/Z is an evolving toolkit for analysing HLASM (High Level Assembler) code. The library provides capabilities for working with mainframe assembler code, including parsing, control flow analysis, dependency tracing, and flowchart visualization capabilities.
- Project Overview
- Getting Started
- Programmatic Usage
- CLI Usage
- Workflow
- Analysis Pipeline
- Useful Neo4J Queries
- Contributing
- Reporting Issues
- A Note on Copyright
- License
Tape/Z is designed to parse, analyse, and process HLASM (High Level Assembler) code, which is commonly used in mainframe environments. The project uses ANTLR4 to define the grammar for HLASM instructions and provides tools for working with parsed HLASM code.
Philophically, this is more of a set of tools intended for use in your own projects.
Internally, it uses many of the same components and class infrastructure from Cobol-REKT, and is intended to be a sibling project to that one.
- HLASM Parsing: Parses HLASM code including labels, instructions, operands, and comments
- Embedded SQL Support: Recognizes and parses DB2 SQL statements embedded in HLASM code
- Macro Expansion: Handles macro definitions and expansions, including copybook inclusion
- Control Flow Analysis: Builds control flow graphs (CFG) to visualize program execution paths
- Dependency Tracing: Identifies and tracks dependencies between HLASM modules
- Cyclomatic Complexity: Calculates cyclomatic complexity metrics for code sections
- Neo4J Integration: Stores analysis results in Neo4J graph database for advanced querying
- API Access: Provides Model Context Protocol (MCP) server for programmatic access to analysis capabilities
Before you begin, ensure you have the following installed:
- Java 21 or higher
- Maven 3.6 or higher
- Neo4J (optional, for graph storage)
- Clone the repository:
git clone --recurse-submodules -j8 https://github.com/avishek-sen-gupta/tape-z.git cd tape-z
Or, if you have already cloned the repository without submodules, you can use:
-
Build the project:
-
Set up environment variables for Neo4J (if using):
export NEO4J_URI=bolt://localhost:7687 export NEO4J_USERNAME=neo4j export NEO4J_PASSWORD=your_password -
(Optional) Install Neo4J:
- Download from Neo4J Download Page
- Follow the installation instructions for your platform
- Start the Neo4J server before running any code which needs Neo4J integration
- See HLASMCFGMain for running the analysis pipeline.
- The HlasmCodeAnalysisResult structure contains the following important results:
- controlFlowGraph is the Control Flow Graph.
- complexitiesByLabel contains a map of sections and their cyclomatic complexities.
- flattened contains the list of all instructions. These are TranspilerInstructions with the appropriate TranspilerNode instances.
- dependencyMap contains the call relations between different HLASM programs. The technique for determining what constitutes a call to an external program is still somewhat specific, and will be refined later.
- Use ExportCFGToNeo4JTask to export the CFG to Neo4J.
- See HLASMFlowchartMain to see how to build a flowchart.
- Pass in a VerbatimBasicBlockTextMaker instance if you do not wish to use AI summarise. Otherwise, pass in an AIBasicBlockTextMaker instance.
- Use ExportFlowchartToNeo4JTask to export the flowchart to Neo4J.
Tape/Z provides a command-line interface (CLI) built with PicoCLI that offers multiple commands for analysing and visualizing HLASM code, through the tapez-cli JAR. These outputs can be used for further analysis, visualization, or integration with other tools.
The CLI provides the following commands:
- cfg-to-json: Exports the Control Flow Graph (CFG) to JSON
- flowchart: Builds a flowchart for the entire program in one go
- flowchart-sections: Builds flowcharts for all sections of the program, section by section
This command analyses a HLASM file and exports its control flow graph to JSON format.
Parameters:
- Path to the HLASM file to analyse (positional parameter)
- -c, --copybook: Path to the copybook directory (required)
- -o, --output: Path where the output JSON file will be written (required)
- -e, --external: Path for external programs (required)
Example:
This command builds a flowchart visualization for the entire HLASM program.
Parameters:
- HLASM program name to analyse (positional parameter)
- -s, --srcDir: The HLASM source directory (required)
- -cp, --copyBooksDir: Copybook directory (required)
- -o, --outputDir: Output directory (required)
- -e, --external: Path for external programs (required)
- -m, --model: Foundation model to use (optional)
Example
NOTE: The command above requires an Ollama endpoint to be running to summarise the contents of the flowchart blocks. If you don't wish to do the summarisation, leave out the -m parameter.
This command builds flowcharts for all sections of the HLASM program, section by section.
Parameters:
- HLASM program name to analyse (positional parameter)
- -s, --srcDir: The HLASM source directory (required)
- -cp, --copyBooksDir: Copybook directory (required)
- -o, --outputDir: Output directory (required)
- -e, --external: Path for external programs (required)
- -m, --model: Foundation model to use (optional)
Example:
To see all available commands and general help information:
To see help for a specific command:
The typical workflow is:
- HLASM code is parsed using the grammar from hlasm-parser and hlasm-format-loader
- The parsed code is analysed by hlasm-graph-loader using algorithms from mojo-common to build control flow graphs
- The analysis results are stored in Neo4J using the woof module
- The hlasm-mcp-server provides API access to the analysis capabilities and results
The library processes code through a pipeline which runs multiple passes on the code:
- File Reading: The source HLASM file is read line by line.
- Line Truncation: Lines are truncated beyond column 72, following HLASM standards.
- Macro Expansion: Macros are expanded, and copybooks are included.
- Label Block Extraction: Labeled blocks are identified and extracted.
- Line Continuation Handling: Continued lines are collapsed into single logical lines.
- HLASM Parsing: The code is parsed using the ANTLR4-generated parser.
- SQL Parsing: Embedded SQL statements are identified and parsed.
- Macro Processing: Both structured and unstructured macros are processed.
- External Call Resolution: External calls to other modules are resolved.
- Dependency Tracking: Dependencies between modules are identified and tracked.
- Code Flattening: The hierarchical code structure is flattened for analysis.
- Control Flow Graph Generation: A control flow graph is built from the flattened code.
- Cyclomatic Complexity Calculation: Complexity metrics are calculated for code sections.
- Independent Component Identification: Independent code components are identified.
Identify dead code
Delete all nodes
Match the whole graph
Contributions to Tape/Z are welcome! Here's how you can contribute:
- Fork the repository
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add some amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
Please make sure to update tests as appropriate and follow the existing code style.
If you encounter any bugs or have feature requests, please file an issue on the GitHub repository. When reporting issues, please include:
- A clear and descriptive title
- Steps to reproduce the issue, including a clear minimal example HLASM program where this issue occurs
- Expected behavior
- Actual behavior
- Any relevant logs or error messages
- Your environment (OS, Java version, etc.)
The DB2 grammar has been graciously borrowed from the eclipse-che4z COBOL support project, and thus (together with any changes) falls under the Eclipse Public License v2.0.
The rest of the code falls under the MIT License.
MIT License
Copyright (c) 2025 Avishek Sen Gupta
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.