MetaGraph is a tool for scalable construction of annotated genome graphs and sequence-to-graph alignment.
The default index representations in MetaGraph are extremely scalable and support building graphs with trillions of nodes and millions of annotation labels. At the same time, the provided workflows and their careful implementation, combined with low-level optimizations of the core data structures, enable exceptional query and alignment performance.
- Large-scale indexing of sequences
- Python API for querying in the server mode
- Encoding k-mer counts (e.g., expression values) and k-mer coordinates in source sequences (e.g., for lossless encoding of genomes)
- Sequence alignment against very large annotated graphs (sub-k seeding allows using arbitrarily short seeds)
- Scalable cleaning of very large de Bruijn graphs (to remove sequencing errors)
- Support for custom alphabets (e.g., {A,C,G,T,N} or amino acids)
- Algorithms for differential assembly
- Use of succinct data structures and efficient representation schemes for extremely high scalability
- Algorithmic choices that work efficiently with succinct data structures (e.g., always prefer batched operations)
- Modular support of different graph and annotation representations
- Use of generic and extensible interfaces to support adding custom index representations / algorithms with little code overhead.
Online documentation is available at https://metagraph.ethz.ch/static/docs/index.html. Offline sources are here.
Install the latest release on Linux or Mac OS X with Anaconda:
If docker is available on the system, immediately get started with
and replace ${HOME} with a directory on the host system to map it under /mnt in the container.
By default, it executes the binary compiled for the DNA alphabet {A,C,G,T}. To run the binary compiled for the DNA5 or Protein alphabet, just replace metagraph with metagraph_DNA5 or metagraph_Protein, respectively, e.g.:
One can see that running MetaGraph with docker is very easy. Also, the following command (or similar) may be handy to see what directory is mounted in the container:
For more complex workflows, consider running docker in the interactive mode:
All different versions of the container image are listed here.
To compile from source (e.g., for builds with custom alphabet or other configurations), see documentation online.
- Build de Bruijn graph from Fasta files, FastQ files, or KMC k-mer counters:
./metagraph build - Annotate graph using the column compressed annotation:
./metagraph annotate - Transform the built annotation to a different annotation scheme:
./metagraph transform_anno - Query annotated graph
./metagraph query
./metagraph
- Cluster columns
Requires N*R/8 + 6*N^2 bytes of RAM, where N is the number of columns and R is the number of rows subsampled.
- Construct Multi-BRWT
Requires M*V/8 + Size(BRWT) bytes of RAM, where M is the number of rows in the annotation and V is the number of nodes merged concurrently.
See metagraph/tests/data/example.diff.json and metagraph/tests/data/example_simple.diff.json for sample files.
Stats for graph
Stats for annotation
Stats for both
Simply run docker build .
The Makefile in the top level source directory can be used to build and test metagraph more conveniently. The following arguments are supported:
- env: environment in which to compile/run ("": on the host, docker: in a docker container)
- alphabet: compile metagraph for a certain alphabet (e.g. DNA or Protein, default DNA)
- additional_cmake_args: additional arguments to pass to cmake.
Examples:
Creating a new version release is done in three steps:
- Update package.json and set the version
- Add a tag with that new version
- Make a new release on github
Metagraph is distributed under the GPLv3 License (see LICENSE). Please find further information in the AUTHORS and COPYRIGHTS files.
.png)

