Using TaxoniumTools#

Installing taxoniumtools#

Taxoniumtools is available from PyPI. You can install it with pip.

pip install taxoniumtools

The usher_to_taxonium utility will then be available for use.

Using usher_to_taxonium from taxoniumtools#


First get some files:


Then convert from UShER pb format to Taxonium jsonl format:

usher_to_taxonium --input tfci.pb --output tfci-taxonium.jsonl.gz --metadata tfci.meta.tsv.gz --genbank \
--columns genbank_accession,country,date,pangolin_lineage

You can then open that tfci-taxonium.jsonl.gz file at


Right now Taxoniumtools is limited in the types of genome annotations it can support, for SARS-CoV-2 we recommend using the exact modified .gb file we use in the example, which splits ORF1ab into ORF1a and ORF1b to avoid the need to model ribosome slippage.


Some people ask what the “L” in JSONL is for. JSONL means “JSON Lines”. Each line of the file is a separate JSON object. In the case of Taxonium JSONL format, the very first line contains a lot of metadata about the tree as a whole, and then each additional line contains information about a single node. It’s important to use the “jsonl” extension instead of “json” as otherwise the interface may try to parse your tree as a NextStrain JSON file.


This tool will convert an UShER protobuf file into a Taxonium file. At its simplest it just takes the -i and -o parameters, describing the input and output files. But for the most complete results you can add metadata, a reference genome, or even create a time tree.

Convert a Usher pb to Taxonium jsonl format

usage: usher_to_taxonium [-h] -i INPUT -o OUTPUT [-m METADATA] [-g GENBANK]
                         [-c COLUMNS] [-C]
                         [--chronumental_steps CHRONUMENTAL_STEPS]
                         [--chronumental_date_output CHRONUMENTAL_DATE_OUTPUT]
                         [--chronumental_tree_output CHRONUMENTAL_TREE_OUTPUT]
                         [--chronumental_reference_node CHRONUMENTAL_REFERENCE_NODE]
                         [-j CONFIG_JSON] [-t TITLE]
                         [--overlay_html OVERLAY_HTML] [--remove_after_pipe]
                         [--clade_types CLADE_TYPES] [--name_internal_nodes]
                         [--shear] [--shear_threshold SHEAR_THRESHOLD]
                         [--only_variable_sites] [--key_column KEY_COLUMN]

Named Arguments#

-i, --input

File path to input Usher protobuf file (.pb)

-o, --output

File path for output Taxonium jsonl file

-m, --metadata

File path for input metadata file (CSV/TSV)

-g, --genbank

File path for GenBank file containing reference genome (N.B. currently only one chromosome is supported, and no compound features)

-c, --columns

Column names to include in the metadata, separated by commas, e.g. pangolin_lineage,country

-C, --chronumental

Runs Chronumental to build a time tree. The metadata TSV must include a date column.

Default: False


Number of steps to run Chronumental for


Optional output file for the chronumental date table if you want to keep it (a table mapping nodes to their inferred dates).


Optional output file for the chronumental time tree file in nwk format.


A reference node to be used for Chronumental. This should be earlier in the outbreak and have a good defined date. If not set the oldest sample will be automatically picked by Chronumental.

-j, --config_json

A JSON file to use as a config file containing things such as search parameters

-t, --title

A title for the tree. This will be shown at the top of the window as “[Title] - powered by Taxonium”


A file containing HTML to put in the About box when this tree is loaded. This could contain information about who built the tree and what data you used.


If set, we will remove anything after a pipe (|) in each node’s name, after joining to metadata

Default: False


Optionally specify clade types provided in the UShER file, comma separated - e.g. ‘nextstrain,pango’. Order must match that used in the UShER pb file. If you haven’t specifically annotated clades in your protobuf, don’t use this


If set, we will name internal nodes node_xxx

Default: False


If set, we will ‘shear’ the tree. This will iterate over all nodes. If a particular sub-branch makes up fewer than e.g. 1/1000 of the total descendants, then in most cases it represents a sequencing error. (But it also could represent recombinants, or a real, unfit branch.) We remove these to simplify the interpretation of the tree.

Default: False


Threshold for shearing, default is 1000 meaning branches will be removed if they make up less than <1/1000 nodes. Has no effect unless –shear is set.

Default: 1000


Only store information about the root sequence at a particular position if there is variation at that position somewhere in the tree. This helps to speed up the loading of larger genomes such as MPXV.

Default: False


The column in the metadata file which is the same as the names in the tree

Default: “strain”

Using the parameters above you can trigger usher_to_taxonium to launch Chronumental and create a time tree which will be packaged into your tree.