BIOM(1)

biom-format

BIOM(1)

NAME¶

biom - BIOM Documentation [image]

The BIOM file format (canonically pronounced biome) is designed to be a general-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the Earth Microbiome Project and is a Genomics Standards Consortium supported project.

The BIOM format is designed for general use in broad areas of comparative -omics. For example, in marker-gene surveys, the primary use of this format is to represent OTU tables: the observations in this case are OTUs and the matrix contains counts corresponding to the number of times each OTU is observed in each sample. With respect to metagenome data, this format would be used to represent metagenome tables: the observations in this case might correspond to SEED subsystems, and the matrix would contain counts corresponding to the number of times each subsystem is observed in each metagenome. Similarly, with respect to genome data, this format may be used to represent a set of genomes: the observations in this case again might correspond to SEED subsystems, and the counts would correspond to the number of times each subsystem is observed in each genome.

There are two components to the BIOM project: first is the definition of the BIOM format, and second is development of support objects in multiple programming languages to support the use of BIOM in diverse bioinformatics applications. The version of the BIOM file format is independent of the version of the biom-format software.

There are official implementations of BIOM format support objects (APIs) in the Python and R programming languages. The rest of this site contains details about the BIOM file format (which is independent of the API) and the Python biom-format API. For more details about the R API, please see the CRAN biom package.

•: QIIME

•: MG-RAST

•: PICRUSt

•: Mothur

•: phyloseq

•: MEGAN

•: VAMPS

•: metagenomeSeq

•: Phinch

If you are using BIOM in your project, and would like your project to be listed, please submit a pull request to the BIOM project. More information on submitting pull requests can be found here.

BIOM DOCUMENTATION¶

These pages provide format specifications and API information for the BIOM table objects.

The biom file format¶

The BIOM project consists of two independent tools: the biom-format software package, which contains software tools for working with BIOM-formatted files and the tables they represent; and the BIOM file format. As of the 1.0.0 software version and the 1.0 file format version, the version of the software and the file format are independent of one another. Version specific documentation of the file formats can be found on the following pages.

The biom file format: Version 1.0¶

The biom format is based on JSON to provide the overall structure for the format. JSON is a widely supported format with native parsers available within many programming languages.

Required top-level fields:

id                  : <string or null> a field that can be used to id a table (or null)
format              : <string> The name and version of the current biom format
format_url          : <url> A string with a static URL providing format details
type                : <string> Table type (a controlled vocabulary)
                      Acceptable values:
                       "OTU table"
                       "Pathway table"
                       "Function table"
                       "Ortholog table"
                       "Gene table"
                       "Metabolite table"
                       "Taxon table"
generated_by        : <string> Package and revision that built the table
date                : <datetime> Date the table was built (ISO 8601 format)
rows                : <list of objects> An ORDERED list of obj describing the rows
                      (explained in detail below)
columns             : <list of objects> An ORDERED list of obj  describing the columns
                      (explained in detail below)
matrix_type         : <string> Type of matrix data representation (a controlled vocabulary)
                      Acceptable values:
                       "sparse" : only non-zero values are specified
                       "dense" : every element must be specified
matrix_element_type : Value type in matrix (a controlled vocabulary)
                      Acceptable values:
                       "int" : integer
                       "float" : floating point
                       "unicode" : unicode string
shape               : <list of ints>, the number of rows and number of columns in data
data                : <list of lists>, counts of observations by sample
                       if matrix_type is "sparse", [[row, column, value],
                                                    [row, column, value],
                                                    ...]
                       if matrix_type is "dense",  [[value, value, value, ...],
                                                    [value, value, value, ...],
                                                    ...]

Optional top-level fields:

comment             : <string> A free text field containing any information that you
                       feel is relevant (or just feel like sharing)

The rows value is an ORDERED list of objects where each object corresponds to a single row in the matrix. Each object can currently store arbitrary keys, although this might become restricted based on table type. Each object must provide, at the minimum:

id                  : <string> an arbitrary UNIQUE identifier
metadata            : <an object or null> A object containing key, value metadata pairs

The columns value is an ORDERED list of objects where each object corresponds to a single column in the matrix. Each object can currently store arbitrary keys, although this might become restricted based on table type. Each object must provide, at the minimum:

id                  : <string> an arbitrary UNIQUE identifier
metadata            : <an object or null> A object containing key, value metadata pairs

Example biom files¶

Below are examples of minimal and rich biom files in both sparse and dense formats. To decide which of these you should generate for new data types, see the section on sparse-or-dense.

Minimal sparse OTU table¶

{
    "id":null,
    "format": "Biological Observation Matrix 0.9.1-dev",
    "format_url": "http://biom-format.org/documentation/format_versions/biom-1.0.html",
    "type": "OTU table",
    "generated_by": "QIIME revision 1.4.0-dev",
    "date": "2011-12-19T19:00:00",
    "rows":[
            {"id":"GG_OTU_1", "metadata":null},
            {"id":"GG_OTU_2", "metadata":null},
            {"id":"GG_OTU_3", "metadata":null},
            {"id":"GG_OTU_4", "metadata":null},
            {"id":"GG_OTU_5", "metadata":null}
        ],
    "columns": [
            {"id":"Sample1", "metadata":null},
            {"id":"Sample2", "metadata":null},
            {"id":"Sample3", "metadata":null},
            {"id":"Sample4", "metadata":null},
            {"id":"Sample5", "metadata":null},
            {"id":"Sample6", "metadata":null}
        ],
    "matrix_type": "sparse",
    "matrix_element_type": "int",
    "shape": [5, 6],
    "data":[[0,2,1],
            [1,0,5],
            [1,1,1],
            [1,3,2],
            [1,4,3],
            [1,5,1],
            [2,2,1],
            [2,3,4],
            [2,4,2],
            [3,0,2],
            [3,1,1],
            [3,2,1],
            [3,5,1],
            [4,1,1],
            [4,2,1]
           ]
}

Minimal dense OTU table¶

{
    "id":null,
    "format": "Biological Observation Matrix 0.9.1-dev",
    "format_url": "http://biom-format.org/documentation/format_versions/biom-1.0.html",
    "type": "OTU table",
    "generated_by": "QIIME revision 1.4.0-dev",
    "date": "2011-12-19T19:00:00",
    "rows":[
            {"id":"GG_OTU_1", "metadata":null},
            {"id":"GG_OTU_2", "metadata":null},
            {"id":"GG_OTU_3", "metadata":null},
            {"id":"GG_OTU_4", "metadata":null},
            {"id":"GG_OTU_5", "metadata":null}
        ],
    "columns": [
            {"id":"Sample1", "metadata":null},
            {"id":"Sample2", "metadata":null},
            {"id":"Sample3", "metadata":null},
            {"id":"Sample4", "metadata":null},
            {"id":"Sample5", "metadata":null},
            {"id":"Sample6", "metadata":null}
        ],
    "matrix_type": "dense",
    "matrix_element_type": "int",
    "shape": [5,6],
    "data":  [[0,0,1,0,0,0],
              [5,1,0,2,3,1],
              [0,0,1,4,2,0],
              [2,1,1,0,0,1],
              [0,1,1,0,0,0]]
}

Rich sparse OTU table¶

{
 "id":null,
 "format": "Biological Observation Matrix 0.9.1-dev",
 "format_url": "http://biom-format.org/documentation/format_versions/biom-1.0.html",
 "type": "OTU table",
 "generated_by": "QIIME revision 1.4.0-dev",
 "date": "2011-12-19T19:00:00",
 "rows":[
    {"id":"GG_OTU_1", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}},
    {"id":"GG_OTU_2", "metadata":{"taxonomy":["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}},
    {"id":"GG_OTU_3", "metadata":{"taxonomy":["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}},
    {"id":"GG_OTU_4", "metadata":{"taxonomy":["k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Halanaerobiales", "f__Halanaerobiaceae", "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum"]}},
    {"id":"GG_OTU_5", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}}
    ],
 "columns":[
    {"id":"Sample1", "metadata":{
                             "BarcodeSequence":"CGCTTATCGAGA",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample2", "metadata":{
                             "BarcodeSequence":"CATACCAGTAGC",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample3", "metadata":{
                             "BarcodeSequence":"CTCTCTACCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample4", "metadata":{
                             "BarcodeSequence":"CTCTCGGCCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample5", "metadata":{
                             "BarcodeSequence":"CTCTCTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample6", "metadata":{
                             "BarcodeSequence":"CTAACTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}}
            ],
 "matrix_type": "sparse",
 "matrix_element_type": "int",
 "shape": [5, 6],
 "data":[[0,2,1],
         [1,0,5],
         [1,1,1],
         [1,3,2],
         [1,4,3],
         [1,5,1],
         [2,2,1],
         [2,3,4],
         [2,5,2],
         [3,0,2],
         [3,1,1],
         [3,2,1],
         [3,5,1],
         [4,1,1],
         [4,2,1]
        ]
}

Rich dense OTU table¶

{
 "id":null,
 "format": "Biological Observation Matrix 0.9.1-dev",
 "format_url": "http://biom-format.org/documentation/format_versions/biom-1.0.html",
 "type": "OTU table",
 "generated_by": "QIIME revision 1.4.0-dev",
 "date": "2011-12-19T19:00:00",
 "rows":[
    {"id":"GG_OTU_1", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}},
    {"id":"GG_OTU_2", "metadata":{"taxonomy":["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}},
    {"id":"GG_OTU_3", "metadata":{"taxonomy":["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}},
    {"id":"GG_OTU_4", "metadata":{"taxonomy":["k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Halanaerobiales", "f__Halanaerobiaceae", "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum"]}},
    {"id":"GG_OTU_5", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}}
    ],
 "columns":[
    {"id":"Sample1", "metadata":{
                             "BarcodeSequence":"CGCTTATCGAGA",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample2", "metadata":{
                             "BarcodeSequence":"CATACCAGTAGC",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample3", "metadata":{
                             "BarcodeSequence":"CTCTCTACCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample4", "metadata":{
                             "BarcodeSequence":"CTCTCGGCCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample5", "metadata":{
                             "BarcodeSequence":"CTCTCTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample6", "metadata":{
                             "BarcodeSequence":"CTAACTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}}
            ],
 "matrix_type": "dense",
 "matrix_element_type": "int",
 "shape": [5,6],
 "data":  [[0,0,1,0,0,0],
           [5,1,0,2,3,1],
           [0,0,1,4,2,0],
           [2,1,1,0,0,1],
           [0,1,1,0,0,0]]
}

The biom file format: Version 2.0¶

The biom format is based on HDF5 to provide the overall structure for the format. HDF5 is a widely supported format with native parsers available within many programming languages.

Required top-level attributes:

id                  : <string or null> a field that can be used to id a table (or null)
format              : <string> The name and version of the current biom format
format-url          : <url> A string with a static URL providing format details
type                : <string> Table type (a controlled vocabulary)
                      Acceptable values:
                       "OTU table"
                       "Pathway table"
                       "Function table"
                       "Ortholog table"
                       "Gene table"
                       "Metabolite table"
                       "Taxon table"
generated-by        : <string> Package and revision that built the table
creation-date       : <datetime> Date the table was built (ISO 8601 format)
nnz                 : <int> The number of non-zero elements in the table
shape               : <list of ints>, the number of rows and number of columns in data

Required groups:

observation/        : The HDF5 group that contains observation specific information and an observation oriented view of the data
observation/matrix  : The HDF5 group that contains matrix data oriented for observation-wise operations (e.g., in compressed sparse row format)
sample/             : The HDF5 group that contains sample specific information and a sample oriented data oriented view of the data
sample/matrix       : The HDF5 group that contains matrix data oriented for sample-wise operations (e.g., in compressed sparse column format)

Required datasets:

observation/ids            : <string> or <variable length string> A (N,) dataset of the observation IDs, where N is the total number of IDs
observation/matrix/data    : <float64> A (nnz,) dataset containing the actual matrix data
observation/matrix/indices : <int32> A (nnz,) dataset containing the column indices (e.g., maps into samples/ids)
observation/matrix/indptr  : <int32> A (M+1,) dataset containing the compressed row offsets
sample/ids                 : <string> or <variable length string> A (M,) dataset of the sample IDs, where M is the total number of IDs
sample/matrix/data         : <float64> A (nnz,) dataset containing the actual matrix data
sample/matrix/indices      : <int32> A (nnz,) dataset containing the row indices (e.g., maps into observation/ids)
sample/matrix/indptr       : <int32> A (N+1,) dataset containing the compressed column offsets

Optional datasets:

observation/metadata       : <variable length string or null> If specified, a (1,) dataset containing a JSON-string representation of the metadata
sample/metadata            : <variable length string or null> If specified, a (1,) dataset containing a JSON-string representation of the metadata

The metadata for each axis (observation and sample) are described with JSON. The required structure, if the metadata are specified, is a list of objects, where the list is in index order with respect to the axis (e.g, the object at element 0 corresponds to ID 0 for the given axis). Any metadata that corresponds to the ID, such as taxonomy, can be represented in the object. For instance, the following JSON string describes taxonomy for three IDs:

Metadata description:

[
    {"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}},
    {"taxonomy": ["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}},
    {"taxonomy": ["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}}
]

Example biom files¶

Below are examples of minimal and rich biom files in both sparse and dense formats. To decide which of these you should generate for new data types, see the section on sparse-or-dense.

BIOM 2.0 OTU table in the HDF5 data description langauge (DDL)¶

HDF5 "rich_sparse_otu_table_hdf5.biom" {
GROUP "/" {
   ATTRIBUTE "creation-date" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2014-05-13T14:50:32.052446"
      }
   }
   ATTRIBUTE "format-url" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "http://biom-format.org"
      }
   }
   ATTRIBUTE "format-version" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 2, 0
      }
   }
   ATTRIBUTE "generated-by" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "example"
      }
   }
   ATTRIBUTE "id" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "No Table ID"
      }
   }
   ATTRIBUTE "nnz" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SCALAR
      DATA {
      (0): 15
      }
   }
   ATTRIBUTE "shape" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 5, 6
      }
   }
   ATTRIBUTE "type" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "otu table"
      }
   }
   GROUP "observation" {
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
         DATA {
         (0): "GG_OTU_1", "GG_OTU_2", "GG_OTU_3", "GG_OTU_4", "GG_OTU_5"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 5, 1, 2, 3, 1, 1, 4, 2, 2, 1, 1, 1, 1, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 2, 0, 1, 3, 4, 5, 2, 3, 5, 0, 1, 2, 5, 1, 2
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): 0, 1, 6, 9, 13, 15
            }
         }
      }
      DATASET "metadata" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): "[{"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}, {"taxonomy": ["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}, {"taxonomy": ["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}, {"taxonomy": ["k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Halanaerobiales", "f__Halanaerobiaceae", "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum"]}, {"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}]"
         }
      }
   }
   GROUP "sample" {
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): "Sample1", "Sample2", "Sample3", "Sample4", "Sample5",
         (5): "Sample6"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 5, 2, 1, 1, 1, 1, 1, 1, 1, 2, 4, 3, 1, 2, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 3, 1, 3, 4, 0, 2, 3, 4, 1, 2, 1, 1, 2, 3
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
            DATA {
            (0): 0, 2, 5, 9, 11, 12, 15
            }
         }
      }
      DATASET "metadata" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): "[{"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CGCTTATCGAGA", "Description": "human gut", "BODY_SITE": "gut"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CATACCAGTAGC", "Description": "human gut", "BODY_SITE": "gut"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTCTCTACCTGT", "Description": "human gut", "BODY_SITE": "gut"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTCTCGGCCTGT", "Description": "human skin", "BODY_SITE": "skin"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTCTCTACCAAT", "Description": "human skin", "BODY_SITE": "skin"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTAACTACCAAT", "Description": "human skin", "BODY_SITE": "skin"}]"
         }
      }
   }
}
}

The biom file format: Version 2.1¶

The biom format is based on HDF5 to provide the overall structure for the format. HDF5 is a widely supported format with native parsers available within many programming languages.

Required top-level attributes:

id                   : <string or null> a field that can be used to id a table (or null)
type                 : <string> Table type (a controlled vocabulary)
                       Acceptable values:
                        "OTU table"
                        "Pathway table"
                        "Function table"
                        "Ortholog table"
                        "Gene table"
                        "Metabolite table"
                        "Taxon table"
format-url           : <url> A string with a static URL providing format details
format-version       : <tuple> The version of the current biom format, major and minor
generated-by         : <string> Package and revision that built the table
creation-date        : <datetime> Date the table was built (ISO 8601 format)
shape                : <list of ints>, the number of rows and number of columns in data
nnz                  : <int> The number of non-zero elements in the table

Required groups:

observation/               : The HDF5 group that contains observation specific information and an observation oriented view of the data
observation/matrix         : The HDF5 group that contains matrix data oriented for observation-wise operations (e.g., in compressed sparse row format)
observation/metadata       : The HDF5 group that contains observation specific metadata information
observation/group-metadata : The HDF5 group that contains observation specific group metadata information (e.g., phylogenetic tree)
sample/                    : The HDF5 group that contains sample specific information and a sample oriented data oriented view of the data
sample/matrix              : The HDF5 group that contains matrix data oriented for sample-wise operations (e.g., in compressed sparse column format)
sample/metadata            : The HDF5 group that contains sample specific metadata information
sample/group-metadata      : The HDF5 group that contains sample specific group metadata information (e.g., relationships between samples)

Required datasets:

observation/ids            : <string> or <variable length string> A (N,) dataset of the observation IDs, where N is the total number of IDs
observation/matrix/data    : <float64> A (nnz,) dataset containing the actual matrix data
observation/matrix/indices : <int32> A (nnz,) dataset containing the column indices (e.g., maps into samples/ids)
observation/matrix/indptr  : <int32> A (M+1,) dataset containing the compressed row offsets
sample/ids                 : <string> or <variable length string> A (M,) dataset of the sample IDs, where M is the total number of IDs
sample/matrix/data         : <float64> A (nnz,) dataset containing the actual matrix data
sample/matrix/indices      : <int32> A (nnz,) dataset containing the row indices (e.g., maps into observation/ids)
sample/matrix/indptr       : <int32> A (N+1,) dataset containing the compressed column offsets

Under the observation/metadata and sample/metadata groups, the user can specify an arbitrary number of datasets that represents a metadata category for that axis. The expected structure for each of these metadata datasets is a list of atomic type objects (int, float, str, ...) where the index order of the list corresponds to the index order of the relevant axis IDs. Special complex metadata fields have been defined, and they are stored in a specific way. Currently, the available special metadata fields are:

observation/metadata/taxonomy      : <string> or <variable length string> A (N, ?) dataset containing the taxonomy names assigned to the observation
observation/metadata/KEGG_Pathways : <string> or <variable length string> A (N, ?) dataset containing the KEGG Pathways assigned to the observation
observation/metadata/collapsed_ids : <string> or <variable length string> A (N, ?) dataset containing the observation ids of the original table that have been collapsed in the given observation
sample/metadata/collapsed_ids      : <string> or <variable length string> A (M, ?) dataset containing the sample ids of the original table that have been collapsed in the given sample

Under the observation/group-metadata and sample/group-metadata groups, the user can specify an arbitrary number of datasets that represents a relationship between the ids for that axis. The expected structure for each of these group metadata datasets is a single string or variable length string. Each of these datasets should have defined an attribute called data_type, which specifies how the string should be interpreted. One example of such group metadata dataset is observation/group-metadata/phylogeny, with the attribute observation/group-metadata/phylogeny.attrs['data_type'] = "newick", which stores a single string with the newick format of the phylogenetic tree for the observations.

Example biom files¶

Below is an examples of a rich biom file. To decide which of these you should generate for new data types, see the section on sparse-or-dense.

BIOM 2.1 OTU table in the HDF5 data description langauge (DDL)¶

HDF5 "examples/rich_sparse_otu_table_hdf5.biom" {
GROUP "/" {
   ATTRIBUTE "creation-date" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2014-07-29T16:16:36.617320"
      }
   }
   ATTRIBUTE "format-url" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "http://biom-format.org"
      }
   }
   ATTRIBUTE "format-version" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 2, 1
      }
   }
   ATTRIBUTE "generated-by" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "example"
      }
   }
   ATTRIBUTE "id" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "No Table ID"
      }
   }
   ATTRIBUTE "nnz" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SCALAR
      DATA {
      (0): 15
      }
   }
   ATTRIBUTE "shape" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 5, 6
      }
   }
   ATTRIBUTE "type" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "otu table"
      }
   }
   GROUP "observation" {
      GROUP "group-metadata" {
      }
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
         DATA {
         (0): "GG_OTU_1", "GG_OTU_2", "GG_OTU_3", "GG_OTU_4", "GG_OTU_5"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 5, 1, 2, 3, 1, 1, 4, 2, 2, 1, 1, 1, 1, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 2, 0, 1, 3, 4, 5, 2, 3, 5, 0, 1, 2, 5, 1, 2
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): 0, 1, 6, 9, 13, 15
            }
         }
      }
      GROUP "metadata" {
         DATASET "taxonomy" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 5, 7 ) / ( 5, 7 ) }
            DATA {
            (0,0): "k__Bacteria", "p__Proteobacteria",
            (0,2): "c__Gammaproteobacteria", "o__Enterobacteriales",
            (0,4): "f__Enterobacteriaceae", "g__Escherichia", "s__",
            (1,0): "k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae",
            (1,3): "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum",
            (1,6): "s__",
            (2,0): "k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia",
            (2,3): "o__Methanosarcinales", "f__Methanosarcinaceae",
            (2,5): "g__Methanosarcina", "s__",
            (3,0): "k__Bacteria", "p__Firmicutes", "c__Clostridia",
            (3,3): "o__Halanaerobiales", "f__Halanaerobiaceae",
            (3,5): "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum",
            (4,0): "k__Bacteria", "p__Proteobacteria",
            (4,2): "c__Gammaproteobacteria", "o__Enterobacteriales",
            (4,4): "f__Enterobacteriaceae", "g__Escherichia", "s__"
            }
         }
      }
   }
   GROUP "sample" {
      GROUP "group-metadata" {
      }
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): "Sample1", "Sample2", "Sample3", "Sample4", "Sample5",
         (5): "Sample6"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 5, 2, 1, 1, 1, 1, 1, 1, 1, 2, 4, 3, 1, 2, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 3, 1, 3, 4, 0, 2, 3, 4, 1, 2, 1, 1, 2, 3
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
            DATA {
            (0): 0, 2, 5, 9, 11, 12, 15
            }
         }
      }
      GROUP "metadata" {
         DATASET "BODY_SITE" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "gut", "gut", "gut", "skin", "skin", "skin"
            }
         }
         DATASET "BarcodeSequence" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "CGCTTATCGAGA", "CATACCAGTAGC", "CTCTCTACCTGT",
            (3): "CTCTCGGCCTGT", "CTCTCTACCAAT", "CTAACTACCAAT"
            }
         }
         DATASET "Description" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "human gut", "human gut", "human gut", "human skin",
            (4): "human skin", "human skin"
            }
         }
         DATASET "LinkerPrimerSequence" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "CATGCTGCCTCCCGTAGGAGT", "CATGCTGCCTCCCGTAGGAGT",
            (2): "CATGCTGCCTCCCGTAGGAGT", "CATGCTGCCTCCCGTAGGAGT",
            (4): "CATGCTGCCTCCCGTAGGAGT", "CATGCTGCCTCCCGTAGGAGT"
            }
         }
      }
   }
}
}

Release versions contain three integers in the following format: major-version.minor-version.micro-version. When -dev is appended to the end of a version string that indicates a development (or between-release version). For example, 1.0.0-dev would refer to the development version following the 1.0.0 release.

Tips and FAQs regarding the BIOM file format¶

Motivation for the BIOM format¶

The BIOM format was motivated by several goals. First, to facilitate efficient handling and storage of large, sparse biological contingency tables; second, to support encapsulation of core study data (contingency table data and sample/observation metadata) in a single file; and third, to facilitate the use of these tables between tools that support this format (e.g., passing of data between QIIME, MG-RAST, and VAMPS.).

Efficient handling and storage of very large tables¶

In QIIME, we began hitting limitations with OTU table objects when working with thousands of samples and hundreds of thousands of OTUs. In the near future we expect that we'll be dealing with hundreds of thousands of samples in single analyses.

The OTU table format up to QIIME 1.4.0 involved a dense matrix: if an OTU was not observed in a given sample, that would be indicated with a zero. We now primarily represent OTU tables in a sparse format: if an OTU is not observed in a sample, there is no count for that OTU. The two ways of representing this data are exemplified here.

A dense representation of an OTU table:

OTU ID PC.354  PC.355  PC.356
OTU0   0   0   4
OTU1   6   0   0
OTU2   1   0   7
OTU3   0   0   3

A sparse representation of an OTU table:

PC.354 OTU1 6
PC.354 OTU2 1
PC.356 OTU0 4
PC.356 OTU2 7
PC.356 OTU3 3

OTU table data tends to be sparse (e.g., greater than 90% of counts are zero, and frequently as many as 99% of counts are zero) in which case the latter format is more convenient to work with as it has a smaller memory footprint. Both of these representations are supported in the biom-format project via dense and sparse Table types. Generally if less than 85% of your counts are zero, a dense representation will be more efficient.

Encapsulation of core study data (OTU table data and sample/OTU metadata) in a single file¶

Formats, such as JSON and HDF5, made more efficient storage of highly sparse data and allowed for storage of arbitrary amounts of sample and OTU metadata in a single file. Sample metadata corresponds to what is generally found in QIIME mapping files. At this stage inclusion of this information in the OTU table file is optional, but it may be useful for sharing these files with other QIIME users and for publishing or archiving results of analyses. OTU metadata (generally a taxonomic assignment for an OTU) is also optional. In contrast to the previous OTU table format, you can now store more than one OTU metadata value in this field, so for example you can score taxonomic assignments based on two different taxonomic assignment approaches.

Facilitating the use of tables between tools that support this format¶

Different tools, such as QIIME, MG-RAST, and VAMPS work with similar data structures that represent different types of data. An example of this is a metagenome table that could be generated by MG-RAST (where for example, columns are metagenomes and rows are functional categories). Exporting this data from MG-RAST in a suitable format will allow for the application of many of the QIIME tools to this data (such as generation of alpha rarefaction plots or beta diversity ordination plots). This new format is far more general than previous formats, so will support adoption by groups working with different data types and is already being integrated to support transfer of data between QIIME, MG-RAST, and VAMPS.

File extension¶

We recommend that BIOM files use the .biom extension.

Quick start¶

BIOM has an example table and two methods for reading in Table objects that are immediately available at the package level.

Functions¶


load_table(f)	Load a Table from a path

biom.load_table¶

biom.load_table(f): Load a Table from a path

Parameters: f : str

Returns: Table

Raises: IOError

If the path does not exist

TypeError

If the data in the path does not appear to be a BIOM table

Examples

Parse a table from a path. BIOM will attempt to determine if the fhe file is either in TSV, HDF5, JSON, gzip'd JSON or gzip'd TSV and parse accordingly:

>>> from biom import load_table
>>> table = load_table('path/to/table.biom') # doctest: +SKIP

Examples¶

Load an example table:

>>> from biom import example_table
>>> print example_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 1.0 2.0
O2  3.0 4.0 5.0

Parse a table from an open file object:

>>> from biom import parse_table
>>> with open('path/to/table.biom') as f: # doctest: +SKIP
...     table = parse_table(f)

Parse a table from a path. BIOM will attempt to determine if the file is either in TSV, HDF5, JSON, gzip'd JSON or gzip'd TSV and parse accordingly:

>>> from biom import load_table
>>> table = load_table('path/to/table.biom') # doctest: +SKIP

BIOM Table ( biom.table)¶

The biom-format project provides rich Table objects to support use of the BIOM file format. The objects encapsulate matrix data (such as OTU counts) and abstract the interaction away from the programmer.

Classes¶


Table(data, observation_ids, sample_ids[, ...])	The (canonically pronounced 'teh') Table.

biom.table.Table¶

class biom.table.Table(data, observation_ids, sample_ids, observation_metadata=None, sample_metadata=None, table_id=None, type=None, create_date=None, generated_by=None, observation_group_metadata=None, sample_group_metadata=None, **kwargs)

The (canonically pronounced 'teh') Table.

Give in to the power of the Table! Attributes


dtype	The type of the objects in the underlying contingency matrix

matrix_data	The sparse matrix object

nnz	Number of non-zero elements of the underlying contingency matrix

shape	The shape of the underlying contingency matrix

biom.table.Table.dtype¶

Table.dtype: The type of the objects in the underlying contingency matrix

biom.table.Table.matrix_data¶

Table.matrix_data: The sparse matrix object

biom.table.Table.nnz¶

Table.nnz: Number of non-zero elements of the underlying contingency matrix

biom.table.Table.shape¶

Table.shape: The shape of the underlying contingency matrix

Methods


__getitem__(args)	Handles row or column slices

_extract_data_from_tsv(lines[, delim, ...])	Parse a classic table into (sample_ids, obs_ids, data, metadata,

add_group_metadata(group_md[, axis])	Take a dict of group metadata and add it to an axis

add_metadata(md[, axis])	Take a dict of metadata and add it to an axis.

collapse(f[, reduce_f, norm, ...])	Collapse partitions in a table by metadata or by IDs

copy()	Returns a copy of the table

data(id[, axis, dense])	Returns data associated with an id

delimited_self([delim, header_key, ...])	Return self as a string in a delimited form

descriptive_equality(other)	For use in testing, describe how the tables are not equal

exists(id[, axis])	Returns whether id exists in axis

filter(ids_to_keep[, axis, invert, inplace])	Filter a table based on a function or iterable.

from_hdf5(h5grp[, ids, axis])	Parse an HDF5 formatted BIOM table

from_json(json_table[, data_pump, ...])	Parse a biom otu table type

from_tsv(lines, obs_mapping, sample_mapping, ...)	Parse a tab separated (observation x sample) formatted BIOM table

get_table_density()	Returns the fraction of nonzero elements in the table.

get_value_by_ids(obs_id, samp_id)	Return value in the matrix corresponding to (obs_id, samp_id)

group_metadata([axis])	Return the group metadata of the given axis

ids([axis])	Return the ids along the given axis

index(id, axis)	Return the index of the identified sample/observation.

is_empty()	Check whether the table is empty

iter([dense, axis])	Yields (value, id, metadata)

iter_data([dense, axis])	Yields axis values

iter_pairwise([dense, axis, tri, diag])	Pairwise iteration over self

max([axis])	Get the maximum nonzero value over an axis

merge(other[, sample, observation, ...])	Merge two tables together

metadata([id, axis])	Return the metadata of the identified sample/observation.

min([axis])	Get the minimum nonzero value over an axis

nonzero()	Yields locations of nonzero elements within the data matrix

nonzero_counts(axis[, binary])	Get nonzero summaries about an axis

norm([axis, inplace])	Normalize in place sample values by an observation, or vice versa.

pa([inplace])	Convert the table to presence/absence data

partition(f[, axis])	Yields partitions

reduce(f, axis)	Reduce over axis using function f

sort([sort_f, axis])	Return a table sorted along axis

sort_order(order[, axis])	Return a new table with axis in order

subsample(n[, axis, by_id])	Randomly subsample without replacement.

sum([axis])	Returns the sum by axis

to_hdf5(h5grp, generated_by[, compress])	Store CSC and CSR in place

to_json(generated_by[, direct_io])	Returns a JSON string representing the table in BIOM format.

to_tsv([header_key, header_value, ...])	Return self as a string in tab delimited form

transform(f[, axis, inplace])	Iterate over axis, applying a function f to each vector.

transpose()	Transpose the contingency table

biom.table.Table.getitem¶

Table.__getitem__(args): Handles row or column slices
Slicing over an individual axis is supported, but slicing over both axes at the same time is not supported. Partial slices, such as foo[0, 5:10] are not supported, however full slices are supported, such as foo[0, :].

Parameters: args : tuple or slice

The specific element (by index position) to return or an entire row or column of the data.

Returns: float or spmatrix

A float is return if a specific element is specified, otherwise a spmatrix object representing a vector of sparse data is returned.

Raises: IndexError

•: If the matrix is empty

•: If the arguments do not appear to be a tuple

•: If a slice on row and column is specified

•: If a partial slice is specified

Notes

Switching between slicing rows and columns is inefficient. Slicing of rows requires a CSR representation, while slicing of columns requires a CSC representation, and transforms are performed on the data if the data are not in the required representation. These transforms can be expensive if done frequently.

biom.table.Table._extract_data_from_tsv¶

static Table._extract_data_from_tsv(lines, delim='t', dtype=<type 'float'>, header_mark=None, md_parse=None): Parse a classic table into (sample_ids, obs_ids, data, metadata, name)

Parameters: lines: list or file-like object

delimted data to parse

delim: string

delimeter in file lines

dtype: type

header_mark: string or None

string that indicates start of header line

md_parse: function or None

funtion used to parse metdata

Returns: list

sample_ids

list

observation_ids

array

data

list

metadata

string

column name if last column is non-numeric

Notes

This is intended to be close to how QIIME classic OTU tables are parsed with the exception of the additional md_name field

This function is ported from QIIME ( http://www.qiime.org), previously named parse_classic_otu_table. QIIME is a GPL project, but we obtained permission from the authors of this function to port it to the BIOM Format project (and keep it under BIOM's BSD license).

biom.table.Table.add_group_metadata¶

Table.add_group_metadata(group_md, axis='sample'): Take a dict of group metadata and add it to an axis

Parameters: group_md : dict of tuples

group_md should be of the form {category: (data type, value)

axis : {'sample', 'observation'}, optional

The axis to operate on

Raises: UnknownAxisError

If provided an unrecognized axis.

biom.table.Table.add_metadata¶

Table.add_metadata(md, axis='sample'): Take a dict of metadata and add it to an axis.

Parameters: md : dict of dict

md should be of the form {id: {dict_of_metadata}}

axis : {'sample', 'observation'}, optional

The axis to operate on

biom.table.Table.collapse¶

Table.collapse(f, reduce_f=<built-in function add>, norm=True, min_group_size=1, include_collapsed_metadata=True, one_to_many=False, one_to_many_mode='add', one_to_many_md_key='Path', strict=False, axis='sample'): Collapse partitions in a table by metadata or by IDs
Partition data by metadata or IDs and then collapse each partition into a single vector.
If include_collapsed_metadata is True, the metadata for the collapsed partition will be a category named 'collapsed_ids', in which a list of the original ids that made up the partition is retained
The remainder is only relevant to setting one_to_many to True.
If one_to_many is True, allow vectors to collapse into multiple bins if the metadata describe a one-many relationship. Supplied functions must allow for iteration support over the metadata key and must return a tuple of (path, bin) as to describe both the path in the hierarchy represented and the specific bin being collapsed into. The uniqueness of the bin is _not_ based on the path but by the name of the bin.
The metadata value for the corresponding collapsed column may include more (or less) information about the collapsed data. For example, if collapsing "FOO", and there are vectors that span three associations A, B, and C, such that vector 1 spans A and B, vector 2 spans B and C and vector 3 spans A and C, the resulting table will contain three collapsed vectors:

•: A, containing original vectors 1 and 3

•: B, containing original vectors 1 and 2

•: C, containing original vectors 2 and 3

If a vector maps to the same partition multiple times, it will be counted multiple times.

There are two supported modes for handling one-to-many relationships via one_to_many_mode: add and divide. add will add the vector counts to each partition that the vector maps to, which may increase the total number of counts in the output table. divide will divide a vectors's counts by the number of metadata that the vector has before adding the counts to each partition. This will not increase the total number of counts in the output table.

If one_to_many_md_key is specified, that becomes the metadata key that describes the collapsed path. If a value is not specified, then it defaults to 'Path'.

If strict is specified, then all metadata pathways operated on must be indexable by metadata_f.

one_to_many and norm are not supported together.

one_to_many and reduce_f are not supported together.

one_to_many and min_group_size are not supported together.

A final note on space consumption. At present, the one_to_many functionality requires a temporary dense matrix representation.

Parameters: f : function

Function that is used to determine what partition a vector belongs to

reduce_f : function, optional

Defaults to operator.add. Function that reduces two vectors in a one-to-one collapse

norm : bool, optional

Defaults to True. If True, normalize the resulting table

min_group_size : int, optional

Defaults to 1. The minimum size of a partition when performing a one-to-one collapse

include_collapsed_metadata : bool, optional

Defaults to True. If True, retain the collapsed metadata keyed by the original IDs of the associated vectors

one_to_many : bool, optional

Defaults to False. Perform a one-to-many collapse

one_to_many_mode : {'add', 'divide'}, optional

The way to reduce two vectors in a one-to-many collapse

one_to_many_md_key : str, optional

Defaults to "Path". If include_collapsed_metadata is True, store the original vector metadata under this key

strict : bool, optional

Defaults to False. Requires full pathway data within a one-to-many structure

axis : {'sample', 'observation'}, optional

The axis to collapse

Returns: Table

The collapsed table

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a Table

>>> dt_rich = Table(
...    np.array([[5, 6, 7], [8, 9, 10], [11, 12, 13]]),
...    ['1', '2', '3'], ['a', 'b', 'c'],
...    [{'taxonomy': ['k__a', 'p__b']},
...     {'taxonomy': ['k__a', 'p__c']},
...     {'taxonomy': ['k__a', 'p__c']}],
...    [{'barcode': 'aatt'},
...     {'barcode': 'ttgg'},
...     {'barcode': 'aatt'}])
>>> print dt_rich # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID a   b   c
1   5.0 6.0 7.0
2   8.0 9.0 10.0
3   11.0    12.0    13.0

Create Function to determine what partition a vector belongs to

>>> bin_f = lambda id_, x: x['taxonomy'][1]
>>> obs_phy = dt_rich.collapse(
...    bin_f, norm=False, min_group_size=1,
...    axis='observation').sort(axis='observation')
>>> print obs_phy # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID a   b   c
p__b    5.0 6.0 7.0
p__c    19.0    21.0    23.0

biom.table.Table.copy¶

Table.copy(): Returns a copy of the table

biom.table.Table.data¶

Table.data(id, axis='sample', dense=True): Returns data associated with an id

Parameters: id : str

ID of the samples or observations whose data will be returned.

axis : {'sample', 'observation'}

Axis to search for id.

dense : bool, optional

If True, return data as dense

Returns: np.ndarray or scipy.sparse.spmatrix

np.ndarray if dense, otherwise scipy.sparse.spmatrix

Raises: UnknownAxisError

If provided an unrecognized axis.

Examples.sp

>>> from biom import example_table
>>> example_table.data('S1', axis='sample')
array([ 0.,  3.])

biom.table.Table.delimited_self¶

Table.delimited_self(delim='t', header_key=None, header_value=None, metadata_formatter=<type 'str'>, observation_column_name='#OTU ID'): Return self as a string in a delimited form
Default str output for the Table is just row/col ids and table data without any metadata
Including observation metadata in output: If header_key is not None, the observation metadata with that name will be included in the delimited output. If header_value is also not None, the observation metadata will use the provided header_value as the observation metadata name (i.e., the column header) in the delimited output.
metadata_formatter: a function which takes a metadata entry and returns a formatted version that should be written to file
observation_column_name: the name of the first column in the output table, corresponding to the observation IDs. For example, the default will look something like:

#OTU ID Sample1 Sample2 OTU1 10 2 OTU2 4 8

biom.table.Table.descriptive_equality¶

Table.descriptive_equality(other): For use in testing, describe how the tables are not equal

biom.table.Table.exists¶

Table.exists(id, axis='sample'): Returns whether id exists in axis

Parameters: id: str

id to check if exists

axis : {'sample', 'observation'}, optional

The axis to check

Returns: bool

True if id exists, False otherwise

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'])

Check whether sample ID is in the table:

>>> table.exists('S1')
True
>>> table.exists('S4')
False

Check whether an observation ID is in the table:

>>> table.exists('O1', 'observation')
True
>>> table.exists('O3', 'observation')
False

biom.table.Table.filter¶

Table.filter(ids_to_keep, axis='sample', invert=False, inplace=True): Filter a table based on a function or iterable.

Parameters: ids_to_keep : iterable, or function(values, id, metadata) -> bool

If a function, it will be called with the values of the sample/observation, its id (a string) and the dictionary of metadata of each sample/observation, and must return a boolean. If it's an iterable, it must be a list of ids to keep.

axis : {'sample', 'observation'}, optional

It controls whether to filter samples or observations and defaults to "sample".

invert : bool, optional

Defaults to False. If set to True, discard samples or observations where ids_to_keep returns True

inplace : bool, optional

Defaults to True. Whether to return a new table or modify itself.

Returns: biom.Table

Returns itself if inplace, else returns a new filtered table.

Raises: UnknownAxisError

If provided an unrecognized axis.

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table, with observation metadata and sample metadata:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'],
...               [{'full_genome_available': True},
...                {'full_genome_available': False}],
...               [{'sample_type': 'a'}, {'sample_type': 'a'},
...                {'sample_type': 'b'}])

Define a function to keep only samples with sample_type == 'a'. This will drop sample S3, which has sample_type 'b':

>>> filter_fn = lambda val, id_, md: md['sample_type'] == 'a'

Get a filtered version of the table, leaving the original table untouched:

>>> new_table = table.filter(filter_fn, inplace=False)
>>> print table.ids()
['S1' 'S2' 'S3']
>>> print new_table.ids()
['S1' 'S2']

Using the same filtering function, discard all samples with sample_type 'a'. This will keep only sample S3, which has sample_type 'b':

>>> new_table = table.filter(filter_fn, inplace=False, invert=True)
>>> print table.ids()
['S1' 'S2' 'S3']
>>> print new_table.ids()
['S3']

Filter the table in-place using the same function (drop all samples where sample_type is not 'a'):

>>> table.filter(filter_fn)
2 x 2 <class 'biom.table.Table'> with 2 nonzero entries (50% dense)
>>> print table.ids()
['S1' 'S2']

Filter out all observations in the table that do not have full_genome_available == True. This will filter out observation O2:

>>> filter_fn = lambda val, id_, md: md['full_genome_available']
>>> table.filter(filter_fn, axis='observation')
1 x 2 <class 'biom.table.Table'> with 0 nonzero entries (0% dense)
>>> print table.ids(axis='observation')
['O1']

biom.table.Table.from_hdf5¶

classmethod Table.from_hdf5(h5grp, ids=None, axis='sample'): Parse an HDF5 formatted BIOM table
If ids is provided, only the samples/observations listed in ids (depending on the value of axis) will be loaded
The expected structure of this group is below. A few basic definitions, N is the number of observations and M is the number of samples. Data are stored in both compressed sparse row (for observation oriented operations) and compressed sparse column (for sample oriented operations).

Parameters: h5grp : a h5py Group or an open h5py File
ids : iterable

The sample/observation ids of the samples/observations that we need to retrieve from the hdf5 biom table

axis : {'sample', 'observation'}, optional

The axis to subset on

Returns: biom.Table

A BIOM Table object

Raises: ValueError

If ids are not a subset of the samples or observations ids present in the hdf5 biom table

SEE ALSO:

Table.to_hdf5

Notes

The expected HDF5 group structure is below. An example of an HDF5 file in DDL can be found here [R9].

•: ./id : str, an arbitrary ID

•: ./type : str, the table type (e.g, OTU table)

•: ./format-url : str, a URL that describes the format

•: ./format-version : two element tuple of int32, major and minor

•: ./generated-by : str, what generated this file

•: ./creation-date : str, ISO format

•: ./shape : two element tuple of int32, N by M

•: ./nnz : int32 or int64, number of non zero elems

•: ./observation : Group

•: ./observation/ids : (N,) dataset of str or vlen str

•: ./observation/matrix : Group

•: ./observation/matrix/data : (nnz,) dataset of float64

•: ./observation/matrix/indices : (nnz,) dataset of int32

•: ./observation/matrix/indptr : (M+1,) dataset of int32

•: ./observation/metadata : Group

•: [./observation/metadata/foo] : Optional, (N,) dataset of any valid HDF5 type in index order with IDs.

•: ./observation/group-metadata : Group

•: [./observation/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs

•: [./observation/group-metadata/foo.attrs['data_type']] : attribute of the foo dataset that describes contained type (e.g., newick)

•: ./sample : Group

•: ./sample/ids : (M,) dataset of str or vlen str

•: ./sample/matrix : Group

•: ./sample/matrix/data : (nnz,) dataset of float64

•: ./sample/matrix/indices : (nnz,) dataset of int32

•: ./sample/matrix/indptr : (N+1,) dataset of int32

•: ./sample/metadata : Group

•: [./sample/metadata/foo] : Optional, (M,) dataset of any valid HDF5 type in index order with IDs.

•: ./sample/group-metadata : Group

•: [./sample/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs

•: [./sample/group-metadata/foo.attrs['data_type']] : attribute of the foo dataset that describes contained type (e.g., newick)

The '?' character on the dataset size means that it can be of arbitrary length.

The expected structure for each of the metadata datasets is a list of atomic type objects (int, float, str, ...), where the index order of the list corresponds to the index order of the relevant axis IDs. Special metadata fields have been defined, and they are stored in a specific way. Currently, the available special metadata fields are:

•: taxonomy: (N, ?) dataset of str or vlen str

•: KEGG_Pathways: (N, ?) dataset of str or vlen str

•: collapsed_ids: (N, ?) dataset of str or vlen str

References.IP [R7] 5 http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csr_matrix.html

[R8]: http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csc_matrix.html

[R9]

http://biom-format.org/documentation/format_versions/biom-2.0.html Examples.sp

>>> from biom.table import Table
>>> from biom.util import biom_open
>>> with biom_open('rich_sparse_otu_table_hdf5.biom') as f # doctest: +SKIP
>>>     t = Table.from_hdf5(f) # doctest: +SKIP

Parse a hdf5 biom table subsetting observations >>> from biom.util import biom_open # doctest: +SKIP >>> from biom.parse import parse_biom_table >>> with biom_open('rich_sparse_otu_table_hdf5.biom') as f # doctest: +SKIP >>> t = Table.from_hdf5(f, ids=["GG_OTU_1"],

biom.table.Table.from_json¶

classmethod Table.from_json(json_table, data_pump=None, input_is_dense=False): Parse a biom otu table type

Parameters: json_table : dict

A JSON object or dict that represents the BIOM table

data_pump : tuple or None

A secondary source of data

input_is_dense : bool

If True, the data contained will be interpretted as dense

Returns: Table

Examples.sp

>>> from biom import Table
>>> json_obj = {"id": "None",
...             "format": "Biological Observation Matrix 1.0.0",
...             "format_url": "http://biom-format.org",
...             "generated_by": "foo",
...             "type": "OTU table",
...             "date": "2014-06-03T14:24:40.884420",
...             "matrix_element_type": "float",
...             "shape": [5, 6],
...             "data": [[0,2,1.0],
...                      [1,0,5.0],
...                      [1,1,1.0],
...                      [1,3,2.0],
...                      [1,4,3.0],
...                      [1,5,1.0],
...                      [2,2,1.0],
...                      [2,3,4.0],
...                      [2,5,2.0],
...                      [3,0,2.0],
...                      [3,1,1.0],
...                      [3,2,1.0],
...                      [3,5,1.0],
...                      [4,1,1.0],
...                      [4,2,1.0]],
...             "rows": [{"id": "GG_OTU_1", "metadata": None},
...                      {"id": "GG_OTU_2", "metadata": None},
...                      {"id": "GG_OTU_3", "metadata": None},
...                      {"id": "GG_OTU_4", "metadata": None},
...                      {"id": "GG_OTU_5", "metadata": None}],
...             "columns": [{"id": "Sample1", "metadata": None},
...                         {"id": "Sample2", "metadata": None},
...                         {"id": "Sample3", "metadata": None},
...                         {"id": "Sample4", "metadata": None},
...                         {"id": "Sample5", "metadata": None},
...                         {"id": "Sample6", "metadata": None}]
...             }
>>> t = Table.from_json(json_obj)

biom.table.Table.from_tsv¶

static Table.from_tsv(lines, obs_mapping, sample_mapping, process_func, **kwargs): Parse a tab separated (observation x sample) formatted BIOM table

Parameters: lines : list, or file-like object

The tab delimited data to parse

obs_mapping : dict or None

The corresponding observation metadata

sample_mapping : dict or None

The corresponding sample metadata

process_func : function

A function to transform the observation metadata

Returns: biom.Table

A BIOM Table object

Examples

Parse tab separated data into a table:

>>> from biom.table import Table
>>> from StringIO import StringIO
>>> tsv = 'a\tb\tc\n1\t2\t3\n4\t5\t6'
>>> tsv_fh = StringIO(tsv)
>>> func = lambda x : x
>>> test_table = Table.from_tsv(tsv_fh, None, None, func)

biom.table.Table.get_table_density¶

Table.get_table_density(): Returns the fraction of nonzero elements in the table.

Returns: float

The fraction of nonzero elements in the table

biom.table.Table.get_value_by_ids¶

Table.get_value_by_ids(obs_id, samp_id): Return value in the matrix corresponding to (obs_id, samp_id)

Parameters: obs_id : str

The ID of the observation

samp_id : str

The ID of the sample

Returns: float

The data value corresponding to the specified matrix position

biom.table.Table.group_metadata¶

Table.group_metadata(axis='sample'): Return the group metadata of the given axis

Parameters: axis : {'sample', 'observation'}, optional

Axis to search for the group metadata. Defaults to 'sample'

Returns: dict

The corresponding group metadata for the given axis

Raises: UnknownAxisError

If provided an unrecognized axis.

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table, with group observation metadata and no group sample metadata:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> group_observation_md = {'tree': ('newick', '(O1:0.3,O2:0.4);')}
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'],
...               observation_group_metadata=group_observation_md)

Get the observation group metadata:

>>> table.group_metadata(axis='observation')
{'tree': ('newick', '(O1:0.3,O2:0.4);')}

Get the sample group metadata:

>> table.group_metadata() None

biom.table.Table.ids¶

Table.ids(axis='sample'): Return the ids along the given axis

Parameters: axis : {'sample', 'observation'}, optional

Axis to search for id. Defaults to 'sample'

Returns: 1-D numpy array

The ids along the given axis

Raises: UnknownAxisError

If provided an unrecognized axis.

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'])

Get the ids along the observation axis:

>>> print table.ids(axis='observation')
['O1' 'O2']

Get the ids along the sample axis:

>>> print table.ids()
['S1' 'S2' 'S3']

biom.table.Table.index¶

Table.index(id, axis): Return the index of the identified sample/observation.

Parameters: id : str

ID of the sample or observation whose index will be returned.

axis : {'sample', 'observation'}

Axis to search for id.

Returns: int

Index of the sample/observation identified by id.

Raises: UnknownAxisError

If provided an unrecognized axis.

UnknownIDError

If provided an unrecognized sample/observation ID.

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'])

Get the index of the observation with ID "O2":

>>> table.index('O2', 'observation')
1

Get the index of the sample with ID "S1":

>>> table.index('S1', 'sample')
0

biom.table.Table.is_empty¶

Table.is_empty(): Check whether the table is empty

Returns: bool

True if the table is empty, False otherwise

biom.table.Table.iter¶

Table.iter(dense=True, axis='sample'): Yields (value, id, metadata)

Parameters: dense : bool, optional

Defaults to True. If False, yield compressed sparse row or compressed sparse columns if axis is 'observation' or 'sample', respectively.

axis : {'sample', 'observation'}, optional

The axis to iterate over.

Returns: GeneratorType

A generator that yields (values, id, metadata)

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'Z3'])

Iter over samples and keep those that start with an Z:

>>> [(values, id, metadata)
...     for values, id, metadata in table.iter() if id[0]=='Z']
[(array([  1.,  42.]), 'Z3', None)]

Iter over observations and add the 2nd column of the values

>>> col = [values[1] for values, id, metadata in table.iter()]
>>> sum(col)
46.0

biom.table.Table.iter_data¶

Table.iter_data(dense=True, axis='sample'): Yields axis values

Parameters: dense : bool, optional

Defaults to True. If False, yield compressed sparse row or compressed sparse columns if axis is 'observation' or 'sample', respectively.

axis : {'sample', 'observation'}, optional

Axis to iterate over.

Returns: generator

Yields list of values for each value in axis

Raises: UnknownAxisError

If axis other than 'sample' or 'observation' passed

Examples.sp

>>> import numpy as np
>>> from biom.table import Table
>>> data = np.arange(30).reshape(3,10) # 3 X 10 OTU X Sample table
>>> obs_ids = ['o1', 'o2', 'o3']
>>> sam_ids = ['s%i' %i for i in range(1,11)]
>>> bt = Table(data, observation_ids=obs_ids, sample_ids=sam_ids)

Lets find the sample with the largest sum

>>> sample_gen = bt.iter_data(axis='sample')
>>> max_sample_count = max([sample.sum() for sample in sample_gen])
>>> print max_sample_count
57.0

biom.table.Table.iter_pairwise¶

Table.iter_pairwise(dense=True, axis='sample', tri=True, diag=False): Pairwise iteration over self

Parameters: dense : bool, optional

Defaults to True. If False, yield compressed sparse row or compressed sparse columns if axis is 'observation' or 'sample', respectively.

axis : {'sample', 'observation'}, optional

The axis to iterate over.

tri : bool, optional

If True, just yield [i, j] and not [j, i]

diag : bool, optional

If True, yield [i, i]

Returns: GeneratorType

Yields [(val_i, id_i, metadata_i), (val_j, id_j, metadata_j)]

Raises: UnknownAxisError

Examples.sp

>>> from biom import example_table

By default, only the upper triangle without the diagonal of the resulting pairwise combinations is yielded.

>>> iter_ = example_table.iter_pairwise()
>>> for (val_i, id_i, md_i), (val_j, id_j, md_j) in iter_:
...     print id_i, id_j
S1 S2
S1 S3
S2 S3

The full pairwise combinations can also be yielded though.

>>> iter_ = example_table.iter_pairwise(tri=False, diag=True)
>>> for (val_i, id_i, md_i), (val_j, id_j, md_j) in iter_:
...     print id_i, id_j
S1 S1
S1 S2
S1 S3
S2 S1
S2 S2
S2 S3
S3 S1
S3 S2
S3 S3

biom.table.Table.max¶

Table.max(axis='sample'): Get the maximum nonzero value over an axis

Parameters: axis : {'sample', 'observation', 'whole'}, optional

Defaults to "sample". The axis over which to calculate maxima.

Returns: scalar of self.dtype or np.array of self.dtype

Raises: UnknownAxisError

If provided an unrecognized axis.

Examples.sp

>>> from biom import example_table
>>> print example_table.max(axis='observation')
[ 2.  5.]

biom.table.Table.merge¶

Table.merge(other, sample='union', observation='union', sample_metadata_f=<function prefer_self at 0x7fe2a9d56b18>, observation_metadata_f=<function prefer_self at 0x7fe2a9d56b18>): Merge two tables together
The axes, samples and observations, can be controlled independently. Both can work on either "union" or "intersection".
sample_metadata_f and observation_metadata_f define how to merge metadata between tables. The default is to just keep the metadata associated to self if self has metadata otherwise take metadata from other. These functions are given both metadata dicts and must return a single metadata dict

Parameters: other : biom.Table

The other table to merge with this one

sample : {'union', 'intersection'}, optional

observation : {'union', 'intersection'}, optional

sample_metadata_f : function, optional

Defaults to biom.util.prefer_self. Defines how to handle sample metadata during merge.

obesrvation_metadata_f : function, optional

Defaults to biom.util.prefer_self. Defines how to handle observation metdata during merge.

Returns: biom.Table

The merged table

Notes.INDENT 7.0

•: There is an implicit type conversion to float.

•: The return type is always that of self

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x2 table and a 3x2 table:

>>> d_a = np.asarray([[2, 0], [6, 1]])
>>> t_a = Table(d_a, ['O1', 'O2'], ['S1', 'S2'])
>>> d_b = np.asarray([[4, 5], [0, 3], [10, 10]])
>>> t_b = Table(d_b, ['O1', 'O2', 'O3'], ['S1', 'S2'])

Merging the table results in the overlapping samples/observations (see O1 and S2) to be summed and the non-overlapping ones to be added to the resulting table (see S3).

>>> merged_table = t_a.merge(t_b)
>>> print merged_table  # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1      S2
O1      6.0     5.0
O2      6.0     4.0
O3      10.0    10.0

biom.table.Table.metadata¶

Table.metadata(id=None, axis='sample'): Return the metadata of the identified sample/observation.

Parameters: id : str

ID of the sample or observation whose index will be returned.

axis : {'sample', 'observation'}

Axis to search for id.

Returns: defaultdict or None

The corresponding metadata defaultdict or None of that axis does not have metadata.

Raises: UnknownAxisError

If provided an unrecognized axis.

UnknownIDError

If provided an unrecognized sample/observation ID.

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table, with observation metadata and no sample metadata:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'],
...               [{'foo': 'bar'}, {'x': 'y'}], None)

Get the metadata of the observation with ID "O2":

>>> # casting to `dict` as the return is `defaultdict`
>>> dict(table.metadata('O2', 'observation'))
{'x': 'y'}

Get the metadata of the sample with ID "S1":

>>> table.metadata('S1', 'sample') is None
True

biom.table.Table.min¶

Table.min(axis='sample'): Get the minimum nonzero value over an axis

Parameters: axis : {'sample', 'observation', 'whole'}, optional

Defaults to "sample". The axis over which to calculate minima.

Returns: scalar of self.dtype or np.array of self.dtype

Raises: UnknownAxisError

If provided an unrecognized axis.

Examples.sp

>>> from biom import example_table
>>> print example_table.min(axis='sample')
[ 3.  1.  2.]

biom.table.Table.nonzero¶

Table.nonzero(): Yields locations of nonzero elements within the data matrix

Returns: generator

Yields (observation_id, sample_id) for each nonzero element

biom.table.Table.nonzero_counts¶

Table.nonzero_counts(axis, binary=False): Get nonzero summaries about an axis

Parameters: axis : {'sample', 'observation', 'whole'}

The axis on which to count nonzero entries

binary : bool, optional

Defaults to False. If False, return number of nonzero entries. If True, sum the values of the entries.

Returns: numpy.array

Counts in index order to the axis

biom.table.Table.norm¶

Table.norm(axis='sample', inplace=True): Normalize in place sample values by an observation, or vice versa.

Parameters: axis : {'sample', 'observation'}, optional

The axis to use for normalization

inplace : bool, optional

Defaults to True. If True, performs the normalization in place. Otherwise, returns a new table with the noramlization applied.

Returns: biom.Table

The normalized table

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x2 table:

>>> data = np.asarray([[2, 0], [6, 1]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2'])

Get a version of the table normalized on the 'sample' axis, leaving the original table untouched:

>>> new_table = table.norm(inplace=False)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  2.0 0.0
O2  6.0 1.0
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  0.25    0.0
O2  0.75    1.0

Get a version of the table normalized on the 'observation' axis, again leaving the original table untouched:

>>> new_table = table.norm(axis='observation', inplace=False)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  2.0 0.0
O2  6.0 1.0
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  1.0 0.0
O2  0.857142857143  0.142857142857

Do the same normalization on 'observation', this time in-place:

>>> table.norm(axis='observation')
2 x 2 <class 'biom.table.Table'> with 3 nonzero entries (75% dense)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  1.0 0.0
O2  0.857142857143  0.142857142857

biom.table.Table.pa¶

Table.pa(inplace=True): Convert the table to presence/absence data

Parameters: inplace : bool, optional

Defaults to False

Returns: Table

Returns itself if inplace, else returns a new presence/absence table.

Examples.sp

>>> from biom.table import Table
>>> import numpy as np

Create a 2x3 BIOM table

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'])

Convert to presence/absence data

>>> _ = table.pa()
>>> print table.data('O1', 'observation')
[ 0.  0.  1.]
>>> print table.data('O2', 'observation')
[ 1.  1.  1.]

biom.table.Table.partition¶

Table.partition(f, axis='sample'): Yields partitions

Parameters: f : function

f is given the ID and metadata of the vector and must return what partition the vector is part of.

axis : {'sample', 'observation'}, optional

The axis to iterate over

Returns: GeneratorType

A generator that yields (partition, Table)

Examples.sp

>>> import numpy as np
>>> from biom.table import Table
>>> from biom.util import unzip

Create a 2x3 BIOM table, with observation metadata and sample metadata:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'],
...               [{'full_genome_available': True},
...                {'full_genome_available': False}],
...               [{'sample_type': 'a'}, {'sample_type': 'a'},
...                {'sample_type': 'b'}])

Define a function to bin by sample_type

>>> f = lambda id_, md: md['sample_type']

Partition the table and view results

>>> bins, tables = table.partition(f)
>>> print bins[1] # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  0.0 0.0
O2  1.0 3.0
>>> print tables[1] # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S3
O1  1.0
O2  42.0

biom.table.Table.reduce¶

Table.reduce(f, axis): Reduce over axis using function f

Parameters: f : function

The function to use for the reduce operation

axis : {'sample', 'observation'}

The axis on which to operate

Returns: numpy.array

A one-dimensional array representing the reduced rows (observations) or columns (samples) of the data matrix

Raises: UnknownAxisError

If axis is neither "sample" nor "observation"

TableException

If the table's data matrix is empty

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 table

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'],
...               [{'foo': 'bar'}, {'x': 'y'}], None)

Create a reduce function

>>> func = lambda x, y: x + y

Reduce table on samples

>>> table.reduce(func, 'sample') # doctest: +NORMALIZE_WHITESPACE
array([  1.,   3.,  43.])

Reduce table on observations

>>> table.reduce(func, 'observation') # doctest: +NORMALIZE_WHITESPACE
array([  1.,  46.])

biom.table.Table.sort¶

Table.sort(sort_f=<function natsort at 0x7fe2a9d56aa0>, axis='sample'): Return a table sorted along axis

Parameters: sort_f : function, optional

Defaults to biom.util.natsort. A function that takes a list of values and sorts it

axis : {'sample', 'observation'}, optional

The axis to operate on

Returns: biom.Table

A table whose samples or observations are sorted according to the sort_f function

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table:

>>> data = np.asarray([[1, 0, 4], [1, 3, 0]])
>>> table = Table(data, ['O2', 'O1'], ['S2', 'S1', 'S3'])
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2  S1  S3
O2  1.0 0.0 4.0
O1  1.0 3.0 0.0

Sort the order of samples in the table using the default natural sorting:

>>> new_table = table.sort()
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O2  0.0 1.0 4.0
O1  3.0 1.0 0.0

Sort the order of observations in the table using the default natural sorting:

>>> new_table = table.sort(axis='observation')
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2  S1  S3
O1  1.0 3.0 0.0
O2  1.0 0.0 4.0

Sort the samples in reverse order using a custom sort function:

>>> sort_f = lambda x: list(sorted(x, reverse=True))
>>> new_table = table.sort(sort_f=sort_f)
>>> print new_table  # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S3  S2  S1
O2  4.0 1.0 0.0
O1  0.0 1.0 3.0

biom.table.Table.sort_order¶

Table.sort_order(order, axis='sample'): Return a new table with axis in order

Parameters: order : iterable

The desired order for axis

axis : {'sample', 'observation'}, optional

The axis to operate on

Returns: Table

A table where the observations or samples are sorted according to order

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table:

>>> data = np.asarray([[1, 0, 4], [1, 3, 0]])
>>> table = Table(data, ['O2', 'O1'], ['S2', 'S1', 'S3'])
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2  S1  S3
O2  1.0 0.0 4.0
O1  1.0 3.0 0.0

Sort the table using a list of samples:

>>> sorted_table = table.sort_order(['S2', 'S3', 'S1'])
>>> print sorted_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2      S3      S1
O2      1.0     4.0     0.0
O1      1.0     0.0     3.0

Additionally you could sort the table's observations:

>>> sorted_table = table.sort_order(['O1', 'O2'], axis="observation")
>>> print sorted_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2      S1      S3
O1      1.0     3.0     0.0
O2      1.0     0.0     4.0

biom.table.Table.subsample¶

Table.subsample(n, axis='sample', by_id=False): Randomly subsample without replacement.

Parameters: n : int

Number of items to subsample from counts.

axis : {'sample', 'observation'}, optional

The axis to sample over

by_id : boolean, optional

If False, the subsampling is based on the counts contained in the matrix (e.g., rarefaction). If True, the subsampling is based on the IDs (e.g., fetch a random subset of samples). Default is False.

Returns: biom.Table

A subsampled version of self

Raises: ValueError

If n is less than zero.

Notes

Subsampling is performed without replacement. If n is greater than the sum of a given vector, that vector is omitted from the result.

Adapted from skbio.math.subsample, see biom-format/licenses for more information about scikit-bio.

This code assumes absolute abundance if by_id is False. Examples.sp

>>> import numpy as np
>>> from biom.table import Table
>>> table = Table(np.array([[0, 2, 3], [1, 0, 2]]), ['O1', 'O2'],
...               ['S1', 'S2', 'S3'])

Subsample 1 item over the sample axis by value (e.g., rarefaction):

>>> print table.subsample(1).sum(axis='sample')
[ 1.  1.  1.]

Subsample 2 items over the sample axis, note that 'S1' is filtered out:

>>> ss = table.subsample(2)
>>> print ss.sum(axis='sample')
[ 2.  2.]
>>> print ss.ids()
['S2' 'S3']

Subsample by IDs over the sample axis. For this example, we're going to randomly select 2 samples and do this 100 times, and then print out the set of IDs observed.

>>> ids = set([tuple(table.subsample(2, by_id=True).ids())
...            for i in range(100)])
>>> print sorted(ids)
[('S1', 'S2'), ('S1', 'S3'), ('S2', 'S3')]

biom.table.Table.sum¶

Table.sum(axis='whole'): Returns the sum by axis

Parameters: axis : {'whole', 'sample', 'observation'}, optional

The axis on which to operate.

Returns: numpy.array or float

If axis is "whole", returns an float representing the whole table sum. If axis is either "sample" or "observation", returns a numpy.array that holds a sum for each sample or observation, respectively.

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'])

Add all values in the table:

>>> table.sum()
array(47.0)

Add all values per sample:

>>> table.sum(axis='sample') # doctest: +NORMALIZE_WHITESPACE
array([  1.,  3.,  43.])

Add all values per observation:

>>> table.sum(axis='observation') # doctest: +NORMALIZE_WHITESPACE
array([  1.,  46.])

biom.table.Table.to_hdf5¶

Table.to_hdf5(h5grp, generated_by, compress=True): Store CSC and CSR in place
The resulting structure of this group is below. A few basic definitions, N is the number of observations and M is the number of samples. Data are stored in both compressed sparse row [R10] (CSR, for observation oriented operations) and compressed sparse column [R11] (CSC, for sample oriented operations).

Parameters: h5grp : h5py.Group or h5py.File

The HDF5 entity in which to write the BIOM formatted data.

generated_by : str

A description of what generated the table

compress : bool, optional

Defaults to True means fields will be compressed with gzip, False means no compression

SEE ALSO:

Table.from_hdf5

Notes

The expected HDF5 group structure is below. An example of an HDF5 file in DDL can be found here [R12].

•: ./id : str, an arbitrary ID

•: ./type : str, the table type (e.g, OTU table)

•: ./format-url : str, a URL that describes the format

•: ./format-version : two element tuple of int32, major and minor

•: ./generated-by : str, what generated this file

•: ./creation-date : str, ISO format

•: ./shape : two element tuple of int32, N by M

•: ./nnz : int32 or int64, number of non zero elems

•: ./observation : Group

•: ./observation/ids : (N,) dataset of str or vlen str

•: ./observation/matrix : Group

•: ./observation/matrix/data : (nnz,) dataset of float64

•: ./observation/matrix/indices : (nnz,) dataset of int32

•: ./observation/matrix/indptr : (M+1,) dataset of int32

•: ./observation/metadata : Group

•: [./observation/metadata/foo] : Optional, (N,) dataset of any valid HDF5 type in index order with IDs.

•: ./observation/group-metadata : Group

•: [./observation/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs

•: [./observation/group-metadata/foo.attrs['data_type']] : attribute of the foo dataset that describes contained type (e.g., newick)

•: ./sample : Group

•: ./sample/ids : (M,) dataset of str or vlen str

•: ./sample/matrix : Group

•: ./sample/matrix/data : (nnz,) dataset of float64

•: ./sample/matrix/indices : (nnz,) dataset of int32

•: ./sample/matrix/indptr : (N+1,) dataset of int32

•: ./sample/metadata : Group

•: [./sample/metadata/foo] : Optional, (M,) dataset of any valid HDF5 type in index order with IDs.

•: ./sample/group-metadata : Group

•: [./sample/group-metadata/foo] : Optional, (?,) dataset of group metadata that relates IDs

•: [./sample/group-metadata/foo.attrs['data_type']] : attribute of the foo dataset that describes contained type (e.g., newick)

The '?' character on the dataset size means that it can be of arbitrary length.

•: taxonomy: (N, ?) dataset of str or vlen str

•: KEGG_Pathways: (N, ?) dataset of str or vlen str

•: collapsed_ids: (N, ?) dataset of str or vlen str

References.IP [R10] 5 http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csr_matrix.html

[R11]: http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.sparse.csc_matrix.html

[R12]

http://biom-format.org/documentation/format_versions/biom-2.0.html Examples.sp

>>> from biom.util import biom_open  # doctest: +SKIP
>>> from biom.table import Table
>>> from numpy import array
>>> t = Table(array([[1, 2], [3, 4]]), ['a', 'b'], ['x', 'y'])
>>> with biom_open('foo.biom', 'w') as f:  # doctest: +SKIP
...     t.to_hdf5(f, "example")

biom.table.Table.to_json¶

Table.to_json(generated_by, direct_io=None): Returns a JSON string representing the table in BIOM format.

Parameters: generated_by : str

a string describing the software used to build the table

direct_io : file or file-like object, optional

Defaults to None. Must implementing a write function. If direct_io is not None, the final output is written directly to direct_io during processing.

Returns: str

A JSON-formatted string representing the biom table

biom.table.Table.to_tsv¶

Table.to_tsv(header_key=None, header_value=None, metadata_formatter=<type 'str'>, observation_column_name='#OTU ID'): Return self as a string in tab delimited form
Default str output for the Table is just row/col ids and table data without any metadata

Parameters: header_key : str or None, optional

Defaults to None

header_value : str or None, optional

Defaults to None

metadata_formatter : function, optional

Defaults to str. a function which takes a metadata entry and returns a formatted version that should be written to file

observation_column_name : str, optional

Defaults to "#OTU ID". The name of the first column in the output table, corresponding to the observation IDs.

Returns: str

tab delimited representation of the Table

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 BIOM table, with observation metadata and no sample metadata:

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'],
...               [{'foo': 'bar'}, {'x': 'y'}], None)
>>> print table.to_tsv() # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1      S2      S3
O1      0.0     0.0     1.0
O2      1.0     3.0     42.0

biom.table.Table.transform¶

Table.transform(f, axis='sample', inplace=True): Iterate over axis, applying a function f to each vector.
Only non null values can be modified and the density of the table can't increase. However, zeroing values is fine.

Parameters: f : function(data, id, metadata) -> new data

A function that takes three values: an array of nonzero values corresponding to each observation or sample, an observation or sample id, and an observation or sample metadata entry. It must return an array of transformed values that replace the original values.

axis : {'sample', 'observation'}, optional

The axis to operate on. Can be "sample" or "observation".

inplace : bool, optional

Defaults to True. Whether to return a new table or modify itself.

Returns: biom.Table

Returns itself if inplace, else returns a new transformed table.

Raises: UnknownAxisError

If provided an unrecognized axis.

Examples.sp

>>> import numpy as np
>>> from biom.table import Table

Create a 2x3 table

>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, ['O1', 'O2'], ['S1', 'S2', 'S3'],
...               [{'foo': 'bar'}, {'x': 'y'}], None)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 1.0
O2  1.0 3.0 42.0

Create a transform function

>>> f = lambda data, id_, md: data / 2

Transform to a new table on samples

>>> table2 = table.transform(f, 'sample', False)
>>> print table2 # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 0.5
O2  0.5 1.5 21.0

table hasn't changed

>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 1.0
O2  1.0 3.0 42.0

Tranform in place on observations

>>> table3 = table.transform(f, 'observation', True)

table is different now

>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 0.5
O2  0.5 1.5 21.0

but the table returned ( table3) is the same as table

>>> print table3 # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 0.5
O2  0.5 1.5 21.0

biom.table.Table.transpose¶

Table.transpose(): Transpose the contingency table
The returned table will be an entirely new table, including copies of the (transposed) data, sample/observation IDs and metadata.

Returns: Table

Return a new table that is the transpose of caller table.

Examples¶

First, lets create a toy table to play around with. For this example, we're going to construct a 10x4 Table, or one that has 10 observations and 4 samples. Each observation and sample will be given an arbitrary but unique name. We'll also add on some metadata.

>>> import numpy as np
>>> from biom.table import Table
>>> data = np.arange(40).reshape(10, 4)
>>> sample_ids = ['S%d' % i for i in range(4)]
>>> observ_ids = ['O%d' % i for i in range(10)]
>>> sample_metadata = [{'environment': 'A'}, {'environment': 'B'},
...                    {'environment': 'A'}, {'environment': 'B'}]
>>> observ_metadata = [{'taxonomy': ['Bacteria', 'Firmicutes']},
...                    {'taxonomy': ['Bacteria', 'Firmicutes']},
...                    {'taxonomy': ['Bacteria', 'Proteobacteria']},
...                    {'taxonomy': ['Bacteria', 'Proteobacteria']},
...                    {'taxonomy': ['Bacteria', 'Proteobacteria']},
...                    {'taxonomy': ['Bacteria', 'Bacteroidetes']},
...                    {'taxonomy': ['Bacteria', 'Bacteroidetes']},
...                    {'taxonomy': ['Bacteria', 'Firmicutes']},
...                    {'taxonomy': ['Bacteria', 'Firmicutes']},
...                    {'taxonomy': ['Bacteria', 'Firmicutes']}]
>>> table = Table(data, observ_ids, sample_ids, observ_metadata,
...               sample_metadata, table_id='Example Table')

Now that we have a table, let's explore it at a high level first.

>>> table
10 x 4 <class 'biom.table.Table'> with 39 nonzero entries (97% dense)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
O0  0.0 1.0 2.0 3.0
O1  4.0 5.0 6.0 7.0
O2  8.0 9.0 10.0    11.0
O3  12.0    13.0    14.0    15.0
O4  16.0    17.0    18.0    19.0
O5  20.0    21.0    22.0    23.0
O6  24.0    25.0    26.0    27.0
O7  28.0    29.0    30.0    31.0
O8  32.0    33.0    34.0    35.0
O9  36.0    37.0    38.0    39.0
>>> print table.ids() # doctest: +NORMALIZE_WHITESPACE
['S0' 'S1' 'S2' 'S3']
>>> print table.ids(axis='observation') # doctest: +NORMALIZE_WHITESPACE
['O0' 'O1' 'O2' 'O3' 'O4' 'O5' 'O6' 'O7' 'O8' 'O9']
>>> print table.nnz  # number of nonzero entries
39

While it's fun to just poke at the table, let's dig deeper. First, we're going to convert table into relative abundances (within each sample), and then filter table to just the samples associated with environment 'A'. The filtering gets fancy: we can pass in an arbitrary function to determine what samples we want to keep. This function must accept a sparse vector of values, the corresponding ID and the corresponding metadata, and should return True or False, where True indicates that the vector should be retained.

>>> normed = table.norm(axis='sample', inplace=False)
>>> filter_f = lambda values, id_, md: md['environment'] == 'A'
>>> env_a = normed.filter(filter_f, axis='sample', inplace=False)
>>> print env_a # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S2
O0  0.0 0.01
O1  0.0222222222222 0.03
O2  0.0444444444444 0.05
O3  0.0666666666667 0.07
O4  0.0888888888889 0.09
O5  0.111111111111  0.11
O6  0.133333333333  0.13
O7  0.155555555556  0.15
O8  0.177777777778  0.17
O9  0.2 0.19

But, what if we wanted individual tables per environment? While we could just perform some fancy iteration, we can instead just rely on Table.partition for these operations. partition, like filter, accepts a function. However, the partition method only passes the corresponding ID and metadata to the function. The function should return what partition the data are a part of. Within this example, we're also going to sum up our tables over the partitioned samples. Please note that we're using the original table (ie, not normalized) here.

>>> part_f = lambda id_, md: md['environment']
>>> env_tables = table.partition(part_f, axis='sample')
>>> for partition, env_table in env_tables:
...     print partition, env_table.sum('sample')
A [ 180.  200.]
B [ 190.  210.]

For this last example, and to highlight a bit more functionality, we're going to first transform the table such that all multiples of three will be retained, while all non-multiples of three will get set to zero. Following this, we'll then collpase the table by taxonomy, and then convert the table into presence/absence data.

First, let's setup the transform. We're going to define a function that takes the modulus of every value in the vector, and see if it is equal to zero. If it is equal to zero, we'll keep the value, otherwise we'll set the value to zero.

>>> transform_f = lambda v,i,m: np.where(v % 3 == 0, v, 0)
>>> mult_of_three = tform = table.transform(transform_f, inplace=False)
>>> print mult_of_three # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
O0  0.0 0.0 0.0 3.0
O1  0.0 0.0 6.0 0.0
O2  0.0 9.0 0.0 0.0
O3  12.0    0.0 0.0 15.0
O4  0.0 0.0 18.0    0.0
O5  0.0 21.0    0.0 0.0
O6  24.0    0.0 0.0 27.0
O7  0.0 0.0 30.0    0.0
O8  0.0 33.0    0.0 0.0
O9  36.0    0.0 0.0 39.0

Next, we're going to collapse the table over the phylum level taxon. To do this, we're going to define a helper variable for the index position of the phylum (see the construction of the table above). Next, we're going to pass this to Table.collapse, and since we want to collapse over the observations, we'll need to specify 'observation' as the axis.

>>> phylum_idx = 1
>>> collapse_f = lambda id_, md: '; '.join(md['taxonomy'][:phylum_idx + 1])
>>> collapsed = mult_of_three.collapse(collapse_f, axis='observation')
>>> print collapsed # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
Bacteria; Firmicutes  7.2 6.6 7.2 8.4
Bacteria; Bacteroidetes   12.0    10.5    0.0 13.5
Bacteria; Proteobacteria  4.0 3.0 6.0 5.0

Finally, let's convert the table to presence/absence data.

>>> pa = collapsed.pa()
>>> print pa # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
Bacteria; Firmicutes  1.0 1.0 1.0 1.0
Bacteria; Bacteroidetes   1.0 1.0 0.0 1.0
Bacteria; Proteobacteria  1.0 1.0 1.0 1.0

Converting between file formats¶

The convert command in the biom-format project can be used to convert between biom and tab-delimited table formats. This is useful for several reasons:

•: converting biom format tables to tab-delimited tables for easy viewing in programs such as Excel

•: converting between sparse and dense biom formats

NOTE:

The tab-delimited tables are commonly referred to as the classic format tables, while BIOM formatted tables are referred to as biom tables.

General usage examples¶

Convert a tab-delimited table to a HDF5 or JSON biom format. Note that you must specify the type of table here:

biom convert -i table.txt -o table.from_txt_json.biom --table-type="OTU table" --to-json
biom convert -i table.txt -o table.from_txt_hdf5.biom --table-type="OTU table" --to-hdf5

Convert biom format to tab-delimited table format:

biom convert -i table.biom -o table.from_biom.txt --to-tsv

Convert biom format to classic format, including the taxonomy observation metadata as the last column of the classic format table. Because the BIOM format can support an arbitrary number of observation (or sample) metadata entries, and the classic format can support only a single observation metadata entry, you must specify which of the observation metadata entries you want to include in the output table:

biom convert -i table.biom -o table.from_biom_w_taxonomy.txt --to-tsv --header-key taxonomy

Convert biom format to classic format, including the taxonomy observation metadata as the last column of the classic format table, but renaming that column as ConsensusLineage. This is useful when using legacy tools that require a specific name for the observation metadata column.:

biom convert -i table.biom -o table.from_biom_w_consensuslineage.txt --to-tsv --header-key taxonomy --output-metadata-id "ConsensusLineage"

Special case usage examples¶

Round-tripping between biom and tsv¶

In specific cases, see this comment, it is still useful to convert our biom table to tsv so we can open in Excel, make some changes to the file and then convert back to biom. For this cases you should follow this steps:

•: Convert from biom to txt:

biom convert -i otu_table.biom -o otu_table.txt --to-tsv --header-key taxonomy

•: Make your changes in Excel.

•: Convert back to biom:

biom convert -i otu_table.txt -o new_otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy

Converting QIIME 1.4.0 and earlier OTU tables to BIOM format¶

If you are converting a QIIME 1.4.0 or earlier OTU table to BIOM format, there are a few steps to go through. First, for convenience, you might want to rename the ConsensusLineage column taxonomy. You can do this with the following command:

sed 's/Consensus Lineage/ConsensusLineage/' < otu_table.txt | sed 's/ConsensusLineage/taxonomy/' > otu_table.taxonomy.txt

Then, you'll want to perform the conversion including a step to convert the taxonomy string from the classic OTU table to a taxonomy list, as it's represented in QIIME 1.4.0-dev and later:

biom convert -i otu_table.taxonomy.txt -o otu_table.from_txt.biom --table-type="OTU table" --process-obs-metadata taxonomy

Adding sample and observation metadata to biom files¶

Frequently you'll have an existing BIOM file and want to add sample and/or observation metadata to it. For samples, metadata is frequently environmental or technical details about your samples: the subject that a sample was collected from, the pH of the sample, the PCR primers used to amplify DNA from the samples, etc. For observations, metadata is frequently a categorization of the observation: the taxonomy of an OTU, or the EC hierarchy of a gene. You can use the biom add-metadata command to add this information to an existing BIOM file.

To get help with add-metadata you can call:

biom add-metadata -h

This command takes a BIOM file, and corresponding sample and/or observation mapping files. The following examples are used in the commands below. You can find these files in the biom-format/examples directory.

Your BIOM file might look like the following:

{
    "id":null,
    "format": "1.0.0",
    "format_url": "http://biom-format.org",
    "type": "OTU table",
    "generated_by": "some software package",
    "date": "2011-12-19T19:00:00",
    "rows":[
            {"id":"GG_OTU_1", "metadata":null},
            {"id":"GG_OTU_2", "metadata":null},
            {"id":"GG_OTU_3", "metadata":null},
            {"id":"GG_OTU_4", "metadata":null},
            {"id":"GG_OTU_5", "metadata":null}
        ],
    "columns": [
            {"id":"Sample1", "metadata":null},
            {"id":"Sample2", "metadata":null},
            {"id":"Sample3", "metadata":null},
            {"id":"Sample4", "metadata":null},
            {"id":"Sample5", "metadata":null},
            {"id":"Sample6", "metadata":null}
        ],
    "matrix_type": "sparse",
    "matrix_element_type": "int",
    "shape": [5, 6],
    "data":[[0,2,1],
            [1,0,5],
            [1,1,1],
            [1,3,2],
            [1,4,3],
            [1,5,1],
            [2,2,1],
            [2,3,4],
            [2,5,2],
            [3,0,2],
            [3,1,1],
            [3,2,1],
            [3,5,1],
            [4,1,1],
            [4,2,1]
           ]
}

A sample metadata mapping file could then look like the following. Notice that there is an extra sample in here with respect to the above BIOM table. Any samples in the mapping file that are not in the BIOM file are ignored.

#SampleID       BarcodeSequence DOB
# Some optional
# comment lines...
Sample1 AGCACGAGCCTA    20060805
Sample2 AACTCGTCGATG    20060216
Sample3 ACAGACCACTCA    20060109
Sample4 ACCAGCGACTAG    20070530
Sample5 AGCAGCACTTGT    20070101
Sample6 AGCAGCACAACT    20070716

An observation metadata mapping file might look like the following. Notice that there is an extra observation in here with respect to the above BIOM table. Any observations in the mapping file that are not in the BIOM file are ignored.

#OTUID  taxonomy        confidence
# Some optional
# comment lines
GG_OTU_0        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__       0.980
GG_OTU_1        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.665
GG_OTU_2        Root;k__Bacteria        0.980
GG_OTU_3        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000
GG_OTU_4        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.842
GG_OTU_5        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000

Adding metadata¶

To add sample metadata to a BIOM file, you can run the following:

biom add-metadata -i min_sparse_otu_table.biom -o table.w_smd.biom --sample-metadata-fp sam_md.txt

To add observation metadata to a BIOM file, you can run the following:

biom add-metadata -i min_sparse_otu_table.biom -o table.w_omd.biom --observation-metadata-fp obs_md.txt

You can also combine these in a single command to add both observation and sample metadata:

biom add-metadata -i min_sparse_otu_table.biom -o table.w_md.biom --observation-metadata-fp obs_md.txt --sample-metadata-fp sam_md.txt

In the last case, the resulting BIOM file will look like the following:

{
    "columns": [
        {
            "id": "Sample1",
            "metadata": {
                "BarcodeSequence": "AGCACGAGCCTA",
                "DOB": "20060805"
            }
        },
        {
            "id": "Sample2",
            "metadata": {
                "BarcodeSequence": "AACTCGTCGATG",
                "DOB": "20060216"
            }
        },
        {
            "id": "Sample3",
            "metadata": {
                "BarcodeSequence": "ACAGACCACTCA",
                "DOB": "20060109"
            }
        },
        {
            "id": "Sample4",
            "metadata": {
                "BarcodeSequence": "ACCAGCGACTAG",
                "DOB": "20070530"
            }
        },
        {
            "id": "Sample5",
            "metadata": {
                "BarcodeSequence": "AGCAGCACTTGT",
                "DOB": "20070101"
            }
        },
        {
            "id": "Sample6",
            "metadata": {
                "BarcodeSequence": "AGCAGCACAACT",
                "DOB": "20070716"
            }
        }
    ],
    "data": [
        [0, 2, 1.0],
        [1, 0, 5.0],
        [1, 1, 1.0],
        [1, 3, 2.0],
        [1, 4, 3.0],
        [1, 5, 1.0],
        [2, 2, 1.0],
        [2, 3, 4.0],
        [2, 5, 2.0],
        [3, 0, 2.0],
        [3, 1, 1.0],
        [3, 2, 1.0],
        [3, 5, 1.0],
        [4, 1, 1.0],
        [4, 2, 1.0]
    ],
    "date": "2012-12-11T07:36:15.467843",
    "format": "Biological Observation Matrix 1.0.0",
    "format_url": "http://biom-format.org",
    "generated_by": "some software package",
    "id": null,
    "matrix_element_type": "float",
    "matrix_type": "sparse",
    "rows": [
        {
            "id": "GG_OTU_1",
            "metadata": {
                "confidence": "0.665",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        },
        {
            "id": "GG_OTU_2",
            "metadata": {
                "confidence": "0.980",
                "taxonomy": "Root;k__Bacteria"
            }
        },
        {
            "id": "GG_OTU_3",
            "metadata": {
                "confidence": "1.000",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        },
        {
            "id": "GG_OTU_4",
            "metadata": {
                "confidence": "0.842",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        },
        {
            "id": "GG_OTU_5",
            "metadata": {
                "confidence": "1.000",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        }
    ],
    "shape": [5, 6],
    "type": "OTU table"
}

Processing metadata while adding¶

There are some additional parameters you can pass to this command for more complex processing.

You can tell the command to process certain metadata column values as integers ( --int-fields), floating point (i.e., decimal or real) numbers (--float-fields), or as hierarchical semicolon-delimited data ( --sc-separated).

biom add-metadata -i min_sparse_otu_table.biom -o table.w_md.biom --observation-metadata-fp obs_md.txt --sample-metadata-fp sam_md.txt --int-fields DOB --sc-separated taxonomy --float-fields confidence

Here your resulting BIOM file will look like the following, where DOB values are now integers (compare to the above: they're not quoted now), confidence values are now floating point numbers (again, not quoted now), and taxonomy values are now lists where each entry is a taxonomy level, opposed to above where they appear as a single semi-colon-separated string.

{
    "columns": [
        {
            "id": "Sample1",
            "metadata": {
                "BarcodeSequence": "AGCACGAGCCTA",
                "DOB": 20060805
            }
        },
        {
            "id": "Sample2",
            "metadata": {
                "BarcodeSequence": "AACTCGTCGATG",
                "DOB": 20060216
            }
        },
        {
            "id": "Sample3",
            "metadata": {
                "BarcodeSequence": "ACAGACCACTCA",
                "DOB": 20060109
            }
        },
        {
            "id": "Sample4",
            "metadata": {
                "BarcodeSequence": "ACCAGCGACTAG",
                "DOB": 20070530
            }
        },
        {
            "id": "Sample5",
            "metadata": {
                "BarcodeSequence": "AGCAGCACTTGT",
                "DOB": 20070101
            }
        },
        {
            "id": "Sample6",
            "metadata": {
                "BarcodeSequence": "AGCAGCACAACT",
                "DOB": 20070716
            }
        }
    ],
    "data": [
        [0, 2, 1.0],
        [1, 0, 5.0],
        [1, 1, 1.0],
        [1, 3, 2.0],
        [1, 4, 3.0],
        [1, 5, 1.0],
        [2, 2, 1.0],
        [2, 3, 4.0],
        [2, 5, 2.0],
        [3, 0, 2.0],
        [3, 1, 1.0],
        [3, 2, 1.0],
        [3, 5, 1.0],
        [4, 1, 1.0],
        [4, 2, 1.0]
    ],
    "date": "2012-12-11T07:30:29.870689",
    "format": "Biological Observation Matrix 1.0.0",
    "format_url": "http://biom-format.org",
    "generated_by": "some software package",
    "id": null,
    "matrix_element_type": "float",
    "matrix_type": "sparse",
    "rows": [
        {
            "id": "GG_OTU_1",
            "metadata": {
                "confidence": 0.665,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        },
        {
            "id": "GG_OTU_2",
            "metadata": {
                "confidence": 0.98,
                "taxonomy": ["Root", "k__Bacteria"]
            }
        },
        {
            "id": "GG_OTU_3",
            "metadata": {
                "confidence": 1.0,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        },
        {
            "id": "GG_OTU_4",
            "metadata": {
                "confidence": 0.842,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        },
        {
            "id": "GG_OTU_5",
            "metadata": {
                "confidence": 1.0,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        }
    ],
    "shape": [5, 6],
    "type": "OTU table"
}

If you have multiple fields that you'd like processed in one of these ways, you can pass a comma-separated list of field names (e.g., --float-fields confidence,pH).

Renaming (or naming) metadata columns while adding¶

You can also override the names of the metadata fields provided in the mapping files with the --observation-header and --sample-header parameters. This is useful if you want to rename metadata columns, or if metadata column headers aren't present in your metadata mapping file. If you pass either of these parameters, you must name all columns in order. If there are more columns in the metadata mapping file then there are headers, extra columns will be ignored (so this is also a useful way to select only the first n columns from your mapping file). For example, if you want to rename the DOB column in the sample metadata mapping you could do the following:

biom add-metadata -i min_sparse_otu_table.biom -o table.w_smd.biom --sample-metadata-fp sam_md.txt --sample-header SampleID,BarcodeSequence,DateOfBirth

If you have a mapping file without headers such as the following:

GG_OTU_0        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__       0.980
GG_OTU_1        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.665
GG_OTU_2        Root;k__Bacteria        0.980
GG_OTU_3        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000
GG_OTU_4        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.842
GG_OTU_5        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000

you could name these while adding them as follows:

biom add-metadata -i min_sparse_otu_table.biom -o table.w_omd.biom --observation-metadata-fp obs_md.txt --observation-header OTUID,taxonomy,confidence

As a variation on the last command, if you only want to include the taxonomy column and exclude the confidence column, you could run:

biom add-metadata -i min_sparse_otu_table.biom -o table.w_omd.biom --observation-metadata-fp obs_md.txt --observation-header OTUID,taxonomy

Summarizing BIOM tables¶

If you have an existing BIOM file and want to compile a summary of the information in that table, you can use the biom summarize-table command.

To get help with biom summarize-table you can call:

biom summarize-table -h

This command takes a BIOM file or gzipped BIOM file as input, and will print a summary of the count information on a per-sample basis to the new file specified by the -o parameter. The example file used in the commands below can be found in the biom-format/examples directory.

Summarizing sample data¶

To summarize the per-sample data in a BIOM file, you can run:

biom summarize-table -i rich_sparse_otu_table.biom -o rich_sparse_otu_table_summary.txt

The following information will be written to rich_sparse_otu_table_summary.txt:

Num samples: 6
Num observations: 5
Total count: 27
Table density (fraction of non-zero values): 0.500

Counts/sample summary:
 Min: 3.0
 Max: 7.0
 Median: 4.000
 Mean: 4.500
 Std. dev.: 1.500
 Sample Metadata Categories: LinkerPrimerSequence; BarcodeSequence; Description; BODY_SITE
 Observation Metadata Categories: taxonomy

Counts/sample detail:
 Sample5: 3.0
 Sample2: 3.0
 Sample6: 4.0
 Sample3: 4.0
 Sample4: 6.0
 Sample1: 7.0

As you can see, general summary information about the table is provided, including the number of samples, the number of observations, the total count (i.e., the sum of all values in the table), and so on, followed by the per-sample counts.

Summarizing sample data qualitatively¶

To summarize the per-sample data in a BIOM file qualitatively, where the number of unique observations per sample (rather than the total count of observations per sample) are provided, you can run:

biom summarize-table -i rich_sparse_otu_table.biom --qualitative -o rich_sparse_otu_table_qual_summary.txt

The following information will be written to rich_sparse_otu_table_qual_summary.txt:

Num samples: 6
Num observations: 5

Observations/sample summary:
 Min: 1
 Max: 4
 Median: 2.500
 Mean: 2.500
 Std. dev.: 0.957
 Sample Metadata Categories: LinkerPrimerSequence; BarcodeSequence; Description; BODY_SITE
 Observation Metadata Categories: taxonomy

Observations/sample detail:
 Sample5: 1
 Sample4: 2
 Sample1: 2
 Sample6: 3
 Sample2: 3

THE BIOM FORMAT LICENSE¶

The BIOM Format project is licensed under the terms of the Modified BSD
License (also known as New or Revised BSD), as follows:

Copyright (c) 2011-2014, The BIOM Format Development Team <gregcaporaso@gmail.com>

All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * Neither the name of the BIOM Format Development Team nor the names of its
      contributors may be used to endorse or promote products derived from this
      software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE BIOM FORMAT DEVELOPMENT TEAM BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The following banner should be used in any source code file to indicate the
copyright and license terms:

#-----------------------------------------------------------------------------
# Copyright (c) 2011-2014, The BIOM Format Development Team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file COPYING.txt, distributed with this software.
#-----------------------------------------------------------------------------

The latest official version of the biom-format project is 2.1 and of the BIOM file format is 2.0. Details on the file format can be found here.

To install the biom-format project, you can download the latest version here, or work with the development version. Generally we recommend working with the release version as it will be more stable, but if you want access to the latest features (and can tolerate some instability) you should work with the development version.

The biom-format project has the following dependencies:

•: Python >= 2.7 and < 3.0

•: numpy >= 1.7.0

•: pyqi 0.3.2

•: scipy >= 0.13.0

•: h5py >= 2.20.0 (optional; must be installed if creating or reading HDF5 formatted files)

The easiest way to install the latest version of the biom-format project and its required dependencies is via pip:

pip install numpy
pip install biom-format

That's it!

If you decided not to install biom-format using pip, it is also possible to manually install the latest release. We'll illustrate the install process in the $HOME/code directory. You can either work in this directory on your system (creating it, if necessary, by running mkdir $HOME/code) or replace all occurrences of $HOME/code in the following instructions with your working directory. Please note that numpy must be in your installed prior to installing biom-format. Change to this directory to start the install process:

cd $HOME/code

Download the latest release, which can be found here. After downloading, unpack and install (note: x.y.z refers to the downloaded version):

tar xzf biom-format-x.y.z.tar.gz
cd $HOME/code/biom-format-x.y.z

Alternatively, to install the development version, pull it from GitHub, and change to the resulting directory:

git clone git://github.com/biocore/biom-format.git
cd $HOME/code/biom-format

To install (either the development or release version), follow these steps:

sudo python setup.py install

If you do not have sudo access on your system (or don't want to install the biom-format project in the default location) you'll need to install the library code and scripts in specified directories, and then tell your system where to look for those files. You can do this as follows:

echo "export PATH=$HOME/bin/:$PATH" >> $HOME/.bashrc
echo "export PYTHONPATH=$HOME/lib/:$PYTHONPATH" >> $HOME/.bashrc
mkdir -p $HOME/bin $HOME/lib/
source $HOME/.bashrc
python setup.py install --install-scripts=$HOME/bin/ --install-purelib=$HOME/lib/ --install-lib=$HOME/lib/

You should then have access to the biom-format project. You can test this by running the following command:

python -c "from biom import __version__; print __version__"

You should see the current version of the biom-format project.

Next you can run:

which biom

You should get a file path ending with biom printed to your screen if it is installed correctly. Finally, to see a list of all biom commands, run:

biom

ENABLING TAB COMPLETION OF BIOM COMMANDS¶

The biom command referenced in the previous section is a driver for commands in biom-format, powered by the pyqi project. You can enable tab completion of biom command names and command options (meaning that when you begin typing the name of a command or option you can auto-complete it by hitting the tab key) by following a few simple steps from the pyqi documentation. While this step is optional, tab completion is very convenient so it's worth enabling.

To enable tab completion, follow the steps outlined under Configuring bash completion in the pyqi install documentation, substituting biom for my-project and my_project in all commands. After completing those steps and closing and re-opening your terminal, auto-completion should be enabled.

There is also a BIOM format package for R, called biom. This package includes basic tools for reading biom-format files, accessing and subsetting data tables from a biom object, as well as limited support for writing a biom-object back to a biom-format file. The design of this API is intended to match the python API and other tools included with the biom-format project, but with a decidedly "R flavor" that should be familiar to R users. This includes S4 classes and methods, as well as extensions of common core functions/methods.

To install the latest stable release of the biom package enter the following command from within an R session:

install.packages("biom")

To install the latest development version of the biom package, enter the following lines in an R session:

install.packages("devtools") # if not already installed
library("devtools")
install_github("biom", "joey711")

Please post any support or feature requests and bugs to the biom issue tracker.

See the biom project on GitHub for further details, or if you would like to contribute.

Note that the licenses between the biom R package (GPL-2) and the other biom-format software (Modified BSD) are different.

You can cite the BIOM format as follows ( link):

The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome.
Daniel McDonald, Jose C. Clemente, Justin Kuczynski, Jai Ram Rideout, Jesse Stombaugh, Doug Wendel, Andreas Wilke, Susan Huse, John Hufnagle, Folker Meyer, Rob Knight, and J. Gregory Caporaso.
GigaScience 2012, 1:7. doi:10.1186/2047-217X-1-7

The biom-format project was conceived of and developed by the QIIME, MG-RAST, and VAMPS development groups to support interoperability of our software packages. If you have questions about the biom-format project you can contact gregcaporaso@gmail.com.

AUTHOR¶

The BIOM Project

COPYRIGHT¶

2011-2013, The BIOM Format Development Team

August 12, 2014

2.1

Source file:	biom.1.en.gz (from python-biom-format 2.1+dfsg-1)
Source last updated:	2014-08-12T12:51:56Z
Converted to HTML:	2018-08-09T22:46:36Z

NAME¶

BIOM DOCUMENTATION¶

The biom file format¶

The biom file format: Version 1.0¶

Example biom files¶

Minimal sparse OTU table¶

Minimal dense OTU table¶

Rich sparse OTU table¶

Rich dense OTU table¶

The biom file format: Version 2.0¶

Example biom files¶

BIOM 2.0 OTU table in the HDF5 data description langauge (DDL)¶

The biom file format: Version 2.1¶

Example biom files¶

BIOM 2.1 OTU table in the HDF5 data description langauge (DDL)¶

Tips and FAQs regarding the BIOM file format¶

Motivation for the BIOM format¶

Efficient handling and storage of very large tables¶

Encapsulation of core study data (OTU table data and sample/OTU metadata) in a single file¶

Facilitating the use of tables between tools that support this format¶

File extension¶

Quick start¶

Functions¶

biom.load_table¶

Examples¶

BIOM Table ( biom.table)¶

Classes¶

biom.table.Table¶

biom.table.Table.dtype¶

biom.table.Table.matrix_data¶

biom.table.Table.nnz¶

biom.table.Table.shape¶

biom.table.Table.__getitem__¶

biom.table.Table._extract_data_from_tsv¶

biom.table.Table.add_group_metadata¶

biom.table.Table.add_metadata¶

biom.table.Table.collapse¶

biom.table.Table.copy¶

biom.table.Table.data¶

biom.table.Table.delimited_self¶

biom.table.Table.descriptive_equality¶

biom.table.Table.exists¶

biom.table.Table.filter¶

biom.table.Table.from_hdf5¶

biom.table.Table.from_json¶

biom.table.Table.from_tsv¶

biom.table.Table.get_table_density¶

biom.table.Table.get_value_by_ids¶

biom.table.Table.group_metadata¶

biom.table.Table.ids¶

biom.table.Table.index¶

biom.table.Table.is_empty¶

biom.table.Table.iter¶

biom.table.Table.iter_data¶

biom.table.Table.iter_pairwise¶

biom.table.Table.max¶

biom.table.Table.merge¶

biom.table.Table.metadata¶

biom.table.Table.min¶

biom.table.Table.nonzero¶

biom.table.Table.nonzero_counts¶

biom.table.Table.norm¶

biom.table.Table.pa¶

biom.table.Table.partition¶

biom.table.Table.reduce¶

biom.table.Table.sort¶

biom.table.Table.sort_order¶

biom.table.Table.subsample¶

biom.table.Table.sum¶

biom.table.Table.to_hdf5¶

biom.table.Table.to_json¶

biom.table.Table.to_tsv¶

biom.table.Table.transform¶

biom.table.Table.transpose¶

Examples¶

Converting between file formats¶

General usage examples¶

Special case usage examples¶

Round-tripping between biom and tsv¶

Converting QIIME 1.4.0 and earlier OTU tables to BIOM format¶

biom.table.Table.getitem¶