.\" Man page generated from reStructuredText.
.
.TH "BIOM" "1" "August 12, 2014" "2.1" "biom-format"
.SH NAME
biom \- BIOM Documentation
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
[image]
.sp
The \fI\%BIOM file format\fP (canonically pronounced \fIbiome\fP) is designed to be a general\-use format for representing biological sample by observation contingency tables. BIOM is a recognized standard for the \fI\%Earth Microbiome Project\fP and is a \fI\%Genomics Standards Consortium\fP supported project.
.sp
The \fI\%BIOM format\fP is designed for general use in broad areas of comparative \-omics. For example, in marker\-gene surveys, the primary use of this format is to represent OTU tables: the observations in this case are OTUs and the matrix contains counts corresponding to the number of times each OTU is observed in each sample. With respect to metagenome data, this format would be used to represent metagenome tables: the observations in this case might correspond to SEED subsystems, and the matrix would contain counts corresponding to the number of times each subsystem is observed in each metagenome. Similarly, with respect to genome data, this format may be used to represent a set of genomes: the observations in this case again might correspond to SEED subsystems, and the counts would correspond to the number of times each subsystem is observed in each genome.
.sp
There are two components to the BIOM project: first is the \fI\%definition of the BIOM format\fP, and second is \fI\%development of support objects\fP in multiple programming languages to support the use of BIOM in diverse bioinformatics applications. The version of the BIOM file format is independent of the version of the \fIbiom\-format\fP software.
.sp
There are official implementations of BIOM format support objects (APIs) in the Python and R programming languages. The rest of this site contains details about the BIOM file format (which is independent of the API) and the Python \fBbiom\-format\fP API. For more details about the R API, please see the \fI\%CRAN biom package\fP\&.
.INDENT 0.0
.IP \(bu 2
\fI\%QIIME\fP
.IP \(bu 2
\fI\%MG\-RAST\fP
.IP \(bu 2
\fI\%PICRUSt\fP
.IP \(bu 2
\fI\%Mothur\fP
.IP \(bu 2
\fI\%phyloseq\fP
.IP \(bu 2
\fI\%MEGAN\fP
.IP \(bu 2
\fI\%VAMPS\fP
.IP \(bu 2
\fI\%metagenomeSeq\fP
.IP \(bu 2
\fI\%Phinch\fP
.UNINDENT
.sp
If you are using BIOM in your project, and would like your project to be listed, please submit a \fI\%pull request\fP to the BIOM project. More information on \fI\%submitting pull requests can be found here\fP\&.
.SH BIOM DOCUMENTATION
.sp
These pages provide format specifications and API information for the BIOM table objects.
.SS The biom file format
.sp
The BIOM project consists of two independent tools: the \fIbiom\-format\fP software package, which contains software tools for working with BIOM\-formatted files and the tables they represent; and the BIOM file format. As of the 1.0.0 software version and the 1.0 file format version, the version of the software and the file format are independent of one another. Version specific documentation of the file formats can be found on the following pages.
.SS The biom file format: Version 1.0
.sp
The \fBbiom\fP format is based on \fI\%JSON\fP to provide the overall structure for the format. JSON is a widely supported format with native parsers available within many programming languages.
.sp
Required top\-level fields:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
id                  : <string or null> a field that can be used to id a table (or null)
format              : <string> The name and version of the current biom format
format_url          : <url> A string with a static URL providing format details
type                : <string> Table type (a controlled vocabulary)
                      Acceptable values:
                       "OTU table"
                       "Pathway table"
                       "Function table"
                       "Ortholog table"
                       "Gene table"
                       "Metabolite table"
                       "Taxon table"
generated_by        : <string> Package and revision that built the table
date                : <datetime> Date the table was built (ISO 8601 format)
rows                : <list of objects> An ORDERED list of obj describing the rows
                      (explained in detail below)
columns             : <list of objects> An ORDERED list of obj  describing the columns
                      (explained in detail below)
matrix_type         : <string> Type of matrix data representation (a controlled vocabulary)
                      Acceptable values:
                       "sparse" : only non\-zero values are specified
                       "dense" : every element must be specified
matrix_element_type : Value type in matrix (a controlled vocabulary)
                      Acceptable values:
                       "int" : integer
                       "float" : floating point
                       "unicode" : unicode string
shape               : <list of ints>, the number of rows and number of columns in data
data                : <list of lists>, counts of observations by sample
                       if matrix_type is "sparse", [[row, column, value],
                                                    [row, column, value],
                                                    ...]
                       if matrix_type is "dense",  [[value, value, value, ...],
                                                    [value, value, value, ...],
                                                    ...]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Optional top\-level fields:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
comment             : <string> A free text field containing any information that you
                       feel is relevant (or just feel like sharing)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The rows value is an ORDERED list of objects where each object corresponds to a single
row in the matrix. Each object can currently store arbitrary keys, although
this might become restricted based on table type. Each object must provide,
at the minimum:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
id                  : <string> an arbitrary UNIQUE identifier
metadata            : <an object or null> A object containing key, value metadata pairs
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The columns value is an ORDERED list of objects where each object corresponds to a single
column in the matrix. Each object can currently store arbitrary keys, although
this might become restricted based on table type. Each object must provide,
at the minimum:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
id                  : <string> an arbitrary UNIQUE identifier
metadata            : <an object or null> A object containing key, value metadata pairs
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Example biom files
.sp
Below are examples of minimal and rich biom files in both sparse and dense formats. To decide which of these you should generate for new data types, see the section on \fIsparse\-or\-dense\fP\&.
.SS Minimal sparse OTU table
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
    "id":null,
    "format": "Biological Observation Matrix 0.9.1\-dev",
    "format_url": "http://biom\-format.org/documentation/format_versions/biom\-1.0.html",
    "type": "OTU table",
    "generated_by": "QIIME revision 1.4.0\-dev",
    "date": "2011\-12\-19T19:00:00",
    "rows":[
            {"id":"GG_OTU_1", "metadata":null},
            {"id":"GG_OTU_2", "metadata":null},
            {"id":"GG_OTU_3", "metadata":null},
            {"id":"GG_OTU_4", "metadata":null},
            {"id":"GG_OTU_5", "metadata":null}
        ],
    "columns": [
            {"id":"Sample1", "metadata":null},
            {"id":"Sample2", "metadata":null},
            {"id":"Sample3", "metadata":null},
            {"id":"Sample4", "metadata":null},
            {"id":"Sample5", "metadata":null},
            {"id":"Sample6", "metadata":null}
        ],
    "matrix_type": "sparse",
    "matrix_element_type": "int",
    "shape": [5, 6],
    "data":[[0,2,1],
            [1,0,5],
            [1,1,1],
            [1,3,2],
            [1,4,3],
            [1,5,1],
            [2,2,1],
            [2,3,4],
            [2,4,2],
            [3,0,2],
            [3,1,1],
            [3,2,1],
            [3,5,1],
            [4,1,1],
            [4,2,1]
           ]
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Minimal dense OTU table
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
    "id":null,
    "format": "Biological Observation Matrix 0.9.1\-dev",
    "format_url": "http://biom\-format.org/documentation/format_versions/biom\-1.0.html",
    "type": "OTU table",
    "generated_by": "QIIME revision 1.4.0\-dev",
    "date": "2011\-12\-19T19:00:00",
    "rows":[
            {"id":"GG_OTU_1", "metadata":null},
            {"id":"GG_OTU_2", "metadata":null},
            {"id":"GG_OTU_3", "metadata":null},
            {"id":"GG_OTU_4", "metadata":null},
            {"id":"GG_OTU_5", "metadata":null}
        ],
    "columns": [
            {"id":"Sample1", "metadata":null},
            {"id":"Sample2", "metadata":null},
            {"id":"Sample3", "metadata":null},
            {"id":"Sample4", "metadata":null},
            {"id":"Sample5", "metadata":null},
            {"id":"Sample6", "metadata":null}
        ],
    "matrix_type": "dense",
    "matrix_element_type": "int",
    "shape": [5,6],
    "data":  [[0,0,1,0,0,0],
              [5,1,0,2,3,1],
              [0,0,1,4,2,0],
              [2,1,1,0,0,1],
              [0,1,1,0,0,0]]
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Rich sparse OTU table
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
 "id":null,
 "format": "Biological Observation Matrix 0.9.1\-dev",
 "format_url": "http://biom\-format.org/documentation/format_versions/biom\-1.0.html",
 "type": "OTU table",
 "generated_by": "QIIME revision 1.4.0\-dev",
 "date": "2011\-12\-19T19:00:00",
 "rows":[
    {"id":"GG_OTU_1", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}},
    {"id":"GG_OTU_2", "metadata":{"taxonomy":["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}},
    {"id":"GG_OTU_3", "metadata":{"taxonomy":["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}},
    {"id":"GG_OTU_4", "metadata":{"taxonomy":["k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Halanaerobiales", "f__Halanaerobiaceae", "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum"]}},
    {"id":"GG_OTU_5", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}}
    ],
 "columns":[
    {"id":"Sample1", "metadata":{
                             "BarcodeSequence":"CGCTTATCGAGA",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample2", "metadata":{
                             "BarcodeSequence":"CATACCAGTAGC",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample3", "metadata":{
                             "BarcodeSequence":"CTCTCTACCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample4", "metadata":{
                             "BarcodeSequence":"CTCTCGGCCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample5", "metadata":{
                             "BarcodeSequence":"CTCTCTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample6", "metadata":{
                             "BarcodeSequence":"CTAACTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}}
            ],
 "matrix_type": "sparse",
 "matrix_element_type": "int",
 "shape": [5, 6],
 "data":[[0,2,1],
         [1,0,5],
         [1,1,1],
         [1,3,2],
         [1,4,3],
         [1,5,1],
         [2,2,1],
         [2,3,4],
         [2,5,2],
         [3,0,2],
         [3,1,1],
         [3,2,1],
         [3,5,1],
         [4,1,1],
         [4,2,1]
        ]
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Rich dense OTU table
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
 "id":null,
 "format": "Biological Observation Matrix 0.9.1\-dev",
 "format_url": "http://biom\-format.org/documentation/format_versions/biom\-1.0.html",
 "type": "OTU table",
 "generated_by": "QIIME revision 1.4.0\-dev",
 "date": "2011\-12\-19T19:00:00",
 "rows":[
    {"id":"GG_OTU_1", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}},
    {"id":"GG_OTU_2", "metadata":{"taxonomy":["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}},
    {"id":"GG_OTU_3", "metadata":{"taxonomy":["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}},
    {"id":"GG_OTU_4", "metadata":{"taxonomy":["k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Halanaerobiales", "f__Halanaerobiaceae", "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum"]}},
    {"id":"GG_OTU_5", "metadata":{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}}
    ],
 "columns":[
    {"id":"Sample1", "metadata":{
                             "BarcodeSequence":"CGCTTATCGAGA",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample2", "metadata":{
                             "BarcodeSequence":"CATACCAGTAGC",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample3", "metadata":{
                             "BarcodeSequence":"CTCTCTACCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"gut",
                             "Description":"human gut"}},
    {"id":"Sample4", "metadata":{
                             "BarcodeSequence":"CTCTCGGCCTGT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample5", "metadata":{
                             "BarcodeSequence":"CTCTCTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}},
    {"id":"Sample6", "metadata":{
                             "BarcodeSequence":"CTAACTACCAAT",
                             "LinkerPrimerSequence":"CATGCTGCCTCCCGTAGGAGT",
                             "BODY_SITE":"skin",
                             "Description":"human skin"}}
            ],
 "matrix_type": "dense",
 "matrix_element_type": "int",
 "shape": [5,6],
 "data":  [[0,0,1,0,0,0],
           [5,1,0,2,3,1],
           [0,0,1,4,2,0],
           [2,1,1,0,0,1],
           [0,1,1,0,0,0]]
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SS The biom file format: Version 2.0
.sp
The \fBbiom\fP format is based on \fI\%HDF5\fP to provide the overall structure for the format. HDF5 is a widely supported format with native parsers available within many programming languages.
.sp
Required top\-level attributes:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
id                  : <string or null> a field that can be used to id a table (or null)
format              : <string> The name and version of the current biom format
format\-url          : <url> A string with a static URL providing format details
type                : <string> Table type (a controlled vocabulary)
                      Acceptable values:
                       "OTU table"
                       "Pathway table"
                       "Function table"
                       "Ortholog table"
                       "Gene table"
                       "Metabolite table"
                       "Taxon table"
generated\-by        : <string> Package and revision that built the table
creation\-date       : <datetime> Date the table was built (ISO 8601 format)
nnz                 : <int> The number of non\-zero elements in the table
shape               : <list of ints>, the number of rows and number of columns in data
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Required groups:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
observation/        : The HDF5 group that contains observation specific information and an observation oriented view of the data
observation/matrix  : The HDF5 group that contains matrix data oriented for observation\-wise operations (e.g., in compressed sparse row format)
sample/             : The HDF5 group that contains sample specific information and a sample oriented data oriented view of the data
sample/matrix       : The HDF5 group that contains matrix data oriented for sample\-wise operations (e.g., in compressed sparse column format)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Required datasets:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
observation/ids            : <string> or <variable length string> A (N,) dataset of the observation IDs, where N is the total number of IDs
observation/matrix/data    : <float64> A (nnz,) dataset containing the actual matrix data
observation/matrix/indices : <int32> A (nnz,) dataset containing the column indices (e.g., maps into samples/ids)
observation/matrix/indptr  : <int32> A (M+1,) dataset containing the compressed row offsets
sample/ids                 : <string> or <variable length string> A (M,) dataset of the sample IDs, where M is the total number of IDs
sample/matrix/data         : <float64> A (nnz,) dataset containing the actual matrix data
sample/matrix/indices      : <int32> A (nnz,) dataset containing the row indices (e.g., maps into observation/ids)
sample/matrix/indptr       : <int32> A (N+1,) dataset containing the compressed column offsets
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Optional datasets:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
observation/metadata       : <variable length string or null> If specified, a (1,) dataset containing a JSON\-string representation of the metadata
sample/metadata            : <variable length string or null> If specified, a (1,) dataset containing a JSON\-string representation of the metadata
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The metadata for each axis (observation and sample) are described with JSON. The required structure, if the metadata are specified, is a list of objects, where the list is in index order with respect to the axis (e.g, the object at element 0 corresponds to ID 0 for the given axis). Any metadata that corresponds to the ID, such as taxonomy, can be represented in the object. For instance, the following JSON string describes taxonomy for three IDs:
.sp
Metadata description:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
[
    {"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}},
    {"taxonomy": ["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}},
    {"taxonomy": ["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}}
]
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Example biom files
.sp
Below are examples of minimal and rich biom files in both sparse and dense formats. To decide which of these you should generate for new data types, see the section on \fIsparse\-or\-dense\fP\&.
.SS BIOM 2.0 OTU table in the HDF5 data description langauge (DDL)
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
HDF5 "rich_sparse_otu_table_hdf5.biom" {
GROUP "/" {
   ATTRIBUTE "creation\-date" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2014\-05\-13T14:50:32.052446"
      }
   }
   ATTRIBUTE "format\-url" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "http://biom\-format.org"
      }
   }
   ATTRIBUTE "format\-version" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 2, 0
      }
   }
   ATTRIBUTE "generated\-by" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "example"
      }
   }
   ATTRIBUTE "id" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "No Table ID"
      }
   }
   ATTRIBUTE "nnz" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SCALAR
      DATA {
      (0): 15
      }
   }
   ATTRIBUTE "shape" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 5, 6
      }
   }
   ATTRIBUTE "type" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "otu table"
      }
   }
   GROUP "observation" {
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
         DATA {
         (0): "GG_OTU_1", "GG_OTU_2", "GG_OTU_3", "GG_OTU_4", "GG_OTU_5"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 5, 1, 2, 3, 1, 1, 4, 2, 2, 1, 1, 1, 1, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 2, 0, 1, 3, 4, 5, 2, 3, 5, 0, 1, 2, 5, 1, 2
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): 0, 1, 6, 9, 13, 15
            }
         }
      }
      DATASET "metadata" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): "[{"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}, {"taxonomy": ["k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae", "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum", "s__"]}, {"taxonomy": ["k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia", "o__Methanosarcinales", "f__Methanosarcinaceae", "g__Methanosarcina", "s__"]}, {"taxonomy": ["k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Halanaerobiales", "f__Halanaerobiaceae", "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum"]}, {"taxonomy": ["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]}]"
         }
      }
   }
   GROUP "sample" {
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): "Sample1", "Sample2", "Sample3", "Sample4", "Sample5",
         (5): "Sample6"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 5, 2, 1, 1, 1, 1, 1, 1, 1, 2, 4, 3, 1, 2, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 3, 1, 3, 4, 0, 2, 3, 4, 1, 2, 1, 1, 2, 3
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
            DATA {
            (0): 0, 2, 5, 9, 11, 12, 15
            }
         }
      }
      DATASET "metadata" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): "[{"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CGCTTATCGAGA", "Description": "human gut", "BODY_SITE": "gut"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CATACCAGTAGC", "Description": "human gut", "BODY_SITE": "gut"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTCTCTACCTGT", "Description": "human gut", "BODY_SITE": "gut"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTCTCGGCCTGT", "Description": "human skin", "BODY_SITE": "skin"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTCTCTACCAAT", "Description": "human skin", "BODY_SITE": "skin"}, {"LinkerPrimerSequence": "CATGCTGCCTCCCGTAGGAGT", "BarcodeSequence": "CTAACTACCAAT", "Description": "human skin", "BODY_SITE": "skin"}]"
         }
      }
   }
}
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SS The biom file format: Version 2.1
.sp
The \fBbiom\fP format is based on \fI\%HDF5\fP to provide the overall structure for the format. HDF5 is a widely supported format with native parsers available within many programming languages.
.sp
Required top\-level attributes:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
id                   : <string or null> a field that can be used to id a table (or null)
type                 : <string> Table type (a controlled vocabulary)
                       Acceptable values:
                        "OTU table"
                        "Pathway table"
                        "Function table"
                        "Ortholog table"
                        "Gene table"
                        "Metabolite table"
                        "Taxon table"
format\-url           : <url> A string with a static URL providing format details
format\-version       : <tuple> The version of the current biom format, major and minor
generated\-by         : <string> Package and revision that built the table
creation\-date        : <datetime> Date the table was built (ISO 8601 format)
shape                : <list of ints>, the number of rows and number of columns in data
nnz                  : <int> The number of non\-zero elements in the table
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Required groups:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
observation/               : The HDF5 group that contains observation specific information and an observation oriented view of the data
observation/matrix         : The HDF5 group that contains matrix data oriented for observation\-wise operations (e.g., in compressed sparse row format)
observation/metadata       : The HDF5 group that contains observation specific metadata information
observation/group\-metadata : The HDF5 group that contains observation specific group metadata information (e.g., phylogenetic tree)
sample/                    : The HDF5 group that contains sample specific information and a sample oriented data oriented view of the data
sample/matrix              : The HDF5 group that contains matrix data oriented for sample\-wise operations (e.g., in compressed sparse column format)
sample/metadata            : The HDF5 group that contains sample specific metadata information
sample/group\-metadata      : The HDF5 group that contains sample specific group metadata information (e.g., relationships between samples)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Required datasets:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
observation/ids            : <string> or <variable length string> A (N,) dataset of the observation IDs, where N is the total number of IDs
observation/matrix/data    : <float64> A (nnz,) dataset containing the actual matrix data
observation/matrix/indices : <int32> A (nnz,) dataset containing the column indices (e.g., maps into samples/ids)
observation/matrix/indptr  : <int32> A (M+1,) dataset containing the compressed row offsets
sample/ids                 : <string> or <variable length string> A (M,) dataset of the sample IDs, where M is the total number of IDs
sample/matrix/data         : <float64> A (nnz,) dataset containing the actual matrix data
sample/matrix/indices      : <int32> A (nnz,) dataset containing the row indices (e.g., maps into observation/ids)
sample/matrix/indptr       : <int32> A (N+1,) dataset containing the compressed column offsets
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Under the \fBobservation/metadata\fP and \fBsample/metadata\fP groups, the user can specify an arbitrary number of datasets that represents a metadata category for that axis. The expected structure for each of these metadata datasets is a list of atomic type objects (int, float, str, ...) where the index order of the list corresponds to the index order of the relevant axis IDs. Special complex metadata fields have been defined, and they are stored in a specific way. Currently, the available special metadata fields are:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
observation/metadata/taxonomy      : <string> or <variable length string> A (N, ?) dataset containing the taxonomy names assigned to the observation
observation/metadata/KEGG_Pathways : <string> or <variable length string> A (N, ?) dataset containing the KEGG Pathways assigned to the observation
observation/metadata/collapsed_ids : <string> or <variable length string> A (N, ?) dataset containing the observation ids of the original table that have been collapsed in the given observation
sample/metadata/collapsed_ids      : <string> or <variable length string> A (M, ?) dataset containing the sample ids of the original table that have been collapsed in the given sample
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Under the \fBobservation/group\-metadata\fP and \fBsample/group\-metadata\fP groups, the user can specify an arbitrary number of datasets that represents a relationship between the ids for that axis. The expected structure for each of these group metadata datasets is a single string or variable length string. Each of these datasets should have defined an attribute called \fBdata_type\fP, which specifies how the string should be interpreted. One example of such group metadata dataset is \fBobservation/group\-metadata/phylogeny\fP, with the attribute \fBobservation/group\-metadata/phylogeny.attrs[\(aqdata_type\(aq] = "newick"\fP, which stores a single string with the newick format of the phylogenetic tree for the observations.
.SS Example biom files
.sp
Below is an examples of a rich biom file. To decide which of these you should generate for new data types, see the section on \fIsparse\-or\-dense\fP\&.
.SS BIOM 2.1 OTU table in the HDF5 data description langauge (DDL)
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
HDF5 "examples/rich_sparse_otu_table_hdf5.biom" {
GROUP "/" {
   ATTRIBUTE "creation\-date" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2014\-07\-29T16:16:36.617320"
      }
   }
   ATTRIBUTE "format\-url" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "http://biom\-format.org"
      }
   }
   ATTRIBUTE "format\-version" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 2, 1
      }
   }
   ATTRIBUTE "generated\-by" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "example"
      }
   }
   ATTRIBUTE "id" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "No Table ID"
      }
   }
   ATTRIBUTE "nnz" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SCALAR
      DATA {
      (0): 15
      }
   }
   ATTRIBUTE "shape" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 5, 6
      }
   }
   ATTRIBUTE "type" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "otu table"
      }
   }
   GROUP "observation" {
      GROUP "group\-metadata" {
      }
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
         DATA {
         (0): "GG_OTU_1", "GG_OTU_2", "GG_OTU_3", "GG_OTU_4", "GG_OTU_5"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 5, 1, 2, 3, 1, 1, 4, 2, 2, 1, 1, 1, 1, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 2, 0, 1, 3, 4, 5, 2, 3, 5, 0, 1, 2, 5, 1, 2
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): 0, 1, 6, 9, 13, 15
            }
         }
      }
      GROUP "metadata" {
         DATASET "taxonomy" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 5, 7 ) / ( 5, 7 ) }
            DATA {
            (0,0): "k__Bacteria", "p__Proteobacteria",
            (0,2): "c__Gammaproteobacteria", "o__Enterobacteriales",
            (0,4): "f__Enterobacteriaceae", "g__Escherichia", "s__",
            (1,0): "k__Bacteria", "p__Cyanobacteria", "c__Nostocophycideae",
            (1,3): "o__Nostocales", "f__Nostocaceae", "g__Dolichospermum",
            (1,6): "s__",
            (2,0): "k__Archaea", "p__Euryarchaeota", "c__Methanomicrobia",
            (2,3): "o__Methanosarcinales", "f__Methanosarcinaceae",
            (2,5): "g__Methanosarcina", "s__",
            (3,0): "k__Bacteria", "p__Firmicutes", "c__Clostridia",
            (3,3): "o__Halanaerobiales", "f__Halanaerobiaceae",
            (3,5): "g__Halanaerobium", "s__Halanaerobiumsaccharolyticum",
            (4,0): "k__Bacteria", "p__Proteobacteria",
            (4,2): "c__Gammaproteobacteria", "o__Enterobacteriales",
            (4,4): "f__Enterobacteriaceae", "g__Escherichia", "s__"
            }
         }
      }
   }
   GROUP "sample" {
      GROUP "group\-metadata" {
      }
      DATASET "ids" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
         DATA {
         (0): "Sample1", "Sample2", "Sample3", "Sample4", "Sample5",
         (5): "Sample6"
         }
      }
      GROUP "matrix" {
         DATASET "data" {
            DATATYPE  H5T_IEEE_F64LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 5, 2, 1, 1, 1, 1, 1, 1, 1, 2, 4, 3, 1, 2, 1
            }
         }
         DATASET "indices" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
            DATA {
            (0): 1, 3, 1, 3, 4, 0, 2, 3, 4, 1, 2, 1, 1, 2, 3
            }
         }
         DATASET "indptr" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
            DATA {
            (0): 0, 2, 5, 9, 11, 12, 15
            }
         }
      }
      GROUP "metadata" {
         DATASET "BODY_SITE" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "gut", "gut", "gut", "skin", "skin", "skin"
            }
         }
         DATASET "BarcodeSequence" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "CGCTTATCGAGA", "CATACCAGTAGC", "CTCTCTACCTGT",
            (3): "CTCTCGGCCTGT", "CTCTCTACCAAT", "CTAACTACCAAT"
            }
         }
         DATASET "Description" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "human gut", "human gut", "human gut", "human skin",
            (4): "human skin", "human skin"
            }
         }
         DATASET "LinkerPrimerSequence" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 6 ) / ( 6 ) }
            DATA {
            (0): "CATGCTGCCTCCCGTAGGAGT", "CATGCTGCCTCCCGTAGGAGT",
            (2): "CATGCTGCCTCCCGTAGGAGT", "CATGCTGCCTCCCGTAGGAGT",
            (4): "CATGCTGCCTCCCGTAGGAGT", "CATGCTGCCTCCCGTAGGAGT"
            }
         }
      }
   }
}
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Release versions contain three integers in the following format: \fBmajor\-version.minor\-version.micro\-version\fP\&. When \fB\-dev\fP is appended to the end of a version string that indicates a development (or between\-release version). For example, \fB1.0.0\-dev\fP would refer to the development version following the 1.0.0 release.
.SS Tips and FAQs regarding the BIOM file format
.SS Motivation for the BIOM format
.sp
The BIOM format was motivated by several goals. First, to facilitate efficient handling and storage of large, sparse biological contingency tables; second, to support encapsulation of core study data (contingency table data and sample/observation metadata) in a single file; and third, to facilitate the use of these tables between tools that support this format (e.g., passing of data between \fI\%QIIME\fP, \fI\%MG\-RAST\fP, and \fI\%VAMPS\fP\&.).
.SS Efficient handling and storage of very large tables
.sp
In \fI\%QIIME\fP, we began hitting limitations with OTU table objects when working with thousands of samples and hundreds of thousands of OTUs. In the near future we expect that we\(aqll be dealing with hundreds of thousands of samples in single analyses.
.sp
The OTU table format up to QIIME 1.4.0 involved a dense matrix: if an OTU was not observed in a given sample, that would be indicated with a zero. We now primarily represent OTU tables in a sparse format: if an OTU is not observed in a sample, there is no count for that OTU. The two ways of representing this data are exemplified here.
.sp
A dense representation of an OTU table:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
OTU ID PC.354  PC.355  PC.356
OTU0   0   0   4
OTU1   6   0   0
OTU2   1   0   7
OTU3   0   0   3
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
A sparse representation of an OTU table:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
PC.354 OTU1 6
PC.354 OTU2 1
PC.356 OTU0 4
PC.356 OTU2 7
PC.356 OTU3 3
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
OTU table data tends to be sparse (e.g., greater than 90% of counts are zero, and frequently as many as 99% of counts are zero) in which case the latter format is more convenient to work with as it has a smaller memory footprint. Both of these representations are supported in the biom\-format project via dense and sparse Table types. Generally if less than 85% of your counts are zero, a dense representation will be more efficient.
.SS Encapsulation of core study data (OTU table data and sample/OTU metadata) in a single file
.sp
Formats, such as JSON and HDF5, made more efficient storage of highly sparse data and allowed for storage of arbitrary amounts of sample and OTU metadata in a single file. Sample metadata corresponds to what is generally found in QIIME mapping files. At this stage inclusion of this information in the OTU table file is optional, but it may be useful for sharing these files with other QIIME users and for publishing or archiving results of analyses. OTU metadata (generally a taxonomic assignment for an OTU) is also optional. In contrast to the previous OTU table format, you can now store more than one OTU metadata value in this field, so for example you can score taxonomic assignments based on two different taxonomic assignment approaches.
.SS Facilitating the use of tables between tools that support this format
.sp
Different tools, such as \fI\%QIIME\fP, \fI\%MG\-RAST\fP, and \fI\%VAMPS\fP work with similar data structures that represent different types of data. An example of this is a \fImetagenome\fP table that could be generated by MG\-RAST (where for example, columns are metagenomes and rows are functional categories). Exporting this data from MG\-RAST in a suitable format will allow for the application of many of the QIIME tools to this data (such as generation of alpha rarefaction plots or beta diversity ordination plots). This new format is far more general than previous formats, so will support adoption by groups working with different data types and is already being integrated to support transfer of data between \fI\%QIIME\fP, \fI\%MG\-RAST\fP, and \fI\%VAMPS\fP\&.
.SS File extension
.sp
We recommend that BIOM files use the \fB\&.biom\fP extension.
.SS Quick start
.sp
BIOM has an example table and two methods for reading in \fITable\fP objects that
are immediately available at the package level.
.SS Functions
.TS
center;
|l|l|.
_
T{
\fBload_table\fP(f)
T}	T{
Load a \fITable\fP from a path
T}
_
.TE
.SS biom.load_table
.INDENT 0.0
.TP
.B biom.load_table(f)
Load a \fITable\fP from a path
.INDENT 7.0
.TP
.B Parameters
\fBf\fP : str
.TP
.B Returns
Table
.TP
.B Raises
\fBIOError\fP
.INDENT 7.0
.INDENT 3.5
If the path does not exist
.UNINDENT
.UNINDENT
.sp
\fBTypeError\fP
.INDENT 7.0
.INDENT 3.5
If the data in the path does not appear to be a BIOM table
.UNINDENT
.UNINDENT
.UNINDENT
Examples
.sp
Parse a table from a path. BIOM will attempt to determine if the fhe file
is either in TSV, HDF5, JSON, gzip\(aqd JSON or gzip\(aqd TSV and parse
accordingly:
.sp
.nf
.ft C
>>> from biom import load_table
>>> table = load_table(\(aqpath/to/table.biom\(aq) # doctest: +SKIP
.ft P
.fi
.UNINDENT
.SS Examples
.sp
Load an example table:
.sp
.nf
.ft C
>>> from biom import example_table
>>> print example_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 1.0 2.0
O2  3.0 4.0 5.0
.ft P
.fi
.sp
Parse a table from an open file object:
.sp
.nf
.ft C
>>> from biom import parse_table
>>> with open(\(aqpath/to/table.biom\(aq) as f: # doctest: +SKIP
\&...     table = parse_table(f)
.ft P
.fi
.sp
Parse a table from a path. BIOM will attempt to determine if the file is
either in TSV, HDF5, JSON, gzip\(aqd JSON or gzip\(aqd TSV and parse accordingly:
.sp
.nf
.ft C
>>> from biom import load_table
>>> table = load_table(\(aqpath/to/table.biom\(aq) # doctest: +SKIP
.ft P
.fi
.SS BIOM Table (\fBbiom.table\fP)
.sp
The biom\-format project provides rich \fBTable\fP objects to support use of the
BIOM file format. The objects encapsulate matrix data (such as OTU counts) and
abstract the interaction away from the programmer.
.SS Classes
.TS
center;
|l|l|.
_
T{
\fBTable\fP(data, observation_ids, sample_ids[, ...])
T}	T{
The (canonically pronounced \(aqteh\(aq) Table.
T}
_
.TE
.SS biom.table.Table
.INDENT 0.0
.TP
.B class biom.table.Table(data, observation_ids, sample_ids, observation_metadata=None, sample_metadata=None, table_id=None, type=None, create_date=None, generated_by=None, observation_group_metadata=None, sample_group_metadata=None, **kwargs)
The (canonically pronounced \(aqteh\(aq) Table.
.sp
Give in to the power of the Table!
Attributes
.TS
center;
|l|l|.
_
T{
\fBdtype\fP
T}	T{
The type of the objects in the underlying contingency matrix
T}
_
T{
\fBmatrix_data\fP
T}	T{
The sparse matrix object
T}
_
T{
\fBnnz\fP
T}	T{
Number of non\-zero elements of the underlying contingency matrix
T}
_
T{
\fBshape\fP
T}	T{
The shape of the underlying contingency matrix
T}
_
.TE
.SS biom.table.Table.dtype
.INDENT 7.0
.TP
.B Table.dtype
The type of the objects in the underlying contingency matrix
.UNINDENT
.SS biom.table.Table.matrix_data
.INDENT 7.0
.TP
.B Table.matrix_data
The sparse matrix object
.UNINDENT
.SS biom.table.Table.nnz
.INDENT 7.0
.TP
.B Table.nnz
Number of non\-zero elements of the underlying contingency matrix
.UNINDENT
.SS biom.table.Table.shape
.INDENT 7.0
.TP
.B Table.shape
The shape of the underlying contingency matrix
.UNINDENT
Methods
.TS
center;
|l|l|.
_
T{
\fB__getitem__\fP(args)
T}	T{
Handles row or column slices
T}
_
T{
\fB_extract_data_from_tsv\fP(lines[, delim, ...])
T}	T{
Parse a classic table into (sample_ids, obs_ids, data, metadata,
T}
_
T{
\fBadd_group_metadata\fP(group_md[, axis])
T}	T{
Take a dict of group metadata and add it to an axis
T}
_
T{
\fBadd_metadata\fP(md[, axis])
T}	T{
Take a dict of metadata and add it to an axis.
T}
_
T{
\fBcollapse\fP(f[, reduce_f, norm, ...])
T}	T{
Collapse partitions in a table by metadata or by IDs
T}
_
T{
\fBcopy\fP()
T}	T{
Returns a copy of the table
T}
_
T{
\fBdata\fP(id[, axis, dense])
T}	T{
Returns data associated with an \fIid\fP
T}
_
T{
\fBdelimited_self\fP([delim, header_key, ...])
T}	T{
Return self as a string in a delimited form
T}
_
T{
\fBdescriptive_equality\fP(other)
T}	T{
For use in testing, describe how the tables are not equal
T}
_
T{
\fBexists\fP(id[, axis])
T}	T{
Returns whether id exists in axis
T}
_
T{
\fBfilter\fP(ids_to_keep[, axis, invert, inplace])
T}	T{
Filter a table based on a function or iterable.
T}
_
T{
\fBfrom_hdf5\fP(h5grp[, ids, axis])
T}	T{
Parse an HDF5 formatted BIOM table
T}
_
T{
\fBfrom_json\fP(json_table[, data_pump, ...])
T}	T{
Parse a biom otu table type
T}
_
T{
\fBfrom_tsv\fP(lines, obs_mapping, sample_mapping, ...)
T}	T{
Parse a tab separated (observation x sample) formatted BIOM table
T}
_
T{
\fBget_table_density\fP()
T}	T{
Returns the fraction of nonzero elements in the table.
T}
_
T{
\fBget_value_by_ids\fP(obs_id, samp_id)
T}	T{
Return value in the matrix corresponding to \fB(obs_id, samp_id)\fP
T}
_
T{
\fBgroup_metadata\fP([axis])
T}	T{
Return the group metadata of the given axis
T}
_
T{
\fBids\fP([axis])
T}	T{
Return the ids along the given axis
T}
_
T{
\fBindex\fP(id, axis)
T}	T{
Return the index of the identified sample/observation.
T}
_
T{
\fBis_empty\fP()
T}	T{
Check whether the table is empty
T}
_
T{
\fBiter\fP([dense, axis])
T}	T{
Yields \fB(value, id, metadata)\fP
T}
_
T{
\fBiter_data\fP([dense, axis])
T}	T{
Yields axis values
T}
_
T{
\fBiter_pairwise\fP([dense, axis, tri, diag])
T}	T{
Pairwise iteration over self
T}
_
T{
\fBmax\fP([axis])
T}	T{
Get the maximum nonzero value over an axis
T}
_
T{
\fBmerge\fP(other[, sample, observation, ...])
T}	T{
Merge two tables together
T}
_
T{
\fBmetadata\fP([id, axis])
T}	T{
Return the metadata of the identified sample/observation.
T}
_
T{
\fBmin\fP([axis])
T}	T{
Get the minimum nonzero value over an axis
T}
_
T{
\fBnonzero\fP()
T}	T{
Yields locations of nonzero elements within the data matrix
T}
_
T{
\fBnonzero_counts\fP(axis[, binary])
T}	T{
Get nonzero summaries about an axis
T}
_
T{
\fBnorm\fP([axis, inplace])
T}	T{
Normalize in place sample values by an observation, or vice versa.
T}
_
T{
\fBpa\fP([inplace])
T}	T{
Convert the table to presence/absence data
T}
_
T{
\fBpartition\fP(f[, axis])
T}	T{
Yields partitions
T}
_
T{
\fBreduce\fP(f, axis)
T}	T{
Reduce over axis using function \fIf\fP
T}
_
T{
\fBsort\fP([sort_f, axis])
T}	T{
Return a table sorted along axis
T}
_
T{
\fBsort_order\fP(order[, axis])
T}	T{
Return a new table with \fIaxis\fP in \fIorder\fP
T}
_
T{
\fBsubsample\fP(n[, axis, by_id])
T}	T{
Randomly subsample without replacement.
T}
_
T{
\fBsum\fP([axis])
T}	T{
Returns the sum by axis
T}
_
T{
\fBto_hdf5\fP(h5grp, generated_by[, compress])
T}	T{
Store CSC and CSR in place
T}
_
T{
\fBto_json\fP(generated_by[, direct_io])
T}	T{
Returns a JSON string representing the table in BIOM format.
T}
_
T{
\fBto_tsv\fP([header_key, header_value, ...])
T}	T{
Return self as a string in tab delimited form
T}
_
T{
\fBtransform\fP(f[, axis, inplace])
T}	T{
Iterate over \fIaxis\fP, applying a function \fIf\fP to each vector.
T}
_
T{
\fBtranspose\fP()
T}	T{
Transpose the contingency table
T}
_
.TE
.SS biom.table.Table.__getitem__
.INDENT 7.0
.TP
.B Table.__getitem__(args)
Handles row or column slices
.sp
Slicing over an individual axis is supported, but slicing over both
axes at the same time is not supported. Partial slices, such as
\fIfoo[0, 5:10]\fP are not supported, however full slices are supported,
such as \fIfoo[0, :]\fP\&.
.INDENT 7.0
.TP
.B Parameters
\fBargs\fP : tuple or slice
.INDENT 7.0
.INDENT 3.5
The specific element (by index position) to return or an entire
row or column of the data.
.UNINDENT
.UNINDENT
.TP
.B Returns
float or spmatrix
.INDENT 7.0
.INDENT 3.5
A float is return if a specific element is specified, otherwise a
spmatrix object representing a vector of sparse data is returned.
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBIndexError\fP
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
If the matrix is empty
.IP \(bu 2
If the arguments do not appear to be a tuple
.IP \(bu 2
If a slice on row and column is specified
.IP \(bu 2
If a partial slice is specified
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
Notes
.sp
Switching between slicing rows and columns is inefficient.  Slicing of
rows requires a CSR representation, while slicing of columns requires a
CSC representation, and transforms are performed on the data if the
data are not in the required representation. These transforms can be
expensive if done frequently.
.UNINDENT
.SS biom.table.Table._extract_data_from_tsv
.INDENT 7.0
.TP
.B static Table._extract_data_from_tsv(lines, delim=\(aqt\(aq, dtype=<type \(aqfloat\(aq>, header_mark=None, md_parse=None)
Parse a classic table into (sample_ids, obs_ids, data, metadata,
name)
.INDENT 7.0
.TP
.B Parameters
\fBlines: list or file\-like object\fP
.INDENT 7.0
.INDENT 3.5
delimted data to parse
.UNINDENT
.UNINDENT
.sp
\fBdelim: string\fP
.INDENT 7.0
.INDENT 3.5
delimeter in file lines
.UNINDENT
.UNINDENT
.sp
\fBdtype: type\fP
.sp
\fBheader_mark:  string or None\fP
.INDENT 7.0
.INDENT 3.5
string that indicates start of header line
.UNINDENT
.UNINDENT
.sp
\fBmd_parse:  function or None\fP
.INDENT 7.0
.INDENT 3.5
funtion used to parse metdata
.UNINDENT
.UNINDENT
.TP
.B Returns
list
.INDENT 7.0
.INDENT 3.5
sample_ids
.UNINDENT
.UNINDENT
.sp
list
.INDENT 7.0
.INDENT 3.5
observation_ids
.UNINDENT
.UNINDENT
.sp
array
.INDENT 7.0
.INDENT 3.5
data
.UNINDENT
.UNINDENT
.sp
list
.INDENT 7.0
.INDENT 3.5
metadata
.UNINDENT
.UNINDENT
.sp
string
.INDENT 7.0
.INDENT 3.5
column name if last column is non\-numeric
.UNINDENT
.UNINDENT
.UNINDENT
Notes
.sp
This is intended to be close to how QIIME classic OTU tables are parsed
with the exception of the additional md_name field
.sp
This function is ported from QIIME (\fI\%http://www.qiime.org\fP), previously
named parse_classic_otu_table. QIIME is a GPL project, but we obtained
permission from the authors of this function to port it to the BIOM
Format project (and keep it under BIOM\(aqs BSD license).
.UNINDENT
.SS biom.table.Table.add_group_metadata
.INDENT 7.0
.TP
.B Table.add_group_metadata(group_md, axis=\(aqsample\(aq)
Take a dict of group metadata and add it to an axis
.INDENT 7.0
.TP
.B Parameters
\fBgroup_md\fP : dict of tuples
.INDENT 7.0
.INDENT 3.5
\fIgroup_md\fP should be of the form \fB{category: (data type, value)\fP
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to operate on
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.add_metadata
.INDENT 7.0
.TP
.B Table.add_metadata(md, axis=\(aqsample\(aq)
Take a dict of metadata and add it to an axis.
.INDENT 7.0
.TP
.B Parameters
\fBmd\fP : dict of dict
.INDENT 7.0
.INDENT 3.5
\fImd\fP should be of the form \fB{id: {dict_of_metadata}}\fP
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to operate on
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.collapse
.INDENT 7.0
.TP
.B Table.collapse(f, reduce_f=<built\-in function add>, norm=True, min_group_size=1, include_collapsed_metadata=True, one_to_many=False, one_to_many_mode=\(aqadd\(aq, one_to_many_md_key=\(aqPath\(aq, strict=False, axis=\(aqsample\(aq)
Collapse partitions in a table by metadata or by IDs
.sp
Partition data by metadata or IDs and then collapse each partition into
a single vector.
.sp
If \fIinclude_collapsed_metadata\fP is \fBTrue\fP, the metadata for the
collapsed partition will be a category named \(aqcollapsed_ids\(aq, in which
a list of the original ids that made up the partition is retained
.sp
The remainder is only relevant to setting \fIone_to_many\fP to \fBTrue\fP\&.
.sp
If \fIone_to_many\fP is \fBTrue\fP, allow vectors to collapse into multiple
bins if the metadata describe a one\-many relationship. Supplied
functions must allow for iteration support over the metadata key and
must return a tuple of (path, bin) as to describe both the path in the
hierarchy represented and the specific bin being collapsed into. The
uniqueness of the bin is _not_ based on the path but by the name of the
bin.
.sp
The metadata value for the corresponding collapsed column may include
more (or less) information about the collapsed data. For example, if
collapsing "FOO", and there are vectors that span three associations A,
B, and C, such that vector 1 spans A and B, vector 2 spans B and C and
vector 3 spans A and C, the resulting table will contain three
collapsed vectors:
.INDENT 7.0
.IP \(bu 2
A, containing original vectors 1 and 3
.IP \(bu 2
B, containing original vectors 1 and 2
.IP \(bu 2
C, containing original vectors 2 and 3
.UNINDENT
.sp
If a vector maps to the same partition multiple times, it will be
counted multiple times.
.sp
There are two supported modes for handling one\-to\-many relationships
via \fIone_to_many_mode\fP: \fBadd\fP and \fIdivide\fP\&. \fBadd\fP will add the
vector counts to each partition that the vector maps to, which may
increase the total number of counts in the output table. \fBdivide\fP
will divide a vectors\(aqs counts by the number of metadata that the
vector has before adding the counts to each partition. This will not
increase the total number of counts in the output table.
.sp
If \fIone_to_many_md_key\fP is specified, that becomes the metadata
key that describes the collapsed path. If a value is not specified,
then it defaults to \(aqPath\(aq.
.sp
If \fIstrict\fP is specified, then all metadata pathways operated on
must be indexable by \fImetadata_f\fP\&.
.sp
\fIone_to_many\fP and \fInorm\fP are not supported together.
.sp
\fIone_to_many\fP and \fIreduce_f\fP are not supported together.
.sp
\fIone_to_many\fP and \fImin_group_size\fP are not supported together.
.sp
A final note on space consumption. At present, the \fIone_to_many\fP
functionality requires a temporary dense matrix representation.
.INDENT 7.0
.TP
.B Parameters
\fBf\fP : function
.INDENT 7.0
.INDENT 3.5
Function that is used to determine what partition a vector belongs
to
.UNINDENT
.UNINDENT
.sp
\fBreduce_f\fP : function, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBoperator.add\fP\&. Function that reduces two vectors in
a one\-to\-one collapse
.UNINDENT
.UNINDENT
.sp
\fBnorm\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. If \fBTrue\fP, normalize the resulting table
.UNINDENT
.UNINDENT
.sp
\fBmin_group_size\fP : int, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fB1\fP\&. The minimum size of a partition when performing
a one\-to\-one collapse
.UNINDENT
.UNINDENT
.sp
\fBinclude_collapsed_metadata\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. If \fBTrue\fP, retain the collapsed metadata
keyed by the original IDs of the associated vectors
.UNINDENT
.UNINDENT
.sp
\fBone_to_many\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBFalse\fP\&. Perform a one\-to\-many collapse
.UNINDENT
.UNINDENT
.sp
\fBone_to_many_mode\fP : {\(aqadd\(aq, \(aqdivide\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The way to reduce two vectors in a one\-to\-many collapse
.UNINDENT
.UNINDENT
.sp
\fBone_to_many_md_key\fP : str, optional
.INDENT 7.0
.INDENT 3.5
Defaults to "Path". If \fIinclude_collapsed_metadata\fP is \fBTrue\fP,
store the original vector metadata under this key
.UNINDENT
.UNINDENT
.sp
\fBstrict\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBFalse\fP\&. Requires full pathway data within a
one\-to\-many structure
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to collapse
.UNINDENT
.UNINDENT
.TP
.B Returns
Table
.INDENT 7.0
.INDENT 3.5
The collapsed table
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a \fBTable\fP
.sp
.nf
.ft C
>>> dt_rich = Table(
\&...    np.array([[5, 6, 7], [8, 9, 10], [11, 12, 13]]),
\&...    [\(aq1\(aq, \(aq2\(aq, \(aq3\(aq], [\(aqa\(aq, \(aqb\(aq, \(aqc\(aq],
\&...    [{\(aqtaxonomy\(aq: [\(aqk__a\(aq, \(aqp__b\(aq]},
\&...     {\(aqtaxonomy\(aq: [\(aqk__a\(aq, \(aqp__c\(aq]},
\&...     {\(aqtaxonomy\(aq: [\(aqk__a\(aq, \(aqp__c\(aq]}],
\&...    [{\(aqbarcode\(aq: \(aqaatt\(aq},
\&...     {\(aqbarcode\(aq: \(aqttgg\(aq},
\&...     {\(aqbarcode\(aq: \(aqaatt\(aq}])
>>> print dt_rich # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID a   b   c
1   5.0 6.0 7.0
2   8.0 9.0 10.0
3   11.0    12.0    13.0
.ft P
.fi
.sp
Create Function to determine what partition a vector belongs to
.sp
.nf
.ft C
>>> bin_f = lambda id_, x: x[\(aqtaxonomy\(aq][1]
>>> obs_phy = dt_rich.collapse(
\&...    bin_f, norm=False, min_group_size=1,
\&...    axis=\(aqobservation\(aq).sort(axis=\(aqobservation\(aq)
>>> print obs_phy # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID a   b   c
p__b    5.0 6.0 7.0
p__c    19.0    21.0    23.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.copy
.INDENT 7.0
.TP
.B Table.copy()
Returns a copy of the table
.UNINDENT
.SS biom.table.Table.data
.INDENT 7.0
.TP
.B Table.data(id, axis=\(aqsample\(aq, dense=True)
Returns data associated with an \fIid\fP
.INDENT 7.0
.TP
.B Parameters
\fBid\fP : str
.INDENT 7.0
.INDENT 3.5
ID of the samples or observations whose data will be returned.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}
.INDENT 7.0
.INDENT 3.5
Axis to search for \fIid\fP\&.
.UNINDENT
.UNINDENT
.sp
\fBdense\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
If \fBTrue\fP, return data as dense
.UNINDENT
.UNINDENT
.TP
.B Returns
np.ndarray or scipy.sparse.spmatrix
.INDENT 7.0
.INDENT 3.5
np.ndarray if \fBdense\fP, otherwise scipy.sparse.spmatrix
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> from biom import example_table
>>> example_table.data(\(aqS1\(aq, axis=\(aqsample\(aq)
array([ 0.,  3.])
.ft P
.fi
.UNINDENT
.SS biom.table.Table.delimited_self
.INDENT 7.0
.TP
.B Table.delimited_self(delim=\(aqt\(aq, header_key=None, header_value=None, metadata_formatter=<type \(aqstr\(aq>, observation_column_name=\(aq#OTU ID\(aq)
Return self as a string in a delimited form
.sp
Default str output for the Table is just row/col ids and table data
without any metadata
.sp
Including observation metadata in output: If \fBheader_key\fP is not
\fBNone\fP, the observation metadata with that name will be included
in the delimited output. If \fBheader_value\fP is also not \fBNone\fP, the
observation metadata will use the provided \fBheader_value\fP as the
observation metadata name (i.e., the column header) in the delimited
output.
.sp
\fBmetadata_formatter\fP: a function which takes a metadata entry and
returns a formatted version that should be written to file
.sp
\fBobservation_column_name\fP: the name of the first column in the output
table, corresponding to the observation IDs. For example, the default
will look something like:
.INDENT 7.0
.INDENT 3.5
#OTU ID     Sample1 Sample2
OTU1        10      2
OTU2        4       8
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.descriptive_equality
.INDENT 7.0
.TP
.B Table.descriptive_equality(other)
For use in testing, describe how the tables are not equal
.UNINDENT
.SS biom.table.Table.exists
.INDENT 7.0
.TP
.B Table.exists(id, axis=\(aqsample\(aq)
Returns whether id exists in axis
.INDENT 7.0
.TP
.B Parameters
\fBid: str\fP
.INDENT 7.0
.INDENT 3.5
id to check if exists
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to check
.UNINDENT
.UNINDENT
.TP
.B Returns
bool
.INDENT 7.0
.INDENT 3.5
\fBTrue\fP if \fIid\fP exists, \fBFalse\fP otherwise
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq])
.ft P
.fi
.sp
Check whether sample ID is in the table:
.sp
.nf
.ft C
>>> table.exists(\(aqS1\(aq)
True
>>> table.exists(\(aqS4\(aq)
False
.ft P
.fi
.sp
Check whether an observation ID is in the table:
.sp
.nf
.ft C
>>> table.exists(\(aqO1\(aq, \(aqobservation\(aq)
True
>>> table.exists(\(aqO3\(aq, \(aqobservation\(aq)
False
.ft P
.fi
.UNINDENT
.SS biom.table.Table.filter
.INDENT 7.0
.TP
.B Table.filter(ids_to_keep, axis=\(aqsample\(aq, invert=False, inplace=True)
Filter a table based on a function or iterable.
.INDENT 7.0
.TP
.B Parameters
\fBids_to_keep\fP : iterable, or function(values, id, metadata) \-> bool
.INDENT 7.0
.INDENT 3.5
If a function, it will be called with the values of the
sample/observation, its id (a string) and the dictionary
of metadata of each sample/observation, and must return a
boolean. If it\(aqs an iterable, it must be a list of ids to
keep.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
It controls whether to filter samples or observations and
defaults to "sample".
.UNINDENT
.UNINDENT
.sp
\fBinvert\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBFalse\fP\&. If set to \fBTrue\fP, discard samples or
observations where \fIids_to_keep\fP returns True
.UNINDENT
.UNINDENT
.sp
\fBinplace\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. Whether to return a new table or modify
itself.
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
Returns itself if \fIinplace\fP, else returns a new filtered table.
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table, with observation metadata and sample
metadata:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq],
\&...               [{\(aqfull_genome_available\(aq: True},
\&...                {\(aqfull_genome_available\(aq: False}],
\&...               [{\(aqsample_type\(aq: \(aqa\(aq}, {\(aqsample_type\(aq: \(aqa\(aq},
\&...                {\(aqsample_type\(aq: \(aqb\(aq}])
.ft P
.fi
.sp
Define a function to keep only samples with sample_type == \(aqa\(aq. This
will drop sample S3, which has sample_type \(aqb\(aq:
.sp
.nf
.ft C
>>> filter_fn = lambda val, id_, md: md[\(aqsample_type\(aq] == \(aqa\(aq
.ft P
.fi
.sp
Get a filtered version of the table, leaving the original table
untouched:
.sp
.nf
.ft C
>>> new_table = table.filter(filter_fn, inplace=False)
>>> print table.ids()
[\(aqS1\(aq \(aqS2\(aq \(aqS3\(aq]
>>> print new_table.ids()
[\(aqS1\(aq \(aqS2\(aq]
.ft P
.fi
.sp
Using the same filtering function, discard all samples with sample_type
\(aqa\(aq. This will keep only sample S3, which has sample_type \(aqb\(aq:
.sp
.nf
.ft C
>>> new_table = table.filter(filter_fn, inplace=False, invert=True)
>>> print table.ids()
[\(aqS1\(aq \(aqS2\(aq \(aqS3\(aq]
>>> print new_table.ids()
[\(aqS3\(aq]
.ft P
.fi
.sp
Filter the table in\-place using the same function (drop all samples
where sample_type is not \(aqa\(aq):
.sp
.nf
.ft C
>>> table.filter(filter_fn)
2 x 2 <class \(aqbiom.table.Table\(aq> with 2 nonzero entries (50% dense)
>>> print table.ids()
[\(aqS1\(aq \(aqS2\(aq]
.ft P
.fi
.sp
Filter out all observations in the table that do not have
full_genome_available == True. This will filter out observation O2:
.sp
.nf
.ft C
>>> filter_fn = lambda val, id_, md: md[\(aqfull_genome_available\(aq]
>>> table.filter(filter_fn, axis=\(aqobservation\(aq)
1 x 2 <class \(aqbiom.table.Table\(aq> with 0 nonzero entries (0% dense)
>>> print table.ids(axis=\(aqobservation\(aq)
[\(aqO1\(aq]
.ft P
.fi
.UNINDENT
.SS biom.table.Table.from_hdf5
.INDENT 7.0
.TP
.B classmethod Table.from_hdf5(h5grp, ids=None, axis=\(aqsample\(aq)
Parse an HDF5 formatted BIOM table
.sp
If ids is provided, only the samples/observations listed in ids
(depending on the value of axis) will be loaded
.sp
The expected structure of this group is below. A few basic definitions,
N is the number of observations and M is the number of samples. Data
are stored in both compressed sparse row (for observation oriented
operations) and compressed sparse column (for sample oriented
operations).
.INDENT 7.0
.TP
.B Parameters
\fBh5grp\fP : a h5py \fBGroup\fP or an open h5py \fBFile\fP
.sp
\fBids\fP : iterable
.INDENT 7.0
.INDENT 3.5
The sample/observation ids of the samples/observations that we need
to retrieve from the hdf5 biom table
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to subset on
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
A BIOM \fBTable\fP object
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBValueError\fP
.INDENT 7.0
.INDENT 3.5
If \fIids\fP are not a subset of the samples or observations ids
present in the hdf5 biom table
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBSEE ALSO:\fP
.INDENT 7.0
.INDENT 3.5
\fBTable.to_hdf5\fP
.UNINDENT
.UNINDENT
Notes
.sp
The expected HDF5 group structure is below. An example of an HDF5 file
in DDL can be found here [R9]\&.
.INDENT 7.0
.IP \(bu 2
\&./id                                                  : str, an arbitrary ID
.IP \(bu 2
\&./type                                                : str, the table type (e.g, OTU table)
.IP \(bu 2
\&./format\-url                                          : str, a URL that describes the format
.IP \(bu 2
\&./format\-version                                      : two element tuple of int32, major and minor
.IP \(bu 2
\&./generated\-by                                        : str, what generated this file
.IP \(bu 2
\&./creation\-date                                       : str, ISO format
.IP \(bu 2
\&./shape                                               : two element tuple of int32, N by M
.IP \(bu 2
\&./nnz                                                 : int32 or int64, number of non zero elems
.IP \(bu 2
\&./observation                                         : Group
.IP \(bu 2
\&./observation/ids                                     : (N,) dataset of str or vlen str
.IP \(bu 2
\&./observation/matrix                                  : Group
.IP \(bu 2
\&./observation/matrix/data                             : (nnz,) dataset of float64
.IP \(bu 2
\&./observation/matrix/indices                          : (nnz,) dataset of int32
.IP \(bu 2
\&./observation/matrix/indptr                           : (M+1,) dataset of int32
.IP \(bu 2
\&./observation/metadata                                : Group
.IP \(bu 2
[./observation/metadata/foo]                          : Optional, (N,) dataset of any valid HDF5 type in index order with IDs.
.IP \(bu 2
\&./observation/group\-metadata                          : Group
.IP \(bu 2
[./observation/group\-metadata/foo]                    : Optional, (?,) dataset of group metadata that relates IDs
.IP \(bu 2
[./observation/group\-metadata/foo.attrs[\(aqdata_type\(aq]] : attribute of the foo dataset that describes contained type (e.g., newick)
.IP \(bu 2
\&./sample                                              : Group
.IP \(bu 2
\&./sample/ids                                          : (M,) dataset of str or vlen str
.IP \(bu 2
\&./sample/matrix                                       : Group
.IP \(bu 2
\&./sample/matrix/data                                  : (nnz,) dataset of float64
.IP \(bu 2
\&./sample/matrix/indices                               : (nnz,) dataset of int32
.IP \(bu 2
\&./sample/matrix/indptr                                : (N+1,) dataset of int32
.IP \(bu 2
\&./sample/metadata                                     : Group
.IP \(bu 2
[./sample/metadata/foo]                               : Optional, (M,) dataset of any valid HDF5 type in index order with IDs.
.IP \(bu 2
\&./sample/group\-metadata                               : Group
.IP \(bu 2
[./sample/group\-metadata/foo]                         : Optional, (?,) dataset of group metadata that relates IDs
.IP \(bu 2
[./sample/group\-metadata/foo.attrs[\(aqdata_type\(aq]]      : attribute of the foo dataset that describes contained type (e.g., newick)
.UNINDENT
.sp
The \(aq?\(aq character on the dataset size means that it can be of arbitrary
length.
.sp
The expected structure for each of the metadata datasets is a list of
atomic type objects (int, float, str, ...), where the index order of
the list corresponds to the index order of the relevant axis IDs.
Special metadata fields have been defined, and they are stored in a
specific way. Currently, the available special metadata fields are:
.INDENT 7.0
.IP \(bu 2
taxonomy: (N, ?) dataset of str or vlen str
.IP \(bu 2
KEGG_Pathways: (N, ?) dataset of str or vlen str
.IP \(bu 2
collapsed_ids: (N, ?) dataset of str or vlen str
.UNINDENT
References.IP [R7] 5
\fI\%http://docs.scipy.org/doc/scipy\-0.13.0/reference/generated/scipy.sparse.csr_matrix.html\fP
.IP [R8] 5
\fI\%http://docs.scipy.org/doc/scipy\-0.13.0/reference/generated/scipy.sparse.csc_matrix.html\fP
.IP [R9] 5
\fI\%http://biom\-format.org/documentation/format_versions/biom\-2.0.html\fP
Examples.sp
.nf
.ft C
>>> from biom.table import Table
>>> from biom.util import biom_open
>>> with biom_open(\(aqrich_sparse_otu_table_hdf5.biom\(aq) as f # doctest: +SKIP
>>>     t = Table.from_hdf5(f) # doctest: +SKIP
.ft P
.fi
.sp
Parse a hdf5 biom table subsetting observations
>>> from biom.util import biom_open # doctest: +SKIP
>>> from biom.parse import parse_biom_table
>>> with biom_open(\(aqrich_sparse_otu_table_hdf5.biom\(aq) as f # doctest: +SKIP
>>>     t = Table.from_hdf5(f, ids=["GG_OTU_1"],
...                         axis=\(aqobservation\(aq) # doctest: +SKIP
.UNINDENT
.SS biom.table.Table.from_json
.INDENT 7.0
.TP
.B classmethod Table.from_json(json_table, data_pump=None, input_is_dense=False)
Parse a biom otu table type
.INDENT 7.0
.TP
.B Parameters
\fBjson_table\fP : dict
.INDENT 7.0
.INDENT 3.5
A JSON object or dict that represents the BIOM table
.UNINDENT
.UNINDENT
.sp
\fBdata_pump\fP : tuple or None
.INDENT 7.0
.INDENT 3.5
A secondary source of data
.UNINDENT
.UNINDENT
.sp
\fBinput_is_dense\fP : bool
.INDENT 7.0
.INDENT 3.5
If \fITrue\fP, the data contained will be interpretted as dense
.UNINDENT
.UNINDENT
.TP
.B Returns
Table
.UNINDENT
Examples.sp
.nf
.ft C
>>> from biom import Table
>>> json_obj = {"id": "None",
\&...             "format": "Biological Observation Matrix 1.0.0",
\&...             "format_url": "http://biom\-format.org",
\&...             "generated_by": "foo",
\&...             "type": "OTU table",
\&...             "date": "2014\-06\-03T14:24:40.884420",
\&...             "matrix_element_type": "float",
\&...             "shape": [5, 6],
\&...             "data": [[0,2,1.0],
\&...                      [1,0,5.0],
\&...                      [1,1,1.0],
\&...                      [1,3,2.0],
\&...                      [1,4,3.0],
\&...                      [1,5,1.0],
\&...                      [2,2,1.0],
\&...                      [2,3,4.0],
\&...                      [2,5,2.0],
\&...                      [3,0,2.0],
\&...                      [3,1,1.0],
\&...                      [3,2,1.0],
\&...                      [3,5,1.0],
\&...                      [4,1,1.0],
\&...                      [4,2,1.0]],
\&...             "rows": [{"id": "GG_OTU_1", "metadata": None},
\&...                      {"id": "GG_OTU_2", "metadata": None},
\&...                      {"id": "GG_OTU_3", "metadata": None},
\&...                      {"id": "GG_OTU_4", "metadata": None},
\&...                      {"id": "GG_OTU_5", "metadata": None}],
\&...             "columns": [{"id": "Sample1", "metadata": None},
\&...                         {"id": "Sample2", "metadata": None},
\&...                         {"id": "Sample3", "metadata": None},
\&...                         {"id": "Sample4", "metadata": None},
\&...                         {"id": "Sample5", "metadata": None},
\&...                         {"id": "Sample6", "metadata": None}]
\&...             }
>>> t = Table.from_json(json_obj)
.ft P
.fi
.UNINDENT
.SS biom.table.Table.from_tsv
.INDENT 7.0
.TP
.B static Table.from_tsv(lines, obs_mapping, sample_mapping, process_func, **kwargs)
Parse a tab separated (observation x sample) formatted BIOM table
.INDENT 7.0
.TP
.B Parameters
\fBlines\fP : list, or file\-like object
.INDENT 7.0
.INDENT 3.5
The tab delimited data to parse
.UNINDENT
.UNINDENT
.sp
\fBobs_mapping\fP : dict or None
.INDENT 7.0
.INDENT 3.5
The corresponding observation metadata
.UNINDENT
.UNINDENT
.sp
\fBsample_mapping\fP : dict or None
.INDENT 7.0
.INDENT 3.5
The corresponding sample metadata
.UNINDENT
.UNINDENT
.sp
\fBprocess_func\fP : function
.INDENT 7.0
.INDENT 3.5
A function to transform the observation metadata
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
A BIOM \fBTable\fP object
.UNINDENT
.UNINDENT
.UNINDENT
Examples
.sp
Parse tab separated data into a table:
.sp
.nf
.ft C
>>> from biom.table import Table
>>> from StringIO import StringIO
>>> tsv = \(aqa\etb\etc\en1\et2\et3\en4\et5\et6\(aq
>>> tsv_fh = StringIO(tsv)
>>> func = lambda x : x
>>> test_table = Table.from_tsv(tsv_fh, None, None, func)
.ft P
.fi
.UNINDENT
.SS biom.table.Table.get_table_density
.INDENT 7.0
.TP
.B Table.get_table_density()
Returns the fraction of nonzero elements in the table.
.INDENT 7.0
.TP
.B Returns
float
.INDENT 7.0
.INDENT 3.5
The fraction of nonzero elements in the table
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.get_value_by_ids
.INDENT 7.0
.TP
.B Table.get_value_by_ids(obs_id, samp_id)
Return value in the matrix corresponding to \fB(obs_id, samp_id)\fP
.INDENT 7.0
.TP
.B Parameters
\fBobs_id\fP : str
.INDENT 7.0
.INDENT 3.5
The ID of the observation
.UNINDENT
.UNINDENT
.sp
\fBsamp_id\fP : str
.INDENT 7.0
.INDENT 3.5
The ID of the sample
.UNINDENT
.UNINDENT
.TP
.B Returns
float
.INDENT 7.0
.INDENT 3.5
The data value corresponding to the specified matrix position
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.group_metadata
.INDENT 7.0
.TP
.B Table.group_metadata(axis=\(aqsample\(aq)
Return the group metadata of the given axis
.INDENT 7.0
.TP
.B Parameters
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
Axis to search for the group metadata. Defaults to \(aqsample\(aq
.UNINDENT
.UNINDENT
.TP
.B Returns
dict
.INDENT 7.0
.INDENT 3.5
The corresponding group metadata for the given axis
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table, with group observation metadata and no group
sample metadata:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> group_observation_md = {\(aqtree\(aq: (\(aqnewick\(aq, \(aq(O1:0.3,O2:0.4);\(aq)}
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq],
\&...               observation_group_metadata=group_observation_md)
.ft P
.fi
.sp
Get the observation group metadata:
.sp
.nf
.ft C
>>> table.group_metadata(axis=\(aqobservation\(aq)
{\(aqtree\(aq: (\(aqnewick\(aq, \(aq(O1:0.3,O2:0.4);\(aq)}
.ft P
.fi
.sp
Get the sample group metadata:
.sp
>> table.group_metadata()
None
.UNINDENT
.SS biom.table.Table.ids
.INDENT 7.0
.TP
.B Table.ids(axis=\(aqsample\(aq)
Return the ids along the given axis
.INDENT 7.0
.TP
.B Parameters
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
Axis to search for \fIid\fP\&. Defaults to \(aqsample\(aq
.UNINDENT
.UNINDENT
.TP
.B Returns
1\-D numpy array
.INDENT 7.0
.INDENT 3.5
The ids along the given axis
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq])
.ft P
.fi
.sp
Get the ids along the observation axis:
.sp
.nf
.ft C
>>> print table.ids(axis=\(aqobservation\(aq)
[\(aqO1\(aq \(aqO2\(aq]
.ft P
.fi
.sp
Get the ids along the sample axis:
.sp
.nf
.ft C
>>> print table.ids()
[\(aqS1\(aq \(aqS2\(aq \(aqS3\(aq]
.ft P
.fi
.UNINDENT
.SS biom.table.Table.index
.INDENT 7.0
.TP
.B Table.index(id, axis)
Return the index of the identified sample/observation.
.INDENT 7.0
.TP
.B Parameters
\fBid\fP : str
.INDENT 7.0
.INDENT 3.5
ID of the sample or observation whose index will be returned.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}
.INDENT 7.0
.INDENT 3.5
Axis to search for \fIid\fP\&.
.UNINDENT
.UNINDENT
.TP
.B Returns
int
.INDENT 7.0
.INDENT 3.5
Index of the sample/observation identified by \fIid\fP\&.
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.sp
\fBUnknownIDError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized sample/observation ID.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq])
.ft P
.fi
.sp
Get the index of the observation with ID "O2":
.sp
.nf
.ft C
>>> table.index(\(aqO2\(aq, \(aqobservation\(aq)
1
.ft P
.fi
.sp
Get the index of the sample with ID "S1":
.sp
.nf
.ft C
>>> table.index(\(aqS1\(aq, \(aqsample\(aq)
0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.is_empty
.INDENT 7.0
.TP
.B Table.is_empty()
Check whether the table is empty
.INDENT 7.0
.TP
.B Returns
bool
.INDENT 7.0
.INDENT 3.5
\fBTrue\fP if the table is empty, \fBFalse\fP otherwise
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.iter
.INDENT 7.0
.TP
.B Table.iter(dense=True, axis=\(aqsample\(aq)
Yields \fB(value, id, metadata)\fP
.INDENT 7.0
.TP
.B Parameters
\fBdense\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. If \fBFalse\fP, yield compressed sparse row or
compressed sparse columns if \fIaxis\fP is \(aqobservation\(aq or \(aqsample\(aq,
respectively.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to iterate over.
.UNINDENT
.UNINDENT
.TP
.B Returns
GeneratorType
.INDENT 7.0
.INDENT 3.5
A generator that yields (values, id, metadata)
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqZ3\(aq])
.ft P
.fi
.sp
Iter over samples and keep those that start with an Z:
.sp
.nf
.ft C
>>> [(values, id, metadata)
\&...     for values, id, metadata in table.iter() if id[0]==\(aqZ\(aq]
[(array([  1.,  42.]), \(aqZ3\(aq, None)]
.ft P
.fi
.sp
Iter over observations and add the 2nd column of the values
.sp
.nf
.ft C
>>> col = [values[1] for values, id, metadata in table.iter()]
>>> sum(col)
46.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.iter_data
.INDENT 7.0
.TP
.B Table.iter_data(dense=True, axis=\(aqsample\(aq)
Yields axis values
.INDENT 7.0
.TP
.B Parameters
\fBdense\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. If \fBFalse\fP, yield compressed sparse row or
compressed sparse columns if \fIaxis\fP is \(aqobservation\(aq or \(aqsample\(aq,
respectively.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
Axis to iterate over.
.UNINDENT
.UNINDENT
.TP
.B Returns
generator
.INDENT 7.0
.INDENT 3.5
Yields list of values for each value in \fIaxis\fP
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If axis other than \(aqsample\(aq or \(aqobservation\(aq passed
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
>>> data = np.arange(30).reshape(3,10) # 3 X 10 OTU X Sample table
>>> obs_ids = [\(aqo1\(aq, \(aqo2\(aq, \(aqo3\(aq]
>>> sam_ids = [\(aqs%i\(aq %i for i in range(1,11)]
>>> bt = Table(data, observation_ids=obs_ids, sample_ids=sam_ids)
.ft P
.fi
.sp
Lets find the sample with the largest sum
.sp
.nf
.ft C
>>> sample_gen = bt.iter_data(axis=\(aqsample\(aq)
>>> max_sample_count = max([sample.sum() for sample in sample_gen])
>>> print max_sample_count
57.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.iter_pairwise
.INDENT 7.0
.TP
.B Table.iter_pairwise(dense=True, axis=\(aqsample\(aq, tri=True, diag=False)
Pairwise iteration over self
.INDENT 7.0
.TP
.B Parameters
\fBdense\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. If \fBFalse\fP, yield compressed sparse row or
compressed sparse columns if \fIaxis\fP is \(aqobservation\(aq or \(aqsample\(aq,
respectively.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to iterate over.
.UNINDENT
.UNINDENT
.sp
\fBtri\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
If \fBTrue\fP, just yield [i, j] and not [j, i]
.UNINDENT
.UNINDENT
.sp
\fBdiag\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
If \fBTrue\fP, yield [i, i]
.UNINDENT
.UNINDENT
.TP
.B Returns
GeneratorType
.INDENT 7.0
.INDENT 3.5
Yields [(val_i, id_i, metadata_i), (val_j, id_j, metadata_j)]
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.UNINDENT
Examples.sp
.nf
.ft C
>>> from biom import example_table
.ft P
.fi
.sp
By default, only the upper triangle without the diagonal  of the
resulting pairwise combinations is yielded.
.sp
.nf
.ft C
>>> iter_ = example_table.iter_pairwise()
>>> for (val_i, id_i, md_i), (val_j, id_j, md_j) in iter_:
\&...     print id_i, id_j
S1 S2
S1 S3
S2 S3
.ft P
.fi
.sp
The full pairwise combinations can also be yielded though.
.sp
.nf
.ft C
>>> iter_ = example_table.iter_pairwise(tri=False, diag=True)
>>> for (val_i, id_i, md_i), (val_j, id_j, md_j) in iter_:
\&...     print id_i, id_j
S1 S1
S1 S2
S1 S3
S2 S1
S2 S2
S2 S3
S3 S1
S3 S2
S3 S3
.ft P
.fi
.UNINDENT
.SS biom.table.Table.max
.INDENT 7.0
.TP
.B Table.max(axis=\(aqsample\(aq)
Get the maximum nonzero value over an axis
.INDENT 7.0
.TP
.B Parameters
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq, \(aqwhole\(aq}, optional
.INDENT 7.0
.INDENT 3.5
Defaults to "sample". The axis over which to calculate maxima.
.UNINDENT
.UNINDENT
.TP
.B Returns
scalar of self.dtype or np.array of self.dtype
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> from biom import example_table
>>> print example_table.max(axis=\(aqobservation\(aq)
[ 2.  5.]
.ft P
.fi
.UNINDENT
.SS biom.table.Table.merge
.INDENT 7.0
.TP
.B Table.merge(other, sample=\(aqunion\(aq, observation=\(aqunion\(aq, sample_metadata_f=<function prefer_self at 0x7fe2a9d56b18>, observation_metadata_f=<function prefer_self at 0x7fe2a9d56b18>)
Merge two tables together
.sp
The axes, samples and observations, can be controlled independently.
Both can work on either "union" or "intersection".
.sp
\fIsample_metadata_f\fP and \fIobservation_metadata_f\fP define how to
merge metadata between tables. The default is to just keep the metadata
associated to self if self has metadata otherwise take metadata from
other. These functions are given both metadata dicts and must return
a single metadata dict
.INDENT 7.0
.TP
.B Parameters
\fBother\fP : biom.Table
.INDENT 7.0
.INDENT 3.5
The other table to merge with this one
.UNINDENT
.UNINDENT
.sp
\fBsample\fP : {\(aqunion\(aq, \(aqintersection\(aq}, optional
.sp
\fBobservation\fP : {\(aqunion\(aq, \(aqintersection\(aq}, optional
.sp
\fBsample_metadata_f\fP : function, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBbiom.util.prefer_self\fP\&. Defines how to handle sample
metadata during merge.
.UNINDENT
.UNINDENT
.sp
\fBobesrvation_metadata_f\fP : function, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBbiom.util.prefer_self\fP\&. Defines how to handle
observation metdata during merge.
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
The merged table
.UNINDENT
.UNINDENT
.UNINDENT
Notes.INDENT 7.0
.IP \(bu 2
There is an implicit type conversion to \fBfloat\fP\&.
.IP \(bu 2
The return type is always that of \fBself\fP
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x2 table and a 3x2 table:
.sp
.nf
.ft C
>>> d_a = np.asarray([[2, 0], [6, 1]])
>>> t_a = Table(d_a, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq])
>>> d_b = np.asarray([[4, 5], [0, 3], [10, 10]])
>>> t_b = Table(d_b, [\(aqO1\(aq, \(aqO2\(aq, \(aqO3\(aq], [\(aqS1\(aq, \(aqS2\(aq])
.ft P
.fi
.sp
Merging the table results in the overlapping samples/observations (see
\fIO1\fP and \fIS2\fP) to be summed and the non\-overlapping ones to be added to
the resulting table (see \fIS3\fP).
.sp
.nf
.ft C
>>> merged_table = t_a.merge(t_b)
>>> print merged_table  # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1      S2
O1      6.0     5.0
O2      6.0     4.0
O3      10.0    10.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.metadata
.INDENT 7.0
.TP
.B Table.metadata(id=None, axis=\(aqsample\(aq)
Return the metadata of the identified sample/observation.
.INDENT 7.0
.TP
.B Parameters
\fBid\fP : str
.INDENT 7.0
.INDENT 3.5
ID of the sample or observation whose index will be returned.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}
.INDENT 7.0
.INDENT 3.5
Axis to search for \fIid\fP\&.
.UNINDENT
.UNINDENT
.TP
.B Returns
defaultdict or None
.INDENT 7.0
.INDENT 3.5
The corresponding metadata \fBdefaultdict\fP or \fBNone\fP of that axis
does not have metadata.
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.sp
\fBUnknownIDError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized sample/observation ID.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table, with observation metadata and no sample
metadata:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq],
\&...               [{\(aqfoo\(aq: \(aqbar\(aq}, {\(aqx\(aq: \(aqy\(aq}], None)
.ft P
.fi
.sp
Get the metadata of the observation with ID "O2":
.sp
.nf
.ft C
>>> # casting to \(gadict\(ga as the return is \(gadefaultdict\(ga
>>> dict(table.metadata(\(aqO2\(aq, \(aqobservation\(aq))
{\(aqx\(aq: \(aqy\(aq}
.ft P
.fi
.sp
Get the metadata of the sample with ID "S1":
.sp
.nf
.ft C
>>> table.metadata(\(aqS1\(aq, \(aqsample\(aq) is None
True
.ft P
.fi
.UNINDENT
.SS biom.table.Table.min
.INDENT 7.0
.TP
.B Table.min(axis=\(aqsample\(aq)
Get the minimum nonzero value over an axis
.INDENT 7.0
.TP
.B Parameters
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq, \(aqwhole\(aq}, optional
.INDENT 7.0
.INDENT 3.5
Defaults to "sample". The axis over which to calculate minima.
.UNINDENT
.UNINDENT
.TP
.B Returns
scalar of self.dtype or np.array of self.dtype
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> from biom import example_table
>>> print example_table.min(axis=\(aqsample\(aq)
[ 3.  1.  2.]
.ft P
.fi
.UNINDENT
.SS biom.table.Table.nonzero
.INDENT 7.0
.TP
.B Table.nonzero()
Yields locations of nonzero elements within the data matrix
.INDENT 7.0
.TP
.B Returns
generator
.INDENT 7.0
.INDENT 3.5
Yields \fB(observation_id, sample_id)\fP for each nonzero element
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.nonzero_counts
.INDENT 7.0
.TP
.B Table.nonzero_counts(axis, binary=False)
Get nonzero summaries about an axis
.INDENT 7.0
.TP
.B Parameters
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq, \(aqwhole\(aq}
.INDENT 7.0
.INDENT 3.5
The axis on which to count nonzero entries
.UNINDENT
.UNINDENT
.sp
\fBbinary\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBFalse\fP\&. If \fBFalse\fP, return number of nonzero
entries. If \fBTrue\fP, sum the values of the entries.
.UNINDENT
.UNINDENT
.TP
.B Returns
numpy.array
.INDENT 7.0
.INDENT 3.5
Counts in index order to the axis
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.norm
.INDENT 7.0
.TP
.B Table.norm(axis=\(aqsample\(aq, inplace=True)
Normalize in place sample values by an observation, or vice versa.
.INDENT 7.0
.TP
.B Parameters
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to use for normalization
.UNINDENT
.UNINDENT
.sp
\fBinplace\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. If \fBTrue\fP, performs the normalization in
place. Otherwise, returns a new table with the noramlization
applied.
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
The normalized table
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x2 table:
.sp
.nf
.ft C
>>> data = np.asarray([[2, 0], [6, 1]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq])
.ft P
.fi
.sp
Get a version of the table normalized on the \(aqsample\(aq axis, leaving the
original table untouched:
.sp
.nf
.ft C
>>> new_table = table.norm(inplace=False)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  2.0 0.0
O2  6.0 1.0
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  0.25    0.0
O2  0.75    1.0
.ft P
.fi
.sp
Get a version of the table normalized on the \(aqobservation\(aq axis,
again leaving the original table untouched:
.sp
.nf
.ft C
>>> new_table = table.norm(axis=\(aqobservation\(aq, inplace=False)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  2.0 0.0
O2  6.0 1.0
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  1.0 0.0
O2  0.857142857143  0.142857142857
.ft P
.fi
.sp
Do the same normalization on \(aqobservation\(aq, this time in\-place:
.sp
.nf
.ft C
>>> table.norm(axis=\(aqobservation\(aq)
2 x 2 <class \(aqbiom.table.Table\(aq> with 3 nonzero entries (75% dense)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  1.0 0.0
O2  0.857142857143  0.142857142857
.ft P
.fi
.UNINDENT
.SS biom.table.Table.pa
.INDENT 7.0
.TP
.B Table.pa(inplace=True)
Convert the table to presence/absence data
.INDENT 7.0
.TP
.B Parameters
\fBinplace\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBFalse\fP
.UNINDENT
.UNINDENT
.TP
.B Returns
Table
.INDENT 7.0
.INDENT 3.5
Returns itself if \fIinplace\fP, else returns a new presence/absence
table.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> from biom.table import Table
>>> import numpy as np
.ft P
.fi
.sp
Create a 2x3 BIOM table
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq])
.ft P
.fi
.sp
Convert to presence/absence data
.sp
.nf
.ft C
>>> _ = table.pa()
>>> print table.data(\(aqO1\(aq, \(aqobservation\(aq)
[ 0.  0.  1.]
>>> print table.data(\(aqO2\(aq, \(aqobservation\(aq)
[ 1.  1.  1.]
.ft P
.fi
.UNINDENT
.SS biom.table.Table.partition
.INDENT 7.0
.TP
.B Table.partition(f, axis=\(aqsample\(aq)
Yields partitions
.INDENT 7.0
.TP
.B Parameters
\fBf\fP : function
.INDENT 7.0
.INDENT 3.5
\fIf\fP is given the ID and metadata of the vector and must return
what partition the vector is part of.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to iterate over
.UNINDENT
.UNINDENT
.TP
.B Returns
GeneratorType
.INDENT 7.0
.INDENT 3.5
A generator that yields (partition, \fITable\fP)
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
>>> from biom.util import unzip
.ft P
.fi
.sp
Create a 2x3 BIOM table, with observation metadata and sample
metadata:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq],
\&...               [{\(aqfull_genome_available\(aq: True},
\&...                {\(aqfull_genome_available\(aq: False}],
\&...               [{\(aqsample_type\(aq: \(aqa\(aq}, {\(aqsample_type\(aq: \(aqa\(aq},
\&...                {\(aqsample_type\(aq: \(aqb\(aq}])
.ft P
.fi
.sp
Define a function to bin by sample_type
.sp
.nf
.ft C
>>> f = lambda id_, md: md[\(aqsample_type\(aq]
.ft P
.fi
.sp
Partition the table and view results
.sp
.nf
.ft C
>>> bins, tables = table.partition(f)
>>> print bins[1] # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2
O1  0.0 0.0
O2  1.0 3.0
>>> print tables[1] # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S3
O1  1.0
O2  42.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.reduce
.INDENT 7.0
.TP
.B Table.reduce(f, axis)
Reduce over axis using function \fIf\fP
.INDENT 7.0
.TP
.B Parameters
\fBf\fP : function
.INDENT 7.0
.INDENT 3.5
The function to use for the reduce operation
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}
.INDENT 7.0
.INDENT 3.5
The axis on which to operate
.UNINDENT
.UNINDENT
.TP
.B Returns
numpy.array
.INDENT 7.0
.INDENT 3.5
A one\-dimensional array representing the reduced rows
(observations) or columns (samples) of the data matrix
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If \fIaxis\fP is neither "sample" nor "observation"
.UNINDENT
.UNINDENT
.sp
\fBTableException\fP
.INDENT 7.0
.INDENT 3.5
If the table\(aqs data matrix is empty
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 table
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq],
\&...               [{\(aqfoo\(aq: \(aqbar\(aq}, {\(aqx\(aq: \(aqy\(aq}], None)
.ft P
.fi
.sp
Create a reduce function
.sp
.nf
.ft C
>>> func = lambda x, y: x + y
.ft P
.fi
.sp
Reduce table on samples
.sp
.nf
.ft C
>>> table.reduce(func, \(aqsample\(aq) # doctest: +NORMALIZE_WHITESPACE
array([  1.,   3.,  43.])
.ft P
.fi
.sp
Reduce table on observations
.sp
.nf
.ft C
>>> table.reduce(func, \(aqobservation\(aq) # doctest: +NORMALIZE_WHITESPACE
array([  1.,  46.])
.ft P
.fi
.UNINDENT
.SS biom.table.Table.sort
.INDENT 7.0
.TP
.B Table.sort(sort_f=<function natsort at 0x7fe2a9d56aa0>, axis=\(aqsample\(aq)
Return a table sorted along axis
.INDENT 7.0
.TP
.B Parameters
\fBsort_f\fP : function, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBbiom.util.natsort\fP\&. A function that takes a list of
values and sorts it
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to operate on
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
A table whose samples or observations are sorted according to the
\fIsort_f\fP function
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table:
.sp
.nf
.ft C
>>> data = np.asarray([[1, 0, 4], [1, 3, 0]])
>>> table = Table(data, [\(aqO2\(aq, \(aqO1\(aq], [\(aqS2\(aq, \(aqS1\(aq, \(aqS3\(aq])
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2  S1  S3
O2  1.0 0.0 4.0
O1  1.0 3.0 0.0
.ft P
.fi
.sp
Sort the order of samples in the table using the default natural
sorting:
.sp
.nf
.ft C
>>> new_table = table.sort()
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O2  0.0 1.0 4.0
O1  3.0 1.0 0.0
.ft P
.fi
.sp
Sort the order of observations in the table using the default natural
sorting:
.sp
.nf
.ft C
>>> new_table = table.sort(axis=\(aqobservation\(aq)
>>> print new_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2  S1  S3
O1  1.0 3.0 0.0
O2  1.0 0.0 4.0
.ft P
.fi
.sp
Sort the samples in reverse order using a custom sort function:
.sp
.nf
.ft C
>>> sort_f = lambda x: list(sorted(x, reverse=True))
>>> new_table = table.sort(sort_f=sort_f)
>>> print new_table  # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S3  S2  S1
O2  4.0 1.0 0.0
O1  0.0 1.0 3.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.sort_order
.INDENT 7.0
.TP
.B Table.sort_order(order, axis=\(aqsample\(aq)
Return a new table with \fIaxis\fP in \fIorder\fP
.INDENT 7.0
.TP
.B Parameters
\fBorder\fP : iterable
.INDENT 7.0
.INDENT 3.5
The desired order for axis
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to operate on
.UNINDENT
.UNINDENT
.TP
.B Returns
Table
.INDENT 7.0
.INDENT 3.5
A table where the observations or samples are sorted according to
\fIorder\fP
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table:
.sp
.nf
.ft C
>>> data = np.asarray([[1, 0, 4], [1, 3, 0]])
>>> table = Table(data, [\(aqO2\(aq, \(aqO1\(aq], [\(aqS2\(aq, \(aqS1\(aq, \(aqS3\(aq])
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2  S1  S3
O2  1.0 0.0 4.0
O1  1.0 3.0 0.0
.ft P
.fi
.sp
Sort the table using a list of samples:
.sp
.nf
.ft C
>>> sorted_table = table.sort_order([\(aqS2\(aq, \(aqS3\(aq, \(aqS1\(aq])
>>> print sorted_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2      S3      S1
O2      1.0     4.0     0.0
O1      1.0     0.0     3.0
.ft P
.fi
.sp
Additionally you could sort the table\(aqs observations:
.sp
.nf
.ft C
>>> sorted_table = table.sort_order([\(aqO1\(aq, \(aqO2\(aq], axis="observation")
>>> print sorted_table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S2      S1      S3
O1      1.0     3.0     0.0
O2      1.0     0.0     4.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.subsample
.INDENT 7.0
.TP
.B Table.subsample(n, axis=\(aqsample\(aq, by_id=False)
Randomly subsample without replacement.
.INDENT 7.0
.TP
.B Parameters
\fBn\fP : int
.INDENT 7.0
.INDENT 3.5
Number of items to subsample from \fIcounts\fP\&.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to sample over
.UNINDENT
.UNINDENT
.sp
\fBby_id\fP : boolean, optional
.INDENT 7.0
.INDENT 3.5
If \fIFalse\fP, the subsampling is based on the counts contained in the
matrix (e.g., rarefaction). If \fITrue\fP, the subsampling is based on
the IDs (e.g., fetch a random subset of samples). Default is
\fIFalse\fP\&.
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
A subsampled version of self
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBValueError\fP
.INDENT 7.0
.INDENT 3.5
If \fIn\fP is less than zero.
.UNINDENT
.UNINDENT
.UNINDENT
Notes
.sp
Subsampling is performed without replacement. If \fIn\fP is greater than
the sum of a given vector, that vector is omitted from the result.
.sp
Adapted from \fIskbio.math.subsample\fP, see biom\-format/licenses for more
information about scikit\-bio.
.sp
This code assumes absolute abundance if \fIby_id\fP is False.
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
>>> table = Table(np.array([[0, 2, 3], [1, 0, 2]]), [\(aqO1\(aq, \(aqO2\(aq],
\&...               [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq])
.ft P
.fi
.sp
Subsample 1 item over the sample axis by value (e.g., rarefaction):
.sp
.nf
.ft C
>>> print table.subsample(1).sum(axis=\(aqsample\(aq)
[ 1.  1.  1.]
.ft P
.fi
.sp
Subsample 2 items over the sample axis, note that \(aqS1\(aq is filtered out:
.sp
.nf
.ft C
>>> ss = table.subsample(2)
>>> print ss.sum(axis=\(aqsample\(aq)
[ 2.  2.]
>>> print ss.ids()
[\(aqS2\(aq \(aqS3\(aq]
.ft P
.fi
.sp
Subsample by IDs over the sample axis. For this example, we\(aqre going to
randomly select 2 samples and do this 100 times, and then print out the
set of IDs observed.
.sp
.nf
.ft C
>>> ids = set([tuple(table.subsample(2, by_id=True).ids())
\&...            for i in range(100)])
>>> print sorted(ids)
[(\(aqS1\(aq, \(aqS2\(aq), (\(aqS1\(aq, \(aqS3\(aq), (\(aqS2\(aq, \(aqS3\(aq)]
.ft P
.fi
.UNINDENT
.SS biom.table.Table.sum
.INDENT 7.0
.TP
.B Table.sum(axis=\(aqwhole\(aq)
Returns the sum by axis
.INDENT 7.0
.TP
.B Parameters
\fBaxis\fP : {\(aqwhole\(aq, \(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis on which to operate.
.UNINDENT
.UNINDENT
.TP
.B Returns
numpy.array or float
.INDENT 7.0
.INDENT 3.5
If \fIaxis\fP is "whole", returns an float representing the whole
table sum. If \fIaxis\fP is either "sample" or "observation", returns a
numpy.array that holds a sum for each sample or observation,
respectively.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq])
.ft P
.fi
.sp
Add all values in the table:
.sp
.nf
.ft C
>>> table.sum()
array(47.0)
.ft P
.fi
.sp
Add all values per sample:
.sp
.nf
.ft C
>>> table.sum(axis=\(aqsample\(aq) # doctest: +NORMALIZE_WHITESPACE
array([  1.,  3.,  43.])
.ft P
.fi
.sp
Add all values per observation:
.sp
.nf
.ft C
>>> table.sum(axis=\(aqobservation\(aq) # doctest: +NORMALIZE_WHITESPACE
array([  1.,  46.])
.ft P
.fi
.UNINDENT
.SS biom.table.Table.to_hdf5
.INDENT 7.0
.TP
.B Table.to_hdf5(h5grp, generated_by, compress=True)
Store CSC and CSR in place
.sp
The resulting structure of this group is below. A few basic
definitions, N is the number of observations and M is the number of
samples. Data are stored in both compressed sparse row [R10] (CSR, for
observation oriented operations) and compressed sparse column [R11]
(CSC, for sample oriented operations).
.INDENT 7.0
.TP
.B Parameters
\fBh5grp\fP : \fIh5py.Group\fP or \fIh5py.File\fP
.INDENT 7.0
.INDENT 3.5
The HDF5 entity in which to write the BIOM formatted data.
.UNINDENT
.UNINDENT
.sp
\fBgenerated_by\fP : str
.INDENT 7.0
.INDENT 3.5
A description of what generated the table
.UNINDENT
.UNINDENT
.sp
\fBcompress\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP means fields will be compressed with gzip,
\fBFalse\fP means no compression
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBSEE ALSO:\fP
.INDENT 7.0
.INDENT 3.5
\fBTable.from_hdf5\fP
.UNINDENT
.UNINDENT
Notes
.sp
The expected HDF5 group structure is below. An example of an HDF5 file
in DDL can be found here [R12]\&.
.INDENT 7.0
.IP \(bu 2
\&./id                                                  : str, an arbitrary ID
.IP \(bu 2
\&./type                                                : str, the table type (e.g, OTU table)
.IP \(bu 2
\&./format\-url                                          : str, a URL that describes the format
.IP \(bu 2
\&./format\-version                                      : two element tuple of int32, major and minor
.IP \(bu 2
\&./generated\-by                                        : str, what generated this file
.IP \(bu 2
\&./creation\-date                                       : str, ISO format
.IP \(bu 2
\&./shape                                               : two element tuple of int32, N by M
.IP \(bu 2
\&./nnz                                                 : int32 or int64, number of non zero elems
.IP \(bu 2
\&./observation                                         : Group
.IP \(bu 2
\&./observation/ids                                     : (N,) dataset of str or vlen str
.IP \(bu 2
\&./observation/matrix                                  : Group
.IP \(bu 2
\&./observation/matrix/data                             : (nnz,) dataset of float64
.IP \(bu 2
\&./observation/matrix/indices                          : (nnz,) dataset of int32
.IP \(bu 2
\&./observation/matrix/indptr                           : (M+1,) dataset of int32
.IP \(bu 2
\&./observation/metadata                                : Group
.IP \(bu 2
[./observation/metadata/foo]                          : Optional, (N,) dataset of any valid HDF5 type in index order with IDs.
.IP \(bu 2
\&./observation/group\-metadata                          : Group
.IP \(bu 2
[./observation/group\-metadata/foo]                    : Optional, (?,) dataset of group metadata that relates IDs
.IP \(bu 2
[./observation/group\-metadata/foo.attrs[\(aqdata_type\(aq]] : attribute of the foo dataset that describes contained type (e.g., newick)
.IP \(bu 2
\&./sample                                              : Group
.IP \(bu 2
\&./sample/ids                                          : (M,) dataset of str or vlen str
.IP \(bu 2
\&./sample/matrix                                       : Group
.IP \(bu 2
\&./sample/matrix/data                                  : (nnz,) dataset of float64
.IP \(bu 2
\&./sample/matrix/indices                               : (nnz,) dataset of int32
.IP \(bu 2
\&./sample/matrix/indptr                                : (N+1,) dataset of int32
.IP \(bu 2
\&./sample/metadata                                     : Group
.IP \(bu 2
[./sample/metadata/foo]                               : Optional, (M,) dataset of any valid HDF5 type in index order with IDs.
.IP \(bu 2
\&./sample/group\-metadata                               : Group
.IP \(bu 2
[./sample/group\-metadata/foo]                         : Optional, (?,) dataset of group metadata that relates IDs
.IP \(bu 2
[./sample/group\-metadata/foo.attrs[\(aqdata_type\(aq]]      : attribute of the foo dataset that describes contained type (e.g., newick)
.UNINDENT
.sp
The \(aq?\(aq character on the dataset size means that it can be of arbitrary
length.
.sp
The expected structure for each of the metadata datasets is a list of
atomic type objects (int, float, str, ...), where the index order of
the list corresponds to the index order of the relevant axis IDs.
Special metadata fields have been defined, and they are stored in a
specific way. Currently, the available special metadata fields are:
.INDENT 7.0
.IP \(bu 2
taxonomy: (N, ?) dataset of str or vlen str
.IP \(bu 2
KEGG_Pathways: (N, ?) dataset of str or vlen str
.IP \(bu 2
collapsed_ids: (N, ?) dataset of str or vlen str
.UNINDENT
References.IP [R10] 5
\fI\%http://docs.scipy.org/doc/scipy\-0.13.0/reference/generated/scipy.sparse.csr_matrix.html\fP
.IP [R11] 5
\fI\%http://docs.scipy.org/doc/scipy\-0.13.0/reference/generated/scipy.sparse.csc_matrix.html\fP
.IP [R12] 5
\fI\%http://biom\-format.org/documentation/format_versions/biom\-2.0.html\fP
Examples.sp
.nf
.ft C
>>> from biom.util import biom_open  # doctest: +SKIP
>>> from biom.table import Table
>>> from numpy import array
>>> t = Table(array([[1, 2], [3, 4]]), [\(aqa\(aq, \(aqb\(aq], [\(aqx\(aq, \(aqy\(aq])
>>> with biom_open(\(aqfoo.biom\(aq, \(aqw\(aq) as f:  # doctest: +SKIP
\&...     t.to_hdf5(f, "example")
.ft P
.fi
.UNINDENT
.SS biom.table.Table.to_json
.INDENT 7.0
.TP
.B Table.to_json(generated_by, direct_io=None)
Returns a JSON string representing the table in BIOM format.
.INDENT 7.0
.TP
.B Parameters
\fBgenerated_by\fP : str
.INDENT 7.0
.INDENT 3.5
a string describing the software used to build the table
.UNINDENT
.UNINDENT
.sp
\fBdirect_io\fP : file or file\-like object, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBNone\fP\&. Must implementing a \fBwrite\fP function. If
\fIdirect_io\fP is not \fBNone\fP, the final output is written directly
to \fIdirect_io\fP during processing.
.UNINDENT
.UNINDENT
.TP
.B Returns
str
.INDENT 7.0
.INDENT 3.5
A JSON\-formatted string representing the biom table
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS biom.table.Table.to_tsv
.INDENT 7.0
.TP
.B Table.to_tsv(header_key=None, header_value=None, metadata_formatter=<type \(aqstr\(aq>, observation_column_name=\(aq#OTU ID\(aq)
Return self as a string in tab delimited form
.sp
Default \fBstr\fP output for the \fBTable\fP is just row/col ids and table
data without any metadata
.INDENT 7.0
.TP
.B Parameters
\fBheader_key\fP : str or \fBNone\fP, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBNone\fP
.UNINDENT
.UNINDENT
.sp
\fBheader_value\fP : str or \fBNone\fP, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBNone\fP
.UNINDENT
.UNINDENT
.sp
\fBmetadata_formatter\fP : function, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBstr\fP\&.  a function which takes a metadata entry and
returns a formatted version that should be written to file
.UNINDENT
.UNINDENT
.sp
\fBobservation_column_name\fP : str, optional
.INDENT 7.0
.INDENT 3.5
Defaults to "#OTU ID". The name of the first column in the output
table, corresponding to the observation IDs.
.UNINDENT
.UNINDENT
.TP
.B Returns
str
.INDENT 7.0
.INDENT 3.5
tab delimited representation of the Table
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 BIOM table, with observation metadata and no sample
metadata:
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq],
\&...               [{\(aqfoo\(aq: \(aqbar\(aq}, {\(aqx\(aq: \(aqy\(aq}], None)
>>> print table.to_tsv() # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1      S2      S3
O1      0.0     0.0     1.0
O2      1.0     3.0     42.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.transform
.INDENT 7.0
.TP
.B Table.transform(f, axis=\(aqsample\(aq, inplace=True)
Iterate over \fIaxis\fP, applying a function \fIf\fP to each vector.
.sp
Only non null values can be modified and the density of the
table can\(aqt increase. However, zeroing values is fine.
.INDENT 7.0
.TP
.B Parameters
\fBf\fP : function(data, id, metadata) \-> new data
.INDENT 7.0
.INDENT 3.5
A function that takes three values: an array of nonzero
values corresponding to each observation or sample, an
observation or sample id, and an observation or sample
metadata entry. It must return an array of transformed
values that replace the original values.
.UNINDENT
.UNINDENT
.sp
\fBaxis\fP : {\(aqsample\(aq, \(aqobservation\(aq}, optional
.INDENT 7.0
.INDENT 3.5
The axis to operate on. Can be "sample" or "observation".
.UNINDENT
.UNINDENT
.sp
\fBinplace\fP : bool, optional
.INDENT 7.0
.INDENT 3.5
Defaults to \fBTrue\fP\&. Whether to return a new table or modify
itself.
.UNINDENT
.UNINDENT
.TP
.B Returns
biom.Table
.INDENT 7.0
.INDENT 3.5
Returns itself if \fIinplace\fP, else returns a new transformed table.
.UNINDENT
.UNINDENT
.TP
.B Raises
\fBUnknownAxisError\fP
.INDENT 7.0
.INDENT 3.5
If provided an unrecognized axis.
.UNINDENT
.UNINDENT
.UNINDENT
Examples.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
.ft P
.fi
.sp
Create a 2x3 table
.sp
.nf
.ft C
>>> data = np.asarray([[0, 0, 1], [1, 3, 42]])
>>> table = Table(data, [\(aqO1\(aq, \(aqO2\(aq], [\(aqS1\(aq, \(aqS2\(aq, \(aqS3\(aq],
\&...               [{\(aqfoo\(aq: \(aqbar\(aq}, {\(aqx\(aq: \(aqy\(aq}], None)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 1.0
O2  1.0 3.0 42.0
.ft P
.fi
.sp
Create a transform function
.sp
.nf
.ft C
>>> f = lambda data, id_, md: data / 2
.ft P
.fi
.sp
Transform to a new table on samples
.sp
.nf
.ft C
>>> table2 = table.transform(f, \(aqsample\(aq, False)
>>> print table2 # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 0.5
O2  0.5 1.5 21.0
.ft P
.fi
.sp
\fItable\fP hasn\(aqt changed
.sp
.nf
.ft C
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 1.0
O2  1.0 3.0 42.0
.ft P
.fi
.sp
Tranform in place on observations
.sp
.nf
.ft C
>>> table3 = table.transform(f, \(aqobservation\(aq, True)
.ft P
.fi
.sp
\fItable\fP is different now
.sp
.nf
.ft C
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 0.5
O2  0.5 1.5 21.0
.ft P
.fi
.sp
but the table returned (\fItable3\fP) is the same as \fItable\fP
.sp
.nf
.ft C
>>> print table3 # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S1  S2  S3
O1  0.0 0.0 0.5
O2  0.5 1.5 21.0
.ft P
.fi
.UNINDENT
.SS biom.table.Table.transpose
.INDENT 7.0
.TP
.B Table.transpose()
Transpose the contingency table
.sp
The returned table will be an entirely new table, including copies of
the (transposed) data, sample/observation IDs and metadata.
.INDENT 7.0
.TP
.B Returns
Table
.INDENT 7.0
.INDENT 3.5
Return a new table that is the transpose of caller table.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS Examples
.sp
First, lets create a toy table to play around with. For this example, we\(aqre
going to construct a 10x4 \fITable\fP, or one that has 10 observations and 4
samples. Each observation and sample will be given an arbitrary but unique
name. We\(aqll also add on some metadata.
.sp
.nf
.ft C
>>> import numpy as np
>>> from biom.table import Table
>>> data = np.arange(40).reshape(10, 4)
>>> sample_ids = [\(aqS%d\(aq % i for i in range(4)]
>>> observ_ids = [\(aqO%d\(aq % i for i in range(10)]
>>> sample_metadata = [{\(aqenvironment\(aq: \(aqA\(aq}, {\(aqenvironment\(aq: \(aqB\(aq},
\&...                    {\(aqenvironment\(aq: \(aqA\(aq}, {\(aqenvironment\(aq: \(aqB\(aq}]
>>> observ_metadata = [{\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqFirmicutes\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqFirmicutes\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqProteobacteria\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqProteobacteria\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqProteobacteria\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqBacteroidetes\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqBacteroidetes\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqFirmicutes\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqFirmicutes\(aq]},
\&...                    {\(aqtaxonomy\(aq: [\(aqBacteria\(aq, \(aqFirmicutes\(aq]}]
>>> table = Table(data, observ_ids, sample_ids, observ_metadata,
\&...               sample_metadata, table_id=\(aqExample Table\(aq)
.ft P
.fi
.sp
Now that we have a table, let\(aqs explore it at a high level first.
.sp
.nf
.ft C
>>> table
10 x 4 <class \(aqbiom.table.Table\(aq> with 39 nonzero entries (97% dense)
>>> print table # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
O0  0.0 1.0 2.0 3.0
O1  4.0 5.0 6.0 7.0
O2  8.0 9.0 10.0    11.0
O3  12.0    13.0    14.0    15.0
O4  16.0    17.0    18.0    19.0
O5  20.0    21.0    22.0    23.0
O6  24.0    25.0    26.0    27.0
O7  28.0    29.0    30.0    31.0
O8  32.0    33.0    34.0    35.0
O9  36.0    37.0    38.0    39.0
>>> print table.ids() # doctest: +NORMALIZE_WHITESPACE
[\(aqS0\(aq \(aqS1\(aq \(aqS2\(aq \(aqS3\(aq]
>>> print table.ids(axis=\(aqobservation\(aq) # doctest: +NORMALIZE_WHITESPACE
[\(aqO0\(aq \(aqO1\(aq \(aqO2\(aq \(aqO3\(aq \(aqO4\(aq \(aqO5\(aq \(aqO6\(aq \(aqO7\(aq \(aqO8\(aq \(aqO9\(aq]
>>> print table.nnz  # number of nonzero entries
39
.ft P
.fi
.sp
While it\(aqs fun to just poke at the table, let\(aqs dig deeper. First, we\(aqre going
to convert \fItable\fP into relative abundances (within each sample), and then
filter \fItable\fP to just the samples associated with environment \(aqA\(aq. The
filtering gets fancy: we can pass in an arbitrary function to determine what
samples we want to keep. This function must accept a sparse vector of values,
the corresponding ID and the corresponding metadata, and should return \fBTrue\fP
or \fBFalse\fP, where \fBTrue\fP indicates that the vector should be retained.
.sp
.nf
.ft C
>>> normed = table.norm(axis=\(aqsample\(aq, inplace=False)
>>> filter_f = lambda values, id_, md: md[\(aqenvironment\(aq] == \(aqA\(aq
>>> env_a = normed.filter(filter_f, axis=\(aqsample\(aq, inplace=False)
>>> print env_a # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S2
O0  0.0 0.01
O1  0.0222222222222 0.03
O2  0.0444444444444 0.05
O3  0.0666666666667 0.07
O4  0.0888888888889 0.09
O5  0.111111111111  0.11
O6  0.133333333333  0.13
O7  0.155555555556  0.15
O8  0.177777777778  0.17
O9  0.2 0.19
.ft P
.fi
.sp
But, what if we wanted individual tables per environment? While we could just
perform some fancy iteration, we can instead just rely on \fITable.partition\fP for
these operations. \fIpartition\fP, like \fIfilter\fP, accepts a function. However, the
\fIpartition\fP method only passes the corresponding ID and metadata to the
function. The function should return what partition the data are a part of.
Within this example, we\(aqre also going to sum up our tables over the partitioned
samples. Please note that we\(aqre using the original table (ie, not normalized)
here.
.sp
.nf
.ft C
>>> part_f = lambda id_, md: md[\(aqenvironment\(aq]
>>> env_tables = table.partition(part_f, axis=\(aqsample\(aq)
>>> for partition, env_table in env_tables:
\&...     print partition, env_table.sum(\(aqsample\(aq)
A [ 180.  200.]
B [ 190.  210.]
.ft P
.fi
.sp
For this last example, and to highlight a bit more functionality, we\(aqre going
to first transform the table such that all multiples of three will be retained,
while all non\-multiples of three will get set to zero. Following this, we\(aqll
then collpase the table by taxonomy, and then convert the table into
presence/absence data.
.sp
First, let\(aqs setup the transform. We\(aqre going to define a function that takes
the modulus of every value in the vector, and see if it is equal to zero. If it
is equal to zero, we\(aqll keep the value, otherwise we\(aqll set the value to zero.
.sp
.nf
.ft C
>>> transform_f = lambda v,i,m: np.where(v % 3 == 0, v, 0)
>>> mult_of_three = tform = table.transform(transform_f, inplace=False)
>>> print mult_of_three # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
O0  0.0 0.0 0.0 3.0
O1  0.0 0.0 6.0 0.0
O2  0.0 9.0 0.0 0.0
O3  12.0    0.0 0.0 15.0
O4  0.0 0.0 18.0    0.0
O5  0.0 21.0    0.0 0.0
O6  24.0    0.0 0.0 27.0
O7  0.0 0.0 30.0    0.0
O8  0.0 33.0    0.0 0.0
O9  36.0    0.0 0.0 39.0
.ft P
.fi
.sp
Next, we\(aqre going to collapse the table over the phylum level taxon. To do
this, we\(aqre going to define a helper variable for the index position of the
phylum (see the construction of the table above). Next, we\(aqre going to pass
this to \fITable.collapse\fP, and since we want to collapse over the observations,
we\(aqll need to specify \(aqobservation\(aq as the axis.
.sp
.nf
.ft C
>>> phylum_idx = 1
>>> collapse_f = lambda id_, md: \(aq; \(aq.join(md[\(aqtaxonomy\(aq][:phylum_idx + 1])
>>> collapsed = mult_of_three.collapse(collapse_f, axis=\(aqobservation\(aq)
>>> print collapsed # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
Bacteria; Firmicutes  7.2 6.6 7.2 8.4
Bacteria; Bacteroidetes   12.0    10.5    0.0 13.5
Bacteria; Proteobacteria  4.0 3.0 6.0 5.0
.ft P
.fi
.sp
Finally, let\(aqs convert the table to presence/absence data.
.sp
.nf
.ft C
>>> pa = collapsed.pa()
>>> print pa # doctest: +NORMALIZE_WHITESPACE
# Constructed from biom file
#OTU ID S0  S1  S2  S3
Bacteria; Firmicutes  1.0 1.0 1.0 1.0
Bacteria; Bacteroidetes   1.0 1.0 0.0 1.0
Bacteria; Proteobacteria  1.0 1.0 1.0 1.0
.ft P
.fi
.SS Converting between file formats
.INDENT 0.0
.TP
.B The \fBconvert\fP command in the biom\-format project can be used to convert between biom and tab\-delimited table formats. This is useful for several reasons:
.INDENT 7.0
.IP \(bu 2
converting biom format tables to tab\-delimited tables for easy viewing in programs such as Excel
.IP \(bu 2
converting between sparse and dense biom formats
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 7.0
.INDENT 3.5
The tab\-delimited tables are commonly referred to as the \fIclassic format\fP tables, while BIOM formatted tables are referred to as \fIbiom tables\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.SS General usage examples
.sp
Convert a tab\-delimited table to a HDF5 or JSON biom format. Note that you \fImust\fP specify the type of table here:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom convert \-i table.txt \-o table.from_txt_json.biom \-\-table\-type="OTU table" \-\-to\-json
biom convert \-i table.txt \-o table.from_txt_hdf5.biom \-\-table\-type="OTU table" \-\-to\-hdf5
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Convert biom format to tab\-delimited table format:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom convert \-i table.biom \-o table.from_biom.txt \-\-to\-tsv
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Convert biom format to classic format, including the \fBtaxonomy\fP observation metadata as the last column of the classic format table. Because the BIOM format can support an arbitrary number of observation (or sample) metadata entries, and the classic format can support only a single observation metadata entry, you must specify which of the observation metadata entries you want to include in the output table:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom convert \-i table.biom \-o table.from_biom_w_taxonomy.txt \-\-to\-tsv \-\-header\-key taxonomy
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Convert biom format to classic format, including the \fBtaxonomy\fP observation metadata as the last column of the classic format table, but renaming that column as \fBConsensusLineage\fP\&. This is useful when using legacy tools that require a specific name for the observation metadata column.:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom convert \-i table.biom \-o table.from_biom_w_consensuslineage.txt \-\-to\-tsv \-\-header\-key taxonomy \-\-output\-metadata\-id "ConsensusLineage"
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Special case usage examples
.SS Round\-tripping between biom and tsv
.sp
In specific cases, see \fI\%this comment\fP, it is still useful to convert our biom table to tsv so we can open in Excel, make some changes to the file and then convert back to biom. For this cases you should follow this steps:
.INDENT 0.0
.IP \(bu 2
Convert from biom to txt:
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
biom convert \-i otu_table.biom \-o otu_table.txt \-\-to\-tsv \-\-header\-key taxonomy
.ft P
.fi
.UNINDENT
.UNINDENT
.IP \(bu 2
Make your changes in Excel.
.IP \(bu 2
Convert back to biom:
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
biom convert \-i otu_table.txt \-o new_otu_table.biom \-\-to\-hdf5 \-\-table\-type="OTU table" \-\-process\-obs\-metadata taxonomy
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS Converting QIIME 1.4.0 and earlier OTU tables to BIOM format
.sp
If you are converting a QIIME 1.4.0 or earlier OTU table to BIOM format, there are a few steps to go through. First, for convenience, you might want to rename the \fBConsensusLineage\fP column \fBtaxonomy\fP\&. You can do this with the following command:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
sed \(aqs/Consensus Lineage/ConsensusLineage/\(aq < otu_table.txt | sed \(aqs/ConsensusLineage/taxonomy/\(aq > otu_table.taxonomy.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Then, you\(aqll want to perform the conversion including a step to convert the taxonomy \fIstring\fP from the classic OTU table to a taxonomy \fIlist\fP, as it\(aqs represented in QIIME 1.4.0\-dev and later:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom convert \-i otu_table.taxonomy.txt \-o otu_table.from_txt.biom \-\-table\-type="OTU table" \-\-process\-obs\-metadata taxonomy
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Adding sample and observation metadata to biom files
.sp
Frequently you\(aqll have an existing BIOM file and want to add sample and/or observation metadata to it. For samples, metadata is frequently environmental or technical details about your samples: the subject that a sample was collected from, the pH of the sample, the PCR primers used to amplify DNA from the samples, etc. For observations, metadata is frequently a categorization of the observation: the taxonomy of an OTU, or the EC hierarchy of a gene. You can use the \fBbiom add\-metadata\fP command to add this information to an existing BIOM file.
.sp
To get help with \fBadd\-metadata\fP you can call:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-h
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This command takes a BIOM file, and corresponding sample and/or observation mapping files. The following examples are used in the commands below. You can find these files in the \fBbiom\-format/examples\fP directory.
.sp
Your BIOM file might look like the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
    "id":null,
    "format": "1.0.0",
    "format_url": "http://biom\-format.org",
    "type": "OTU table",
    "generated_by": "some software package",
    "date": "2011\-12\-19T19:00:00",
    "rows":[
            {"id":"GG_OTU_1", "metadata":null},
            {"id":"GG_OTU_2", "metadata":null},
            {"id":"GG_OTU_3", "metadata":null},
            {"id":"GG_OTU_4", "metadata":null},
            {"id":"GG_OTU_5", "metadata":null}
        ],
    "columns": [
            {"id":"Sample1", "metadata":null},
            {"id":"Sample2", "metadata":null},
            {"id":"Sample3", "metadata":null},
            {"id":"Sample4", "metadata":null},
            {"id":"Sample5", "metadata":null},
            {"id":"Sample6", "metadata":null}
        ],
    "matrix_type": "sparse",
    "matrix_element_type": "int",
    "shape": [5, 6],
    "data":[[0,2,1],
            [1,0,5],
            [1,1,1],
            [1,3,2],
            [1,4,3],
            [1,5,1],
            [2,2,1],
            [2,3,4],
            [2,5,2],
            [3,0,2],
            [3,1,1],
            [3,2,1],
            [3,5,1],
            [4,1,1],
            [4,2,1]
           ]
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
A sample metadata mapping file could then look like the following. Notice that there is an extra sample in here with respect to the above BIOM table. Any samples in the mapping file that are not in the BIOM file are ignored.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
#SampleID       BarcodeSequence DOB
# Some optional
# comment lines...
Sample1 AGCACGAGCCTA    20060805
Sample2 AACTCGTCGATG    20060216
Sample3 ACAGACCACTCA    20060109
Sample4 ACCAGCGACTAG    20070530
Sample5 AGCAGCACTTGT    20070101
Sample6 AGCAGCACAACT    20070716
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
An observation metadata mapping file might look like the following. Notice that there is an extra observation in here with respect to the above BIOM table. Any observations in the mapping file that are not in the BIOM file are ignored.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
#OTUID  taxonomy        confidence
# Some optional
# comment lines
GG_OTU_0        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__       0.980
GG_OTU_1        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.665
GG_OTU_2        Root;k__Bacteria        0.980
GG_OTU_3        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000
GG_OTU_4        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.842
GG_OTU_5        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Adding metadata
.sp
To add sample metadata to a BIOM file, you can run the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-i min_sparse_otu_table.biom \-o table.w_smd.biom \-\-sample\-metadata\-fp sam_md.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To add observation metadata to a BIOM file, you can run the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-i min_sparse_otu_table.biom \-o table.w_omd.biom \-\-observation\-metadata\-fp obs_md.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You can also combine these in a single command to add both observation and sample metadata:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-i min_sparse_otu_table.biom \-o table.w_md.biom \-\-observation\-metadata\-fp obs_md.txt \-\-sample\-metadata\-fp sam_md.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
In the last case, the resulting BIOM file will look like the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
    "columns": [
        {
            "id": "Sample1",
            "metadata": {
                "BarcodeSequence": "AGCACGAGCCTA",
                "DOB": "20060805"
            }
        },
        {
            "id": "Sample2",
            "metadata": {
                "BarcodeSequence": "AACTCGTCGATG",
                "DOB": "20060216"
            }
        },
        {
            "id": "Sample3",
            "metadata": {
                "BarcodeSequence": "ACAGACCACTCA",
                "DOB": "20060109"
            }
        },
        {
            "id": "Sample4",
            "metadata": {
                "BarcodeSequence": "ACCAGCGACTAG",
                "DOB": "20070530"
            }
        },
        {
            "id": "Sample5",
            "metadata": {
                "BarcodeSequence": "AGCAGCACTTGT",
                "DOB": "20070101"
            }
        },
        {
            "id": "Sample6",
            "metadata": {
                "BarcodeSequence": "AGCAGCACAACT",
                "DOB": "20070716"
            }
        }
    ],
    "data": [
        [0, 2, 1.0],
        [1, 0, 5.0],
        [1, 1, 1.0],
        [1, 3, 2.0],
        [1, 4, 3.0],
        [1, 5, 1.0],
        [2, 2, 1.0],
        [2, 3, 4.0],
        [2, 5, 2.0],
        [3, 0, 2.0],
        [3, 1, 1.0],
        [3, 2, 1.0],
        [3, 5, 1.0],
        [4, 1, 1.0],
        [4, 2, 1.0]
    ],
    "date": "2012\-12\-11T07:36:15.467843",
    "format": "Biological Observation Matrix 1.0.0",
    "format_url": "http://biom\-format.org",
    "generated_by": "some software package",
    "id": null,
    "matrix_element_type": "float",
    "matrix_type": "sparse",
    "rows": [
        {
            "id": "GG_OTU_1",
            "metadata": {
                "confidence": "0.665",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        },
        {
            "id": "GG_OTU_2",
            "metadata": {
                "confidence": "0.980",
                "taxonomy": "Root;k__Bacteria"
            }
        },
        {
            "id": "GG_OTU_3",
            "metadata": {
                "confidence": "1.000",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        },
        {
            "id": "GG_OTU_4",
            "metadata": {
                "confidence": "0.842",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        },
        {
            "id": "GG_OTU_5",
            "metadata": {
                "confidence": "1.000",
                "taxonomy": "Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae"
            }
        }
    ],
    "shape": [5, 6],
    "type": "OTU table"
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Processing metadata while adding
.sp
There are some additional parameters you can pass to this command for more complex processing.
.sp
You can tell the command to process certain metadata column values as integers (\fB\-\-int\-fields\fP), floating point (i.e., decimal or real) numbers (\fB\-\-float\-fields\fP), or as hierarchical semicolon\-delimited data (\fB\-\-sc\-separated\fP).
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-i min_sparse_otu_table.biom \-o table.w_md.biom \-\-observation\-metadata\-fp obs_md.txt \-\-sample\-metadata\-fp sam_md.txt \-\-int\-fields DOB \-\-sc\-separated taxonomy \-\-float\-fields confidence
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Here your resulting BIOM file will look like the following, where \fBDOB\fP values are now integers (compare to the above: they\(aqre not quoted now), \fBconfidence\fP values are now floating point numbers (again, not quoted now), and \fBtaxonomy\fP values are now lists where each entry is a taxonomy level, opposed to above where they appear as a single semi\-colon\-separated string.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
    "columns": [
        {
            "id": "Sample1",
            "metadata": {
                "BarcodeSequence": "AGCACGAGCCTA",
                "DOB": 20060805
            }
        },
        {
            "id": "Sample2",
            "metadata": {
                "BarcodeSequence": "AACTCGTCGATG",
                "DOB": 20060216
            }
        },
        {
            "id": "Sample3",
            "metadata": {
                "BarcodeSequence": "ACAGACCACTCA",
                "DOB": 20060109
            }
        },
        {
            "id": "Sample4",
            "metadata": {
                "BarcodeSequence": "ACCAGCGACTAG",
                "DOB": 20070530
            }
        },
        {
            "id": "Sample5",
            "metadata": {
                "BarcodeSequence": "AGCAGCACTTGT",
                "DOB": 20070101
            }
        },
        {
            "id": "Sample6",
            "metadata": {
                "BarcodeSequence": "AGCAGCACAACT",
                "DOB": 20070716
            }
        }
    ],
    "data": [
        [0, 2, 1.0],
        [1, 0, 5.0],
        [1, 1, 1.0],
        [1, 3, 2.0],
        [1, 4, 3.0],
        [1, 5, 1.0],
        [2, 2, 1.0],
        [2, 3, 4.0],
        [2, 5, 2.0],
        [3, 0, 2.0],
        [3, 1, 1.0],
        [3, 2, 1.0],
        [3, 5, 1.0],
        [4, 1, 1.0],
        [4, 2, 1.0]
    ],
    "date": "2012\-12\-11T07:30:29.870689",
    "format": "Biological Observation Matrix 1.0.0",
    "format_url": "http://biom\-format.org",
    "generated_by": "some software package",
    "id": null,
    "matrix_element_type": "float",
    "matrix_type": "sparse",
    "rows": [
        {
            "id": "GG_OTU_1",
            "metadata": {
                "confidence": 0.665,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        },
        {
            "id": "GG_OTU_2",
            "metadata": {
                "confidence": 0.98,
                "taxonomy": ["Root", "k__Bacteria"]
            }
        },
        {
            "id": "GG_OTU_3",
            "metadata": {
                "confidence": 1.0,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        },
        {
            "id": "GG_OTU_4",
            "metadata": {
                "confidence": 0.842,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        },
        {
            "id": "GG_OTU_5",
            "metadata": {
                "confidence": 1.0,
                "taxonomy": ["Root", "k__Bacteria", "p__Firmicutes", "c__Clostridia", "o__Clostridiales", "f__Lachnospiraceae"]
            }
        }
    ],
    "shape": [5, 6],
    "type": "OTU table"
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If you have multiple fields that you\(aqd like processed in one of these ways, you can pass a comma\-separated list of field names (e.g., \fB\-\-float\-fields confidence,pH\fP).
.SS Renaming (or naming) metadata columns while adding
.sp
You can also override the names of the metadata fields provided in the mapping files with the \fB\-\-observation\-header\fP and \fB\-\-sample\-header\fP parameters. This is useful if you want to rename metadata columns, or if metadata column headers aren\(aqt present in your metadata mapping file. If you pass either of these parameters, you must name all columns in order. If there are more columns in the metadata mapping file then there are headers, extra columns will be ignored (so this is also a useful way to select only the first n columns from your mapping file). For example, if you want to rename the \fBDOB\fP column in the sample metadata mapping you could do the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-i min_sparse_otu_table.biom \-o table.w_smd.biom \-\-sample\-metadata\-fp sam_md.txt \-\-sample\-header SampleID,BarcodeSequence,DateOfBirth
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If you have a mapping file without headers such as the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
GG_OTU_0        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__       0.980
GG_OTU_1        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.665
GG_OTU_2        Root;k__Bacteria        0.980
GG_OTU_3        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000
GG_OTU_4        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        0.842
GG_OTU_5        Root;k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae        1.000
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
you could name these while adding them as follows:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-i min_sparse_otu_table.biom \-o table.w_omd.biom \-\-observation\-metadata\-fp obs_md.txt \-\-observation\-header OTUID,taxonomy,confidence
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
As a variation on the last command, if you only want to include the \fBtaxonomy\fP column and exclude the \fBconfidence\fP column, you could run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom add\-metadata \-i min_sparse_otu_table.biom \-o table.w_omd.biom \-\-observation\-metadata\-fp obs_md.txt \-\-observation\-header OTUID,taxonomy
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Summarizing BIOM tables
.sp
If you have an existing BIOM file and want to compile a summary of the information in that table, you can use the \fBbiom summarize\-table\fP command.
.sp
To get help with \fBbiom summarize\-table\fP you can call:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom summarize\-table \-h
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This command takes a BIOM file or gzipped BIOM file as input, and will print a summary of the count information on a per\-sample basis to the new file specified by the \fB\-o\fP parameter. The example file used in the commands below can be found in the \fBbiom\-format/examples\fP directory.
.SS Summarizing sample data
.sp
To summarize the per\-sample data in a BIOM file, you can run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom summarize\-table \-i rich_sparse_otu_table.biom \-o rich_sparse_otu_table_summary.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The following information will be written to \fBrich_sparse_otu_table_summary.txt\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Num samples: 6
Num observations: 5
Total count: 27
Table density (fraction of non\-zero values): 0.500

Counts/sample summary:
 Min: 3.0
 Max: 7.0
 Median: 4.000
 Mean: 4.500
 Std. dev.: 1.500
 Sample Metadata Categories: LinkerPrimerSequence; BarcodeSequence; Description; BODY_SITE
 Observation Metadata Categories: taxonomy

Counts/sample detail:
 Sample5: 3.0
 Sample2: 3.0
 Sample6: 4.0
 Sample3: 4.0
 Sample4: 6.0
 Sample1: 7.0
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
As you can see, general summary information about the table is provided, including the number of samples, the number of observations, the total count (i.e., the sum of all values in the table), and so on, followed by the per\-sample counts.
.SS Summarizing sample data qualitatively
.sp
To summarize the per\-sample data in a BIOM file qualitatively, where the number of unique observations per sample (rather than the total count of observations per sample) are provided, you can run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom summarize\-table \-i rich_sparse_otu_table.biom \-\-qualitative \-o rich_sparse_otu_table_qual_summary.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The following information will be written to \fBrich_sparse_otu_table_qual_summary.txt\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Num samples: 6
Num observations: 5

Observations/sample summary:
 Min: 1
 Max: 4
 Median: 2.500
 Mean: 2.500
 Std. dev.: 0.957
 Sample Metadata Categories: LinkerPrimerSequence; BarcodeSequence; Description; BODY_SITE
 Observation Metadata Categories: taxonomy

Observations/sample detail:
 Sample5: 1
 Sample4: 2
 Sample1: 2
 Sample6: 3
 Sample2: 3
.ft P
.fi
.UNINDENT
.UNINDENT
.SH THE BIOM FORMAT LICENSE
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
The BIOM Format project is licensed under the terms of the Modified BSD
License (also known as New or Revised BSD), as follows:

Copyright (c) 2011\-2014, The BIOM Format Development Team <gregcaporaso@gmail.com>

All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * Neither the name of the BIOM Format Development Team nor the names of its
      contributors may be used to endorse or promote products derived from this
      software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE BIOM FORMAT DEVELOPMENT TEAM BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The following banner should be used in any source code file to indicate the
copyright and license terms:

#\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
# Copyright (c) 2011\-2014, The BIOM Format Development Team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file COPYING.txt, distributed with this software.
#\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The latest official version of the biom\-format project is 2.1 and of the BIOM file format is 2.0. Details on the \fI\%file format can be found here\fP\&.
.sp
To install the \fBbiom\-format\fP project, you can download the \fI\%latest version here\fP, or work with the development version. Generally we recommend working with the release version as it will be more stable, but if you want access to the latest features (and can tolerate some instability) you should work with the development version.
.sp
The biom\-format project has the following dependencies:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fI\%Python\fP >= 2.7 and < 3.0
.IP \(bu 2
\fI\%numpy\fP >= 1.7.0
.IP \(bu 2
\fI\%pyqi\fP 0.3.2
.IP \(bu 2
\fI\%scipy\fP >= 0.13.0
.IP \(bu 2
\fI\%h5py\fP >= 2.20.0 (optional; must be installed if creating or reading HDF5 formatted files)
.UNINDENT
.UNINDENT
.UNINDENT
.sp
The easiest way to install the latest version of the biom\-format project and its required dependencies is via pip:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
pip install numpy
pip install biom\-format
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
That\(aqs it!
.sp
If you decided not to install biom\-format using pip, it is also possible to manually install the latest release. We\(aqll illustrate the install process in the \fB$HOME/code\fP directory. You can either work in this directory on your system (creating it, if necessary, by running \fBmkdir $HOME/code\fP) or replace all occurrences of \fB$HOME/code\fP in the following instructions with your working directory. Please note that \fBnumpy\fP must be in your installed prior to installing \fBbiom\-format\fP\&. Change to this directory to start the install process:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
cd $HOME/code
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Download the \fI\%latest release, which can be found here\fP\&. After downloading, unpack and install (note: x.y.z refers to the downloaded version):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
tar xzf biom\-format\-x.y.z.tar.gz
cd $HOME/code/biom\-format\-x.y.z
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Alternatively, to install the development version, pull it from GitHub, and change to the resulting directory:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
git clone git://github.com/biocore/biom\-format.git
cd $HOME/code/biom\-format
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To install (either the development or release version), follow these steps:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
sudo python setup.py install
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If you do not have sudo access on your system (or don\(aqt want to install the \fBbiom\-format\fP project in the default location) you\(aqll need to install the library code and scripts in specified directories, and then tell your system where to look for those files. You can do this as follows:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
echo "export PATH=$HOME/bin/:$PATH" >> $HOME/.bashrc
echo "export PYTHONPATH=$HOME/lib/:$PYTHONPATH" >> $HOME/.bashrc
mkdir \-p $HOME/bin $HOME/lib/
source $HOME/.bashrc
python setup.py install \-\-install\-scripts=$HOME/bin/ \-\-install\-purelib=$HOME/lib/ \-\-install\-lib=$HOME/lib/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You should then have access to the biom\-format project. You can test this by running the following command:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
python \-c "from biom import __version__; print __version__"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You should see the current version of the biom\-format project.
.sp
Next you can run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
which biom
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You should get a file path ending with \fBbiom\fP printed to your screen if it is installed correctly. Finally, to see a list of all \fBbiom\fP commands, run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
biom
.ft P
.fi
.UNINDENT
.UNINDENT
.SH ENABLING TAB COMPLETION OF BIOM COMMANDS
.sp
The \fBbiom\fP command referenced in the previous section is a driver for commands in biom\-format, powered by \fI\%the pyqi project\fP\&. You can enable tab completion of biom command names and command options (meaning that when you begin typing the name of a command or option you can auto\-complete it by hitting the \fItab\fP key) by following a few simple steps from the pyqi documentation. While this step is optional, tab completion is very convenient so it\(aqs worth enabling.
.sp
To enable tab completion, follow the steps outlined under \fI\%Configuring bash completion\fP in the pyqi install documentation, substituting \fBbiom\fP for \fBmy\-project\fP and \fBmy_project\fP in all commands. After completing those steps and closing and re\-opening your terminal, auto\-completion should be enabled.
.sp
There is also a BIOM format package for R, called \fBbiom\fP\&. This package includes basic tools for reading biom\-format files, accessing and subsetting data tables from a biom object, as well as limited support for writing a biom\-object back to a biom\-format file. The design of this API is intended to match the python API and other tools included with the biom\-format project, but with a decidedly "R flavor" that should be familiar to R users. This includes S4 classes and methods, as well as extensions of common core functions/methods.
.sp
To install the latest stable release of the \fBbiom\fP package enter the following command from within an R session:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
install.packages("biom")
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To install the latest development version of the \fBbiom\fP package, enter the following lines in an R session:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
install.packages("devtools") # if not already installed
library("devtools")
install_github("biom", "joey711")
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Please post any support or feature requests and bugs to \fI\%the biom issue tracker\fP\&.
.sp
See \fI\%the biom project on GitHub\fP for further details, or if you would like to contribute.
.sp
Note that the licenses between the \fBbiom\fP R package (GPL\-2) and the other biom\-format software (Modified BSD) are different.
.sp
You can cite the BIOM format as follows (\fI\%link\fP):
.nf
The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome\-ome.
Daniel McDonald, Jose C. Clemente, Justin Kuczynski, Jai Ram Rideout, Jesse Stombaugh, Doug Wendel, Andreas Wilke, Susan Huse, John Hufnagle, Folker Meyer, Rob Knight, and J. Gregory Caporaso.
GigaScience 2012, 1:7. doi:10.1186/2047\-217X\-1\-7
.fi
.sp
.sp
The biom\-format project was conceived of and developed by the \fI\%QIIME\fP, \fI\%MG\-RAST\fP, and \fI\%VAMPS\fP development groups to support interoperability of our software packages. If you have questions about the biom\-format project you can contact \fI\%gregcaporaso@gmail.com\fP\&.
.SH AUTHOR
The BIOM Project
.SH COPYRIGHT
2011-2013, The BIOM Format Development Team
.\" Generated by docutils manpage writer.
.