Scroll to navigation

XTRACT(1) NCBI Entrez Direct User's Manual XTRACT(1)

NAME

xtract - convert XML into a table of data values

SYNOPSIS

xtract [-help] [-cleanup] [-compress] [-input filename] [-pattern expr] [-group expr] [-block expr] [-subset expr] [-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else] [-position pos] [-equals str] [-contains str] [-starts-with str] [-ends-with str] [-is-not str] [-gt N] [-ge N] [-lt N] [-le N] [-eq N] [-ne N] [-ret str] [-tab str] [-sep str] [-pfx str] [-sfx str] [-clr] [-pfc str] [-rst] [-def str] [-lbl str] [-element element] [-first element] [-last element] [-NAME] [-num element] [-len element] [-sum element] [-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element] [-dev element] [-encode element] [-upper element] [-lower element] [-title element] [-terms element] [-words element] [-pairs element] [-phrase element] [-letters element] [-0-based element] [-1-based element] [-ucsc-based element] [-insd arg ...] [-head str] [-tail str] [-hd str] [-tl str] [-format fmt] [-filter element action target] [-verify] [-outline] [-synopsis] [-examples] [-version]

DESCRIPTION

xtract converts an XML document into a table of data values according to user-specified rules.

OPTIONS

Processing

-cleanup
Fix non-ASCII spaces.
-compress
Compress runs of spaces.
-input filename
Read from file instead of standard input.

Exploration Argument Hierarchy

-pattern expr
-group expr
-block expr
-subset expr
Name of record within set. Use of different argument names allows command-line control of nested looping.

Exploration Constructs

Object
DateCreated
Parent/Child
Book/AuthorList
Heterogeneous
"PubmedArticleSet/*"
Nested
"*/Taxon"
Recursive
"**/Gene-commentary"

Conditional Execution

-if expr [constraint]
Element (or @attribute) must exist and satisfy any specified constraint.
-unless expr [constraint]
Skip if element matches.
-and condition
Preceding and following tests must both pass.
-or condition
Any passing test suffices.
-else
Execute if conditional test failed.
-position pos
Must be at first/last location in list.

String Constraints

-equals str
String must match exactly.
-contains str
Substring must be present.
-starts-with str
Substring must be at beginning.
-ends-with str
Substring must be at end.
-is-not str
String must not match.

Numeric Constraints

-gt N
Greater than.
-ge N
Greater than or equal to.
-lt N
Less than to.
-le N
Less than or equal to.
-eq N
Equal to.
-ne N
Not equal to.

Format Customization

-ret str
Override line break between patterns.
-tab str
Replace tab character between fields.
-sep str
Separator between group members.
-pfx str
Prefix to print before group.
-sfx str
Suffix to print after group.
-clr
Clear queued tab separator.
-pfc str
Preface combines -clr and -pfx.
-rst
Reset -sep, -pfx, and -sfx.
-def str
Default placeholder for missing fields.
-lbl str
Insert arbitrary text.

Element Selection

-element element
Print all items that match tag name.
-first element
Only print value of first item.
-last element
Only print value of last item.
-NAME
Record value in named variable.

-element Constructs

Tag
Caption
Group
Initials,LastName
Attribute
DescriptorName@MajorTopicYN
Recursive
"**/Gene-commentary_accession"
Object Count
"#Author"
Item Length
"%Title"
Element Depth
"^PMID"
Variable
"&NAME"

Special -element Operations

Parent Index
"+"
XML Subtree
"*"
Children
"$"
Attributes
"@"

Numeric Processing

-num element
Count.
-len element
Length.
-sum element
Sum.
-min element
Minimum.
-max element
Maximum.
-inc element
Increment.
-dec element
Decrement.
-sub element
Difference.
-avg element
Average.
-dev element
Deviation.

String Processing

-encode element
URL-encode <, >, &, ", and ' characters.
-upper element
Convert text to uppercase.
-lower element
Convert text to lowercase.
-title element
Capitalize initial letters of words.

Phrase Processing

-terms element
Partition phrase at spaces.
-words element
Split at punctuation marks.
-pairs element
Adjacent informative words.
-phrase element
Experimental index generation.
-letters element
Separate individual letters.

Sequence Coordinates

-0-based element
Zero-based.
-1-based element
One-based.
-ucsc-based element
Half-open.

Command Generator

-insd arg ...
Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as part of a pipeline. Requires one or more arguments, which may appear in the following order:
Descriptor(s)
INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]
Completeness
complete/partial
Feature(s)
CDS/mRNA/...[,...]
Qualifier(s)
INSDFeature_key/"#INSDInterval"/gene/product/... [...]

Miscellaneous

-head str
Print before everything else.
-tail str
Print after everything else.
-hd str
Print before each record.
-tl str
Print after each record.

Reformatting

-format fmt
compact
Compress runs of spaces.
flush
Suppress line indentation.
indent
Indent according to nesting depth.
expand
Place each attribute on a separate line.

Modification

-filter element action target
Actions:
retain
Keep matching elements (no-op).
remove
Remove matching elements.
encode
HTML-escape special characters.
decode
Decode HTML esapes.
shrink
Compress runs of spaces.
expand
Place each attribute on a separate line.

Targets:

content
Plain-text content.
cdata
CDATA blocks.
comment
Comments.
object
The whole object.
attributes
Attributes.
container
Start and end tags.

Validation

-verify
Report XML data integrity problems.

Summary

-outline
Display outline of XML structure.
-synopsis
Display count of unique XML paths.

Documentation

-help
Print usage information and some example argument combinations.
-examples
Complete examples of edirect(1) and xtract usage.
-version
Print version number.

NOTES

String constraints use case-insensitive comparisons.

Numeric constraints and selection arguments use integer values.

-num and -len selections are synonyms for Object Count (#) and Item Length (%).

-words, -pairs, and -phrase convert to lower case.

SEE ALSO

edirect(1), xy-plot(1).
2017-01-24 NCBI