XTRACT(1) | NCBI Entrez Direct User's Manual | XTRACT(1) |
NAME¶
xtract - convert XML into a table of data valuesSYNOPSIS¶
xtract [-help] [-cleanup] [-compress] [-input filename] [-pattern expr] [-group expr] [-block expr] [-subset expr] [-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else] [-position pos] [-equals str] [-contains str] [-starts-with str] [-ends-with str] [-is-not str] [-gt N] [-ge N] [-lt N] [-le N] [-eq N] [-ne N] [-ret str] [-tab str] [-sep str] [-pfx str] [-sfx str] [-clr] [-pfc str] [-rst] [-def str] [-lbl str] [-element element] [-first element] [-last element] [-NAME] [-num element] [-len element] [-sum element] [-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element] [-dev element] [-encode element] [-upper element] [-lower element] [-title element] [-terms element] [-words element] [-pairs element] [-phrase element] [-letters element] [-0-based element] [-1-based element] [-ucsc-based element] [-insd arg ...] [-head str] [-tail str] [-hd str] [-tl str] [-format fmt] [-filter element action target] [-verify] [-outline] [-synopsis] [-examples] [-version]DESCRIPTION¶
xtract converts an XML document into a table of data values according to user-specified rules.OPTIONS¶
Processing¶
- -cleanup
- Fix non-ASCII spaces.
- -compress
- Compress runs of spaces.
- -input filename
- Read from file instead of standard input.
Exploration Argument Hierarchy¶
- -pattern expr
- -group expr
- -block expr
- -subset expr
- Name of record within set. Use of different argument names allows command-line control of nested looping.
Exploration Constructs¶
- Object
- DateCreated
- Parent/Child
- Book/AuthorList
- Heterogeneous
- "PubmedArticleSet/*"
- Nested
- "*/Taxon"
- Recursive
- "**/Gene-commentary"
Conditional Execution¶
- -if expr [constraint]
- Element (or @attribute) must exist and satisfy any specified constraint.
- -unless expr [constraint]
- Skip if element matches.
- -and condition
- Preceding and following tests must both pass.
- -or condition
- Any passing test suffices.
- -else
- Execute if conditional test failed.
- -position pos
- Must be at first/last location in list.
String Constraints¶
- -equals str
- String must match exactly.
- -contains str
- Substring must be present.
- -starts-with str
- Substring must be at beginning.
- -ends-with str
- Substring must be at end.
- -is-not str
- String must not match.
Numeric Constraints¶
- -gt N
- Greater than.
- -ge N
- Greater than or equal to.
- -lt N
- Less than to.
- -le N
- Less than or equal to.
- -eq N
- Equal to.
- -ne N
- Not equal to.
Format Customization¶
- -ret str
- Override line break between patterns.
- -tab str
- Replace tab character between fields.
- -sep str
- Separator between group members.
- -pfx str
- Prefix to print before group.
- -sfx str
- Suffix to print after group.
- -clr
- Clear queued tab separator.
- -pfc str
- Preface combines -clr and -pfx.
- -rst
- Reset -sep, -pfx, and -sfx.
- -def str
- Default placeholder for missing fields.
- -lbl str
- Insert arbitrary text.
Element Selection¶
- -element element
- Print all items that match tag name.
- -first element
- Only print value of first item.
- -last element
- Only print value of last item.
- -NAME
- Record value in named variable.
-element Constructs¶
- Tag
- Caption
- Group
- Initials,LastName
- Attribute
- DescriptorName@MajorTopicYN
- Recursive
- "**/Gene-commentary_accession"
- Object Count
- "#Author"
- Item Length
- "%Title"
- Element Depth
- "^PMID"
- Variable
- "&NAME"
Special -element Operations¶
- Parent Index
- "+"
- XML Subtree
- "*"
- Children
- "$"
- Attributes
- "@"
Numeric Processing¶
- -num element
- Count.
- -len element
- Length.
- -sum element
- Sum.
- -min element
- Minimum.
- -max element
- Maximum.
- -inc element
- Increment.
- -dec element
- Decrement.
- -sub element
- Difference.
- -avg element
- Average.
- -dev element
- Deviation.
String Processing¶
- -encode element
- URL-encode <, >, &, ", and ' characters.
- -upper element
- Convert text to uppercase.
- -lower element
- Convert text to lowercase.
- -title element
- Capitalize initial letters of words.
Phrase Processing¶
- -terms element
- Partition phrase at spaces.
- -words element
- Split at punctuation marks.
- -pairs element
- Adjacent informative words.
- -phrase element
- Experimental index generation.
- -letters element
- Separate individual letters.
Sequence Coordinates¶
- -0-based element
- Zero-based.
- -1-based element
- One-based.
- -ucsc-based element
- Half-open.
Command Generator¶
- -insd arg ...
- Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as part of a pipeline. Requires one or more arguments, which may appear in the following order:
- Descriptor(s)
- INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]
- Completeness
- complete/partial
- Feature(s)
- CDS/mRNA/...[,...]
- Qualifier(s)
- INSDFeature_key/"#INSDInterval"/gene/product/... [...]
Miscellaneous¶
- -head str
- Print before everything else.
- -tail str
- Print after everything else.
- -hd str
- Print before each record.
- -tl str
- Print after each record.
Reformatting¶
- -format fmt
- compact
- Compress runs of spaces.
- flush
- Suppress line indentation.
- indent
- Indent according to nesting depth.
- expand
- Place each attribute on a separate line.
Modification¶
- -filter element action target
- Actions:
- retain
- Keep matching elements (no-op).
- remove
- Remove matching elements.
- encode
- HTML-escape special characters.
- decode
- Decode HTML esapes.
- shrink
- Compress runs of spaces.
- expand
- Place each attribute on a separate line.
Targets:
- content
- Plain-text content.
- cdata
- CDATA blocks.
- comment
- Comments.
- object
- The whole object.
- attributes
- Attributes.
- container
- Start and end tags.
Validation¶
- -verify
- Report XML data integrity problems.
Summary¶
- -outline
- Display outline of XML structure.
- -synopsis
- Display count of unique XML paths.
Documentation¶
- -help
- Print usage information and some example argument combinations.
- -examples
- Complete examples of edirect(1) and xtract usage.
- -version
- Print version number.
NOTES¶
String constraints use case-insensitive comparisons.Numeric constraints and selection arguments use integer values.
-num and -len selections are synonyms for Object Count (#) and Item Length (%).
-words, -pairs, and -phrase convert to lower case.
SEE ALSO¶
edirect(1), xy-plot(1).2017-01-24 | NCBI |