.TH XTRACT 1 2017-01-24 NCBI "NCBI Entrez Direct User's Manual" .SH NAME xtract \- convert XML into a table of data values .SH SYNOPSIS \fBxtract\fP [\|\fB\-help\fP\|] [\|\fB\-cleanup\fP\|] [\|\fB\-compress\fP\|] [\|\fB\-input\fP\ \fIfilename\fP\|] [\|\fB\-pattern\fP\ \fIexpr\fP\|] [\|\fB\-group\fP\ \fIexpr\fP\|] [\|\fB\-block\fP\ \fIexpr\fP\|] [\|\fB\-subset\fP\ \fIexpr\fP\|] [\|\fB\-if\fP\ \fIexpr\fP\ [\|\fIconstraint\fP\|]\|] [\|\fB\-unless\fP\ \fIexpr\fP\ [\|\fIconstraint\fP\|]\|] [\|\fB\-and\fP\ \fIcondition\fP\|] [\|\fB\-or\fP\ \fIcondition\fP\|] [\|\fB\-else\fP\|] [\|\fB\-position\fP\ \fIpos\fP\|] [\|\fB\-equals\fP\ \fIstr\fP\|] [\|\fB\-contains\fP\ \fIstr\fP\|] [\|\fB\-starts\-with\fP\ \fIstr\fP\|] [\|\fB\-ends\-with\fP\ \fIstr\fP\|] [\|\fB\-is\-not\fP\ \fIstr\fP\|] [\|\fB\-gt\fP\ \fIN\fP\|] [\|\fB\-ge\fP\ \fIN\fP\|] [\|\fB\-lt\fP\ \fIN\fP\|] [\|\fB\-le\fP\ \fIN\fP\|] [\|\fB\-eq\fP\ \fIN\fP\|] [\|\fB\-ne\fP\ \fIN\fP\|] [\|\fB\-ret\fP\ \fIstr\fP\|] [\|\fB\-tab\fP\ \fIstr\fP\|] [\|\fB\-sep\fP\ \fIstr\fP\|] [\|\fB\-pfx\fP\ \fIstr\fP\|] [\|\fB\-sfx\fP\ \fIstr\fP\|] [\|\fB\-clr\fP\|] [\|\fB\-pfc\fP\ \fIstr\fP\|] [\|\fB\-rst\fP\|] [\|\fB\-def\fP\ \fIstr\fP\|] [\|\fB\-lbl\fP\ \fIstr\fP\|] [\|\fB\-element\fP\ \fIelement\fP\|] [\|\fB\-first\fP\ \fIelement\fP\|] [\|\fB\-last\fP\ \fIelement\fP\|] [\|\fB\-\fP\fINAME\fP\|] [\|\fB\-num\fP\ \fIelement\fP\|] [\|\fB\-len\fP\ \fIelement\fP\|] [\|\fB\-sum\fP\ \fIelement\fP\|] [\|\fB\-min\fP\ \fIelement\fP\|] [\|\fB\-max\fP\ \fIelement\fP\|] [\|\fB\-inc\fP\ \fIelement\fP\|] [\|\fB\-dec\fP\ \fIelement\fP\|] [\|\fB\-sub\fP\ \fIelement\fP\|] [\|\fB\-avg\fP\ \fIelement\fP\|] [\|\fB\-dev\fP\ \fIelement\fP\|] [\|\fB\-encode\fP\ \fIelement\fP\|] [\|\fB\-upper\fP\ \fIelement\fP\|] [\|\fB\-lower\fP\ \fIelement\fP\|] [\|\fB\-title\fP\ \fIelement\fP\|] [\|\fB\-terms\fP\ \fIelement\fP\|] [\|\fB\-words\fP\ \fIelement\fP\|] [\|\fB\-pairs\fP\ \fIelement\fP\|] [\|\fB\-phrase\fP\ \fIelement\fP\|] [\|\fB\-letters\fP\ \fIelement\fP\|] [\|\fB\-0\-based\fP\ \fIelement\fP\|] [\|\fB\-1\-based\fP\ \fIelement\fP\|] [\|\fB\-ucsc\-based\fP\ \fIelement\fP\|] [\|\fB\-insd\fP\ \fIarg\fP\ ...\|] [\|\fB\-head\fP\ \fIstr\fP\|] [\|\fB\-tail\fP\ \fIstr\fP\|] [\|\fB\-hd\fP\ \fIstr\fP\|] [\|\fB\-tl\fP\ \fIstr\fP\|] [\|\fB\-format\fP\ \fIfmt\fP\|] [\|\fB\-filter\fP\ \fIelement\fP \fIaction\fP\ \fItarget\fP\|] [\|\fB\-verify\fP\|] [\|\fB\-outline\fP\|] [\|\fB\-synopsis\fP\|] [\|\fB\-examples\fP\|] [\|\fB\-version\fP\|] .SH DESCRIPTION \fBxtract\fP converts an XML document into a table of data values according to user\-specified rules. .SH OPTIONS .SS Processing .TP \fB\-cleanup\fP Fix non\-ASCII spaces. .TP \fB\-compress\fP Compress runs of spaces. .TP \fB\-input\fP\ \fIfilename\fP Read from file instead of standard input. .SS Exploration Argument Hierarchy .PD 0 .TP \fB\-pattern\fP\ \fIexpr\fP .TP \fB\-group\fP\ \fIexpr\fP .TP \fB\-block\fP\ \fIexpr\fP .TP \fB\-subset\fP\ \fIexpr\fP Name of record within set. Use of different argument names allows command-line control of nested looping. .PD .SS Exploration Constructs .PD 0 .IP Object 15 \fBDateCreated\fP .IP Parent/Child 15 \fBBook/AuthorList\fP .IP Heterogeneous 15 \fB"PubmedArticleSet/*"\fP .IP Nested 15 \fB"*/Taxon"\fP .IP Recursive 15 \fB"**/Gene-commentary"\fP .PD .SS Conditional Execution .TP \fB\-if\fP\ \fIexpr\fP\ [\|\fIconstraint\fP\|] Element (or \fB@\fP\fIattribute\fP) must exist and satisfy any specified constraint. .TP \fB\-unless\fP\ \fIexpr\fP\ [\|\fIconstraint\fP\|] Skip if element matches. .TP \fB\-and\fP\ \fIcondition\fP Preceding and following tests must both pass. .TP \fB\-or\fP\ \fIcondition\fP Any passing test suffices. .TP \fB\-else\fP Execute if conditional test failed. .TP \fB\-position\fP\ \fIpos\fP Must be at \fBfirst\fP/\fBlast\fP location in list. .SS String Constraints .TP \fB\-equals\fP\ \fIstr\fP String must match exactly. .TP \fB\-contains\fP\ \fIstr\fP Substring must be present. .TP \fB\-starts\-with\fP\ \fIstr\fP Substring must be at beginning. .TP \fB\-ends\-with\fP\ \fIstr\fP Substring must be at end. .TP \fB\-is\-not\fP\ \fIstr\fP String must not match. .SS Numeric Constraints .TP \fB\-gt\fP\ \fIN\fP Greater than. .TP \fB\-ge\fP\ \fIN\fP Greater than or equal to. .TP \fB\-lt\fP\ \fIN\fP Less than to. .TP \fB\-le\fP\ \fIN\fP Less than or equal to. .TP \fB\-eq\fP\ \fIN\fP Equal to. .TP \fB\-ne\fP\ \fIN\fP Not equal to. .SS Format Customization .TP \fB\-ret\fP\ \fIstr\fP Override line break between patterns. .TP \fB\-tab\fP\ \fIstr\fP Replace tab character between fields. .TP \fB\-sep\fP\ \fIstr\fP Separator between group members. .TP \fB\-pfx\fP\ \fIstr\fP Prefix to print before group. .TP \fB\-sfx\fP\ \fIstr\fP Suffix to print after group. .TP \fB\-clr\fP Clear queued tab separator. .TP \fB\-pfc\fP\ \fIstr\fP Preface combines \fB\-clr\fP and \fB\-pfx\fP. .TP \fB\-rst\fP Reset \fB\-sep\fP, \fB\-pfx\fP, and \fB\-sfx\fP. .TP \fB\-def\fP\ \fIstr\fP Default placeholder for missing fields. .TP \fB\-lbl\fP\ \fIstr\fP Insert arbitrary text. .SS Element Selection .TP \fB\-element\fP\ \fIelement\fP Print all items that match tag name. .TP \fB\-first\fP\ \fIelement\fP Only print value of first item. .TP \fB\-last\fP\ \fIelement\fP Only print value of last item. .TP \fB\-\fP\fINAME\fP Record value in named variable. .SS \-element Constructs .PD 0 .IP Tag 15 \fBCaption\fP .IP Group 15 \fBInitials,LastName\fP .IP Attribute 15 \fBDescriptorName@MajorTopicYN\fP .IP Recursive 15 \fB"**/Gene-commentary_accession"\fP .IP "Object Count" 15 \fB"#Author"\fP .IP "Item Length" 15 \fB"%Title"\fP .IP "Element Depth" 15 \fB"^PMID"\fP .IP Variable 15 \fB"&NAME"\fP .PD .SS Special \-element Operations .PD 0 .IP "Parent Index" 15 \fB"+"\fP .IP "XML Subtree" 15 \fB"*"\fP .IP Children 15 \fB"$"\fP .IP Attributes 15 \fB"@"\fP .PD .SS Numeric Processing .TP \fB\-num\fP\ \fIelement\fP Count. .TP \fB\-len\fP\ \fIelement\fP Length. .TP \fB\-sum\fP\ \fIelement\fP Sum. .TP \fB\-min\fP\ \fIelement\fP Minimum. .TP \fB\-max\fP\ \fIelement\fP Maximum. .TP \fB\-inc\fP\ \fIelement\fP Increment. .TP \fB\-dec\fP\ \fIelement\fP Decrement. .TP \fB\-sub\fP\ \fIelement\fP Difference. .TP \fB\-avg\fP\ \fIelement\fP Average. .TP \fB\-dev\fP\ \fIelement\fP Deviation. .SS String Processing .TP \fB\-encode\fP\ \fIelement\fP URL\-encode \fB<\fP, \fB>\fP, \fB&\fP, \fB\(dq\fP, and \fB\[aq]\fP characters. .TP \fB\-upper\fP\ \fIelement\fP Convert text to uppercase. .TP \fB\-lower\fP\ \fIelement\fP Convert text to lowercase. .TP \fB\-title\fP\ \fIelement\fP Capitalize initial letters of words. .SS Phrase Processing .TP \fB\-terms\fP\ \fIelement\fP Partition phrase at spaces. .TP \fB\-words\fP\ \fIelement\fP Split at punctuation marks. .TP \fB\-pairs\fP\ \fIelement\fP Adjacent informative words. .TP \fB\-phrase\fP\ \fIelement\fP Experimental index generation. .TP \fB\-letters\fP\ \fIelement\fP Separate individual letters. .SS Sequence Coordinates .TP \fB\-0\-based\fP\ \fIelement\fP Zero\-based. .TP \fB\-1\-based\fP\ \fIelement\fP One\-based. .TP \fB\-ucsc\-based\fP\ \fIelement\fP Half\-open. .SS Command Generator .TP \fB\-insd\fP\ \fIarg\fP\ ... Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as part of a pipeline. Requires one or more arguments, which may appear in the following order: .RS .\".PD 0 .IP Descriptor(s) 15 .BR INSDSeq_sequence / INSDSeq_definition / INSDSeq_division "/... [\|...\|]" .IP Completeness 15 .BR complete / partial .IP Feature(s) 15 .BR CDS / mRNA /...[\| , ...\|] .IP Qualifier(s) .BR INSDFeature_key / \(dq#INSDInterval\(dq / gene / product "/... [\|...\|]" .\".PD .RE .SS Miscellaneous .TP \fB\-head\fP\ \fIstr\fP Print before everything else. .TP \fB\-tail\fP\ \fIstr\fP Print after everything else. .TP \fB\-hd\fP\ \fIstr\fP Print before each record. .TP \fB\-tl\fP\ \fIstr\fP Print after each record. .SS Reformatting .TP \fB\-format\fP\ \fIfmt\fP .PD 0 .RS .IP \fBcompact\fP 9 Compress runs of spaces. .IP \fBflush\fP 9 Suppress line indentation. .IP \fBindent\fP 9 Indent according to nesting depth. .IP \fBexpand\fP 9 Place each attribute on a separate line. .RE .PD .SS Modification .TP \fB\-filter\fP\ \fIelement\fP \fIaction\fP\ \fItarget\fP Actions: .PD 0 .RS .IP \fBretain\fP 12 Keep matching elements (no\-op). .IP \fBremove\fP 12 Remove matching elements. .IP \fBencode\fP 12 HTML\-escape special characters. .IP \fBdecode\fP 12 Decode HTML esapes. .IP \fBshrink\fP 12 Compress runs of spaces. .IP \fBexpand\fP 12 Place each attribute on a separate line. .PD .P Targets: .PD 0 .IP \fBcontent\fP 12 Plain\-text content. .IP \fBcdata\fP 12 \fBCDATA\fP blocks. .IP \fBcomment\fP 12 Comments. .IP \fBobject\fP 12 The whole object. .IP \fBattributes\fP 12 Attributes. .IP \fBcontainer\fP 12 Start and end tags. .RE .PD .SS Validation .TP \fB\-verify\fP Report XML data integrity problems. .SS Summary .TP \fB\-outline\fP Display outline of XML structure. .TP \fB\-synopsis\fP Display count of unique XML paths. .SS Documentation .TP \fB\-help\fP Print usage information and some example argument combinations. .TP \fB\-examples\fP Complete examples of \fBedirect\fP(1) and \fBxtract\fP usage. .TP \fB-version\fP Print version number. .SH NOTES String constraints use case\-insensitive comparisons. Numeric constraints and selection arguments use integer values. \fB\-num\fP and \fB\-len\fP selections are synonyms for Object Count (\fB#\fP) and Item Length (\fB%\fP). \fB\-words\fP, \fB\-pairs\fP, and \fB\-phrase\fP convert to lower case. .SH SEE ALSO .BR edirect (1), .BR xy-plot (1).