.\" Automatically generated by Pod::Man 2.28 (Pod::Simple 3.29) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{ . if \nF \{ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "STAG-IR 1p" .TH STAG-IR 1p "2016-05-29" "perl v5.22.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" stag\-ir.pl \- information retrieval using a simple relational index .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 2 \& stag\-ir.pl \-r person \-k social_security_no \-d Pg:mydb myrecords.xml \& stag\-ir.pl \-d Pg:mydb \-q 999\-9999\-9999 \-q 888\-8888\-8888 .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" Indexes stag nodes (\s-1XML\s0 Elements) in a simple relational db structure \&\- keyed by \s-1ID\s0 with an \s-1XML\s0 Blob as a value .PP Imagine you have a very large file of data, in a stag compatible format such as \s-1XML.\s0 You want to index all the elements of type \&\fBperson\fR; each person can be uniquely identified by \&\fBsocial_security_no\fR, which is a direct subnode of \fBperson\fR .PP The first thing to do is to build the index file, which will be stored in the database mydb .PP .Vb 1 \& stag\-ir.pl \-r person \-k social_security_no \-d Pg:mydb myrecords.xml .Ve .PP You can then use the index \*(L"person-idx\*(R" to retrieve \fBperson\fR nodes by their social security number .PP .Vb 1 \& stag\-ir.pl \-d Pg:mydb \-q 999\-9999\-9999 > some\-person.xml .Ve .PP You can export using different stag formats .PP .Vb 1 \& stag\-ir.pl \-d Pg:mydb \-q 999\-9999\-9999 \-w sxpr > some\-person.xml .Ve .PP You can retrieve multiple nodes (although these need to be rooted to make a valid file) .PP .Vb 1 \& stag\-ir.pl \-d Pg:mydb \-q 999\-9999\-9999 \-q 888\-8888\-8888 \-top personset .Ve .PP Or you can use a list of IDs from a file (newline delimited) .PP .Vb 1 \& stag\-ir.pl \-d Pg:mydb \-qf my_ss_nmbrs.txt \-top personset .Ve .SS "\s-1ARGUMENTS\s0" .IX Subsection "ARGUMENTS" \fI\-d \s-1DB_NAME\s0\fR .IX Subsection "-d DB_NAME" .PP This database will be used for storing the stag nodes .PP The name can be a logical name or \s-1DBI\s0 locator or DBStag shorthand \- see DBIx::DBStag .PP The database must already exist .PP \fI\-clear\fR .IX Subsection "-clear" .PP Deletes all data from the relation type (specified with \fB\-r\fR) before loading .PP \fI\-insertonly\fR .IX Subsection "-insertonly" .PP Does not check if the \s-1ID\s0 in the file exists in the db \- will always attempt an \s-1INSERT \s0(and will fail if \s-1ID\s0 already exists) .PP This is the fastest way to load data (only one \s-1SQL\s0 operation per node rather than two) but is only safe if there is no existing data .PP (Default is clobber mode \- existing data with same \s-1ID\s0 will be replaced) .PP \fI\-newonly\fR .IX Subsection "-newonly" .PP If there is already data in the specified relation in the db, and the \&\s-1XML\s0 being loaded specifies an \s-1ID\s0 that is already in the db, then this node will be ignored .PP (Default is clobber mode \- existing data with same \s-1ID\s0 will be replaced) .PP \fI\-transaction_size\fR .IX Subsection "-transaction_size" .PP A commit will be performed every n UPDATEs/COMMITs (and at the end) .PP Default is autocommit .PP note that if you are using \-insertonly, and you are using transactions, and the input file contains an \s-1ID\s0 already in the database, then the transaction will fail because this script will try and insert a duplicate \s-1ID\s0 .PP \fI\-r RELATION-NAME\fR .IX Subsection "-r RELATION-NAME" .PP This is the name of the stag node (\s-1XML\s0 element) that will be stored in the index; for example, with the \s-1XML\s0 below you may want to use the node name \fBperson\fR and the unique key \fBid\fR .PP .Vb 9 \& \& \& ... \& \& \& ... \& \& ... \& .Ve .PP This flag should only be used when you want to store data .PP \fI\-k UNIQUE-KEY\fR .IX Subsection "-k UNIQUE-KEY" .PP This node will be used as the unique/primary key for the data .PP This node should be nested directly below the node that is being stored in the index \- if it is more that one below, specify a path .PP This flag should only be used when you want to store data .PP \fI\-u UNIQUE-KEY\fR .IX Subsection "-u UNIQUE-KEY" .PP Synonym for \fB\-k\fR .PP \fI\-create\fR .IX Subsection "-create" .PP If specified, this will create a table for the relation name specified below; you should use this the first time you index a relation .PP \fI\-idtype \s-1TYPE\s0\fR .IX Subsection "-idtype TYPE" .PP (optional) .PP This is the \s-1SQL\s0 datatype for the unique key; it defaults to \s-1VARCHAR\s0(255) .PP If you know that your id is an integer, you can specify \s-1INTEGER\s0 here .PP If your id is always a 8\-character field you can do this .PP .Vb 1 \& \-idtype \*(AqCHAR(8)\*(Aq .Ve .PP This option only makes sense when combined with the \fB\-c\fR option .PP \fI\-p \s-1PARSER\s0\fR .IX Subsection "-p PARSER" .PP This can be the name of a stag supported format (xml, sxpr, itext) \- \&\s-1XML\s0 is assumed by default .PP It can also be a module name \- this module is used to parse the input file into a stag stream; see Data::Stag::BaseGenerator for details on writing your own parsers/event generators .PP This flag should only be used when you want to store data .PP \fI\-q QUERY-ID\fR .IX Subsection "-q QUERY-ID" .PP Fetches the relation/node with unique key value equal to query-id .PP Multiple arguments can be passed by specifying \-q multple times .PP This flag should only be used when you want to query data .PP \fI\-top NODE-NAME\fR .IX Subsection "-top NODE-NAME" .PP If this is specified in conjunction with \fB\-q\fR or \fB\-qf\fR then all the query result nodes will be nested inside a node with this name (ie this provides a root for the resulting document tree) .PP \fI\-qf QUERY-FILE\fR .IX Subsection "-qf QUERY-FILE" .PP This is a file of newline-seperated IDs; this is useful for querying the index in batch .PP \fI\-keys\fR .IX Subsection "-keys" .PP This will write a list of all primary keys in the index .SH "SEE ALSO" .IX Header "SEE ALSO" Data::Stag .PP For more complex stag to database mapping, see DBIx::DBStag and the scripts .PP stag\-db.pl use file \s-1DBM\s0 indexes .PP stag\-storenode.pl is for storing fully normalised stag trees .PP selectall_xml