.\" Automatically generated by Pod::Man 2.28 (Pod::Simple 3.29) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{ . if \nF \{ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "DBIx::DBStag::Cookbook 3pm" .TH DBIx::DBStag::Cookbook 3pm "2016-05-29" "perl v5.22.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" .Vb 1 \& DBIx::DBStag::Cookbook \- building and querying databases from XML .Ve .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 3 \& stag\-autoddl.pl \& stag\-storenode.pl \& selectall_xml.pl .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This will give an outline of how to build a normalised relational database from \s-1XML\s0 source data, set up \s-1SQL\s0 templates, issue relational queries that return hierarchical results (as \s-1XML\s0 or as perl objects), and autogenerate a web query front end for this data. .PP Why would you want to do this? Well, it gives you the full power of the relational model and \s-1SQL,\s0 combined with the convenience of representations which allow for the nesting of data entities (\s-1SQL\s0 query results are typically flat relations which are inconvenient for complex hierarchical data). .PP The dataset we will use is the \s-1CIA\s0 World Factbook. .PP The web interface should end up looking something like this \- .SS "\s-1AUTOGENERATING A RELATIONAL DATABASE\s0" .IX Subsection "AUTOGENERATING A RELATIONAL DATABASE" Download \s-1CIA\s0 world factbook in \s-1XML\s0 format; this has kindly been made available by The University of Goettingen as part of their Mondial database project, see for details. .PP The actual \s-1XML\s0 file is available via .PP Or from .PP \fIPre-processing\fR .IX Subsection "Pre-processing" .PP We need to do some pre-processing of the \s-1XML\s0 to make it more database-friendly. This is necessitated by the way Stag handles attributes (Stag prefers \s-1XML\s0 documents that have a simple tree format). We also want to turn XXX_id fields into XXX_ciaid, because we prefer to use XXX_id for surrogate keys in the database. .PP .Vb 4 \& stag\-mogrify.pl \-w xml \-r \*(Aqs/text$/quant/\*(Aq\e \& \-r \*(Aqs/id$/ciaid/\*(Aq\e \& \-r \*(Aqs/(.*)\e\-//\*(Aq\e \& cia.xml > cia\-pp.xml .Ve .PP See also .PP \fIGenerating the \s-1SQL DDL\s0\fR .IX Subsection "Generating the SQL DDL" .PP Next we generate the \s-1SQL \s0\fB\s-1CREATE TABLE\s0\fR statements .PP .Vb 1 \& stag\-autoddl.pl \-t cia\-pp2.xml cia\-pp.xml > cia\-schema.sql .Ve .PP This does further post-processing of the \s-1XML,\s0 to make it suitable for relational storage; see the file \fBcia\-pp2.xml\fR which is generated as a side-effect of running the above script. .PP Load the database (the following instructions assume you have postgresql on your localhost; please consult your \s-1DBMS\s0 manual if this is not the case) .PP .Vb 2 \& createdb cia \& psql \-a \-e cia < cia\-schema.sql >& create.log .Ve .PP (check there are no errors in the log file) .PP \fI\s-1LOAD THE DATA\s0\fR .IX Subsection "LOAD THE DATA" .PP Turn the processed \s-1XML\s0 into relations: .PP .Vb 1 \& stag\-storenode.pl \-d dbi:Pg:dbname=cia cia\-pp2.xml >& load.log .Ve .SS "\s-1FETCHING TREE DATA USING SQL\s0" .IX Subsection "FETCHING TREE DATA USING SQL" You can issue \s-1SQL\s0 queries (using optional stag-specific extensions) and get the results back in a hierarchical format such as \s-1XML\s0 .PP \fI\s-1SQL\s0 to \s-1XML\s0 via the command line\fR .IX Subsection "SQL to XML via the command line" .PP Fetch countries nested under continents: .PP .Vb 2 \& selectall_xml.pl \-d dbi:Pg:dbname=cia\e \& "SELECT * FROM continent INNER JOIN country ON (continent.name=country.continent)" .Ve .PP Or, edit a file containing the \s-1SQL \s0(the following query fetches data on countries bordering other countries on different continents) .PP .Vb 8 \& cat > myquery.sql \& select c1.*, c2.* \& from country AS c1 \& inner join borders on (c1.country_id = borders.country_id) \& inner join country AS c2 on (borders.country=c2.ciaid) \& where c1.continent != c2.continent \& order by c1.name, c2.name \& use nesting (set(c1(c2))); .Ve .PP (the final clause is a DBStag \s-1SQL\s0 extension \- it nests country c2 under country c1) .PP Then query for \s-1XML\s0 .PP .Vb 1 \& selectall_xml.pl \-d dbi:Pg:dbname=cia \-f myquery.sql > myresults.xml .Ve .PP \fI\s-1SQL\s0 to \s-1XML\s0 via the Interactive Query Shell\fR .IX Subsection "SQL to XML via the Interactive Query Shell" .PP Query the data using the stag query shell (qsh). You type in \s-1SQL\s0 queries, and get results back as \s-1XML \s0(or any other tree format, such as indented text or S\-Expressions). .PP The following can be cut and pasted directly onto the unix command line: .PP Simple query rooted at \fBcountry\fR: .PP .Vb 4 \& stag\-qsh \-d dbi:Pg:dbname=cia \& \el \& SELECT * FROM country INNER JOIN country_coasts USING (country_id) \& WHERE country.name = \*(AqFrance\*(Aq; .Ve .PP (type \eq to quit stag-qsh) .PP Or a more advanced query, still rooted at \fBcountry\fR .PP .Vb 9 \& stag\-qsh \-d dbi:Pg:dbname=cia \& \el \& SELECT * \& FROM country \& LEFT OUTER JOIN religions USING (country_id) \& LEFT OUTER JOIN languages USING (country_id) \& INNER JOIN continent ON (continent.name=country.continent) \& WHERE continent.ciaid = \*(Aqaustralia\*(Aq \& USE NESTING (set(country(religions)(languages)(continent))); .Ve .PP See DBIx::DBStag for more details on fetching hierarchical data from relational database .SS "\s-1USING TEMPLATES\s0" .IX Subsection "USING TEMPLATES" If you have a particular pattern of \s-1SQL\s0 you execute a lot, you can reuse this \s-1SQL\s0 by creating \fBtemplates\fR .PP \fICreating Templates\fR .IX Subsection "Creating Templates" .PP First create a place for your templates: .PP .Vb 1 \& mkdir ./templates .Ve .PP (do not change directory after this) .PP The following command specifies a colon-separated path for directories containing templates (all templates must end with .stg) .PP .Vb 1 \& setenv DBSTAG_TEMPLATE_DIRS ".:templates:/usr/local/share/sql/templates" .Ve .PP Auto-generate templates (you can customize these later): .PP .Vb 2 \& stag\-autoschema.pl \-w sxpr cia\-pp2.xml > cia\-stagschema.sxpr \& stag\-autotemplate.pl \-no_pp \-s cia \-dir ./templates cia\-stagschema.sxpr .Ve .PP The first command creates an S\-Expression representation of the Schema; the second generates \s-1SQL\s0 templates from these. .PP You may wish to examine a template: .PP .Vb 1 \& more templates/cia\-country.stg .Ve .PP You can hand generate as many templates as you like; see DBIx::DBStag::SQLTemplate for more details .PP For more example templates for this schema, see .PP \fIExecuting Templates from the Command Line\fR .IX Subsection "Executing Templates from the Command Line" .PP now execute a template from the command line: .PP .Vb 1 \& selectall_xml.pl \-d dbi:Pg:dbname=cia /cia\-country country_name=Austria .Ve .PP You should get back a tree (rooted in \fBcountry\fR), that looks similar to this: .PP .Vb 10 \& \& \& 3 \& federal republic \& 8023244 \& 83850 \& Austria \& 2.3 \& ... \& \& 1 \& German \& 100 \& 3 \& \& ... .Ve .PP \fIExecuting Templates with the Stag Query Shell\fR .IX Subsection "Executing Templates with the Stag Query Shell" .PP You can also do this interactively using qsh .PP First, we need to inform stag-qsh what the schema is. The schema is used to determine which templates are appropriate. Later we will discover how to set up a resources file, which will allow stag to infer the schema. .PP Call qsh from command line: .PP .Vb 1 \& stag\-qsh \-d dbi:Pg:dbname=cia \-s cia .Ve .PP Interactive perl/qsh: .PP .Vb 3 \& \el \& t cia\-country \& /borders_country=cid\-cia\-Germany .Ve .PP (do not leave spaces at the beginning of the line) .PP The above should fetch all countries bordering Germany .PP If we prefer objects over hierarchical formats such as \s-1XML,\s0 we can do this using perl. For example, to print the religions of spanish speaking countries: .PP Still in qsh (multi-line mode), type the following: .PP .Vb 5 \& # find all Spanish\-speaking countries \& $dataset = \& $dbh\->selectall_stag(\-template=>\*(Aqcia\-country\*(Aq,\-bind=>{languages_name=>\*(AqSpanish\*(Aq}); \& # get country objects from query results \& @lcountry = $dataset\->get_country; \& \& foreach $country (@lcountry) { \& printf("Country: %s\en Religions:%s\en", \& $country\->sget_name, \& join(\*(Aq & \*(Aq, \& map { \& $_\->get_name.\*(Aq \*(Aq.$_\->get_quant.\*(Aq%\*(Aq \& } $country\->get_religions)) \& } \& print "\en\enDone!\en"; \& \eq .Ve .PP See Data::Stag for more details on using Stag objects .SS "\s-1BUILDING A CGI/WEB INTERFACE\s0" .IX Subsection "BUILDING A CGI/WEB INTERFACE" We can construct a generic but powerful default cgi interface for our data, using ubiq.cgi, which should come with your distribution. .PP You may have to modify some of the directories below, depending on your web server set up (we assume Apache here). .PP We want to create the \s-1CGI,\s0 and give it access to our templates: .PP .Vb 6 \& mkdir /usr/local/httpd/cgi\-bin/cia \& cp templates/*.stg /usr/local/httpd/cgi\-bin/cia \& cp \`which ubiq.cgi\` /usr/local/httpd/cgi\-bin/cia \& chmod +x /usr/local/httpd/cgi\-bin/cia/ubiq.cgi \& mkdir /usr/local/httpd/cgi\-bin/cia/cache \& chmod 777 /usr/local/httpd/cgi\-bin/cia/cache .Ve .PP Set up the environment for the \s-1CGI\s0 script. It must be able to see the templates and the necessary perl libraries (if not installed system-wide) .PP .Vb 4 \& cat > /usr/local/httpd/cgi\-bin/cia/dbenv.pl \& $ENV{DBSTAG_DBIMAP_FILE} = "./resources.conf"; \& $ENV{DBSTAG_TEMPLATE_DIRS} = ".:./templates:/usr/local/share/sql/templates"; \& $ENV{STAGLIB} = "/users/me/lib/DBIx\-DBStag:/users/me/lib/stag"; .Ve .PP We must create a basic resources file, currently containing one db: .PP .Vb 2 \& cat > /usr/local/httpd/cgi\-bin/cia/resources.conf \& cia rdb Pg:cia schema=cia .Ve .PP Fields are whitespace delimited; do not leave a space before the initial 'cia' .PP (note that if you set \s-1DBSTAG_DBIMAP_FILE\s0 to the avove file on the command line, you can use the shortened name of \fBcia\fR instead of \&\fBdbi:Pg:dbname=cia\fR) .PP You should be able to use the interface via http://localhost/cgi\-bin/cia/ubiq.cgi .PP You can customize this by overriding some of the existing display functions; .PP .Vb 4 \& cat > /usr/local/httpd/cgi\-bin/cia/ubiq\-customize.pl \& # \-\-\- CUSTOM SETTINGS \& { \& no warnings \*(Aqredefine\*(Aq; \& \& *g_title = sub { \& "U * B * I * Q \- CIA World Factbook"; \& }; \& *short_intro = sub { \& "Demo interface to CIA World Factbook" \& }; \& add_initfunc(sub { \& $dbname = \*(Aqcia\*(Aq; \& $schema = \*(Aqcia\*(Aq; \& }); \& } .Ve .PP From here on you can customise the web interface, create new templates, integrate this with other data. Consult DBIx::DBStag and the script \fBubiq.cgi\fR for further details. .SS "\s-1FURTHER EXPLORATION\s0" .IX Subsection "FURTHER EXPLORATION" This cookbook has focused on an example with relatively simple \s-1XML,\s0 with only a few layers of nesting. .PP There is a more complex example you can download from the Mondial project site here: .PP This also integrates data on cities, which increases the depth of the \&\s-1XML\s0 tree. .PP You could use the tutorial above to try and turn this \s-1XML\s0 into a database. .SH "WEBSITE" .IX Header "WEBSITE" .SH "AUTHOR" .IX Header "AUTHOR" Chris Mungall .PP .Vb 1 \& cjm at fruitfly dot org .Ve .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright (c) 2002 Chris Mungall .PP This module is free software. You may distribute this module under the same terms as perl itself