NAME¶
Bio::DB::GFF::Adaptor::dbi::pg_fts -- Database adaptor for a specific postgres
schema with a TSearch2 implementation
SYNOPSIS¶
#create new GFF database connection
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg_fts',
-dsn => 'dbi:Pg:dbname=worm');
#add full text indexing 'stuff'
#assumes that TSearch2 is available to PostgreSQL
#this will take a VERY long time for a reasonably large database
$db->install_TSearch2();
...some time later...
#we don't like full text searching...
$db->remove_TSearch2();
DESCRIPTION¶
This adaptor is based on Bio::DB::GFF::Adaptor::dbi::pg but it implements the
TSearch2 PostgreSQL contrib module for fast full text searching. To use this
module with your PostgreSQL GFF database, you need to make TSearch2 available
in the database.
To use this adaptor, follow these steps:
- Install TSearch2 contrib module for Pg
- Can be as easy as `sudo yum install postgresql-contrib`, or
you may need to recompile PostgreSQL to include it. See
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
<http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html>
for more details
- Load the TSearch2 functions to you database
-
% cat tsearch2.sql | psql <your database>
- Load your data using the pg adaptor:
-
% bp_pg_bulk_load_gff.pl -c -d yeast saccharomyces_cerevisiae.gff
or
% bp_load_gff.pl -c -d yeast -a dbi::pg saccharomyces_cerevisiae.gff
- Add GFF/TSearch2 specific modifications
- Execute a perl script like this one:
#!/usr/bin/perl -w
use strict;
use Bio::DB::GFF;
my $db = Bio::DB::GFF->new(
-adaptor => 'dbi::pg_fts',
-dsn => 'dbi:Pg:dbname=yeast',
-user => 'scott',
);
print "Installing TSearch2 columns...\n";
$db->install_TSearch2();
print "Done\n";
Note that this last step will take a long time. For a S. cerevisiae database
with 15K rows, it took over an hour on my laptop, and with a C. elegans
database (~10 million rows) it took well over a day.
If at some point you add more data you your database, you need to run a similar
script to the one above, only executing the
update_TSearch2() method.
Finally, if you want to remove the TSearch2 columns from your database and go
back to using the pg adaptor, you can execute a script like the one above,
only executing the
remove_TSearch2() method.
NOTES ABOUT TSearch2 SEARCHING¶
You should know a few things about how searching with TSearch2 works in the
GBrowse environment:
- 1.
- TSearch2 does not do wild cards, so you should encourage
your users not to use them. If wild cards are used, the adaptor will fall
back on an ILIKE search, which will be much slower.
- 2.
- However, TSearch2 does do 'word stemming'. That is, if you
search for 'copy', it will find 'copy', 'copies', and 'copied'.
- 3.
- TSearch2 does not do phrase searching; all of the terms in
the search string are ANDed together.
ACKNOWLEDGEMENTS¶
Special thanks to Russell Smithies and Paul Smale at AgResearch in New Zealand
for giving me their recipe for doing full text indexing in a GFF database.
BUGS¶
Please report bugs to the BioPerl and/or GBrowse mailing lists
(mailto:bioperl-l@lists.open-bio.org
<mailto:bioperl-l@lists.open-bio.org> and
mailto:gmod-gbrowse@lists.sourceforge.net
<mailto:gmod-gbrowse@lists.sourceforge.net> respectively).
SEE ALSO¶
Please see Bio::DB::GFF::Adaptor::dbi::pg for more information about tuning your
PostgreSQL server for GFF data, and for general information about GFF database
access, see Bio::DB::GFF.
AUTHOR¶
Scott Cain, cain@cshl.edu
APPENDIX¶
search_notes¶
Title : search_notes
Usage : @search_results = $db->search_notes("full text string",$limit)
Function: Search the notes for a text string, using PostgreSQL TSearch2
Returns : array of results
Args : full text search string, and an optional row limit
Status : public
This is based on the mysql-specific method that makes use of the TSearch2
functionality in PosgreSQL's contrib directory. Given a search string, it
performs a full-text search of the notes table and returns an array of
results. Each row of the returned array is a arrayref containing the following
fields:
column 1 A Bio::DB::GFF::Featname object, for passing to segment()
column 2 The text of the note
column 3 A relevance score.
make_features_by_name_where_part¶
Title : make_features_by_name_where_part
Function: constructs a TSearch2-compliant WHERE clause for a name search
Status : protected
install_TSearch2¶
Title : install_TSearch2
Function: installs schema modifications for use with TSearch2
Usage : $db->install_TSearch2
Status : public
update_TSearch2¶
Title : update_TSearch2
Function: Updates TSearch2 columns
Usage : $db->update_TSearch2
Status : public
remove_TSearch2¶
Title : remove_TSearch2
Function: Removes TSearch2 columns
Usage : $db->remove_TSearch2
Status : public