NAME¶
Boulder::LocusLink - Fetch LocusLink data records as parsed Boulder Stones
SYNOPSIS¶
# parse a file of LocusLink records
$ll = new Boulder::LocusLink(-accessor=>'File',
-param => '/home/data/LocusLink/LL_tmpl');
while (my $s = $ll->get) {
print $s->Identifier;
print $s->Gene;
}
# parse flatfile records yourself
open (LL,"/home/data/LocusLink/LL_tmpl");
local $/ = "*RECORD*";
while (<LL>) {
my $s = Boulder::LocusLink->parse($_);
# etc.
}
DESCRIPTION¶
Boulder::LocusLink provides retrieval and parsing services for LocusLink records
Boulder::LocusLink provides retrieval and parsing services for NCBI LocusLink
records. It returns Unigene entries in Stone format, allowing easy access to
the various fields and values. Boulder::LocusLink is a descendent of
Boulder::Stream, and provides a stream-like interface to a series of Stone
objects.
Access to LocusLink is provided by one
accessors, which give access to
local LocusLink database. When you create a new Boulder::LocusLink stream, you
provide the accessors, along with accessor-specific parameters that control
what entries to fetch. The accessors is:
- File
- This provides access to local LocusLink entries by reading
from a flat file (typically Hs.dat file downloadable from NCBI's Ftp
site). The stream will return a Stone corresponding to each of the entries
in the file, starting from the top of the file and working downward. The
parameter is the path to the local file.
It is also possible to parse a single LocusLink entry from a text string stored
in a scalar variable, returning a Stone object.
Boulder::LocusLink methods¶
This section lists the public methods that the
Boulder::LocusLink class
makes available.
- new()
-
# Local fetch via File
$ug=new Boulder::LocusLink(-accessor => 'File',
-param => '/data/LocusLink/Hs.dat');
The new() method creates a new Boulder::LocusLink stream on
the accessor provided. The only possible accessors is File. If
successful, the method returns the stream object. Otherwise it returns
undef.
new() takes the following arguments:
-accessor Name of the accessor to use
-param Parameters to pass to the accessor
Specify the accessor to use with the -accessor argument. If not
specified, it defaults to File.
-param is an accessor-specific argument. The possibilities is:
For File, the -param argument must point to a string-valued
scalar, which will be interpreted as the path to the file to read
LocusLink entries from.
- get()
- The get() method is inherited from
Boulder::Stream, and simply returns the next parsed LocusLink
Stone, or undef if there is nothing more to fetch. It has the same
semantics as the parent class, including the ability to restrict access to
certain top-level tags.
- put()
- The put() method is inherited from the parent
Boulder::Stream class, and will write the passed Stone to standard output
in Boulder format. This means that it is currently not possible to write a
Boulder::LocusLink object back into LocusLink flatfile form.
The tags returned by the parsing operation are taken from the names shown in the
Flat file Hs.dat since no better description of them is provided yet by the
database source producer.
These are tags that appear at the top level of the parsed LocusLink entry.
- Identifier
- The LocusLink identifier of this entry. Identifier is a
single-value tag.
Example:
my $identifierNo = $s->Identifier;
- Current_locusid
- If a locus has been merged with another, the
Current_locusid contains the previous LOCUSID line (A bit confusing, shall
be called "previous_locusid", but this is defined in NCBI README
File ... ).
Example:
my $prevlocusid=$s->Current_locusid;
- Organism Source species ased on NCBI's Taxonomy
- Example:
my $theorganism=$s->Organism;
- Status Type of reference sequence record. If
"PROVISIONAL" then means that is generated automatically from
existing Genbank record and information stored in the LocusLink database, no
curation. If "REVIEWED" than it means that is generated from the
most representative complete GenBank sequence or merge of GenBank sequenes
and from information stored in the LocusLink database
- Example:
my $thestatus=$s->Status;
- LocAss Here comes a complex record ... made up of
LOCUS_STRING, NM The value in the LOCUS field of the RefSeq record , NP The
RefSeq accession number for an mRNA record, PRODUCT The name of the produc
tof this transcript, TRANSVAR a variant-specific description, ASSEMBLY The
Genbank accession used to assemble the refseq record
- Example:
my $theprod=$s->LocAss->Product;
- AccProt Here comes a complex record ... made up of ACCNUM
Nucleotide sequence accessio number TYPE e=EST, m=mRNA, g=Genomic PROT set
of PID values for the coding region or regions annotated on the nucleotide
record. The first value is the PID (an integer or null), then either MMDB or
na, separated from the PID by a |. If MMDB is present, it indicates there
are structur edata available for a protein related to the protein referenced
by the PID Example: my $theprot=$s->AccProt->Prot;
- OFFICIAL_SYMBOL The symbol used for gene reports, validated
by the appropriate nomenclature committee
- PREFERRED_SYMBOL Interim symbol used for display
- OFFICIAL_GENE_NAME The gene description used for gene
reports validate by the appropriate nomenclatur eommittee. If the symbol is
official, the gene name will be official. No records will have both official
and interim nomenclature.
- PREFERRED_GENE_NAME Interim used for display
- PREFERRED_PRODUCT The name of the product used in the
RefSeq record
- ALIAS_SYMBOL Other symbols associated with this gene
- ALIAS_PROT Other protein names associated with this
gene
- PhenoTable A complex record made up of Phenotype
Phenotype_ID
- SUmmary
- Unigene
- Omim
- Chr
- Map
- STS
- ECNUM
- ButTable BUTTON LINK
- DBTable DB_DESCR DB_LINK
- PMID a subset of publications associated with this locus
with the link being the PubMed unique identifier comma separated
SEE ALSO¶
Boulder, Boulder::Blast, Boulder::Genbank
AUTHOR¶
Lincoln Stein <lstein@cshl.org>. Luca I.G. Toldo
<luca.toldo@merck.de>
Copyright (c) 1997 Lincoln D. Stein Copyright (c) 1999 Luca I.G. Toldo
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of
warranty.