NAME¶
Boulder::Unigene - Fetch Unigene data records as parsed Boulder Stones
SYNOPSIS¶
# parse a file of Unigene records
$ug = new Boulder::Unigene(-accessor=>'File',
-param => '/data/unigene/Hs.dat');
while (my $s = $ug->get) {
print $s->Identifier;
print $s->Gene;
}
# parse flatfile records yourself
open (UG,"/data/unigene/Hs.dat");
local $/ = "*RECORD*";
while (<UG>) {
my $s = Boulder::Unigene->parse($_);
# etc.
}
DESCRIPTION¶
Boulder::Unigene provides retrieval and parsing services for UNIGENE records
Boulder::Unigene provides retrieval and parsing services for NCBI Unigene
records. It returns Unigene entries in Stone format, allowing easy access to
the various fields and values. Boulder::Unigene is a descendent of
Boulder::Stream, and provides a stream-like interface to a series of Stone
objects.
Access to Unigene is provided by one
accessors, which give access to
local Unigene database. When you create a new Boulder::Unigene stream, you
provide the accessors, along with accessor-specific parameters that control
what entries to fetch. The accessors is:
- File
- This provides access to local Unigene entries by reading from a flat file
(typically Hs.dat file downloadable from NCBI's Ftp site). The stream will
return a Stone corresponding to each of the entries in the file, starting
from the top of the file and working downward. The parameter is the path
to the local file.
It is also possible to parse a single Unigene entry from a text string stored in
a scalar variable, returning a Stone object.
Boulder::Unigene methods¶
This section lists the public methods that the
Boulder::Unigene class
makes available.
- new()
-
# Local fetch via File
$ug=new Boulder::Unigene(-accessor => 'File',
-param => '/data/unigene/Hs.dat');
The new() method creates a new Boulder::Unigene stream on the
accessor provided. The only possible accessors is File. If
successful, the method returns the stream object. Otherwise it returns
undef.
new() takes the following arguments:
-accessor Name of the accessor to use
-param Parameters to pass to the accessor
Specify the accessor to use with the -accessor argument. If not
specified, it defaults to File.
-param is an accessor-specific argument. The possibilities is:
For File, the -param argument must point to a string-valued
scalar, which will be interpreted as the path to the file to read Unigene
entries from.
- get()
- The get() method is inherited from Boulder::Stream, and
simply returns the next parsed Unigene Stone, or undef if there is nothing
more to fetch. It has the same semantics as the parent class, including
the ability to restrict access to certain top-level tags.
- put()
- The put() method is inherited from the parent Boulder::Stream
class, and will write the passed Stone to standard output in Boulder
format. This means that it is currently not possible to write a
Boulder::Unigene object back into Unigene flatfile form.
The tags returned by the parsing operation are taken from the names shown in the
Flat file Hs.dat since no better description of them is provided yet by the
database source producer.
These are tags that appear at the top level of the parsed Unigene entry.
- Identifier
- The Unigene identifier of this entry. Identifier is a single-value tag.
Example:
my $identifierNo = $s->Identifier;
- Title
- The Unigene title for this entry.
Example:
my $titledef=$s->Title;
- Gene The Gene associated with this Unigene entry
- Example:
my $thegene=$s->Gene;
- Cytoband The cytological band position of this entry
- Example:
my $thecytoband=$s->Cytoband;
- Counts The number of EST in this record
- Example:
my $thecounts=$s->Counts;
- LocusLink The id of the LocusLink entry associated with this record
- Example:
my $thelocuslink=$s->LocusLink;
- Chromosome This field contains a list, of the chromosomes numbers in which
this entry has been linked
- Example:
my @theChromosome=$s->Chromosome;
- ACC
- NAME
The TXMAP tag points to a Stone record that contains multiple subtags. Each
subtag is the name of a feature which points, in turn, to a Stone that
describes the feature's location and other attributes.
Each feature will contain one or more of the following subtags:
- MARKER
- RHPANEL
- ORG
- PROTID
- PCT
- ALN
- ACC
- NID
- PID
- CLONE
- END
- LID
SEE ALSO¶
Boulder, Boulder::Blast, Boulder::Genbank
AUTHOR¶
Lincoln Stein <lstein@cshl.org>. Luca I.G. Toldo
<luca.toldo@merck.de>
Copyright (c) 1997 Lincoln D. Stein Copyright (c) 1999 Luca I.G. Toldo
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of
warranty.