NAME¶
WordNet::QueryData - direct perl interface to WordNet database
SYNOPSIS¶
use WordNet::QueryData;
my $wn = WordNet::QueryData->new( noload => 1);
print "Synset: ", join(", ", $wn->querySense("cat#n#7", "syns")), "\n";
print "Hyponyms: ", join(", ", $wn->querySense("cat#n#1", "hypo")), "\n";
print "Parts of Speech: ", join(", ", $wn->querySense("run")), "\n";
print "Senses: ", join(", ", $wn->querySense("run#v")), "\n";
print "Forms: ", join(", ", $wn->validForms("lay down#v")), "\n";
print "Noun count: ", scalar($wn->listAllWords("noun")), "\n";
print "Antonyms: ", join(", ", $wn->queryWord("dark#n#1", "ants")), "\n";
DESCRIPTION¶
WordNet::QueryData provides a direct interface to the WordNet database files. It
requires the WordNet package (
http://www.cogsci.princeton.edu/~wn/). It allows
the user direct access to the full WordNet semantic lexicon. All parts of
speech are supported and access is generally very efficient because the index
and morphical exclusion tables are loaded at initialization. The module can
optionally be used to load the indexes into memory for extra-fast lookups.
USAGE¶
LOCATING THE WORDNET DATABASE¶
To use QueryData, you must tell it where your WordNet database is. There are two
ways you can do this: 1) by setting the appropriate environment variables, or
2) by passing the location to QueryData when you invoke the "new"
function.
QueryData knows about two environment variables, WNHOME and WNSEARCHDIR. If
WNSEARCHDIR is set, QueryData looks for WordNet data files there. Otherwise,
QueryData looks for WordNet data files in WNHOME/dict (WNHOME\dict on a PC).
If WNHOME is not set, it defaults to "/usr/local/WordNet-3.0" on
Unix and "C:\Program Files\WordNet\3.0" on a PC. Normally, all you
have to do is to set the WNHOME variable to the location where you unpacked
your WordNet distribution. The database files are normally unpacked to the
"dict" subdirectory.
You can also pass the location of the database files directly to QueryData. To
do this, pass the location to "new":
my $wn = WordNet::QueryData->new("/usr/local/wordnet/dict");
You can instead call the constructor with a hash of params, as in:
my $wn = WordNet::QueryData->new(
dir => "/usr/local/wordnet/dict",
verbose => 0,
noload => 1
);
When calling "new" in this fashion, two additional arguments are
supported; "verbose" will output debugging information, and
"noload" will cause the object to *not* load the indexes at startup.
CACHING VERSUS NOLOAD¶
The "noload" option results in data being retrieved using a dictionary
lookup rather than caching the indexes in RAM. This method yields an immediate
startup time but *slightly* (though less than you might think) longer lookup
time. For the curious, here are some profile data for each method on a duo
core intel mac, averaged seconds over 10000 iterations:
Caching versus noload times in seconds
noload => 1 noload => 0
------------------------------------------------------------------
new() 0.00001 2.55
queryWord("descending") 0.0009 0.0001
querySense("sunset#n#1", "hype") 0.0007 0.0001
validForms ("lay down#2") 0.0004 0.0001
Obviously the
new() comparison is not very useful, because nothing is
happening with the constructor in the case of noload => 1. Similarly,
lookups with caching are basically just hash lookups, and therefore very fast.
The lookup times for noload => 1 illustrate the tradeoff between caching at
new() time and using dictionary lookups.
Because of the lookup speed increase when noload => 0, many users will find
it useful to set noload to 1 during development cycles, and to 0 when RAM is
less of a concern than speed. The bottom line is that noload => 1 saves you
over 2 seconds of startup time, and costs you about 0.0005 seconds per lookup.
QUERYING THE DATABASE¶
There are two primary query functions, 'querySense' and 'queryWord'. querySense
accesses semantic (sense to sense) relations; queryWord accesses lexical (word
to word) relations. The majority of relations are semantic. Some relations,
including "also see", antonym, pertainym, "participle of
verb", and derived forms are lexical. See the following WordNet
documentation for additional information:
http://wordnet.princeton.edu/man/wninput.5WN#sect3
Both functions take as their first argument a query string that takes one of
three types:
(1) word (e.g. "dog")
(2) word#pos (e.g. "house#n")
(3) word#pos#sense (e.g. "ghostly#a#1")
Types (1) or (2) passed to querySense or queryWord will return a list of
possible query strings at the next level of specificity. When type (3) is
passed to querySense or queryWord, it requires a second argument, a relation.
Relations generally only work with one function or the other, though some
relations can be either semantic or lexical; hence they may work for both
functions. Below is a list of known relations, grouped according to the
function they're most likely to work with:
queryWord
---------
also - also see
ants - antonyms
deri - derived forms (nouns and verbs only)
part - participle of verb (adjectives only)
pert - pertainym (pertains to noun) (adjectives only)
vgrp - verb group (verbs only)
querySense
----------
also - also see
glos - word definition
syns - synset words
hype - hypernyms
inst - instance of
hypes - hypernyms and "instance of"
hypo - hyponyms
hasi - has instance
hypos - hyponums and "has instance"
mmem - member meronyms
msub - substance meronyms
mprt - part meronyms
mero - all meronyms
hmem - member holonyms
hsub - substance holonyms
hprt - part holonyms
holo - all holonyms
attr - attributes (?)
sim - similar to (adjectives only)
enta - entailment (verbs only)
caus - cause (verbs only)
domn - domain - all
dmnc - domain - category
dmnu - domain - usage
dmnr - domain - region
domt - member of domain - all (nouns only)
dmtc - member of domain - category (nouns only)
dmtu - member of domain - usage (nouns only)
dmtr - member of domain - region (nouns only)
When called in this manner, querySense and queryWord will return a list of
related words/senses. Note that as of WordNet 2.1, many hypernyms have become
"instance of" and many hyponyms have become "has
instance."
Note that querySense and queryWord use type (3) query strings in different ways.
A type (3) string passed to querySense specifies a synset. A type (3) string
passed to queryWord specifies a specific sense of a specific word.
OTHER FUNCTIONS¶
"validForms" accepts a type (1) or (2) query string. It returns a list
of all alternate forms (alternate spellings, conjugations, plural/singular
forms, etc.). The type (1) query returns alternates for all parts of speech
(noun, verb, adjective, adverb). WARNING: Only the first argument returned by
validForms is certain to be valid (i.e. recognized by WordNet). Remaining
arguments may not be valid.
"listAllWords" accepts a part of speech and returns the full list of
words in the WordNet database for that part of speech.
"level" accepts a type (3) query string and returns a distance (not
necessarily the shortest or longest) to the root in the hypernym directed
acyclic graph.
"offset" accepts a type (3) query string and returns the binary offset
of that sense's location in the corresponding data file.
"tagSenseCnt" accepts a type (2) query string and returns the
tagsense_cnt value for that lemma: "number of senses of lemma that are
ranked according to their frequency of occurrence in semantic concordance
texts."
"lexname" accepts a type (3) query string and returns the lexname of
the sense; see WordNet lexnames man page for more information.
"frequency" accepts a type (3) query string and returns the frequency
count of the sense from tagged text; see WordNet cntlist man page for more
information.
See test.pl for additional example usage.
NOTES¶
Requires access to WordNet database files (data.noun/noun.dat,
index.noun/noun.idx, etc.)
COPYRIGHT¶
Copyright 2000-2005 Jason Rennie. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.
SEE ALSO¶
perl(1)
http://wordnet.princeton.edu/
http://people.csail.mit.edu/~jrennie/WordNet/