NAME¶
ra-retrieve - retrieve files that match a query for use with remembrance agent
software
SYNOPSIS¶
ra-retrieve [--version] [-v] [-d] <base-dir> [--docnum
<docnum>]
DESCRIPTION¶
ra-index and
ra-retrieve make up the Savant search engine, an
information retrieval engine designed as a back-end for the Remembrance Agent
(RA). Given a collection of the user's accumulated email, usenet news
articles, papers, saved HTML files and other text notes, the RA attempts to
find those documents which are most relevant to the user's current context.
That is, it searches this collection of text for the documents which bear the
highest word-for-word similarity to the text the user is currently editing, in
the hope that they will also bear high conceptual similarity and thus be
useful to the user's current work. With the Emacs front-end, these suggestions
are continuously displayed in a small buffer at the bottom of the user's
window. If a suggestion looks useful, the full text can be retrieved with a
single command.
The Remembrance Agent works in two stages. First, the user's collection of text
documents is indexed into a database saved in a vector format. After the
database is created, the other stage of the Remembrance Agent is run from
emacs, where it periodically takes a sample of text from the working buffer
and finds those documents from the collection that are most similar. It
summarizes the top documents in a small emacs window and allows you to
retrieve the entire text of any one with a keystroke. See the README file for
information on using the Emacs front-end.
The RA is primarily designed as a proactive information provider that
continually gives you information that might be relevant to your current
environment. In this mode,
ra-retrieve is run by a front end and its
output is parsed into a more human-readable format. However, Savant can also
be used as a standard text and information retrieval search engine.
USAGE¶
The one argument to
ra-retrieve is <base-dir>, which is the
directory containing the index files created by
ra-index. This starts
an interactive process that handles queries and returns documents that most
match that query. When running with the -v argument, a menu and other
information is printed. Without the -v option it is assumed that
ra-retrieve has been run from a front-end, and only minimal information
is printed. The following commands are available:
- query <num-lines>
- Find the <num-lines> most relevant documents to a
query. Default is 5. Enter the text of the query, followed by a ^D (ASCII
04). If the query matches a predefined template then fields are parsed and
separately. For example, in emacs mail mode the from, subject, date and
body of the message are all individually parsed. If no template matches,
the query is assumed to be plain text.
A query will output up to n summary lines followed by a period on a line by
itself. Each summary line will contain a line number, relevance number,
document number, and a series of fields describing the document. The final
field is a comma-separated list of words from the query that most
contributed to this document being chosen.
- retrieve <document number>
- Retrieve and print the document with the given document
number. Document number is the third field outputed by the query. The full
text of the document is displayed. If the document is a part of a larger
file, such as in an email in an archive file, only that one document is
shown.
- loc-retrieve <document number>
- Retrieve and print the location of the document with the
given document number. Three values are displayed, each on their own
separate line. The first is the character offset to where the beginning of
the document is found. The second is the character offset for the end of
the document. The third line contains the fully expanded filename for the
document itself. This is primarily so front-ends can load the document and
display them with their own formatting.
- info
- Display version and database info.
- quit
- Quit.
- ?
- Display menu.
OPTIONS¶
- -v
- Verbose mode. Print a menu and other info for running
without a front-end.
- -d
- Debug mode. Print not-so-useful information.
- [--docnum <docnum>]
- Print the contents of the specified document number and
exit. This option doesn't use as much memory as interactive mode, and is
useful for scripts that call this program.
SEE ALSO¶
ra-index(1)
AUTHOR¶
Bradley Rhodes, MIT Media Lab. Please send comments and questions to
ra-bugs@media.mit.edu. New versions and updates can be found at
http://www.media.mit.edu/~rhodes/RA/
COPYRIGHT¶
All code included in versions up to and including 2.09:
Copyright (C) 2001 Massachusetts Institute of Technology.
All modifications subsequent to version 2.09 are copyright Bradley Rhodes or
their respective authors.
Developed by Bradley Rhodes at the Media Laboratory, MIT, Cambridge,
Massachusetts, with support from British Telecom and Merrill Lynch.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version. For commercial licensing under other terms, please consult the MIT
Technology Licensing Office.
This program may be subject to the following US and/or foreign patents
(pending): "Method and Apparatus for Automated, Context-Dependent
Retrieval of Information," MIT Case No. 7870TS. If any of these patents
are granted, royalty-free license to use this and derivative programs under
the GNU General Public License are hereby granted.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place - Suite 330, Boston, MA 02111-1307, USA.
BUGS¶
Dates are not currently indexed, so anything trying to do a date query gets no
suggestion back.
Requires GNU make to compile.
The template structure isn't documented.