table of contents
other versions
- jessie 1.0.2-3
MARC2RIS(1) | RefDB Manual | MARC2RIS(1) |
NAME¶
marc2ris - converts MARC bibliographic data to the RIS formatSYNOPSIS¶
marc2ris [-e log-destination] [-h]
[-l log-level] [-L log-file] [-m]
[-o outfile] [-O outfile]
[-t input_type] [-u t|f] file
DESCRIPTION¶
marc2ris attempts to extract the information useful to RefDB from MARC datasets. MARC (Machine Readable Catalogue Format) is a standard originating from the 1960s and is widely used by libraries and bibliographic agencies. Most libraries that offer Z39.50 access can provide the records in at least one MARC format (like with most other "standards" there's a couple to choose from). Currently the following MARC dialects are supported: MARC21This is an attempt to consolidate existing MARC variants
(mainly USMARC and CANMARC) and will most likely be the format supported by
all libraries in the near future. The format is described on the Library of
Congress MARC pages[1].
UNIMARC
This is the European equivalent of a standardization
attempt. The specification can be found here[2].
UKMARC
This format is fairly close to the USMARC variant and is
mainly used by libraries in the United Kingdom and in Ireland. Libraries
supporting this format may switch to MARC21 in the future. Unfortunately there
is no online description of this format, but this PDF document[3]
describes the main differences between USMARC and UKMARC.
OPTIONS¶
By default the script reads USMARC data from stdin and sends RIS data to stdout. -e log-destinationlog-destination can have the values 0, 1, or 2, or the
equivalent strings stderr, syslog, or file, respectively.
This value specifies where the log information goes to. 0 (zero) means the
messages are sent to stderr. They are immediately available on the screen but
they may interfere with command output. 1 will send the output to the syslog
facility. Keep in mind that syslog must be configured to accept log messages
from user programs, see the syslog(8) man page for further information.
Unix-like systems usually save these messages in /var/log/user.log. 2 will
send the messages to a custom log file which can be specified with the
-L option.
-h
Displays help and usage screen, then exits.
-l log-level
Specify the priority up to which events are logged. This
is either a number between 0 and 7 or one of the strings emerg,
alert, crit, err, warning, notice,
info, debug, respectively (see also Log level definitions).
-1 disables logging completely. A low log level like 0 means that only
the most critical messages are logged. A higher log level means that less
critical events are logged as well. 7 will include debug messages. The latter
can be verbose and abundant, so you want to avoid this log level unless you
need to track down problems.
-L log-file
Specify the full path to a log file that will receive the
log messages. Typically this would be /var/log/refdba.
-m
Switch on additional MARC output. The output data will be
the RIS output interspersed with the source MARC data used to generate the
output. This is useful to fix conversion errors manually.
-o file
Send output to file. If file exists, its
contents will be overwritten.
-O file
Send output to file. If file exists, the
output will be appended.
-t input_type
Specify the MARC input type. The default is
MARC21. Other available types are UNIMARC and
UKMARC.
-u t|f
Request Unicode output if set to "t" (this is
the default). marc2ris attempts to convert the input data into Unicode (unless
the dataset explicitly states that it already uses Unicode). If the conversion
does not seem to work, set this to "f" as some MARC variants do not
state the character encoding explicitly.
CONFIGURATION¶
marc2ris evaluates the file marc2risrc to initialize itself.Variable | Default | Comment |
outfile | (none) | The default output file name. |
outappend | t | Determines whether output is appended (t) to an existing file or overwrites ( f) an existing file. |
unmapped | t | If set to t, unknown tags in the input data will be output following a <unmapped> tag; the resulting data can be inspected and then be sent through sed to strip off these additional lines. If set to f, unknown tags will be gracefully ignored. |
logfile | /var/log/med2ris.log | The full path of a custom log file. This is used only if logdest is set appropriately. |
logdest | 1 | The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile. |
loglevel | 6 | The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged. |
DATA PROCESSING¶
The purpose of the MARC format is entirely different from the purpose of the RIS format, so you shouldn't be too surprised that the import of MARC data is somewhat rough at the edges. The filter apparently deals fine with quite a lot of datasets, but the following shortcomings are known (and more are likely to be discovered by the interested reader):•Some fields, like 846, are currently ignored
completely. This, of course, is bound to change.
•Author names specified in the natural order, i.e.
something like First Middle Last, are not normalized due to the problems with
multiple middle or last names. Author names in the inverse order, i.e.
something like Last, First Middle, are normalized correctly in most cases.
Handling of non-European names is a matter of trial and error.
•Character set handling is somewhat limited. Only
the unaltered input character encoding or UTF-8 are available for the output
data.
That said, there is still some hope. The -m command line option switches
on additional MARC output. That is, the generated output will contain
interspersed lines that show the contents of the original MARC fields used to
generate the following RIS line or lines. For example, the following output
snippet shows how marc2ris generated the author lines from the MARC
input:
<marc>empty author field (100) <marc>:Author(Ind1): 1 <marc>:Author($a): Ershov, A. P. <marc>:Author($b): <marc>:Author($c): <marc>:Author(Ind1): 1 <marc>:Author($a): Knuth, Donald Ervin, <marc>:Author($b): <marc>:Author($c): AU - Ershov,A.P. AU - Knuth,Donald Ervin
~$ grep -v "<marc>" < withmarc.ris > womarc.ris
FILES¶
PREFIX/etc/refdb/marc2risrcThe global configuration file of marc2ris.
$HOME/.marc2risrc
The user configuration file of marc2ris.
SEE ALSO¶
RefDB (7), bib2ris (1), db2ris (1), en2ris (1), med2ris (1). RefDB manual (local copy) PREFIX/share/doc/refdb-<version>/refdb-manual/index.html RefDB manual (web) < http://refdb.sourceforge.net/manual/index.html> RefDB on the web < http://refdb.sourceforge.net/>AUTHOR¶
marc2ris was written by Markus Hoenicka <markus@mhoenicka.de>.NOTES¶
- 1.
- Library of Congress MARC pages
- 2.
- here
- 3.
- PDF document
[set
$man.base.url.for.relative.links]/www.bl.uk/services/bibliographic/marcchange.pdf
2005-10-16 | RefDB Manual |