'\" t
.\" Title: marc2ris
.\" Author: [see the "Author" section]
.\" Generator: DocBook XSL Stylesheets v1.76.1
.\" Date: 2005-10-16
.\" Manual: RefDB Manual
.\" Source: RefDB Manual
.\" Language: English
.\"
.TH "MARC2RIS" "1" "2005\-10\-16" "RefDB Manual" "RefDB Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
marc2ris \- converts MARC bibliographic data to the RIS format
.SH "SYNOPSIS"
.HP \w'\fBmarc2ris\fR\ 'u
\fBmarc2ris\fR [\-e\ \fIlog\-destination\fR] [\-h] [\-l\ \fIlog\-level\fR] [\-L\ \fIlog\-file\fR] [\-m] [\-o\ \fIoutfile\fR] [\-O\ \fIoutfile\fR] [\-t\ \fIinput_type\fR] [\-u\ \fIt|f\fR] \fIfile\fR
.SH "DESCRIPTION"
.PP
marc2ris attempts to extract the information useful to RefDB from
MARC
datasets\&.
MARC
(Machine Readable Catalogue Format) is a standard originating from the 1960s and is widely used by libraries and bibliographic agencies\&. Most libraries that offer Z39\&.50 access can provide the records in at least one
MARC
format (like with most other "standards" there\*(Aqs a couple to choose from)\&. Currently the following
MARC
dialects are supported:
.PP
\fBMARC21\fR
.RS 4
This is an attempt to consolidate existing MARC variants (mainly USMARC and CANMARC) and will most likely be the format supported by all libraries in the near future\&. The format is described on the
\m[blue]\fBLibrary of Congress MARC pages\fR\m[]\&\s-2\u[1]\d\s+2\&.
.RE
.PP
\fBUNIMARC\fR
.RS 4
This is the European equivalent of a standardization attempt\&. The specification can be found
\m[blue]\fBhere\fR\m[]\&\s-2\u[2]\d\s+2\&.
.RE
.PP
\fBUKMARC\fR
.RS 4
This format is fairly close to the USMARC variant and is mainly used by libraries in the United Kingdom and in Ireland\&. Libraries supporting this format may switch to MARC21 in the future\&. Unfortunately there is no online description of this format, but this
\m[blue]\fBPDF document\fR\m[]\&\s-2\u[3]\d\s+2
describes the main differences between USMARC and UKMARC\&.
.RE
.SH "OPTIONS"
.PP
By default the script reads USMARC data from stdin and sends RIS data to stdout\&.
.PP
\fB\-e\fR \fIlog\-destination\fR
.RS 4
log\-destination can have the values 0, 1, or 2, or the equivalent strings
\fIstderr\fR,
\fIsyslog\fR, or
\fIfile\fR, respectively\&. This value specifies where the log information goes to\&.
0
(zero) means the messages are sent to stderr\&. They are immediately available on the screen but they may interfere with command output\&.
1
will send the output to the syslog facility\&. Keep in mind that syslog must be configured to accept log messages from user programs, see the syslog(8) man page for further information\&. Unix\-like systems usually save these messages in
/var/log/user\&.log\&.
2
will send the messages to a custom log file which can be specified with the
\fB\-L\fR
option\&.
.RE
.PP
\fB\-h\fR
.RS 4
Displays help and usage screen, then exits\&.
.RE
.PP
\fB\-l\fR \fIlog\-level\fR
.RS 4
Specify the priority up to which events are logged\&. This is either a number between
0
and
7
or one of the strings
\fIemerg\fR,
\fIalert\fR,
\fIcrit\fR,
\fIerr\fR,
\fIwarning\fR,
\fInotice\fR,
\fIinfo\fR,
\fIdebug\fR, respectively (see also Log level definitions)\&.
\fB\-1\fR
disables logging completely\&. A low log level like
0
means that only the most critical messages are logged\&. A higher log level means that less critical events are logged as well\&.
7
will include debug messages\&. The latter can be verbose and abundant, so you want to avoid this log level unless you need to track down problems\&.
.RE
.PP
\fB\-L\fR \fIlog\-file\fR
.RS 4
Specify the full path to a log file that will receive the log messages\&. Typically this would be
/var/log/refdba\&.
.RE
.PP
\fB\-m\fR
.RS 4
Switch on additional MARC output\&. The output data will be the RIS output interspersed with the source MARC data used to generate the output\&. This is useful to fix conversion errors manually\&.
.RE
.PP
\fB\-o\fR \fIfile\fR
.RS 4
Send output to
\fIfile\fR\&. If
\fIfile\fR
exists, its contents will be overwritten\&.
.RE
.PP
\fB\-O\fR \fIfile\fR
.RS 4
Send output to
\fIfile\fR\&. If
\fIfile\fR
exists, the output will be appended\&.
.RE
.PP
\fB\-t\fR \fIinput_type\fR
.RS 4
Specify the MARC input type\&. The default is
\fIMARC21\fR\&. Other available types are
\fIUNIMARC\fR
and
\fIUKMARC\fR\&.
.RE
.PP
\fB\-u \fR\fB\fIt|f\fR\fR
.RS 4
Request Unicode output if set to "t" (this is the default)\&. marc2ris attempts to convert the input data into Unicode (unless the dataset explicitly states that it already uses Unicode)\&. If the conversion does not seem to work, set this to "f" as some MARC variants do not state the character encoding explicitly\&.
.RE
.SH "CONFIGURATION"
.PP
\fBmarc2ris\fR
evaluates the file
marc2risrc
to initialize itself\&.
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.B Table\ \&1.\ \&marc2risrc
.TS
allbox tab(:);
lB lB lB.
T{
Variable
T}:T{
Default
T}:T{
Comment
T}
.T&
l l l
l l l
l l l
l l l
l l l
l l l.
T{
outfile
T}:T{
(none)
T}:T{
The default output file name\&.
T}
T{
outappend
T}:T{
t
T}:T{
Determines whether output is appended (\fIt\fR) to an existing file or overwrites (\fIf\fR) an existing file\&.
T}
T{
unmapped
T}:T{
t
T}:T{
If set to \fIt\fR, unknown tags in the input data will be output following a tag; the resulting data can be inspected and then be sent through \fBsed\fR to strip off these additional lines\&. If set to \fIf\fR, unknown tags will be gracefully ignored\&.
T}
T{
logfile
T}:T{
/var/log/med2ris\&.log
T}:T{
The full path of a custom log file\&. This is used only if logdest is set appropriately\&.
T}
T{
logdest
T}:T{
1
T}:T{
The destination of the log information\&. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile\&. The latter needs a proper setting of logfile\&.
T}
T{
loglevel
T}:T{
6
T}:T{
The log level up to which messages will be sent\&. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages\&. \-1 means nothing will be logged\&.
T}
.TE
.sp 1
.SH "DATA PROCESSING"
.PP
The purpose of the MARC format is entirely different from the purpose of the RIS format, so you shouldn\*(Aqt be too surprised that the import of MARC data is somewhat rough at the edges\&. The filter apparently deals fine with quite a lot of datasets, but the following shortcomings are known (and more are likely to be discovered by the interested reader):
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Some fields, like 846, are currently ignored completely\&. This, of course, is bound to change\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Author names specified in the natural order, i\&.e\&. something like First Middle Last, are not normalized due to the problems with multiple middle or last names\&. Author names in the inverse order, i\&.e\&. something like Last, First Middle, are normalized correctly in most cases\&. Handling of non\-European names is a matter of trial and error\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Character set handling is somewhat limited\&. Only the unaltered input character encoding or UTF\-8 are available for the output data\&.
.RE
.PP
That said, there is still some hope\&. The
\fB\-m\fR
command line option switches on additional MARC output\&. That is, the generated output will contain interspersed lines that show the contents of the original MARC fields used to generate the following RIS line or lines\&. For example, the following output snippet shows how
\fBmarc2ris\fR
generated the author lines from the MARC input:
.sp
.if n \{\
.RS 4
.\}
.nf
empty author field (100)
:Author(Ind1): 1
:Author($a): Ershov, A\&. P\&.
:Author($b):
:Author($c):
:Author(Ind1): 1
:Author($a): Knuth, Donald Ervin,
:Author($b):
:Author($c):
AU \- Ershov,A\&.P\&.
AU \- Knuth,Donald Ervin
.fi
.if n \{\
.RE
.\}
.PP
If you feel marc2ris does not translate your data appropriately, the easiest way might be to use the
\fB\-m\fR
switch and redirect the output into a file\&. Then you can analyze the situation and fix the RIS lines as you see fit\&. Finally you can strip the MARC lines off with a command like:
.sp
.if n \{\
.RS 4
.\}
.nf
~$ grep \-v "" < withmarc\&.ris > womarc\&.ris
.fi
.if n \{\
.RE
.\}
.SH "FILES"
.PP
PREFIX/etc/refdb/marc2risrc
.RS 4
The global configuration file of marc2ris\&.
.RE
.PP
$HOME/\&.marc2risrc
.RS 4
The user configuration file of marc2ris\&.
.RE
.SH "SEE ALSO"
.PP
\fBRefDB\fR
(7),
\fBbib2ris\fR
(1),
\fBdb2ris\fR
(1),
\fBen2ris\fR
(1),
\fBmed2ris\fR
(1)\&.
.PP
\fIRefDB manual (local copy) \fR
PREFIX/share/doc/refdb\-/refdb\-manual/index\&.html
.PP
\fIRefDB manual (web) \fR
<\m[blue]\fBhttp://refdb\&.sourceforge\&.net/manual/index\&.html\fR\m[]>
.PP
\fIRefDB on the web \fR
<\m[blue]\fBhttp://refdb\&.sourceforge\&.net/\fR\m[]>
.SH "AUTHOR"
.PP
marc2ris was written by Markus Hoenicka \&.
.SH "NOTES"
.IP " 1." 4
Library of Congress MARC pages
.RS 4
\%http://www.loc.gov/marc/
.RE
.IP " 2." 4
here
.RS 4
\%http://www.ifla.org/VI/3/p1996-1/sec-uni.htm
.RE
.IP " 3." 4
PDF document
.RS 4
\%[set $man.base.url.for.relative.links]/www.bl.uk/services/bibliographic/marcchange.pdf
.RE