'\" t .\" Title: marc2ris .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.76.1 .\" Date: 2005-10-16 .\" Manual: RefDB Manual .\" Source: RefDB Manual .\" Language: English .\" .TH "MARC2RIS" "1" "2005\-10\-16" "RefDB Manual" "RefDB Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" marc2ris \- converts MARC bibliographic data to the RIS format .SH "SYNOPSIS" .HP \w'\fBmarc2ris\fR\ 'u \fBmarc2ris\fR [\-e\ \fIlog\-destination\fR] [\-h] [\-l\ \fIlog\-level\fR] [\-L\ \fIlog\-file\fR] [\-m] [\-o\ \fIoutfile\fR] [\-O\ \fIoutfile\fR] [\-t\ \fIinput_type\fR] [\-u\ \fIt|f\fR] \fIfile\fR .SH "DESCRIPTION" .PP marc2ris attempts to extract the information useful to RefDB from MARC datasets\&. MARC (Machine Readable Catalogue Format) is a standard originating from the 1960s and is widely used by libraries and bibliographic agencies\&. Most libraries that offer Z39\&.50 access can provide the records in at least one MARC format (like with most other "standards" there\*(Aqs a couple to choose from)\&. Currently the following MARC dialects are supported: .PP \fBMARC21\fR .RS 4 This is an attempt to consolidate existing MARC variants (mainly USMARC and CANMARC) and will most likely be the format supported by all libraries in the near future\&. The format is described on the \m[blue]\fBLibrary of Congress MARC pages\fR\m[]\&\s-2\u[1]\d\s+2\&. .RE .PP \fBUNIMARC\fR .RS 4 This is the European equivalent of a standardization attempt\&. The specification can be found \m[blue]\fBhere\fR\m[]\&\s-2\u[2]\d\s+2\&. .RE .PP \fBUKMARC\fR .RS 4 This format is fairly close to the USMARC variant and is mainly used by libraries in the United Kingdom and in Ireland\&. Libraries supporting this format may switch to MARC21 in the future\&. Unfortunately there is no online description of this format, but this \m[blue]\fBPDF document\fR\m[]\&\s-2\u[3]\d\s+2 describes the main differences between USMARC and UKMARC\&. .RE .SH "OPTIONS" .PP By default the script reads USMARC data from stdin and sends RIS data to stdout\&. .PP \fB\-e\fR \fIlog\-destination\fR .RS 4 log\-destination can have the values 0, 1, or 2, or the equivalent strings \fIstderr\fR, \fIsyslog\fR, or \fIfile\fR, respectively\&. This value specifies where the log information goes to\&. 0 (zero) means the messages are sent to stderr\&. They are immediately available on the screen but they may interfere with command output\&. 1 will send the output to the syslog facility\&. Keep in mind that syslog must be configured to accept log messages from user programs, see the syslog(8) man page for further information\&. Unix\-like systems usually save these messages in /var/log/user\&.log\&. 2 will send the messages to a custom log file which can be specified with the \fB\-L\fR option\&. .RE .PP \fB\-h\fR .RS 4 Displays help and usage screen, then exits\&. .RE .PP \fB\-l\fR \fIlog\-level\fR .RS 4 Specify the priority up to which events are logged\&. This is either a number between 0 and 7 or one of the strings \fIemerg\fR, \fIalert\fR, \fIcrit\fR, \fIerr\fR, \fIwarning\fR, \fInotice\fR, \fIinfo\fR, \fIdebug\fR, respectively (see also Log level definitions)\&. \fB\-1\fR disables logging completely\&. A low log level like 0 means that only the most critical messages are logged\&. A higher log level means that less critical events are logged as well\&. 7 will include debug messages\&. The latter can be verbose and abundant, so you want to avoid this log level unless you need to track down problems\&. .RE .PP \fB\-L\fR \fIlog\-file\fR .RS 4 Specify the full path to a log file that will receive the log messages\&. Typically this would be /var/log/refdba\&. .RE .PP \fB\-m\fR .RS 4 Switch on additional MARC output\&. The output data will be the RIS output interspersed with the source MARC data used to generate the output\&. This is useful to fix conversion errors manually\&. .RE .PP \fB\-o\fR \fIfile\fR .RS 4 Send output to \fIfile\fR\&. If \fIfile\fR exists, its contents will be overwritten\&. .RE .PP \fB\-O\fR \fIfile\fR .RS 4 Send output to \fIfile\fR\&. If \fIfile\fR exists, the output will be appended\&. .RE .PP \fB\-t\fR \fIinput_type\fR .RS 4 Specify the MARC input type\&. The default is \fIMARC21\fR\&. Other available types are \fIUNIMARC\fR and \fIUKMARC\fR\&. .RE .PP \fB\-u \fR\fB\fIt|f\fR\fR .RS 4 Request Unicode output if set to "t" (this is the default)\&. marc2ris attempts to convert the input data into Unicode (unless the dataset explicitly states that it already uses Unicode)\&. If the conversion does not seem to work, set this to "f" as some MARC variants do not state the character encoding explicitly\&. .RE .SH "CONFIGURATION" .PP \fBmarc2ris\fR evaluates the file marc2risrc to initialize itself\&. .sp .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .br .B Table\ \&1.\ \&marc2risrc .TS allbox tab(:); lB lB lB. T{ Variable T}:T{ Default T}:T{ Comment T} .T& l l l l l l l l l l l l l l l l l l. T{ outfile T}:T{ (none) T}:T{ The default output file name\&. T} T{ outappend T}:T{ t T}:T{ Determines whether output is appended (\fIt\fR) to an existing file or overwrites (\fIf\fR) an existing file\&. T} T{ unmapped T}:T{ t T}:T{ If set to \fIt\fR, unknown tags in the input data will be output following a tag; the resulting data can be inspected and then be sent through \fBsed\fR to strip off these additional lines\&. If set to \fIf\fR, unknown tags will be gracefully ignored\&. T} T{ logfile T}:T{ /var/log/med2ris\&.log T}:T{ The full path of a custom log file\&. This is used only if logdest is set appropriately\&. T} T{ logdest T}:T{ 1 T}:T{ The destination of the log information\&. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile\&. The latter needs a proper setting of logfile\&. T} T{ loglevel T}:T{ 6 T}:T{ The log level up to which messages will be sent\&. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages\&. \-1 means nothing will be logged\&. T} .TE .sp 1 .SH "DATA PROCESSING" .PP The purpose of the MARC format is entirely different from the purpose of the RIS format, so you shouldn\*(Aqt be too surprised that the import of MARC data is somewhat rough at the edges\&. The filter apparently deals fine with quite a lot of datasets, but the following shortcomings are known (and more are likely to be discovered by the interested reader): .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Some fields, like 846, are currently ignored completely\&. This, of course, is bound to change\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Author names specified in the natural order, i\&.e\&. something like First Middle Last, are not normalized due to the problems with multiple middle or last names\&. Author names in the inverse order, i\&.e\&. something like Last, First Middle, are normalized correctly in most cases\&. Handling of non\-European names is a matter of trial and error\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} Character set handling is somewhat limited\&. Only the unaltered input character encoding or UTF\-8 are available for the output data\&. .RE .PP That said, there is still some hope\&. The \fB\-m\fR command line option switches on additional MARC output\&. That is, the generated output will contain interspersed lines that show the contents of the original MARC fields used to generate the following RIS line or lines\&. For example, the following output snippet shows how \fBmarc2ris\fR generated the author lines from the MARC input: .sp .if n \{\ .RS 4 .\} .nf empty author field (100) :Author(Ind1): 1 :Author($a): Ershov, A\&. P\&. :Author($b): :Author($c): :Author(Ind1): 1 :Author($a): Knuth, Donald Ervin, :Author($b): :Author($c): AU \- Ershov,A\&.P\&. AU \- Knuth,Donald Ervin .fi .if n \{\ .RE .\} .PP If you feel marc2ris does not translate your data appropriately, the easiest way might be to use the \fB\-m\fR switch and redirect the output into a file\&. Then you can analyze the situation and fix the RIS lines as you see fit\&. Finally you can strip the MARC lines off with a command like: .sp .if n \{\ .RS 4 .\} .nf ~$ grep \-v "" < withmarc\&.ris > womarc\&.ris .fi .if n \{\ .RE .\} .SH "FILES" .PP PREFIX/etc/refdb/marc2risrc .RS 4 The global configuration file of marc2ris\&. .RE .PP $HOME/\&.marc2risrc .RS 4 The user configuration file of marc2ris\&. .RE .SH "SEE ALSO" .PP \fBRefDB\fR (7), \fBbib2ris\fR (1), \fBdb2ris\fR (1), \fBen2ris\fR (1), \fBmed2ris\fR (1)\&. .PP \fIRefDB manual (local copy) \fR PREFIX/share/doc/refdb\-/refdb\-manual/index\&.html .PP \fIRefDB manual (web) \fR <\m[blue]\fBhttp://refdb\&.sourceforge\&.net/manual/index\&.html\fR\m[]> .PP \fIRefDB on the web \fR <\m[blue]\fBhttp://refdb\&.sourceforge\&.net/\fR\m[]> .SH "AUTHOR" .PP marc2ris was written by Markus Hoenicka \&. .SH "NOTES" .IP " 1." 4 Library of Congress MARC pages .RS 4 \%http://www.loc.gov/marc/ .RE .IP " 2." 4 here .RS 4 \%http://www.ifla.org/VI/3/p1996-1/sec-uni.htm .RE .IP " 3." 4 PDF document .RS 4 \%[set $man.base.url.for.relative.links]/www.bl.uk/services/bibliographic/marcchange.pdf .RE