.\" hey, Emacs: -*- nroff -*- .\" This program is free software; you can redistribute it and/or modify .\" it under the terms of the GNU General Public License as published by .\" the Free Software Foundation; either version 2 of the License, or .\" (at your option) any later version. .\" .\" This program is distributed in the hope that it will be useful, .\" but WITHOUT ANY WARRANTY; without even the implied warranty of .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the .\" GNU General Public License for more details. .\" .\" You should have received a copy of the GNU General Public License .\" along with this program; if not, write to the Free Software .\" Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA .\" .\" Please update the above date whenever this man page is modified. .\" .\" Some roff macros, for reference: .\" .nh disable hyphenation .\" .hy enable hyphenation .\" .ad l left justify .\" .ad b justify to both left and right margins (default) .\" .nf disable filling .\" .fi enable filling .\" .br insert line break .\" .sp insert n+1 empty lines .\" for manpage-specific macros, see man(7) .TH "FFE" "1" "2011-04-06" "Timo Savinen" "" .SH "NAME" ffe \- flat file extractor .SH "SYNOPSIS" .B ffe .RI [ options ]... .SH "DESCRIPTION" \fBffe\fP is a program for extracting fields from flat file records and displaying them in different formats. \fBffe\fP relies on the configuration file to control input file structure and the output format. .SH "OPTIONS" \fBffe\fP accepts the following options: .TP .BR \-c ", " \-\-configuration=\fIfile\fP Read the configuration from \fIfile\fP, default is ~/.fferc. .TP .BR \-s ", " \-\-structure=\fISTRUCTURE\fR Input file is processed using the structure \fISTRUCTURE\fR. .TP .BR \-p ", " \-\-print=\fIFORMAT\fR Use output format \fIFORMAT\fR for printing. All printing can be suppressed using format \fBno\fR. Original data is printed using format \fBraw\fR. .TP .BR \-o ", " \-\-output=\fINAME\fP Write output to \fINAME\fP instead of standard output. .TP .BR \-f ", " \-\-field\-list=\fILIST\fP Print only fields and constants specified in comma separated list \fILIST\fP. .TP .BR \-e ", " \-\-expression=\fIEXPRESSION\fR Print only those records for which the \fIEXPRESSION\fR evaluates to true. .TP .BR \-a ", " \-\-and Expressions are combined with logical and, default is logical or. .TP .BR \-X ", " \-\-casecmp Expressions are evaluated case insensitive. .TP .BR \-v ", " \-\-invert\-match Print only those records which don't match the expression. .TP .BR \-l ", " \-\-loose An invalid input line does not cause program to abort. .TP .BR \-r ", " \-\-replace=\fIFIELD\fR=\fIVALUE\fR Replace \fIFIELD\fRs contents with \fIVALUE\fR in output. \fIVALUE\fR can contain same directives as output option \fBdata\fR. .TP .BR \-d ", " \-\-debug All invalid input lines are written to file ffe_error_.log. .TP .BR \-I ", " \-\-info Show the structure information in configuration file and exit. .TP .BR \-? ", " \-\-help List all available options and their meanings and exit. .TP .BR \-V ", " \-\-version Show version of program and exit. .PP All remaining arguments are names of input files; if no input files are specified, then the standard input is read. .SS Expressions (option \-e, \-\-expression) Expression can be used to select specific records comparing field values. If the \fIvalue\fR starts with string "file:" then the rest of the \fIvalue\fR is considered as a file name. Every line in the file is used as value in comparison. Record will be selected if one or more values evaluates true. Expression notation: .TP .BR \fIfield\fR\fB=\fR\fIvalue\fR A record will be selected if the field \fIfield\fR is equal to the value \fIvalue\fR. .TP .BR \fIfield\fR\fB^\fR\fIvalue\fR A record will be selected if the field \fIfield\fR starts with the value \fIvalue\fR. .TP .BR \fIfield\fR\fB~\fR\fIvalue\fR A record will be selected if the field \fIfield\fR contains the value \fIvalue\fR. .TP .BR \fIfield\fR\fB!\fR\fIvalue\fR A record will be selected if the field \fIfield\fR is not equal to the value \fIvalue\fR. .TP .BR \fIfield\fR\fB?\fR\fIvalue\fR A record will be selected if the field \fIfield\fR matches the regular expression in \fIvalue\fR. .SH "FFE CONFIGURATION" \fBffe\fR uses the configuration file for extracting fields from the input file and for formatting the fields for output. Every line or binary block of the input file is considered as a record. Default configuration file is \fB~/.fferc\fR but another file can be given with '\-c' option. .PP Configuration file for \fBffe\fR is a text file. The file may contain empty lines. Commands are case\-sensitive. Comments begin with the #\-character and end at the end of the line. The \fIstring\fR and \fIchar\fR definitions can be enclosed in double quotation '"' characters. \fIchar\fR is a single character. \fIstring\fR and \fIchar\fR can contain following escape codes: '\ea','\eb','\et','\en','\ev','\ef', '\er', '\e"' and '\e#'. Character '\e' can be escaped as '\e\e'. Command Substitution allows the output of a command to replace parts of the configuration file. Syntax for command substitution is: .br `\fIcommand\fR` .br The \fIcommand\fR is executed and the `\fIcommand\fR` is substituted with the standard output of the command, with any trailing newlines deleted. Command substitutions may not be nested. Before executing the \fIcommand\fR ffe sets few environment variables: .TP \fBFFE_STRUCTURE\fR The name of the structure given using \-s,\-\-structure. .TP \fBFFE_OUIPUT\fR The name of the output file given using \-o,\-\-output. .TP \fBFFE_FORMAT\fR The name of the output format given using \-p,\-\-print. .TP \fBFFE_FIRST_FILE\fR The name of the first input file. .TP \fBFFE_FILES\fR A list of all input files. .PP If variable is already set it will not be replaced. .SS Input file structure .PP Input file structures are specified with keyword \fBstructure\fR: .PP \fBstructure\fR \fIname\fR {options...} .PP Options must be ended with newline, options are: .PP .TP \fBtype\fR \fBfixed\fR|\fBbinary\fR|\fBseparated\fR [\fIchar\fR] [\fB*\fR] Fields in the input are fixed length text fields, fixed length binary fields or text fields separated by \fIchar\fR. If * is given, multiple sequential separators are considered as one. Default separator is comma. .TP \fBquoted\fR [\fIchar\fR] Fields may be quoted with \fIchar\fR, default quotation mark is double quotation mark '"'. A quotation mark is assumed to be escaped as \e\fIchar\fR or doubling the mark as \fIchar\fR\fIchar\fR in input. Non escaped quotation marks are not preserved in output. .TP \fBheader\fR \fBfirst\fR|\fBall\fR|\fBno\fR Controls the occurrence of the header line. Default is no. If set as first or all, the first line of the first input file is considered as header line containing the names of the fields. First means that only the first file has a header, all means that all files have a header, although the names are still taken from the header of the first file. Header line is handled according the record definition, meaning that the name positions, separators etc. are the same as for the fields. .TP \fBoutput\fR \fIname\fR All records belonging this structure are printed according output format \fIname\fR. Default is to use output named as 'default'. .TP \fBrecord\fR \fIname\fR {options...} Defines one record for a structure. A structure can contain several record types. .SS Record options: .PP .TP \fBid\fR \fIposition\fR \fIstring\fR .TP \fBrid\fR \fIposition\fR \fIregexp\fR Identifies a record in the input file. Records are identified by the \fIstring\fR or by the regular expression in \fIregexp\fR in input record position \fIposition\fR. For fixed length and binary input the \fIposition\fR is the byte position of the input record and for separated input the \fIposition\fR means the \fIposition\fR'th field of the input record. Positions start from one. \fBId\fR's are required only if input structure contains several record types with equal lengths or field counts. Non printable characters can be escaped as \fB\exnn\fR where \fBnn\fR is the hexadecimal value of the character. A record definition can contain several \fBid\fR's, then all \fBid\fR'd must match the input line (\fBid\fR's are combined with logical and). In a multi\-record binary structure every record must have at least one \fBid\fR. .TP \fBfield\fR \fIname\fR|\fBFILLER\fR|\fB*\fR [\fIlength\fR]|\fB*\fR [\fIlookup\fR]|\fB*\fR [\fIoutput\fR] Specifies one field in a text input structure. \fIlength\fR is mandatory for fixed length input structure except for the last field. If the last field of a fixed length input structure has a \fB*\fR in place of \fIlength\fR then the last field can have arbitrary length. Length is also used for printing fields in fixed length format using the \fB%D\fR or \fB%D\fR directive. The order of fields in configuration file is essential, it specifies the field order in a record. If '*' is given instead of the name, then the 'name' will be the ordinal number of the field, or if the 'header' option has value 'first' or 'all', then the name of the field will taken from the header line (first line of the input). If \fIlookup\fR is given then the fields contents is used to make a lookup in lookup table \fIlookup\fR. If length is not needed (separated format) but lookup is needed, use asterisk (*) in place of length definition. If \fIoutput\fR is given field is printed using output \fIoutput\fR. Use asterisk in place of lookup if lookup is not needed. Naming the field as FILLER causes field not to be printed in output. .TP \fBfield\fR \fIname\fR|\fBFILLER\fR|\fB*\fR [\fIlength\fR]|\fItype\fR [\fIlookup\fR]|\fB*\fR [\fIoutput\fR] Specifies one field in a binary input structure. All other features are same as for the text structure except the \fItype\fR parameter. \fItype\fR specifies field data type and length and can have the following values: .IP \fBchar\fR Printable character. .IP \fBshort\fR Short integer having current system length and byte order. .IP \fBint\fR Integer having current system length and byte order. .IP \fBlong\fR Long integer having current system length and byte order. .IP \fBllong\fR Long long integer having current system length and byte order. .IP \fBushort\fR Unsigned short integer having current system length and byte order. .IP \fBuint\fR Unsigned integer having current system length and byte order. .IP \fBulong\fR Unsigned long integer having current system length and byte order. .IP \fBullong\fR Unsigned long long integer having current system length and byte order. .IP \fBint8\fR 8 bit integer. .IP \fBint16_be\fR Big endian 16 bit integer. .IP \fBint32_be\fR Big endian 32 bit integer. .IP \fBint64_be\fR Big endian 64 bit integer. .IP \fBint16_le\fR Little endian 16 bit integer. .IP \fBint32_le\fR Little endian 32 bit integer. .IP \fBint64_le\fR Little endian 64 bit integer. .IP \fBuint8\fR Unsigned 8 bit integer. .IP \fBuint16_be\fR Unsigned big endian 16 bit integer. .IP \fBuint32_be\fR Unsigned big endian 32 bit integer. .IP \fBuint64_be\fR Unsigned big endian 64 bit integer. .IP \fBuint16_le\fR Unsigned little endian 16 bit integer. .IP \fBuint32_le\fR Unsigned little endian 32 bit integer. .IP \fBuint64_le\fR Unsigned little endian 64 bit integer. .IP \fBfloat\fR Float having current system length and byte order. .IP \fBfloat_be\fR Float having current system length and big endian byte order. .IP \fBfloat_le\fR Float having current system length and little endian byte order. .IP \fBdouble\fR Double having current system length and byte order. .IP \fBdouble_be\fR Double having current system length and big endian byte order. .IP \fBdouble_le\fR Double having current system length and little endian byte order. .IP \fBbcd_be_\fIlen\fR\fR Bcd number having length \fIlen\fR and nybbles in big endian order. .IP \fBbcd_le_\fIlen\fR\fR Bcd number having length \fIlen\fR and nybbles in little endian order. .IP \fBhex_be_\fIlen\fR\fR Hexadecimal data in big endian order having length \fIlen\fR. .IP \fB \fBhex_le_\fIlen\fR\fR Hexadecimal data in little endian order having length \fIlen\fR. If \fIlength\fR is given instead of the \fItype\fR, then the field is assumed to be a printable string having length \fIlength\fR. String is printed until \fIlength\fR characters are printed or NULL character is found. Bcd number (\fBbcd_be_\fIlen\fR\fR and \fBbcd_le_\fIlen\fR\fR) is printed until \fIlen\fR bytes are read or a nybble having hexadecimal value \fBf\fR is found. Bcd number having big endian order is printed in order: most significant nybble first and least significant nybble second and bcd number having little endian order is printed in order: least significant nybble first and most significant nybble second. Bytes are always read in big endian order. Hexadecimal data (\fBhex_be_\fIlen\fR\fR and \fBhex_le_\fIlen\fR\fR) is printed as hexadecimal values. Big endian data is printed starting from the lower address and little endian data starting from the upper address. .TP \fBfield\-count\fR \fInumber\fR Same effect as having \fBfield *\fR \fInumber\fR times. Because length is not specified, this works only with separated structure. .TP \fBfields\-from\fR \fIrecord\fR Fields for this record are the same as for record \fIrecord\fR. .TP \fBoutput\fR \fIname\fR This record is printed according output format \fIname\fR. Default is to use output format specified in the structure. .TP \fBlevel\fR \fInumber\fR [\fIelement_name\fR|*] [\fIgroup_name\fR] Level can be used if the contents of a file should be printed as hierarchical multi\-level nested form document. Use * instead of the element name if it is not needed. number is the level of the record, starting from number one (highest level), \fIelement_name\fR is the name for the record, \fIgroup_name\fR is used to group records in the same and lower levels. Only \fInumber\fR is mandatory parameter. .TP \fBrecord\-length\fR \fBstrict\fR|\fBminimum\fR .IP \fBstrict\fR Input record length and field count must match the record definition in order to get it processed. This is default value. .IP \fBminimum\fR Input record length and field count can be the same or longer as defined for the record. The rest of the input line is ignored. .SS Output definitions .PP There can be several output definitions in the configuration file. Format can be selected with '\-p' option. Default format is named as 'default'. .TP \fBoutput\fR \fIname\fR|\fBdefault\fR {options...} Defines one output format. Output named as 'default' will be used if none is given for structure or record, or none is given with option '\-p'. There is two predefined output formats \fBno\fR and \fBraw\fR. \fBno\fR suppresses all printing and \fBraw\fR prints the original input data. .SS Output options .PP Pictures in output definition can contain printf\-style %\-directives: .LP .TP \fB%f\fR Name of the input file. .TP \fB%s\fR Name of the current structure. .TP \fB%r\fR Name of the current record. .TP \fB%o\fR Input record number in current file. .TP \fB%O\fR Input record number starting from the first file. .TP \fB%i\fR Byte offset of the current record in the current file. Starts from zero. .TP \fB%I\fR Byte offset of the current record starting from the first file. Starts from zero. .TP \fB%n\fR Field name. .TP \fB%t\fR Field contents, without leading and trailing whitespaces. .TP \fB%d\fR Field contents. Binary integer is printed as a decimal value. Floating point number is printed in the style \fB[\-]ddd.ddd\fR, where the number of digits after the decimal\-point character is 6. Bcd number is printed as a decimal number and hexadecimal data as consecutive hexadecimal values. .TP \fB%D\fR Field contents, right padded to the field length (requires length definition for the field). .TP \fB%C\fR Field contents, right padded to the field length (requires length definition for the field). Output field is cut if input field is longer that field length. .TP \fB%x\fR Unsigned hexadecimal value of a binary integer. Other fields are printed using directive \fB%d\fR. .TP \fB%l\fR Value from lookup. .TP \fB%L\fR Value from lookup, right padded to the field length (requires length definition for the field). .TP \fB%e\fR Does not print anything, causes still the "field empty" check to be performed. Can be used when only the names of non\-empty fields should be printed. .TP \fB%p\fR Fields start position in a record. For fixed structure this is field's byte position in the input line and for separated structure this is the ordinal number of the field. Starts from one. .TP \fB%h\fR Hexadecimal dump of a field. Byte values are printed as consecutive \fBxnn\fR values, where the \fBnn\fR is the hexadecimal value of a byte. Data is printed before any endian conversion. .TP \fB%g\fR Group name given by the keyword \fBgroup_name\fR in record definition. .TP \fB%m\fR Element name given by the keyword \fBelement_name\fR in record definition. .TP \fB%%\fR Percent sign. .TP \fBfile_header\fR \fIpicture\fR Picture is printed once before file contents. .TP \fBfile_trailer\fR \fIpicture\fR Picture is printed once after file contents. .TP \fBheader\fR \fIpicture\fR If specified, then the header line describing the field names is printed before records. Every field name is printed according the \fIpicture\fR using the same separator and fields length as defined for the fields. \fIPicture\fR can contain only \fB%n\fR directive. .TP \fBdata\fR \fIpicture\fR Field contents is printed according \fIpicture\fR. .TP \fBlookup\fR \fIpicture\fR If field is mapped to lookup table, this picture will be used instead of picture from \fBdata\fR option. If not given, then picture from \fBdata\fR will be used. .TP \fBseparator\fR \fIstring\fR All fields are terminated by \fIstring\fR, except the last field of the record. Default is not to print separator. .TP \fBrecord_header\fR \fIpicture\fR \fIpicture\fR is printed before the record content. Default is not to print header. .TP \fBrecord_trailer\fR \fIpicture\fR \fIpicture\fR is printed after the record content. Default is newline. .TP \fBjustify\fR \fBleft\fR|\fBright\fR|\fIchar\fR Fields are left or right justified. \fIchar\fR justifies output according the first occurrence of \fIchar\fR in the data picture. Default is left. .TP \fBindent\fR \fIstring\fR Record contents is intended by \fIstring\fR. Field contents is intended by two times the \fIstring\fR. Default is not to indent. .TP \fBfield\-list\fR \fIname1\fR,\fIname2\fR,... Only fields or constants named as \fIname1\fR,\fIname2\fR,... are printed, same effect as has '\-f' option. Default is to print all the fields. Fields are also printed in the same order as they are listed. .TP \fBno\-data\-print\fR \fByes\fR|\fBno\fR When set as no and \fBfield\-list\fR is given, suppresses printing of \fBrecord_header\fR and \fBrecord_trailer\fR in case where current record contains none of the fields specified in \fBfield\-list\fR. .TP \fBfield\-empty\-print\fR \fByes\fR|\fBno\fR When set as no, nothing is printed for fields which consist entirely of characters from \fBempty\-chars\fR. If none of the fields of a record are printed then the printing of \fBrecord_trailer\fR is also suppressed. Default is yes. .TP \fBempty\-chars\fR \fIstring\fR \fIstring\fR specifies a set of characters which define an "empty" field. Default is " \ef\en\er\et\ev" (space, form\-feed, newline, carriage return, horizontal tab and vertical tab). .TP \fBoutput\-file\fR \fIfile\fR Output is written to \fIfile\fR instead of the default output. If \- is given the standard output is used. .TP \fBgroup_header\fR \fIstring\fR If a record has a level and group name defined, \fIstring\fR is printed before the first record in the same group or if the group name has changed in the same level .TP \fBgroup_trailer\fR \fIstring\fR If a record has a level and group name defined, \fIstring\fR is printed after the records in lower levels or if the group name has changed in the same level or if a higher level record is found. .TP \fBelement_header\fR \fIstring\fR If record has a level and header name defined, \fIstring\fR is printed before the records contents. .TP \fBelement_header\fR \fIstring\fR If record has a level and header name defined, \fIstring\fR is printed after the records contents. .TP \fBhex\-caps\fR \fByes\fR|\fBno\fR Print hexadecimal numbers in capital letters. Default is no. .SS Lookup definitions .TP \fBlookup\fR \fIname\fR {options...} Defines one lookup table. .SS Lookup options: .TP \fBsearch\fR \fBexact\fR|\fBlongest\fR The search type for lookup table. .TP \fBdefault\-value\fR \fIvalue\fR \fIvalue\fR is printed if the lookup is not successful. .TP \fBpair\fR \fIkey\fR \fIvalue\fR One key/value pair for the lookup table. .TP \fBfile\fR \fIname\fR [\fIseparator\fR] Key/value pairs are read from file \fIname\fR. Every line is considered as a key/value pair separated by \fIseparator\fR. Default separator is semicolon. .SS Constants Additional to input fields constants values can be printed using option \fB\-f\fR,\fB\-\-field\-list\fR or output option \fBfield\-list\fR. Constant will be printed using \fBdata\fR output option. Constants are specified as .TP \fBconst\fR \fIname\fR \fIvalue\fR when the \fIname\fR appears in a field list, \fIvalue\fR will be printed for every record as the \fIname\fR were one of the input fields. .SS Input Preprocessor It is possible to define an input preprosessor for \fBffe\fR. An input preprocessor is simply an executable program which writes the contents of the input file to standard output which will be read by \fBffe\fR. If the input preprosessor does not write any characters on its standard output, then \fBffe\fR uses the original file. To set up an input preprocessor, set the \fBFFEOPEN\fR environment variable to a command line which will invoke your input preprocessor. This command line should include one occurrence of the string \fB%s\fR, which will be replaced by the input filename when the input preprocessor command is invoked. The input preprocessor is not used if \fBffe\fR is reading standard input. .SH "EXAMPLES" Example of fixed length flat file containing fields 'FirstName','LastName' and 'Age': .br John Ripper 23 .br Scott Tiger 45 .br Mary Moore 41 This file can be printed in XML with the following configuration: structure personnel { .br type fixed .br output XML .br record person { .br field FirstName 9 .br field LastName 13 .br field Age 2 .br } .br } .br output XML { .br file_header "\en" .br data "<%n>%d\en" .br record_header "<%r>\en" .br record_trailer "\en" .br indent " " .br } .SH "SEE ALSO" .LP More examples in Texinfo manual. If the \fBinfo\fR and \fBffe\fR are properly installed, the command .sp 3 \fBinfo\fR \fBffe\fR .sp 3 should give more information. .SH "AUTHOR" Timo Savinen