.\" Hey, EMACS: -*- nroff -*- .\" First parameter, NAME, should be all caps .\" Second parameter, SECTION, should be 1-8, maybe w/ subsection .\" other parameters are allowed: see man(7), man(1) .TH DOCX2TXT 1 "February 25, 2012" .\" Please adjust this date whenever revising the manpage. .\" .\" Some roff macros, for reference: .\" .nh disable hyphenation .\" .hy enable hyphenation .\" .ad l left justify .\" .ad b justify to both left and right margins .\" .nf disable filling .\" .fi enable filling .\" .br insert line break .\" .sp insert n+1 empty lines .\" for manpage-specific macros, see man(7) .SH NAME docx2txt \- convert Microsoft OOXML files to plain text. .SH SYNOPSIS .B docx2txt .RI "[ infile.docx|-|-h ] [ outfile.txt|- ]" .br .B docx2txt .RI "< infile.docx" .br .B docx2txt .RI "< infile.docx > outfile.txt" .SH DESCRIPTION This manual page documents briefly the .B docx2txt commands. .PP .\" TeX users may be more comfortable with the \fB\fP and .\" \fI\fP escape sequences to invode bold face and italics, .\" respectively. \fBdocx2txt\fP docx2txt is a tool that attempts to generate equivalent plain text files from Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ascii or utf-8) text experience. It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to a fair extent. It can very conveniently be used to build a Web-based docx document conversion service. With unzippers like CakeCmd that can deal with corrupt Zip archives, this tool can extract text from corrupt docx documents in many cases, where MS Word fails to even open them. .SH OPTIONS .TP .B \-h As the first argument to get this usage information. .TP .B \- As the infile name to read the docx file from STDIN. .TP .B \- As the outfile name to dump the text on STDOUT. .br Output is saved in infile.txt if second argument is omitted. .br .SH AUTHOR docx2txt was written by Sandeep Kumar . .PP This manual page was written by Khalid El Fathi , for the Debian project (and may be used by others).