.\" Automatically generated by Pod::Man 4.10 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "HOOLA 1" .TH HOOLA 1 "2018-12-25" "EN Tools" "EN Tools" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" htmlstrip \- Strip HTML markup code .SH "SYNOPSIS" .IX Header "SYNOPSIS" \&\fBhtmlstrip\fR [\fB\-o\fR \fIoutputfile\fR] [\fB\-O\fR \fIlevel\fR] [\fB\-b\fR \fIblocksize\fR] [\fB\-v\fR] [\fIinputfile\fR] .SH "DESCRIPTION" .IX Header "DESCRIPTION" HTMLstrip reads \fIinputfile\fR or from \f(CW\*(C`stdin\*(C'\fR and strips the contained \s-1HTML\s0 markup. Use this program to shrink and compactify your \s-1HTML\s0 files in a safe way. .SS "Recognized Content Types" .IX Subsection "Recognized Content Types" There are three disjunct types of content which are recognized by HTMLstrip while parsing: .IP "\s-1HTML\s0 Tag (tag)" 4 .IX Item "HTML Tag (tag)" This is just a single \s-1HTML\s0 tag, i.e. a string beginning with a opening angle bracket directly followed by an identifier, optionally followed by attributes and ending with a closing angle bracket. .IP "Preformatted (pre)" 4 .IX Item "Preformatted (pre)" This is any contents enclosed in one of the following container tags: .Sp .Vb 3 \& 1. \& 2.
\&  3. 
.Ve
.Sp
The non\-HTML\-3.2\-conforming \f(CW\*(C`<nostrip>\*(C'\fR tag is special here: It acts
like \f(CW\*(C`<pre>\*(C'\fR as a protection container for HTMLstrip but is also
stripped from the output.  Use this as a pseudo-block which just preserves its
body for the HTMLstrip processing but itself is removed from the output.
.IP "Plain Text (txt)" 4
.IX Item "Plain Text (txt)"
This is anything not falling into one of the two other categories, i.e any
content both outside of preformatted areas and outside of \s-1HTML\s0 tags.
.SS "Supported Stripping Levels"
.IX Subsection "Supported Stripping Levels"
The amount of stripping can be controlled by a optimization level, specified
via option \fB\-O\fR (see below). Higher levels also include all of the lower
levels. The following stripping is done on each level:
.IP "\fBLevel 0:\fR" 4
.IX Item "Level 0:"
No real stripping, just removing the sharp/comment\-lines (\f(CW\*(C`#...\*(C'\fR) [txt,tag].
Such lines are a standard feature of \s-1WML,\s0 so this is always done.
.IP "\fBLevel 1:\fR" 4
.IX Item "Level 1:"
Minimal stripping: Same as level 0 plus stripping of blank and empty lines
[txt].
.IP "\fBLevel 2:\fR" 4
.IX Item "Level 2:"
Good stripping: Same as level 1 plus compression of multiple whitespaces (more
then one in sequence) to single whitespaces [txt,tag] and stripping of
trailing whitespaces at the of of a line [txt,tag,pre].
.Sp
\&\fBThis level is the default\fR because while providing good optimization the
\&\s-1HTML\s0 markup is not destroyed and remains human readable.
.IP "\fBLevel 3:\fR" 4
.IX Item "Level 3:"
Best stripping: Same as level 2 plus stripping of leading whitespaces on a
line [txt]. This can also be recommended when you still want to make sure that
the \s-1HTML\s0 markup is not destroyed in any case. But the resulting code is a
little bit ugly because of the removed whitespaces.
.IP "\fBLevel 4:\fR" 4
.IX Item "Level 4:"
Expert stripping:  Same as level 3 plus stripping of \s-1HTML\s0 comment lines
(``\f(CW\*(C`<!\-\- ... \-\->\*(C'\fR'') and crunching of \s-1HTML\s0 tag endsi [tag]. \fB\s-1BE
CAREFUL HERE:\s0\fR Comment lines are widely used for hiding some Java or
JavaScript code for browsers which are not capable of ignoring those stuff.
When using this optimization level make sure all your JavaScript code is hided
correctly by adding HTMLstrip's \f(CW\*(C`<nostrip>\*(C'\fR tags around the comment
delimiters.
.IP "\fBLevel 5:\fR" 4
.IX Item "Level 5:"
Crazy stripping: Same as level 4 plus wrapping lines around to fit in an 80
column view window. This saves some newlines but both leads to really
unreadable markup code and opens the window for a lot of problems when this
code is used to layout the page in a browser. \fBUse with care. This is only
experimental!\fR
.PP
Additionally the following global strippings are done:
.ie n .IP """^\en"":" 4
.el .IP "\f(CW^\en\fR:" 4
.IX Item "^n:"
A leading newline is always stripped.
.ie n .IP """<suck>"":" 4
.el .IP "\f(CW<suck>\fR:" 4
.IX Item "<suck>:"
The \f(CW\*(C`<suck>\*(C'\fR tag just absorbs itself and all whitespaces around it.
This is like the backslash for line-continuation, but is done in Pass 8, i.e.
really at the end. Use this inside \s-1HTML\s0 tag definitions to absorb whitespaces,
for instance around \f(CW%body\fR when used inside \f(CW\*(C`<table>\*(C'\fR structures
which at some point are newline-sensitive in Netscape Navigator.
.SH "OPTIONS"
.IX Header "OPTIONS"
.IP "\fB\-o\fR \fIoutputfile\fR" 4
.IX Item "-o outputfile"
This redirects the output to \fIoutputfile\fR. Usually the output will be send to
\&\f(CW\*(C`stdout\*(C'\fR if no such option is specified or \fIoutputfile\fR is "\f(CW\*(C`\-\*(C'\fR".
.IP "\fB\-O\fR \fIlevel\fR" 4
.IX Item "-O level"
This sets the optimization/stripping level, i.e. how much HTMLstrip should
compress the contents.
.IP "\fB\-b\fR \fIblocksize\fR" 4
.IX Item "-b blocksize"
For efficiency reasons, input is divided into blocks of 16384 chars.  If
you have some performance problems, you may try to change this value.
Any value between \f(CW1024\fR and \f(CW32766\fR is allowed.  With a value of
\&\f(CW0\fR, input is not divided into blocks.
.IP "\fB\-v\fR" 4
.IX Item "-v"
This sets verbose mode where some
processing information will be given on the console.
.SH "AUTHORS"
.IX Header "AUTHORS"
.Vb 3
\& Ralf S. Engelschall
\& rse@engelschall.com
\& www.engelschall.com
\&
\& Denis Barbier
\& barbier@engelschall.com
.Ve