.\" Automatically generated by Pod::Man 2.28 (Pod::Simple 3.28) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{ . if \nF \{ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "HTML::Truncate 3pm" .TH HTML::Truncate 3pm "2009-07-14" "perl v5.20.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" HTML::Truncate \- (beta software) truncate HTML by percentage or character count while preserving well\-formedness. .SH "VERSION" .IX Header "VERSION" 0.20 .SH "ABSTRACT" .IX Header "ABSTRACT" When working with text it is common to want to truncate strings to make them fit a desired context. E.g., you might have a menu that is only 100px wide and prefer text doesn't wrap so you'd truncate it around 15\-30 characters, depending on preference and typeface size. This is trivial with plain text using substr but with \s-1HTML\s0 it is somewhat difficult because whitespace has fluid significance and open tags that are not properly closed destroy well-formedness and can wreck an entire layout. .PP HTML::Truncate attempts to account for those two problems by padding truncation for spacing and entities and closing any tags that remain open at the point of truncation. .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 2 \& use strict; \& use HTML::Truncate; \& \& my $html = \*(Aq
We have to test something.
\*(Aq; \& my $readmore = \*(Aq... [readmore]\*(Aq; \& \& my $html_truncate = HTML::Truncate\->new(); \& $html_truncate\->chars(20); \& $html_truncate\->ellipsis($readmore); \& print $html_truncate\->truncate($html); \& \& # or \& \& use Encode; \& my $ht = HTML::Truncate\->new( utf8_mode => 1, \& chars => 1_000, \& ); \& print Encode::encode_utf8( $ht\->truncate($html) ); .Ve .SH "XHTML" .IX Header "XHTML" This module is designed to work with XHTML-style nested tags. More below. .SH "WHITESPACE AND ENTITIES" .IX Header "WHITESPACE AND ENTITIES" Repeated natural whitespace (i.e., \*(L"\es+\*(R" and not \*(L" \*(R") in \s-1HTML\s0 \&\*(-- with rare exception (pre tags or user defined styles) \*(-- is not meaningful. Therefore it is normalized when truncating. Entities are also normalized. The following is only counted 14 chars long. .PP .Vb 2 \& \en\enthis is ‘text’\en\en
\& ^^^^^^^12345\-\-\-\-678\-\-9\-\-\-\-\-\-01234\-\-\-\-\-\-^^^^^^^^ .Ve .SH "METHODS" .IX Header "METHODS" .IP "\fBnew\fR" 4 .IX Item "new" Can take all the methods as hash style args. \*(L"percent\*(R" and \*(L"chars\*(R" are incompatible so don't use them both. Whichever is set most recently will erase the other. .Sp .Vb 3 \& my $ht = HTML::Truncate\->new(utf8_mode => 1, \& chars => 500, # default is 100 \& ); .Ve .IP "\fButf8_mode\fR" 4 .IX Item "utf8_mode" Set/get, true/false. If \f(CW\*(C`utf8_mode\*(C'\fR is set, \f(CWutf8_mode(1)\fR is also set in the underlying HTML::Parser, entities will be transformed with decode and the default ellipsis will be a literal ellipsis and not the default of \f(CW\*(C`…\*(C'\fR. .IP "\fBchars\fR" 4 .IX Item "chars" Set/get. The number of characters remaining after truncation, \&\fBexcluding\fR the \*(L"ellipsis\*(R". .Sp Entities are counted as single characters. E.g., \f(CW\*(C`©\*(C'\fR is one character for truncation counts. .Sp Default is \*(L"100.\*(R" Side-effect: clears any \*(L"percent\*(R" that has been set. .IP "\fBpercent\fR" 4 .IX Item "percent" Set/get. A percentage to keep while truncating the rest. For a document of 1,000 chars, percent('15%') and chars(150) would be equivalent. The actual amount of character that the percent represents cannot be known until the given \s-1HTML\s0 is parsed. .Sp Side-effect: clears any \*(L"chars\*(R" that has been set. .IP "\fBellipsis\fR" 4 .IX Item "ellipsis" Set/get. Ellipsis in this case means \-\- .Sp .Vb 3 \& The omission of a word or phrase necessary for a complete \& syntactical construction but not necessary for understanding. \& http://www.answers.com/topic/ellipsis .Ve .Sp What it will probably mean in most real applications is \*(L"read more.\*(R" The default is \f(CW\*(C`…\*(C'\fR which if the utf8 flag is true will render as a literal ellipsis, \f(CW\*(C`chr(8230)\*(C'\fR. .Sp The reason the default is \f(CW\*(C`…\*(C'\fR and not \*(L"...\*(R" is this is meant for use in \s-1HTML\s0 environments, not plain text, and \*(L"...\*(R" (dot-dot-dot) is not typographically correct or equivalent to a real horizontal ellipsis character. .IP "\fBtruncate\fR" 4 .IX Item "truncate" It returns the truncated \s-1XHTML\s0 if asked for a return value. .Sp .Vb 1 \& my $truncated = $ht\->truncate($html); .Ve .Sp It will truncate the string in place if no return value is expected (wantarray is not defined). .Sp .Vb 2 \& $ht\->truncate($html); \& print $html; .Ve .Sp Also can be called with inline arguments\- .Sp .Vb 3 \& print $ht\->truncate( $html, \& $chars_or_percent, \& $ellipsis ); .Ve .Sp No arguments are strictly required. Without \s-1HTML\s0 to operate upon it returns undef. The two optional arguments may be preset with the methods \*(L"chars\*(R" (or \*(L"percent\*(R") and \*(L"ellipsis\*(R". .Sp Valid nesting of tags is required (alla \s-1XHTML\s0). Therefore some old \&\s-1HTML\s0 habits likewithout a
are not supported and may cause a fatal error. See \*(L"repair\*(R" for help with badly formed \&\s-1HTML.\s0 .Sp Certain tags are omitted by default from the truncated output. .RS 4 .IP "\(bu" 4 Skipped tags .Sp These will not be included in truncated output by default. .Sp .Vb 3 \& ... \&