.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "HTML::Microformats 3pm" .TH HTML::Microformats 3pm "2021-09-12" "perl v5.32.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" HTML::Microformats \- parse microformats in HTML .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use HTML::Microformats; \& \& my $doc = HTML::Microformats \& \->new_document($html, $uri) \& \->assume_profile(qw(hCard hCalendar)); \& print $doc\->json(pretty => 1); \& \& use RDF::TrineShortcuts qw(rdf_query); \& my $results = rdf_query($sparql, $doc\->model); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" The HTML::Microformats module is a wrapper for parser and handler modules of various individual microformats (each of those modules has a name like HTML::Microformats::Format::Foo). .PP The general pattern of usage is to create an HTML::Microformats object (which corresponds to an \s-1HTML\s0 document) using the \&\f(CW\*(C`new_document\*(C'\fR method; then ask for the data, as a Perl hashref, a \s-1JSON\s0 string, or an RDF::Trine model. .SS "Constructor" .IX Subsection "Constructor" .ie n .IP """$doc = HTML::Microformats\->new_document($html, $uri, %opts)""" 4 .el .IP "\f(CW$doc = HTML::Microformats\->new_document($html, $uri, %opts)\fR" 4 .IX Item "$doc = HTML::Microformats->new_document($html, $uri, %opts)" Constructs a document object. .Sp \&\f(CW$html\fR is the \s-1HTML\s0 or \s-1XHTML\s0 source (string) or an XML::LibXML::Document. .Sp \&\f(CW$uri\fR is the document \s-1URI,\s0 important for resolving relative \s-1URL\s0 references. .Sp \&\f(CW%opts\fR are additional parameters; currently only one option is defined: \&\f(CW$opts\fR{'type'} is set to 'text/html' or 'application/xhtml+xml', to control how \f(CW$html\fR is parsed. .SS "Profile Management" .IX Subsection "Profile Management" HTML::Microformats uses \s-1HTML\s0 profiles (i.e. the profile attribute on the \&\s-1HTML\s0 element) to detect which Microformats are used on a page. Any microformats which do not have a profile \s-1URI\s0 declared will not be parsed. .PP Because many pages fail to properly declare which profiles they use, there are various profile management methods to tell HTML::Microformats to assume the presence of particular profile URIs, even if they're actually missing. .ie n .IP """$doc\->profiles""" 4 .el .IP "\f(CW$doc\->profiles\fR" 4 .IX Item "$doc->profiles" This method returns a list of profile URIs declared by the document. .ie n .IP """$doc\->has_profile(@profiles)""" 4 .el .IP "\f(CW$doc\->has_profile(@profiles)\fR" 4 .IX Item "$doc->has_profile(@profiles)" This method returns true if and only if one or more of the profile URIs in \f(CW@profiles\fR is declared by the document. .ie n .IP """$doc\->add_profile(@profiles)""" 4 .el .IP "\f(CW$doc\->add_profile(@profiles)\fR" 4 .IX Item "$doc->add_profile(@profiles)" Using \f(CW\*(C`add_profile\*(C'\fR you can add one or more profile URIs, and they are treated as if they were found in the document. .Sp For example: .Sp .Vb 1 \& $doc\->add_profile(\*(Aqhttp://microformats.org/profile/rel\-tag\*(Aq) .Ve .Sp This is useful for adding profile URIs declared outside the document itself (e.g. in \s-1HTTP\s0 headers). .Sp Returns a reference to the document. .ie n .IP """$doc\->assume_profile(@microformats)""" 4 .el .IP "\f(CW$doc\->assume_profile(@microformats)\fR" 4 .IX Item "$doc->assume_profile(@microformats)" For example: .Sp .Vb 1 \& $doc\->assume_profile(qw(hCard adr geo)) .Ve .Sp This method acts similarly to \f(CW\*(C`add_profile\*(C'\fR but allows you to use names of microformats rather than URIs. .Sp Microformat names are case sensitive, and must match HTML::Microformats::Format::Foo module names. .Sp Returns a reference to the document. .ie n .IP """$doc\->assume_all_profiles""" 4 .el .IP "\f(CW$doc\->assume_all_profiles\fR" 4 .IX Item "$doc->assume_all_profiles" This method is equivalent to calling \f(CW\*(C`assume_profile\*(C'\fR for all known microformats. .Sp Returns a reference to the document. .SS "Parsing Microformats" .IX Subsection "Parsing Microformats" Generally speaking, you can skip this. The \f(CW\*(C`data\*(C'\fR, \f(CW\*(C`json\*(C'\fR and \&\f(CW\*(C`model\*(C'\fR methods will automatically do this for you. .ie n .IP """$doc\->parse_microformats""" 4 .el .IP "\f(CW$doc\->parse_microformats\fR" 4 .IX Item "$doc->parse_microformats" Scans through the document, finding microformat objects. .Sp On subsequent calls, does nothing (as everything is already parsed). .Sp Returns a reference to the document. .ie n .IP """$doc\->clear_microformats""" 4 .el .IP "\f(CW$doc\->clear_microformats\fR" 4 .IX Item "$doc->clear_microformats" Forgets information gleaned by \f(CW\*(C`parse_microformats\*(C'\fR and thus allows \&\f(CW\*(C`parse_microformats\*(C'\fR to be run again. This is useful if you've modified or added some profiles between runs of \f(CW\*(C`parse_microformats\*(C'\fR. .Sp Returns a reference to the document. .SS "Retrieving Data" .IX Subsection "Retrieving Data" These methods allow you to retrieve the document's data, and do things with it. .ie n .IP """$doc\->objects($format);""" 4 .el .IP "\f(CW$doc\->objects($format);\fR" 4 .IX Item "$doc->objects($format);" \&\f(CW$format\fR is, for example, 'hCard', 'adr' or 'RelTag'. .Sp Returns a list of objects of that type. (If called in scalar context, returns an arrayref.) .Sp Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevant documentation for details. .ie n .IP """$doc\->all_objects""" 4 .el .IP "\f(CW$doc\->all_objects\fR" 4 .IX Item "$doc->all_objects" Returns a hashref of data. Each hashref key is the name of a microformat (e.g. 'hCard', 'RelTag', etc), and the values are arrayrefs of objects. .Sp Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevant documentation for details. .ie n .IP """$doc\->json(%opts)""" 4 .el .IP "\f(CW$doc\->json(%opts)\fR" 4 .IX Item "$doc->json(%opts)" Returns data roughly equivalent to the \f(CW\*(C`all_objects\*(C'\fR method, but as a \s-1JSON\s0 string. .Sp \&\f(CW%opts\fR is a hash of options, suitable for passing to the \s-1JSON\s0 module's to_json function. The 'convert_blessed' and 'utf8' options are enabled by default, but can be disabled by explicitly setting them to 0, e.g. .Sp .Vb 1 \& print $doc\->json( pretty=>1, canonical=>1, utf8=>0 ); .Ve .ie n .IP """$doc\->model""" 4 .el .IP "\f(CW$doc\->model\fR" 4 .IX Item "$doc->model" Returns data as an RDF::Trine::Model, suitable for serialising as \&\s-1RDF\s0 or running \s-1SPARQL\s0 queries. .ie n .IP """$object\->serialise_model(as => $format)""" 4 .el .IP "\f(CW$object\->serialise_model(as => $format)\fR" 4 .IX Item "$object->serialise_model(as => $format)" As \f(CW\*(C`model\*(C'\fR but returns a string. .ie n .IP """$doc\->add_to_model($model)""" 4 .el .IP "\f(CW$doc\->add_to_model($model)\fR" 4 .IX Item "$doc->add_to_model($model)" Adds data to an existing RDF::Trine::Model. .Sp Returns a reference to the document. .SS "Utility Functions" .IX Subsection "Utility Functions" .ie n .IP """HTML::Microformats\->modules""" 4 .el .IP "\f(CWHTML::Microformats\->modules\fR" 4 .IX Item "HTML::Microformats->modules" Returns a list of Perl modules, each of which implements a specific microformat. .ie n .IP """HTML::Microformats\->formats""" 4 .el .IP "\f(CWHTML::Microformats\->formats\fR" 4 .IX Item "HTML::Microformats->formats" As per \f(CW\*(C`modules\*(C'\fR, but strips 'HTML::Microformats::Format::' off the module name, and sorts alphabetically. .SH "WHY ANOTHER MICROFORMATS MODULE?" .IX Header "WHY ANOTHER MICROFORMATS MODULE?" There already exist two microformats packages on \s-1CPAN\s0 (see Text::Microformat and Data::Microformat), so why create another? .PP Firstly, HTML::Microformats isn't being created from scratch. It's actually a fork/clean\-up of a non-CPAN application (Swignition), and in that sense predates Text::Microformat (though not Data::Microformat). .PP It has a number of other features that distinguish it from the existing packages: .IP "\(bu" 4 It supports more formats. .Sp HTML::Microformats supports hCard, hCalendar, rel-tag, geo, adr, rel-enclosure, rel-license, hReview, hResume, hRecipe, xFolk, \s-1XFN,\s0 hAtom, hNews and more. .IP "\(bu" 4 It supports more patterns. .Sp HTML::Microformats supports the include pattern, abbr pattern, table cell header pattern, value excerpting and other intricacies of microformat parsing better than the other modules on \s-1CPAN.\s0 .IP "\(bu" 4 It offers \s-1RDF\s0 support. .Sp One of the key features of HTML::Microformats is that it makes data available as RDF::Trine models. This allows your application to benefit from a rich, feature-laden Semantic Web toolkit. Data gleaned from microformats can be stored in a triple store; output in \s-1RDF/XML\s0 or Turtle; queried using the \s-1SPARQL\s0 or \s-1RDQL\s0 query languages; and more. .Sp If you're not comfortable using \s-1RDF,\s0 HTML::Microformats also makes all its data available as native Perl objects. .SH "BUGS" .IX Header "BUGS" Please report any bugs to . .SH "SEE ALSO" .IX Header "SEE ALSO" HTML::Microformats::Documentation::Notes. .PP Individual format modules: .IP "\(bu" 4 HTML::Microformats::Format::adr .IP "\(bu" 4 HTML::Microformats::Format::figure .IP "\(bu" 4 HTML::Microformats::Format::geo .IP "\(bu" 4 HTML::Microformats::Format::hAtom .IP "\(bu" 4 HTML::Microformats::Format::hAudio .IP "\(bu" 4 HTML::Microformats::Format::hCalendar .IP "\(bu" 4 HTML::Microformats::Format::hCard .IP "\(bu" 4 HTML::Microformats::Format::hListing .IP "\(bu" 4 HTML::Microformats::Format::hMeasure .IP "\(bu" 4 HTML::Microformats::Format::hNews .IP "\(bu" 4 HTML::Microformats::Format::hProduct .IP "\(bu" 4 HTML::Microformats::Format::hRecipe .IP "\(bu" 4 HTML::Microformats::Format::hResume .IP "\(bu" 4 HTML::Microformats::Format::hReview .IP "\(bu" 4 HTML::Microformats::Format::hReviewAggregate .IP "\(bu" 4 HTML::Microformats::Format::OpenURL_COinS .IP "\(bu" 4 HTML::Microformats::Format::RelEnclosure .IP "\(bu" 4 HTML::Microformats::Format::RelLicense .IP "\(bu" 4 HTML::Microformats::Format::RelTag .IP "\(bu" 4 HTML::Microformats::Format::species .IP "\(bu" 4 HTML::Microformats::Format::VoteLinks .IP "\(bu" 4 HTML::Microformats::Format::XFN .IP "\(bu" 4 HTML::Microformats::Format::XMDP .IP "\(bu" 4 HTML::Microformats::Format::XOXO .PP Similar modules: RDF::RDFa::Parser, HTML::HTML5::Microdata::Parser, XML::Atom::Microformats, Text::Microformat, Data::Microformats. .PP Related web sites: , . .SH "AUTHOR" .IX Header "AUTHOR" Toby Inkster . .SH "COPYRIGHT AND LICENCE" .IX Header "COPYRIGHT AND LICENCE" Copyright 2008\-2012 Toby Inkster .PP This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. .SH "DISCLAIMER OF WARRANTIES" .IX Header "DISCLAIMER OF WARRANTIES" \&\s-1THIS PACKAGE IS PROVIDED \*(L"AS IS\*(R" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.\s0