NAME¶
Parse::MediaWikiDump::Links - Object capable of processing link dump files
ABOUT¶
This object is used to access content of the SQL based category dump files by
providing an iterative interface for extracting the indidivual article links
to the same. Objects returned are an instance of Parse::MediaWikiDump::link.
SYNOPSIS¶
$pmwd = Parse::MediaWikiDump->new;
$links = $pmwd->links('pagelinks.sql');
$links = $pmwd->links(\*FILEHANDLE);
#print the links between articles
while(defined($link = $links->next)) {
print 'from ', $link->from, ' to ', $link->namespace, ':', $link->to, "\n";
}
STATUS¶
This software is being RETIRED - MediaWiki::DumpFile is the official successor
to Parse::MediaWikiDump and includes a compatibility library called
MediaWiki::DumpFile::Compat that is 100% API compatible and is a near perfect
standin for this module. It is faster in all instances where it counts and is
actively maintained. Any undocumented deviation of MediaWiki::DumpFile::Compat
from Parse::MediaWikiDump is considered a bug and will be fixed.
METHODS¶
- Parse::MediaWikiDump::Links->new
- Create a new instance of a page links dump file parser
- $links->next
- Return the next available Parse::MediaWikiDump::link object or undef if
there is no more data left
EXAMPLE¶
List all links between articles in a friendly way¶
#!/usr/bin/perl
use strict;
use warnings;
use Parse::MediaWikiDump;
my $pmwd = Parse::MediaWikiDump->new;
my $links = $pmwd->links(shift) or die "must specify a pagelinks dump file";
my $dump = $pmwd->pages(shift) or die "must specify an article dump file";
my %id_to_namespace;
my %id_to_pagename;
binmode(STDOUT, ':utf8');
#build a map between namespace ids to namespace names
foreach (@{$dump->namespaces}) {
my $id = $_->[0];
my $name = $_->[1];
$id_to_namespace{$id} = $name;
}
#build a map between article ids and article titles
while(my $page = $dump->next) {
my $id = $page->id;
my $title = $page->title;
$id_to_pagename{$id} = $title;
}
$dump = undef; #cleanup since we don't need it anymore
while(my $link = $links->next) {
my $namespace = $link->namespace;
my $from = $link->from;
my $to = $link->to;
my $namespace_name = $id_to_namespace{$namespace};
my $fully_qualified;
my $from_name = $id_to_pagename{$from};
if ($namespace_name eq '') {
#default namespace
$fully_qualified = $to;
} else {
$fully_qualified = "$namespace_name:$to";
}
print "Article \"$from_name\" links to \"$fully_qualified\"\n";
}