Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion s/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@ Thanks to the following folks for their contributions:
* Jeffrey Kegler
* Bill Ricker
* Stuart Caie
* and Jeana Clark
* Jeana Clark
* David Precious
24 changes: 24 additions & 0 deletions s/perl.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,30 @@ The [XML::Twig](http://search.cpan.org/dist/XML-Twig) module includes the
but finds local matches.


# HTML::TableExtract

A fairly common requirement when parsing websites is to scrape data from a HTML
table. [HTML::TableExtract](http://search.cpan.org/dist/HTML-TableExtract)
makes this easy and simple.

Quick example:

my $html = LWP::Simple::get($url);
my $te = HTML::TableExtract->new(
headers => [ 'Date', 'Home Team', 'Away Team', 'Venue' ],
);
$te->parse($html);

for my $row ($te->rows) {
my ($date, $home, $away, $venue) = @$row;
}

The above will find the first table on the page which has the headers mentioned,
and extract each row from it. Looking for the table by the headers means that,
if extra columns are added or columns are moved about, your script will continue
to work perfectly.


# To do

* Code examples
Expand Down