From 305df3aca75239dc46e04253a3631ee27b64447d Mon Sep 17 00:00:00 2001 From: David Precious Date: Wed, 5 Sep 2012 16:43:03 +0100 Subject: [PATCH 1/2] Add HTML::TableExtract example --- s/perl.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/s/perl.md b/s/perl.md index 38f9e7e..825329c 100644 --- a/s/perl.md +++ b/s/perl.md @@ -94,6 +94,30 @@ The [XML::Twig](http://search.cpan.org/dist/XML-Twig) module includes the but finds local matches. +# HTML::TableExtract + +A fairly common requirement when parsing websites is to scrape data from a HTML +table. [HTML::TableExtract](http://search.cpan.org/dist/HTML-TableExtract) +makes this easy and simple. + +Quick example: + + my $html = LWP::Simple::get($url); + my $te = HTML::TableExtract->new( + headers => [ 'Date', 'Home Team', 'Away Team', 'Venue' ], + ); + $te->parse($html); + + for my $row ($te->rows) { + my ($date, $home, $away, $venue) = @$row; + } + +The above will find the first table on the page which has the headers mentioned, +and extract each row from it. Looking for the table by the headers means that, +if extra columns are added or columns are moved about, your script will continue +to work perfectly. + + # To do * Code examples From a83479ba7801945a69a8679bfb66913e13cd0b37 Mon Sep 17 00:00:00 2001 From: David Precious Date: Wed, 5 Sep 2012 16:43:33 +0100 Subject: [PATCH 2/2] Add myself to the contributors list. (Hope this is OK.) (I removed the "and " from Jeana's entry to make it just a simple list, so the next person added doesn't require the entry before to be modified each time.) --- s/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/s/index.md b/s/index.md index 9553cf5..387225e 100644 --- a/s/index.md +++ b/s/index.md @@ -25,4 +25,5 @@ Thanks to the following folks for their contributions: * Jeffrey Kegler * Bill Ricker * Stuart Caie -* and Jeana Clark +* Jeana Clark +* David Precious