DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Extract The Body Of An HTML Document

  • submit to reddit
        For example, print out just the body of Google's home page:

use LWP::UserAgent;
use HTML::TreeBuilder;

$ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => '');
my $res = $ua->request($req);

if ($res->is_success) {
  my $tree = HTML::TreeBuilder->new_from_content($res->content);
  my $body = $tree->find('body');
  foreach $e ($body->content_list())
    print $e->as_HTML();