Skip to content

Jsoup Extension

Dmitriy Zayceff edited this page Apr 17, 2015 · 13 revisions

API: http://jphp-docs.readthedocs.org/en/latest/api_en/php/jsoup/

Jsoup is a library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. It based on the jsoup java library jsoup.org

Usage

Getting and parsing text of url:

use php\jsoup\Jsoup;

$doc = Jsoup::connect("http://en.wikipedia.org/")->get();
$newsHeadlines = $doc->select("#mp-itn b a");

foreach ($newsHeadlines as $element) {
   echo "- {$element->text()}\n";
}

Use the static Jsoup::parse(string $html) method, or Jsoup::parse(string $html, string $baseUri) if the page came from the web.

$html = "<html><head><title>First parse</title></head>
         <body><p>Parsed HTML into a doc.</p></body></html>";
$doc = Jsoup::parse($html);

echo $doc->title();

Loading document form url with parameters:

$doc = Jsoup::connect("http://example.com")
  ->data("query", "Java")
  ->userAgent("Mozilla")
  ->cookie("auth", "token")
  ->timeout(3000) // 3 sec
  ->post();

Clone this wiki locally