HTMLExtract

Introduced in t-ui beta 6.6

This feature let's you extract text from HTML pages and display it inside t-ui.

XPath

XPath is a language used to find particular nodes and tags in HTML/XML documents. It's very easy to understand, and very powerful.

Tutorial: w3schools
Examples: w3schools
Tester: freeformatter.com

JSONPath

JSONPath has the same features of the language described above, but it works with Json.

Tutorial: http://goessner.net/
Tester: jsonpath.com

Format

Values:

%n -> newline
%t -> tag name
%t(attributeName) -> the value of the attribute attributeName of the matched node
%a(format)(separator) -> prints every attribute of the matched nodes
- %an -> attribute name
- %av -> attribute value
%v -> tag value
#[URL] -> link
#rrggbb[text] -> color the text

Example

Matched node:
<a href="https://github.com/Andre1299/TUI-ConsoleLauncher/subscription" class="myClass" role="button">This is a link</a>

Example 1

Format:
#[%t(href)]

Output:
https://github.com/Andre1299/TUI-ConsoleLauncher/subscription

Example 2

Format:
%t -> %v%n%a(%an = %av)(%n)

Output:

a -> This is a link
href = https://github.com/Andre1299/TUI-ConsoleLauncher/subscription
class = myClass
role = button

Steps

1. Find a webpage

2. Decide the node kind

You can select an infinite amount of nodes, but everyone will be of the same kind. Decide carefully what kind of nodes you need.

3. Create a new XPath/JsonPath expression

4. Test!

5. Add the expression to t-ui

htmlextract -add [json OR xpath] [ID] [expression]

For instance:
htmlextract -add xpath 1 //a[@class="foo"]

6. Add a new format to t-ui (you can also use the default one)

htmlextract -add format [ID] [expression]

For instance:
htmlextract -add format 5 #[%t(href)]

7. Use it!

htmlextract -query [ID] [optional: Format ID] [webpage]
For instance:
htmlextract -query 1 5 https://website.com/page.html

Notice that [Format ID] is optional. This means that if you omit it, t-ui will use the value of htmlextract_default_format instead.

Francesco Andreuzzi, Italy, andreuzzi.francesco@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTMLExtract

XPath

JSONPath

Format

Example

Example 1

Example 2

Steps

1. Find a webpage

2. Decide the node kind

3. Create a new XPath/JsonPath expression

4. Test!

5. Add the expression to t-ui

6. Add a new format to t-ui (you can also use the default one)

7. Use it!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally