Skip to content

Commit e806cb4

Browse files
authored
How to use jq (#651)
1 parent b4b1b68 commit e806cb4

File tree

2 files changed

+179
-0
lines changed

2 files changed

+179
-0
lines changed

docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,7 @@
269269
{
270270
"group": "Tool demos",
271271
"pages": [
272+
"examplecode/tools/jq",
272273
"examplecode/tools/firecrawl",
273274
"examplecode/tools/langflow",
274275
"examplecode/tools/vectorshift",

examplecode/tools/jq.mdx

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: Query JSON with jq
3+
---
4+
5+
[jq](https://jqlang.org/) is a lightweight and flexible command-line JSON processor. You can use `jq` on a local development machine to
6+
slice, filter, map, and transform the JSON data that Unstructured outputs in much the same ways that tools such as `sed`, `awk`, and `grep` let you work with text.
7+
8+
To get `jq`, see the [Download jq](https://jqlang.org/download/) page.
9+
10+
<Info>
11+
`jq` is not owned or supported by Unstructured. For questions about `jq`and
12+
feature requests for future versions of `jq`, see the [Issues](https://github.com/jqlang/jq/issues) tab of the
13+
`jq` repository in GitHub.
14+
</Info>
15+
16+
The following command examples use `jq` with the
17+
[spring-weather.html.json](https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/spring-weather.html.json) file in the
18+
**example-docs** directory within the **Unstructured-IO/unstructured** repository in GitHub.
19+
20+
Find the element with a `type` of `Address`, and print the element's `text` field's value.
21+
22+
```bash
23+
jq '.[]
24+
| select(.type == "Address")
25+
| .text' spring-weather.html.json
26+
```
27+
28+
The output is:
29+
30+
```bash
31+
"Silver Spring, MD 20910"
32+
```
33+
34+
Find all elements with a `type` of `Title`, and print the `text` field of each found element as a string in a JSON array.
35+
36+
```bash
37+
jq '[
38+
.[]
39+
| select(.type == "Title")
40+
| .text]' spring-weather.html.json
41+
```
42+
43+
The output is:
44+
45+
```bash
46+
[
47+
"News Around NOAA",
48+
"National Program",
49+
"Are You Weather-Ready for the Spring?",
50+
"Weather.gov >",
51+
"News Around NOAA > Are You Weather-Ready for the Spring?",
52+
"US Dept of Commerce",
53+
"National Oceanic and Atmospheric Administration",
54+
"National Weather Service",
55+
"News Around NOAA",
56+
"1325 East West Highway",
57+
"Comments? Questions? Please Contact Us.",
58+
"Disclaimer",
59+
"Information Quality",
60+
"Help",
61+
"Glossary",
62+
"Privacy Policy",
63+
"Freedom of Information Act (FOIA)",
64+
"About Us",
65+
"Career Opportunities"
66+
]
67+
```
68+
69+
Find all elements with a `type` of `Title`. Of these, find the ones that have a `text` field that contains the phrase `Contact Us`, and print the contents of each found element's `metadata.link_urls` field.
70+
71+
```bash
72+
jq '.[]
73+
| select(.type == "Title")
74+
| select(.text
75+
| contains("Contact Us"))
76+
| .metadata.link_urls' spring-weather.html.json
77+
```
78+
79+
The output is:
80+
81+
```bash
82+
[
83+
"https://www.weather.gov/news/contact"
84+
]
85+
```
86+
87+
Find all elements with a `type` of `ListItem`. Of these, find the ones that have a `text` field that contains the phrase `Weather Safety`.
88+
For each item in `metadata.link_texts`, print the item's value as the key, followed by the matching item in
89+
`metadata.link_urls` as the value. Trim any leading and trailing whitespace from all values. Wrap the output in a JSON array.
90+
91+
```bash
92+
jq '[
93+
.[]
94+
| select(.type == "ListItem")
95+
| select(.text | test("Weather Safety"; "i"))
96+
| [.metadata.link_texts, .metadata.link_urls]
97+
| transpose[]
98+
| {
99+
(.[0] | gsub("^\\s+|\\s+$"; "")) : (.[1] | gsub("^\\s+|\\s+$"; ""))
100+
}
101+
]' spring-weather.html.json
102+
```
103+
104+
The output is:
105+
106+
```bash
107+
[
108+
{
109+
"Weather Safety": "http://www.weather.gov/safetycampaign"
110+
},
111+
{
112+
"Air Quality": "https://www.weather.gov/safety/airquality"
113+
},
114+
{
115+
"Beach Hazards": "https://www.weather.gov/safety/beachhazards"
116+
},
117+
{
118+
"Cold": "https://www.weather.gov/safety/cold"
119+
},
120+
{
121+
"Cold Water": "https://www.weather.gov/safety/coldwater"
122+
},
123+
{
124+
"Drought": "https://www.weather.gov/safety/drought"
125+
},
126+
{
127+
"Floods": "https://www.weather.gov/safety/flood"
128+
},
129+
{
130+
"Fog": "https://www.weather.gov/safety/fog"
131+
},
132+
{
133+
"Heat": "https://www.weather.gov/safety/heat"
134+
},
135+
{
136+
"Hurricanes": "https://www.weather.gov/safety/hurricane"
137+
},
138+
{
139+
"Lightning Safety": "https://www.weather.gov/safety/lightning"
140+
},
141+
{
142+
"Rip Currents": "https://www.weather.gov/safety/ripcurrent"
143+
},
144+
{
145+
"Safe Boating": "https://www.weather.gov/safety/safeboating"
146+
},
147+
{
148+
"Space Weather": "https://www.weather.gov/safety/space"
149+
},
150+
{
151+
"Sun (Ultraviolet Radiation)": "https://www.weather.gov/safety/heat-uv"
152+
},
153+
{
154+
"Thunderstorms & Tornadoes": "https://www.weather.gov/safety/thunderstorm"
155+
},
156+
{
157+
"Tornado": "https://www.weather.gov/safety/tornado"
158+
},
159+
{
160+
"Tsunami": "https://www.weather.gov/safety/tsunami"
161+
},
162+
{
163+
"Wildfire": "https://www.weather.gov/safety/wildfire"
164+
},
165+
{
166+
"Wind": "https://www.weather.gov/safety/wind"
167+
},
168+
{
169+
"Winter": "https://www.weather.gov/safety/winter"
170+
}
171+
]
172+
```
173+
174+
## Additional resources
175+
176+
- [jq Tutorial](https://jqlang.org/tutorial/)
177+
- [jq Manual](https://jqlang.org/manual/)
178+
- [jq Playground](https://play.jqlang.org/)

0 commit comments

Comments
 (0)