Skip to content

Commit 1f987cd

Browse files
committed
Adding WOQL Beginners Guide
1 parent 415f0be commit 1f987cd

File tree

3 files changed

+313
-0
lines changed

3 files changed

+313
-0
lines changed
Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
---
2+
nextjs:
3+
metadata:
4+
title: WOQL Getting Started
5+
description: >-
6+
Examples to Get Started with the TerminusDB Web Object Query Langauge (WOQL)
7+
openGraph:
8+
images: >-
9+
https://assets.terminusdb.com/docs/technical-documentation-terminuscms-og.png
10+
media: []
11+
---
12+
13+
# The WOQL beginners guide
14+
15+
The thinking behind the Web Object Query Language is well explained on the [WOQL explanation](/docs/woql-explanation) page. As mentioned there, it is a formal language for querying and updating TerminusDB databases. As a language is builds on a declarative datalog foundation, and is evaluated by binding variables as the abstract syntax tree is read.
16+
17+
This means that some logical behaviours depend on the order of binding the variables to values in the current instance or schema graph that is addressed.
18+
19+
## A foundation to get started
20+
21+
This page provides examples of WOQL queries and explanations for how to use it in practice. In general we recommend using the functional style as if has been shown to be easier to reason about in general for practitioners. Senior WOQLers tend to use both based on elegance and the best way to express a particular problem.
22+
23+
In the tutorial we will use the Javascript/Typescript dialect of WOQL. The logic is the same, but the syntax for Python follows traditional Python patterns. For details, refer to the syntax of each.
24+
25+
All examples have been tested in the TerminusDB Logical Studio provided by [DFRNT](https://dfrnt.com). The reader will be expected to have already built a few first classes and stored documents in TerminusDB before embarking on the examples in the tutorial.
26+
27+
## Predicates, literals and variables in WOQL
28+
29+
WOQL as a language has three main parts, predicates, literals and variables.
30+
31+
### Predicates, the "functions" of WOQL
32+
33+
Predicates such as `triple()`, `eq()` and `read_document()` enable logical constraints to be put to match information stored in TerminusDB. The retrieved information can be seen as the full set of combinations, possibilities or solutions from the query that is sent.
34+
35+
### Literals, the "values" of WOQL
36+
37+
Literals are values of a datatype. A variable can be bound to a value using `eq("v:MyVariable", "John Doe")` to let the variable `MyVariable` be bound to the literal `John Doe`. Valid solutions are only those where the variable `MyVariable` will be equal to the literal or value `John Doe`.
38+
39+
### Variables, the "glue" of WOQL
40+
41+
Variables can be specified in two ways in the Typescript and Python TerminusDB SDK, either they can be generated using the `Vars` function as a native variable in the language to leverage syntax coloring and a pure approach. Or, they can be specified using the `v:` prefix. In this tutorial we will use the `v:` prefix as we will need less boilerplate code and it's easy to copy and paste.
42+
43+
Think of each "binding" with a set of bound variables that is returned from TerminusDB as a solution to the logical constraints that were put in the query. The logical constraints are not limited to reading information, updates, creations and deletions can also be made in the same query.
44+
45+
## How WOQL sees the world
46+
47+
The world that one WOQL query operates on is a closed world as defined by the layer that the query is active on, usually the most recently defined layer, but it could also operate in read-only mode on a commit in the past.
48+
49+
All operations happen atomically in one go in the query and nothing is created, updated, or deleted in the world that the query operates on. Every action is recorded during the query and are applied in creating the next layer in the instance and schema graphs. This means that a query can reference many layers in the past, but can only create a new world that exists after the query completes.
50+
51+
This means a query can't operate on information that comes out of the current query. How the query logically sees the world is an important consideration when working with declarative logic in a datalog language like WOQL.
52+
53+
## Reading documents and triples from the instance graph
54+
55+
Information stored in TerminusDB is stored as hierarchical documents with a defined shape, similar in structure to JSON-LD documents. Each document is made up of frames, sets of triples that are connected, that some call such frames field sets or FieldSets. Triples that are disconnected from documents are not allowed.
56+
57+
All triples that form documents exist in the instance graph. There are two graph types in TerminusDB, the instance graph and the schema graph. The instance graph contains the technical documents that are stored in the database as a graph of connected triples of a certain shape.
58+
59+
### The minimal document, a single defined triple
60+
61+
Documents can be minimal. The smallest possible document to define consists of a single Resource Description Framework triple. It contains the `@id` of the document as the subject, a field or property called `rdf:type` as the predicate and as the object of the triple, the type of the document, such as `@schema:Entity`.
62+
63+
The `@schema:` prefix to `Entity` is a prefix that is shared between schema and instance graphs of TerminusDB data products. It enables the properties of types to be looked up in the schema graph using `quad()` that will be addressed separately. The schema graph contains the schema that defines the data model for all documents stored in the instance graph.
64+
65+
The schema enforces a structure, the shape of documents, that all triples connecting the information stored in the instance graph. Reading documents and triples from the instance graph is done using the `read_document()` predicate.
66+
67+
### Triples are the foundation for the WOQL language
68+
69+
Documents are the basic unit to exchange facts and hierarchical records. They can be read into WOQL variables. WOQL however mostly operates on triples once the TerminusDB engine has ensured that a new instance layer can be recorded for the future.
70+
71+
This means we need a bit of foundation for what a document is, and what triple is. A triple is in essence three glued part that mirrors the world or that stand on their own: a subject, a predicate and an object. The subject is identified by an `@id`. The predicate indicates something about the subject, and the object is the value of the predicate.
72+
73+
Example of a minimal document of the `Person` type:
74+
75+
```json
76+
{
77+
"@id": "Person/JohnDoe",
78+
"@type": "Person"
79+
}
80+
```
81+
82+
This is how the document is seen by WOQL in the triple store when the prefixes have been applied from the context.
83+
84+
{% table %}
85+
86+
- Subject
87+
- Predicate
88+
- Object
89+
90+
---
91+
92+
- Person/JohnDoe
93+
- rdf:type
94+
- @schema:Person
95+
96+
{% /table %}
97+
98+
This is the triple as it is actually stored on file.
99+
100+
{% table %}
101+
102+
- Subject
103+
- Predicate
104+
- Object
105+
106+
---
107+
108+
- terminusdb:///data/Person/JohnDoe
109+
- rdf:type
110+
- terminusdb:///schema#Person
111+
112+
{% /table %}
113+
114+
The first table shows how default prefixes in TerminusDB enable WOQL to make for concise queries without the full IRI. The second row shows the full URL that is used internally inthe instance graph if the context has not been altered.
115+
116+
If we would state that John Doe has an age, this is how it would look to WOQL. 34 is a literal value and could be an integer or one of the other datatypes in TerminusDB.
117+
118+
{% table %}
119+
120+
- Subject
121+
- Predicate
122+
- Object
123+
124+
---
125+
126+
- Person/JohnDoe
127+
- age
128+
- 34
129+
130+
{% /table %}
131+
132+
We could also use the equivalent values below when quering, or even write out the full schema IRI, depending on our needs.
133+
134+
{% table %}
135+
136+
- Subject
137+
- Predicate
138+
- Object
139+
140+
---
141+
142+
- terminusdb:///data/Person/JohnDoe
143+
- @schema:age
144+
- 34
145+
146+
{% /table %}
147+
148+
The default context for TerminusDB is shown below.
149+
150+
```json
151+
{
152+
"@base": "terminusdb:///data/",
153+
"@schema": "terminusdb:///schema#"
154+
}
155+
```
156+
157+
The reason this is important is that the foundation of TerminusDB rests with the Semantic Web, but with a closed world interpretation of the Resource Description Framework, RDF. This means that the context defines the data that is stored in TerminusDB, and that TerminusDB is authoritative for the information within it's world, it is expected to have correct answers reasoning about that world it knows about, a "closed world".
158+
159+
WOQL operates as a datalog semantic query language that enables declarative logic to be used for the semantic web and to traverse documents no matter their shape or how they are connected. The variables in WOQL use either shorthand IRIs, prefixed IRIs or the full IRI when resolving connected triples.
160+
161+
Let's continue the exporation by quering for documents.
162+
163+
### Reading the minimal document as a triple
164+
165+
First, let's look at how documents are structured in TerminusDB. A document is assembled from triples by TerminusDB into a set of JSON-LD-like documents, without the context object, including `@id` and `@type` for each frame or field set. Each frame is represented as a JSON object.
166+
167+
Documents in TerminusDB are hierarchical, where concrete types that can exist on their own are called documents, and documents that form parts of a document are called subdocuments. They are nested framed objects in the structure of a document. What connects the "levels" of a hierarchical document are triples that connect one document to a subdocument.
168+
169+
### Querying a document in TerminusDB
170+
171+
Querying a document in TerminusDB is done using the `read_document("v:id", "v:document")` predicate. It takes two arguments, the first is the document id to read and the second is the variable to bind the read document to in the response. Both arguments are variables, and both will be returned in the bindings.
172+
173+
The whole nested document will be resolved into a complete object in the bindings WOQL result, with all the subdocuments assembled automatically. In a regular relational database, the client bears the responsibility to resolve the nested structure of documents using various kinds of joins as part of the query, where as the unfolding structure of documents can be specified elegantly in the TerminusDB schema.
174+
175+
To only get the `v:read_document_variable` back, leverage the `select()` predicate that filters the bindings to only include the variables that specified by the arguments to `select()`. This is a common pattern when only interested in a specific part of a document.
176+
177+
Here we want just a list of documents, so we select the `v:docs` variable to not get the data in the `v:docId` variable back in our query bindings.
178+
179+
```javascript
180+
WOQL.select("v:docs").read_document("v:docId", "v:docs");
181+
```
182+
183+
If the individual WOQL keywords have been added to the Javascript context, the query can be written as below in the **fluent** style, where the WOQL parts are joined together in a flow-like manner. The easiest way to think of it is that the next WOQL keyword is either the last argument or flows from the first predicate to the next.
184+
185+
```javascript
186+
select("v:docs").read_document("v:docId", "v:docs");
187+
```
188+
189+
The query can also be written as below in the **functional** style that we will be using mainly. Note the use of `v:` for specifying variables. Variables used in multiple places are bound by the how deterministic the predicate is and whether the variable becomes "grounded".
190+
191+
```javascript
192+
select("v:docs", read_document("v:docId", "v:docs"));
193+
```
194+
195+
In this example, `v:docs` was bound by the `read_document()` predicate ("read into the variable"), and `v:docId` is free floating. It should be noted that the first argument of `read_document()` is not fully floating, it only matches the first document it finds, unless it is preceded by a predicate that is fully free floating, such as `triple()` or `quad()` that bind variables to content in the triple store.
196+
197+
To match all documents thus, the following query can be used to match the document equal to the `@id` of `Person/JohnDoe`, read_document it into the variable `v:docs`, and filter out to only retrieve `v:docs` in the bindings of the query.
198+
199+
```javascript
200+
select("v:docs").and(
201+
eq("v:docId", "Person/JohnDoe")
202+
triple("v:docId", "rdf:type", "@schema:Person"),
203+
read_document("v:docId", "v:docs")
204+
);
205+
```
206+
207+
Changing the order of the eq, triple and read_document predicates can change performance characteristics and sometimes also the meaning of the query.
208+
209+
## Using and, or and opt and not in queries
210+
211+
An important part of querying is to use boolean logic to filter out the data you are interested in. WOQL provides the `and()`, `or()`, `opt()`, and `not()` predicates to enable this.
212+
213+
### The and() predicate
214+
215+
The `and()` predicate is used to combine multiple predicates into a single predicate where all predicates must be true for its solutions to match and return. The use of `and()` is straightforward and is a good way to structure the query into logical blocks that must hold true.
216+
217+
No row will be returned for the variable `v:var` from this example:
218+
219+
```javascript
220+
and(
221+
eq("v:var", "a"),
222+
eq("v:var", "b")
223+
)
224+
```
225+
226+
One row will be returned for the variable `v:var` from this example, where `v:var_a` is grounded to "a" and `v:var_b` is grounded to "b".
227+
228+
```javascript
229+
and(
230+
eq("v:var_a", "a"),
231+
eq("v:var_b", "b")
232+
)
233+
```
234+
235+
### The or() predicate
236+
237+
The `or()` predicate is used to combine multiple predicates into a single predicate where at least one predicate must be true for its solutions to match and return. The use of `or()` can be a bit tricky and sometimes it is better to use the `opt()`, optional, predicate.
238+
239+
The use of the `or()` predicate drives the cardinality of the solutions, where each possible solution stands on its own, meaning that more than one solution to the query will be returned. It is often the case that the query author means to use that certain parts of the query are optional and that the variable can be left ungrounded if the predicate does not hold true or fill a variable with a value.
240+
241+
Two rows will be returned for the variable `v:var` from this example:
242+
243+
```javascript
244+
or(
245+
eq("v:var_a", "a"),
246+
eq("v:var_b", "b")
247+
)
248+
```
249+
250+
One row where `v:var_a` is grounded to "a" and one row where `v:var_b` is grounded to "b". The values where `v:var_a` and `v:var_b` are ungrounded have no values.
251+
252+
### The opt() predicate
253+
254+
The `opt()` predicate means that the predicate is options, if possible, it will ground the variable, but if it is already grounded or does not hold true, the predicate will be skipped. An example is to fill up the ungrounded values of the or statement, which could be done as follows:
255+
256+
```javascript
257+
and(
258+
or(
259+
eq("v:var_a", "a"),
260+
eq("v:var_b", "b")
261+
),
262+
opt().eq("v:var_a", "v:var_b")
263+
)
264+
```
265+
266+
Here we get one solution where both `v:var_a` and `v:var_b` are grounded to "a". And one solution where `v:var_a` isand `v:var_b` are grounded to "b". The `opt()` predicate is optional, and applied where possible, i.e. filling the ungrounded value through equality. This is useful for handling optional values and to avoid "exploding" cardinality.
267+
268+
### The not() predicate
269+
270+
The `not()` predicate is used to negate a predicate, i.e. to make sure that something should not match. This can be used to find all document subjects, where we need to filter out lists and subdocuments:
271+
272+
```javascript
273+
select(
274+
"v:subject",
275+
"v:type",
276+
and(
277+
triple("v:subject", "rdf:type", "v:type"),
278+
not(quad("v:type", "sys:subdocument", "v:select", "schema")),
279+
not(eq("v:type", "rdf:List")),
280+
),
281+
)
282+
```
283+
284+
Here we select subject and type for the resulting bindings. We match all subjects of documents, but there are certain special types of documents, such as Lists and Subdocuments, that are not top level documents and should thus be exluded from the result.
285+
286+
## Further Reading
287+
288+
### WOQL Explanation
289+
290+
[WOQL Explanation](/docs/woql-explanation/) for a more in-depth explanation of WOQL.
291+
292+
### WOQL Reference
293+
294+
[JavaScript](/docs/javascript/) and [Python](/docs/python/) WOQL Reference guides
295+
296+
### How-to guides
297+
298+
See the [How-to Guides](/docs/use-the-clients/) for further examples of using WOQL.
299+
300+
### Documents
301+
302+
[Documents](/docs/documents-explanation/) in a knowledge graph and how to use them.

src/lib/navigation.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -346,6 +346,10 @@ export const navigation: Navigation[] = [
346346
title: 'Customer Data Processing',
347347
href: '/docs/python-woql-customer-data-processing-example',
348348
},
349+
{
350+
title: 'WOQL Getting Started',
351+
href: '/docs/woql-getting-started',
352+
},
349353
],
350354
},
351355
{

src/menu.json renamed to src/prebuild/menu_deprecated.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -588,6 +588,13 @@
588588
"Menu3Page": {
589589
"slug": "python-woql-customer-data-processing-example"
590590
}
591+
},
592+
{
593+
"Menu3Label": "WOQL Getting Started",
594+
"Order": "2001",
595+
"Menu3Page": {
596+
"slug": "woql-getting-started"
597+
}
591598
}
592599
]
593600
},

0 commit comments

Comments
 (0)