In Part 1, an example of a Datalog query was provided:
[:find ?title
:where
[_ :movie/title ?title]]
Here, :find has a meaning similar to SQL’s SELECT, as both specify the query's target and define output refinements.
The real difference lies in :where. While SQL also uses WHERE for specifying row filtering conditions, there are two key differences in Datalog:
- Datalog does not use SQL's
FROMto specify tables. - The syntax of conditions in Datalog's
:wherelooks significantly different from SQL.
To explain these differences, we need to understand Datomic's information model.
When imagining the information model of an SQL database, it usually looks something like this:
- Each table can be seen as a set.
- Each set contains numerous entities, stored in rows.
- Relationships can exist between different sets.
Datomic's information model, however, is radically different. Imagine the entire database as if it had just one table with five columns:
[<entity-id> <attribute> <value> <tx-id> <op>]
...
[ 167 :person/name "James Cameron" 102 true]
[ 234 :movie/title "Die Hard" 102 true]
[ 234 :movie/year 1987 102 true]
[ 235 :movie/title "Terminator" 102 true]
[ 235 :movie/director 167 102 true]
...
Each row in this table is called a datom.
The five columns are as follows:
- Entity ID (entity-id)
- Attribute (attribute)
- Value (value)
- Transaction ID (tx-id)
- Operation (op)
To understand this, imagine converting multiple SQL tables into a single Datomic table. Here’s how this could be done:
- Define the Datomic table schema with three main columns:
entity-id,attribute, andvalue(shortened to the EAV model).entity-id: Long integer.attribute: String.value: Binary (BLOB).
- Add a unique column to each SQL table to serve as the global row identifier, which maps to
entity-id. - Prepend table names to column names (e.g.,
titlein themovietable becomes:movie/title). - Transform each row from every table into EAV format and insert it into the Datomic table.
Example transformation:
movie table
[entity-id :movie/title :movie/year :movie/director]
...
[214748 "Terminator" 1987 "James Francis Cameron"]
becomes:
Datomic table
[<entity-id> <attribute> <value>]
...
[214748 :movie/title "Terminator"]
[214748 :movie/year 1987]
[214748 :movie/director "James Francis Cameron"]
If you're thinking this approach is interesting and wondering if a similar schema could be used for SQL databases: this is known as the EAV model and is often considered an anti-pattern. However, some real-world applications, like WordPress's user meta and post meta tables, still use this model.
Datomic's information model is slightly more complex than the EAV model due to the addition of transaction IDs and operations:
- Transaction ID (tx-id)
- Datoms are written into the database at different times, and their write timestamps are recorded as special transaction datoms.
tx-idpoints to these transaction datoms. Since transaction IDs increase over time, they indicate the relative order of datoms' creation.
- Datoms are written into the database at different times, and their write timestamps are recorded as special transaction datoms.
- Operation (op)
- This column has two values:
trueorfalse. truemeans the datom's fact exists;falsemeans it does not exist. Deleting a fact involves writing new datoms where the operation is set tofalse.
- This column has two values:
The following query retrieves all entities with :person/name equal to "Ridley Scott":
[:find ?e
:where
[?e :person/name "Ridley Scott"]]
Explanation:
[?e :person/name "Ridley Scott"]matches this condition against the database's content. Here,?ein the:whereclause is a variable.- Returns all matching
?evalues.
Doesn’t it resemble an SQL query? The part executed first appears lower in the query.
- Try the Basic Queries!
