Skip to content

Commit 44d7987

Browse files
committed
Version doc for v0.0.4
1 parent 90541b3 commit 44d7987

File tree

26 files changed

+6687
-0
lines changed

26 files changed

+6687
-0
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Storage
3+
weight: 1
4+
---
5+
6+
The research conducted with LingoDB does not focus on storage aspects of database systems.
7+
Thus, LingoDB does not come with an optimized storage backend and currently does not provide transactional semantics.
8+
9+
## In-Memory Format: Apache Arrow
10+
11+
The Apache Arrow columnar layout is used for the in-memory representation of tabular data.
12+
Thus, LingoDB can exchange data with existing libraries and frameworks withoug any overhead and can directly query
13+
Apache Arrow tables.
14+
15+
## Persistent Storage
16+
17+
For many practical purposes, persistent storage is required.
18+
We chose a pragmatic approach:
19+
20+
1. Each database is represented by multiple files placed in one *database directory*
21+
2. The main database file is called `db.lingodb` and contains the database catalog and metadata in a binary format
22+
3. At the moment, further files are used to store the table data (e.g., in Apache Arrow format)
23+
24+
Given the database directory, LingoDB automatically loads the database catalog and metadata from the `db.lingodb` file.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
title: Design
3+
type: docs
4+
weight: 4
5+
---
6+
7+
This section gives an overview over the overall design of LingoDB.
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
2+
LingoDB is an open-source project that welcomes contributions from the community.
3+
However, it is also a research project that still undergoes major changes (often not in public repositories) that might conflict with your contributions.
4+
Furthermore, the project is developed by a very small team of researchers and students, which means that we have limited resources to review and merge pull requests.
5+
Finally, we have to ensure that the codebase stays maintainable and that the project's goals are met.
6+
Thus, please follow the guidelines below when planning to contribute to LingoDB.
7+
8+
### Micro-Changes such as fixing typos, etc
9+
If you find a small typo or similar in one of the LingoDB repositories, please open an *Issue* in the respective repository.
10+
We won't accept pull requests for such small changes, but we will be happy to fix them ourselves as soon as possible.
11+
12+
Examples:
13+
* Typos
14+
* Slight rephrasing of existing sentences
15+
* Updating npm dependencies
16+
* ...
17+
18+
### Medium-sized Changes: Create a Pull Request
19+
If you want to contribute a medium-sized change, please create a pull request in the respective repository.
20+
21+
Examples:
22+
* Any changes to the documentation
23+
* Bug-Fixes that do not require large changes/redesign (e.g., fixing a segfault)
24+
* Smallish new features (e.g., adding a new command line option, adding a new SQL function (e.g., `sin`))
25+
* Adding new tests
26+
27+
### Large Changes: Discuss first
28+
If you want to contribute a larger change, please open an issue in the respective repository first.
29+
This way, we can discuss the change before you start working on it and we can avoid situations like:
30+
* You working on a feature that is already in development
31+
* You working on a feature that is not in line with the project's goals and won't be merged
32+
* You working on a feature that will not be working soon due to other changes in the project
33+
34+
Examples:
35+
* Add a new compilation backend/target
36+
* Refactor the SQL parser
37+
* Refactorings
38+
* Larger features that touch the code base in many places
39+
* Anything that is more "researchy"
40+
41+
### Before Creating a Pull Request
42+
Before creating a pull request, please make sure that
43+
* the CI pipeline passes and the coverage does not decrease.
44+
* the code is formatted according to the `.clang-format` file in the repository
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: Debugging & Profiling
3+
---
4+
5+
Compared to interpreted execution engines, compiling engines come with many advantages but also some challenges.
6+
Especially debugging and profiling can become a challenge, as one not only needs to debug and profile the engine code, but also the generated code.
7+
Possible solutions to these problems have been discussed before for debugging [Hyper](https://ieeexplore.ieee.org/document/8667737) and [Umbra](https://dl.acm.org/doi/abs/10.1145/3395032.3395321) and [profiling Umbra](https://dl.acm.org/doi/abs/10.1145/3447786.3456254).
8+
9+
## Guide: Profiling queries
10+
For profiling queries LingoDB comes with a *ct* tool that collects several metrics.
11+
For the following instructions, we assume that LingoDB was built in Release mode with debugging informations (`build/lingodb-relwithdebinfo/.buildstamp` ).
12+
13+
1. Run the ct.py script with query and dataset: `python3 tools/ct/ct.py resources/sql/tpch/1.sql resources/data/tpch-1/`. If the build directory is not `build/lingodb-relwithdebinfo`, it can be supplied with the `BIN_DIR` environment variable
14+
2. Open the resulting `ct.json` file with the [CT viewer](https://ct.lingo-db.com) and explore it in detail
15+
16+
## Guide: Debugging
17+
* If the compilation fails: Use [Snapshotting](#snapshotting) to identify the broken/problematic pass. Then run the pass isolated with [mlir-db-opt](../GettingStarted/CommandLineTools.md#performing-optimizations-and-lowerings) for detailed debugging (e.g., with gdb).
18+
* If compilation succeeds but execution fails in/because generated code: First check if the error persists when switching to the [C++-Backend](#c-backend) if possible (i.e., all MLIR operations are supported)
19+
* If yes: debug with this backend.
20+
* If not: you should use the [LLVM Debug Backend](#llvm-debug-backend)
21+
22+
## Components for Debugging and Profiling
23+
### Location Tracking in MLIR
24+
In MLIR, every operation is associated with a *Location*, that must be provided during operation creation.
25+
While it is possible to provide a *Unknown Location*, it should be avoided.
26+
When parsing a MLIR file, MLIR automatically annotates the parsed operations with the corresponding file locations.
27+
When new operations are created during a pass they are usually annotated with the location of the current operation that is transformed or lowered.
28+
**All passes in LingoDB ensure that correct locations are set afterwards.**
29+
30+
### Snapshotting
31+
MLIR already comes with a `LocationSnapshotPass` that takes an operation (e.g. a MLIR Module) and writes it to disk, including the annotated locations.
32+
Then, this file is now read back in, now annotating the locations *according to the location inside this newly written file*.
33+
34+
If enabled (cf [Settings](Settings.md) ), LingoDB performs multiple location snapshots on after every or selected (important) MLIR passes.
35+
36+
Using this snapshot files, we can track the origin of any operation, by recursively following the following steps
37+
1. get the origin location of the current operation by looking in the appropriate snapshot file
38+
2. find the origin operation by going to this location
39+
40+
### Special Compiler Backends
41+
In addition to location tracking and snapshotting, LingoDB implements two special compiler backends for debugging.
42+
43+
#### LLVM-Debug Backend
44+
Instead of using the standard LLVM backend, another LLVM-based backend can be used that adds debug information and performs no optimizations.
45+
This backend is selected by setting the environment variable `LINGODB_EXECUTION_MODE=DEBUGGING`.
46+
During the execution, standard debuggers like `gdb` will then point to the corresponding operation in the last snapshot that was performed
47+
This enables basic tracking of problematic operations, but advanced debugging will remain difficult.
48+
49+
#### C++-Backend
50+
For more advanced debugging, a *C++-Backend* can be used by setting `LINGODB_EXECUTION_MODE=C`.
51+
This backend directly translates a fixed set of low-level generic MLIR operations to C++ statements and functions that are written to a file called `mlir-c-module.cpp`.
52+
Next, LingoDB automatically invokes `clang++` (must be installed!) with `-O0` and `-g` to compile this C++ file into a shared library with debug informations.
53+
This shared library is then loaded with `dlopen` and the main function is called.
54+
Thus, the generated code can be debugged as any usual C++ program.
55+
To help with tracking an error to higher-level MLIR operations, each C++ statement is preceeded with a comment containing the original operation and it's location.
56+
57+
58+
### Lightweight Tracing
59+
When compiled as `RelWithDebInfo`, LingoDB will produce a trace file with events (type, start timestamp, duration, thread) as trace.json.
60+
This trace file can then be opened with the [CT Viewer](https://ct.lingo-db.com)
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
import Tabs from '@theme/Tabs';
2+
import TabItem from '@theme/TabItem';
3+
4+
LingoDB relies on three main external dependencies:
5+
* [LLVM/MLIR 20](https://github.com/llvm/llvm-project)
6+
* [Apache Arrow 20](https://arrow.apache.org/release/19.0.0.html)
7+
* [Boost Context 1.83](https://www.boost.org/doc/libs/1_83_0/libs/context/doc/html/index.html)
8+
9+
**Additional tools and libraries required:**
10+
* C++ compiler supporting C++ 20
11+
* CMake 3.13.4 or newer
12+
* Ninja
13+
* lit (optional, for testing), can be e.g., installed via `pip install lit`
14+
15+
We also provide a [Dockerfile](https://github.com/lingo-db/lingo-db/pkgs/container/lingodb-dev) that contains all dependencies and tools required to build LingoDB.
16+
17+
When building dependencies from source, make sure that either the cmake config files are installed in a system-wide locations, or for example, the `CMAKE_PREFIX_PATH` is set accordingly.
18+
19+
## LLVM/MLIR
20+
21+
<Tabs groupId="os-tabs">
22+
<TabItem value="linux" label="Ubuntu/Linux">
23+
Follow the instructions on [https://apt.llvm.org/](https://apt.llvm.org/) to install the repository on your system.
24+
Then install the following packages: `clang-20 llvm-20 libclang-20-dev llvm-20-dev libmlir-20-dev mlir-20-tools clang-tidy-20`
25+
26+
### Binaries
27+
For other recent Linux distributions, you can also rely on the pre-built binaries provided by the LLVM project on the Github release pages.
28+
29+
### Building from Source
30+
31+
```shell
32+
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-20.1.0-rc1/llvm-project-20.1.0-rc1.src.tar.xz
33+
tar -xf llvm-project-20.1.0-rc1.src.tar.xz
34+
mkdir llvm-project-20.1.0-rc1.src/build
35+
cd llvm-project-20.1.0-rc1.src
36+
export INSTALL_PREFIX=[install_prefix]
37+
env VIRTUAL_ENV=/venv cmake -B build -DLLVM_ENABLE_PROJECTS="llvm;mlir;clang;clang-tools-extra" -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_BUILD_EXAMPLES=OFF -DCMAKE_BUILD_TYPE=Release -G Ninja -DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_BUILD_TESTS=OFF -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=OFF -DLLVM_ENABLE_DUMP=ON -DLLVM_ENABLE_FFI=ON -DCMAKE_CXX_FLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_PARALLEL_TABLEGEN_JOBS=10 -DBUILD_SHARED_LIBS=OFF -DLLVM_INSTALL_UTILS=ON -DLLVM_ENABLE_ZLIB=OFF -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX llvm/
38+
cmake --build build --target install -j$(nproc)
39+
```
40+
41+
</TabItem>
42+
<TabItem value="macos" label="MacOS">
43+
Install LLVM/MLIR using Homebrew and make it available system-wide:
44+
45+
```shell
46+
brew install llvm@20
47+
brew link --force llvm@20
48+
```
49+
50+
### Binaries
51+
⚠️ **Caution**: the pre-built binaries provided by the LLVM project on the Github release pages **DO NOT** serve as a replacement, since they lack the required MLIR support.
52+
53+
### Building from Source
54+
55+
1. Install XCode (through the App Store).
56+
2. Install the build requisites: `brew install cmake ninja z3`
57+
2. Make sure to replace [install_prefix] with your preferred install path of LLVM.
58+
```shell
59+
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-20.1.4/llvm-project-20.1.4.src.tar.xz
60+
tar -xf llvm-project-20.1.4.src.tar.xz
61+
mkdir llvm-project-20.1.4.src/build
62+
cd llvm-project-20.1.4.src
63+
export INSTALL_PREFIX=[install_prefix]
64+
export SDKROOT=$(xcrun --sdk macosx --show-sdk-path)
65+
env VIRTUAL_ENV=/venv cmake -B build -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;mlir;polly;lldb" -DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi;libunwind;pstl;openmp" -DLLVM_TARGETS_TO_BUILD="AArch64" -DLLVM_BUILD_EXAMPLES=OFF -DCMAKE_BUILD_TYPE=Release -G Ninja -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_BUILD_TESTS=OFF -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON -DLLVM_ENABLE_DUMP=ON -DLLVM_ENABLE_FFI=ON -DCMAKE_CXX_FLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_PARALLEL_TABLEGEN_JOBS=10 -DBUILD_SHARED_LIBS=OFF -DLLVM_INSTALL_UTILS=ON -DLLVM_ENABLE_ZLIB=OFF -DLLVM_POLLY_LINK_INTO_TOOLS=ON -DLLVM_BUILD_EXTERNAL_COMPILER_RT=ON -DLLVM_ENABLE_EH=OFF -DLLVM_ENABLE_RTTI=ON -DLLVM_INCLUDE_DOCS=OFF -DLLVM_INCLUDE_TESTS=OFF -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_USE_RELATIVE_PATHS_IN_FILES=ON -DLLVM_SOURCE_PREFIX=. -DLLDB_USE_SYSTEM_DEBUGSERVER=ON -DLIBOMP_INSTALL_ALIASES=OFF -DLIBCXX_INSTALL_MODULES=ON -DLLVM_CREATE_XCODE_TOOLCHAIN=OFF -DCLANG_FORCE_MATCHING_LIBCLANG_SOVERSION=OFF -DLLVM_BUILD_LLVM_C_DYLIB=ON -DLLVM_ENABLE_LIBCXX=ON -DLIBCXX_PSTL_BACKEND=libdispatch -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_PROJECT_TOP_LEVEL_INCLUDES=/opt/homebrew/Library/Homebrew/cmake/trap_fetchcontent_provider.cmake -Wno-dev -DCMAKE_OSX_SYSROOT=$(xcrun --sdk macosx --show-sdk-path) -DLLVM_ENABLE_Z3_SOLVER=ON -DFFI_INCLUDE_DIR=$(xcrun --sdk macosx --show-sdk-path)/usr/include/ffi -DFFI_LIBRARY_DIR=$(xcrun --sdk macosx --show-sdk-path)/usr/lib -DLIBCXX_INSTALL_LIBRARY_DIR=$INSTALL_PREFIX/lib/c++ -DLIBUNWIND_INSTALL_LIBRARY_DIR=$INSTALL_PREFIX/lib/unwind -DLIBCXXABI_INSTALL_LIBRARY_DIR=$INSTALL_PREFIX/c++ -DRUNTIMES_CMAKE_ARGS="-DCMAKE_INSTALL_RPATH=@loader_path|@loader_path/../unwind" -DBUILTINS_CMAKE_ARGS="-DCOMPILER_RT_ENABLE_IOS=OFF;-DCOMPILER_RT_ENABLE_WATCHOS=OFF;-DCOMPILER_RT_ENABLE_TVOS=OFF" -DCMAKE_PREFIX_PATH=/opt/homebrew -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX llvm/
66+
cmake --build build --target install -j$(sysctl -n hw.logicalcpu)
67+
```
68+
</TabItem>
69+
</Tabs>
70+
71+
## Apache Arrow
72+
73+
<Tabs groupId="os-tabs">
74+
<TabItem value="linux" label="Ubuntu/Linux">
75+
76+
```shell
77+
wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
78+
apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
79+
apt-get update
80+
apt-get install libarrow-dev=20.*
81+
```
82+
83+
### Binaries
84+
For other recent Linux distributions, you can also rely on the pre-built binaries provided by the Apache Arrow project.
85+
86+
### Building from Source
87+
88+
```shell
89+
wget https://github.com/apache/arrow/releases/download/apache-arrow-20.0.0/apache-arrow-20.0.0.tar.gz
90+
tar -xf apache-arrow-20.0.0.tar.gz
91+
cd apache-arrow-20.0.0/cpp
92+
cmake -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=[output-dir] -DARROW_DEPENDENCY_SOURCE=BUNDLED -DARROW_BUILD_STATIC=ON -DARROW_CSV=ON -DARROW_JSON=ON -DARROW_COMPUTE=ON apache-arrow-20.0.0/cpp
93+
cmake --build build --target install -j$(nproc)
94+
```
95+
96+
</TabItem>
97+
<TabItem value="macos" label="MacOS">
98+
99+
Install Apache Arrow using Homebrew:
100+
101+
```shell
102+
brew tap lingo-db/homebrew https://github.com/lingo-db/homebrew.git
103+
brew install lingo-db/homebrew/apache-arrow@20
104+
```
105+
106+
### Building from Source
107+
108+
```shell
109+
wget https://github.com/apache/arrow/releases/download/apache-arrow-20.0.0/apache-arrow-20.0.0.tar.gz
110+
tar -xf apache-arrow-20.0.0.tar.gz
111+
cd apache-arrow-20.0.0/cpp
112+
cmake -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=[output-dir] -DARROW_DEPENDENCY_SOURCE=BUNDLED -DARROW_BUILD_STATIC=ON -DARROW_CSV=ON -DARROW_JSON=ON -DARROW_COMPUTE=ON -DCMAKE_PREFIX_PATH=/opt/homebrew/ -DCMAKE_CXX_COMPILER=/opt/homebrew/bin/clang++ -DCMAKE_C_COMPILER=/opt/homebrew/bin/clang
113+
cmake --build build --target install -j$(sysctl -n hw.logicalcpu)
114+
```
115+
116+
</TabItem>
117+
</Tabs>
118+
119+
## Boost Context
120+
121+
<Tabs groupId="os-tabs">
122+
<TabItem value="linux" label="Ubuntu/Linux">
123+
124+
```shell
125+
apt-get install libboost-context1.83-dev
126+
```
127+
128+
### Build from Source
129+
```shell
130+
wget https://archives.boost.io/release/1.83.0/source/boost_1_83_0.tar.gz
131+
tar -xf boost_1_83_0.tar.gz
132+
cd boost_1_83_0
133+
./bootstrap.sh --prefix=/usr # or any other directory in the PATH/LD_LIBRARY_PATH
134+
./b2 install --with-context
135+
```
136+
137+
</TabItem>
138+
<TabItem value="macos" label="MacOS">
139+
140+
Install Boost Context using Homebrew:
141+
142+
```shell
143+
brew install boost
144+
```
145+
146+
</TabItem>
147+
</Tabs>

0 commit comments

Comments
 (0)