Skip to content

Commit bba4e51

Browse files
TanZiYenElliezza
andauthored
docs: update the quickstart and sdk folder (#3537)
* Docs: Update-quickstart-sdk-folder * Docs: update-quickstart-sdk-index-file * Update compile.md Update referred document to the latest * Update cxx_sdk.md with recent updates * Update go_sdk.md * Update rest_api.md * Update java_sdk.md to most recent upate * Update python_sdk.md to most recent updates * Update python_sdk.md image reference link --------- Co-authored-by: Siqi Wang <[email protected]>
1 parent 19a8c9a commit bba4e51

File tree

8 files changed

+476
-256
lines changed

8 files changed

+476
-256
lines changed

docs/en/deploy/compile.md

Lines changed: 78 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
1-
# Build
1+
# Compilation from Source Code
22

3-
## 1. Quick Start
3+
## Compile and Use in Docker Container
44

5-
[quick-start]: quick-start
6-
7-
This section describes the steps to compile and use OpenMLDB inside its official docker image [hybridsql](https://hub.docker.com/r/4pdosc/hybridsql).
8-
The docker image has packed required tools and dependencies, so there is no need to set them up separately. To compile without the official docker image, refer to the section [Detailed Instructions for Build](#detailed-instructions-for-build) below.
5+
This section describes the steps to compile and use OpenMLDB inside its official docker image [hybridsql](https://hub.docker.com/r/4pdosc/hybridsql), mainly for quick start and development purposes in the docker container.
6+
The docker image has packed the required tools and dependencies, so there is no need to set them up separately. To compile without the official docker image, refer to the section [Detailed Instructions for Build](#detailed-instructions-for-build) below.
97

108
Keep in mind that you should always use the same version of both compile image and [OpenMLDB version](https://github.com/4paradigm/OpenMLDB/releases). This section demonstrates compiling for [OpenMLDB v0.8.3](https://github.com/4paradigm/OpenMLDB/releases/tag/v0.8.3) under `hybridsql:0.8.3` ,If you prefer to compile on the latest code in `main` branch, pull `hybridsql:latest` image instead.
119

@@ -15,13 +13,13 @@ Keep in mind that you should always use the same version of both compile image a
1513
docker pull 4pdosc/hybridsql:0.8
1614
```
1715

18-
2. Create a docker container with the hybridsql docker image
16+
2. Create a docker container
1917

2018
```bash
2119
docker run -it 4pdosc/hybridsql:0.8 bash
2220
```
2321

24-
3. Download the OpenMLDB source code inside the docker container, and setting the branch into v0.8.3
22+
3. Download the OpenMLDB source code inside the docker container, and set the branch into v0.8.3
2523

2624
```bash
2725
cd ~
@@ -41,52 +39,49 @@ Keep in mind that you should always use the same version of both compile image a
4139
make install
4240
```
4341

44-
Now you've finished the compilation job, and you may try run OpenMLDB inside the docker container.
42+
Now you've finished the compilation job, you may try running OpenMLDB inside the docker container.
4543

46-
## 2. Detailed Instructions for Build
44+
## Detailed Instructions for Build
4745

48-
[build]: build
46+
This chapter discusses compiling source code without relying on pre-built container environments.
4947

50-
### 2.1. Hardware Requirements
48+
### Hardware Requirements
5149

5250
- **Memory**: 8GB+ recommended.
5351
- **Disk Space**: >=25GB of free disk space for full compilation.
5452
- **Operating System**: CentOS 7, Ubuntu 20.04 or macOS >= 10.15, other systems are not carefully tested but issue/PR welcome
53+
- **CPU Architecture**: Currently, only x86 architecture is supported, and other architectures like ARM are not supported at the moment (please note that running x86 images on heterogeneous systems like M1 Mac is also not supported at this time).
5554

56-
Note: By default, the parallel build is disabled, and it usually takes an hour to finish all the compile jobs. You can enable the parallel build by tweaking the `NPROC` option if your machine's resource is enough. This will reduce the compile time but also consume more memory. For example, the following command set the number of concurrent build jobs to 4:
55+
💡 Note: By default, the parallel build is disabled, and it usually takes an hour to finish all the compile jobs. You can enable the parallel build by tweaking the `NPROC` option if your machine's resource is enough. This will reduce the compile time but also consume more memory. For example, the following command sets the number of concurrent build jobs to 4:
5756

5857
```bash
5958
make NPROC=4
6059
```
6160

62-
### 2.2. Prerequisites
63-
64-
Make sure those tools are installed
65-
61+
### Dependencies
6662
- gcc >= 8 or AppleClang >= 12.0.0
67-
- cmake 3.20 or later ( < cmake 3.24 is better)
63+
- cmake 3.20 or later ( recommended < cmake 3.24)
6864
- jdk 8
6965
- python3, python setuptools, python wheel
7066
- If you'd like to compile thirdparty from source, checkout the [third-party's requirement](../../third-party/README.md) for extra dependencies
7167

72-
### 2.3. Build and Install OpenMLDB
68+
### Build and Install OpenMLDB
7369

7470
Building OpenMLDB requires certain thirdparty dependencies. Hence a Makefile is provided as a convenience to setup thirdparty dependencies automatically and run CMake project in a single command `make`. The `make` command offers three methods to compile, each manages thirdparty differently:
7571

76-
- **Method One: Build and Run Inside Docker:** Using [hybridsql](https://hub.docker.com/r/4pdosc/hybridsql) docker image, the thirdparty is already bundled inside the image and no extra steps are required, refer to above section [Quick Start](#quick-start)
77-
- **Method Two: Download Pre-Compiled Thirdparty:** Command is `make && make install`. It downloads necessary prebuild libraries from [hybridsql-assert](https://github.com/4paradigm/hybridsql-asserts/releases) and [zetasql](https://github.com/4paradigm/zetasql/releases). Currently it supports CentOS 7, Ubuntu 20.04 and macOS.
78-
- **Method Three: Compile Thirdparty from Source:** This is the suggested way if the host system is not in the supported list for pre-compiled thirdparty (CentOS 7, Ubuntu 20.04 and macOS). Note that when compiling thirdparty for the first time requires extra time to finish, approximately 1 hour on a 2 core & 7 GB machine. To compile thirdparty from source, please pass `BUILD_BUNDLED=ON` to `make`:
72+
- **Method One: Download Pre-Compiled Thirdparty:** Command is `make && make install`. It downloads necessary prebuild libraries from [hybridsql-assert](https://github.com/4paradigm/hybridsql-asserts/releases) and [zetasql](https://github.com/4paradigm/zetasql/releases). Currently it supports CentOS 7, Ubuntu 20.04 and macOS.
73+
- **Method Two: Compile Thirdparty from Source:** This is the suggested way if the host system is not in the supported list for pre-compiled thirdparty (CentOS 7, Ubuntu 20.04 and macOS). Note that when compiling thirdparty for the first time requires extra time to finish, approximately 1 hour on a 2 core & 8 GB machine. To compile thirdparty from source, please pass `BUILD_BUNDLED=ON` to `make`:
7974

8075
```bash
8176
make BUILD_BUNDLED=ON
8277
make install
8378
```
8479

85-
All of the three methods above will install OpenMLDB binaries into `${PROJECT_ROOT}/openmldb` by default, you may tweak the installation directory with the option `CMAKE_INSTALL_PREFIX` (refer the following section [Extra options for `make`](#24-extra-options-for-make)).
80+
All of the three methods above will install OpenMLDB binaries into `${PROJECT_ROOT}/openmldb` by default, you may tweak the installation directory with the option `CMAKE_INSTALL_PREFIX` (refer to the following section [Extra Parameters for `make`](#extra-parameters-for-make) ).
8681

87-
### 2.4. Extra Options for `make`
82+
### Extra Parameters for `make`
8883

89-
You can customize the `make` behavior by passing following arguments, e.g., changing the build mode to `Debug` instead of `Release`:
84+
You can customize the `make` behavior by passing the following arguments, e.g., changing the build mode to `Debug` instead of `Release`:
9085

9186
```bash
9287
make CMAKE_BUILD_TYPE=Debug
@@ -132,10 +127,14 @@ make CMAKE_BUILD_TYPE=Debug
132127

133128
Default: ON
134129

135-
- OPENMLDB_BUILD_TARGET: If you only want to build some targets, not all, e.g. only build a test `ddl_parser_test`, you can set it to `ddl_parser_test`. Multiple targets may be given, separated by spaces. It can reduce the build time, reduce the build output, save the storage space.
130+
- OPENMLDB_BUILD_TARGET: If you only want to build some targets, not all, e.g. only build a test `ddl_parser_test`, you can set it to `ddl_parser_test`. Multiple targets may be given, separated by spaces. It can reduce build time, reduce build output, and save storage space.
136131

137132
Default: all
138133

134+
- THIRD_PARTY_CMAKE_FLAGS: You can use this to configure additional parameters when compiling third-party dependencies. For instance, to specify concurrent compilation for each third-party project, you can set` THIRD_PARTY_CMAKE_FLAGS` to `-DMAKEOPTS=-j8`. Please note that NPROC does not affect third-party compilation; multiple third-party projects will be executed sequentially.
135+
136+
Default: ''
137+
139138
### Build Java SDK with Multi Processes
140139

141140
```
@@ -144,7 +143,7 @@ make SQL_JAVASDK_ENABLE=ON NPROC=4
144143

145144
The built jar packages are in the `target` path of each submodule. If you want to use the jar packages built by yourself, please DO NOT add them by systemPath(may get `ClassNotFoundException` about Protobuf and so on, requires a little work in compile and runtime phase). The better way is, use `mvn install -DskipTests=true -Dscalatest.skip=true -Dwagon.skip=true -Dmaven.test.skip=true -Dgpg.skip` to install them in local m2 repository, your project will use them.
146145

147-
## 3. Optimized Spark Distribution for OpenMLDB
146+
## Optimized Spark Distribution for OpenMLDB
148147

149148
[OpenMLDB Spark Distribution](https://github.com/4paradigm/spark) is the fork of [Apache Spark](https://github.com/apache/spark). It adopts specific optimization techniques for OpenMLDB. It provides native `LastJoin` implementation and achieves 10x~100x performance improvement compared with the original Spark distribution. The Java/Scala/Python/SQL APIs of the OpenMLDB Spark distribution are fully compatible with the standard Spark distribution.
150149

@@ -171,3 +170,55 @@ export SPARK_HOME=`pwd`
171170
```
172171

173172
3. Now you are all set to run OpenMLDB by enjoying the performance speedup from this optimized Spark distribution.
173+
174+
175+
## Build for Other OS
176+
As previously mentioned, if you want to run OpenMLDB or the SDK on a different OS, you will need to compile from the source code. We provide quick compilation solutions for several operating systems. For other OS, you'll need to perform source code compilation on your own.
177+
178+
### Centos 6 or other glibc Linux OS
179+
#### Local Compilation
180+
To compile a version compatible with CentOS 6, you can use Docker and the `steps/centos6_build.sh` script. As shown below, we use the current directory as the mount directory and place the compilation output locally.
181+
182+
```bash
183+
git clone https://github.com/4paradigm/OpenMLDB.git
184+
cd OpenMLDB
185+
docker run -it -v`pwd`:/root/OpenMLDB ghcr.io/4paradigm/centos6_gcc7_hybridsql bash
186+
```
187+
Execute the compilation script within the container, and the output will be in the "build" directory. If there are failures while downloading `bazel` or `icu4c` during compilation, you can use the image sources provided by OpenMLDB by configuring the environment variable `OPENMLDB_SOURCE=true`. Various environment variables that can be used with "make" will also work, as shown below.
188+
189+
```bash
190+
cd OpenMLDB
191+
bash steps/centos6_build.sh
192+
# THIRD_PARTY_CMAKE_FLAGS=-DMAKEOPTS=-j8 bash steps/centos6_build.sh # run fast when build single project
193+
# OPENMLDB_SOURCE=true bash steps/centos6_build.sh
194+
# SQL_JAVASDK_ENABLE=ON SQL_PYSDK_ENABLE=ON NPROC=8 bash steps/centos6_build.sh # NPROC will build openmldb in parallel, thirdparty should use THIRD_PARTY_CMAKE_FLAGS
195+
```
196+
197+
For a local compilation with a 2.20GHz CPU, SSD hard drive, and 32 threads to build both third-party libraries and the OpenMLDB core, the approximate timeframes are as follows:
198+
`THIRD_PARTY_CMAKE_FLAGS=-DMAKEOPTS=-j32 SQL_JAVASDK_ENABLE=ON SQL_PYSDK_ENABLE=ON NPROC=32 bash steps/centos6_build.sh`
199+
- third-party (excluding source code download time): Approximately 40 minutes:
200+
- Zetasql patch: 13 minutes
201+
- Compilation of all third-party dependencies: 30 minutes
202+
- OpenMLDB core, including Python and Java native components: Approximately 12 minutes
203+
204+
Please note that these times can vary depending on your specific hardware and system performance. The provided compilation commands and environment variables are optimized for multi-threaded compilation, which can significantly reduce build times.
205+
206+
#### Cloud Compilation
207+
208+
After forking the OpenMLDB repository, you can trigger the `Other OS Build` workflow in `Actions`, and the output will be available in the `Actions` `Artifacts`. Here's how to configure the workflow:
209+
210+
- Do not change the `Use workflow from` setting to a specific tag; it can be another branch.
211+
- Choose the desired `OS name`, which in this case is `centos6`.
212+
- If you are not compiling the main branch, provide the name of the branch, tag (e.g., v0.8.2), or SHA you want to compile in the `The branch, tag, or SHA to checkout, otherwise use the branch` field.
213+
- The compilation output will be accessible in "runs", as shown in an example [here](https://github.com/4paradigm/OpenMLDB/actions/runs/6044951902).
214+
- The workflow will definitely produce the OpenMLDB binary file.
215+
- If you don't need the Java or Python SDK, you can configure `java sdk enable` or `python sdk enable` to be "OFF" to save compilation time.
216+
217+
Please note that this compilation process involves building third-party dependencies from source code, and it may take a while to complete due to limited resources. The approximate time for this process is around 3 hours and 5 minutes (2 hours for third-party dependencies and 1 hour for OpenMLDB). However, the workflow caches the compilation output for third-party dependencies, so the second compilation will be much faster, taking approximately 1 hour and 15 minutes for OpenMLDB.
218+
219+
### Macos 10.15, 11
220+
221+
MacOS doesn't require compiling third-party dependencies from source code, so compilation is relatively faster, taking about 1 hour and 15 minutes. Local compilation is similar to the steps outlined in the [Detailed Instructions for Build](#detailed-instructions-for-build) and does not require compiling third-party dependencies (`BUILD_BUNDLED=OFF`). For cloud compilation on macOS, trigger the `Other OS Build` workflow in `Actions` with the specified macOS version (`os name` as `macos10` or `macos11`). You can also disable Java or Python SDK compilation if they are not needed, by setting `java sdk enable` or `python sdk enable` to `OFF`.
222+
223+
224+

docs/en/quickstart/sdk/cpp_sdk.md

Lines changed: 0 additions & 117 deletions
This file was deleted.

0 commit comments

Comments
 (0)