docs: updates on integration and develop folder (#3739)

Elliezza · web-flow · commit 9f0d3fc7db7c · 2024-02-22T11:27:33.000+08:00
* update integration title capitalization

* updates on develop folder
diff --git a/docs/en/developer/built_in_function_develop_guide.md b/docs/en/developer/built_in_function_develop_guide.md
diff --git a/docs/en/developer/contributing.md b/docs/en/developer/contributing.md
@@ -1,3 +1,26 @@
 # Contributing
 Please refer to [Contribution Guideline](https://github.com/4paradigm/OpenMLDB/blob/main/CONTRIBUTING.md)
 
+## Pull Request (PR) Guidelines
+
+When submitting a PR, please pay attention to the following points:
+- PR Title: Please adhere to the [commit format](https://github.com/4paradigm/rfcs/blob/main/style-guide/commit-convention.md#conventional-commits-reference) for the PR title. **Note that this refers to the PR title, not the commits within the PR**.
+```{note}
+If the title does not meet the standard, `pr-linter / pr-name-lint (pull_request)` will fail with a status of `x`.
+```
+- PR Checks: There are various checks in a PR, and only `codecov/patch` and `codecov/project` may not pass. Other checks should pass. If other checks do not pass and you cannot fix them or believe they should not be fixed, you can leave a comment in the PR.
+
+- PR Description: Please explain the intent of the PR in the first comment of the PR. We provide a PR comment template, and while you are not required to follow it, ensure that there is sufficient explanation.
+
+- PR Files Changed: Pay attention to the `files changed` in the PR. Do not include code changes outside the scope of the PR intent. You can generally eliminate unnecessary diffs by using `git merge origin/main` followed by `git push` to the PR branch. If you need assistance, leave a comment in the PR.
+```{note}
+If you are not modifying the code based on the main branch, when the PR intends to merge into the main branch, the `files changed` will include unnecessary code. For example, if the main branch is at commit 10, and you start from commit 9 of the old main, add new_commit1, and then add new_commit2 on top of new_commit1, you actually only want to submit new_commit2, but the PR will include new_commit1 and new_commit2.
+In this case, just use `git merge origin/main` and `git push` to the PR branch to only include the changes.
+```
+```{seealso}
+If you want the branch code to be cleaner, you can avoid using `git merge` and use `git rebase -i origin/main` instead. It will add your changes one by one on top of the main branch. However, it will change the commit history, and you need `git push -f` to override the branch.
+```
+
+## Compilation Guidelines
+
+For compilation details, refer to the [Compilation Documentation](../deploy/compile.md). To avoid the impact of operating systems and tool versions, we recommend compiling OpenMLDB in a compilation image. Since compiling the entire OpenMLDB requires significant space, we recommend using `OPENMLDB_BUILD_TARGET` to specify only the parts you need.
diff --git a/docs/en/developer/index.rst b/docs/en/developer/index.rst
@@ -10,4 +10,3 @@ Developers
     built_in_function_develop_guide
     sdk_develop
     python_dev
-    udf_develop_guide
diff --git a/docs/en/developer/python_dev.md b/docs/en/developer/python_dev.md
@@ -2,9 +2,19 @@
 
 There are two modules in `python/`: Python SDK and an OpenMLDB diagnostic tool.
 
-## SDK Testing Methods
+## SDK 
+
+The Python SDK itself does not depend on the pytest and tox libraries used for testing. If you want to use the tests in the tests directory for testing, you can download the testing dependencies using the following method.
+
+```
+pip install 'openmldb[test]'
+pip install 'dist/....whl[test]'
+```
+
+### Testing Method
+
+Run the command `make SQL_PYSDK_ENABLE=ON OPENMLDB_BUILD_TARGET=cp_python_sdk_so` under the root directory and make sure the library in `python/openmldb_sdk/openmldb/native/` was the latest native library. Testing typically requires connecting to an OpenMLDB cluster. If you haven't started a cluster yet, or if you've made code changes to the service components, you'll also need to compile the TARGET openmldb and start a onebox cluster. You can refer to the launch section of `steps/test_python.sh` for guidance.
 
-Run the command `make SQL_PYSDK_ENABLE=ON OPENMLDB_BUILD_TARGET=cp_python_sdk_so` under the root directory and make sure the library in `python/openmldb_sdk/openmldb/native/` was the latest native library.
 1. Package installation test: Install the compiled `whl`, then run `pytest tests/`. You can use the script `steps/test_python.sh` directly.
 2. Dynamic test: Make sure there isn't OpenMLDB in `pip` or the compiled `whl`. Run `pytest test/` in `python/openmldb_sdk`, thereby you can easily debug.
 
@@ -32,6 +42,11 @@ If the python log messages are required in all tests(even successful tests), ple
 pytest -so log_cli=true --log-cli-level=DEBUG tests/
 ```
 
+You can also use the module mode for running tests, which is suitable for actual runtime testing.
+```
+python -m diagnostic_tool.diagnose ...
+```
+
 ## Conda
 
 If you use conda, `pytest` may found the wrong python, then get errors like `ModuleNotFoundError: No module named 'IPython'`. Please use `python -m pytest`.
diff --git a/docs/en/developer/sdk_develop.md b/docs/en/developer/sdk_develop.md
@@ -9,22 +9,19 @@ The OpenMLDB SDK can be divided into several layers, as shown in the figure. The
 The bottom layer is the SDK core layer, which is implemented as [SQLClusterRouter](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L110). It is the  core layer of **client**. All operations on OpenMLDB clusters can be done by using the methods of `SQLClusterRouter` after proper configuration.
 
 Three core methods of this layer that developers may need to use are:
-
 1. [ExecuteSQL](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L160) supports the execution of all SQL commands, including DDL, DML and DQL.
 2. [ExecuteSQLParameterized](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L166)supports parameterized SQL.
 3. [ExecuteSQLRequest](https://github.com/4paradigm/OpenMLDB/blob/b6f122798f567adf2bb7766e2c3b81b633ebd231/src/sdk/sql_cluster_router.h#L156)is the special methods for the OpenMLDB specific execution mode: [Online Request mode](../tutorial/modes.md#4-the-online-request-mode).
 
-
+Other methods, such as CreateDB, DropDB, DropTable, have not been removed promptly due to historical reasons. Developers don't need to be concerned about them.
 
 ### Wrapper Layer
-Due to the complexity of the implementation of the SDK Layer, we didn't develop the Java and Python SDKs from scratch, but to use Java and Python to call the **SDK Layer**. Specifically, we made a wrapper layer using Swig.
+Due to the complexity of the implementation of the SDK Layer, we didn't develop the Java and Python SDKs from scratch, but to use Java and Python to call the **SDK Layer**. Specifically, we made a wrapper layer using swig.
 
 Java Wrapper is implemented as [SqlClusterExecutor](https://github.com/4paradigm/OpenMLDB/blob/main/java/openmldb-jdbc/src/main/java/com/_4paradigm/openmldb/sdk/impl/SqlClusterExecutor.java). It is a simple wrapper of `sql_router_sdk`, including the conversion of input types, the encapsulation of returned results, the encapsulation of returned errors.
 
 Python Wrapper is implemented as [OpenMLDBSdk](https://github.com/4paradigm/OpenMLDB/blob/main/python/openmldb/sdk/sdk.py). Like the Java Wrapper, it is a simple wrapper as well.
 
-
-
 ### User Layer
 Although the Wrapper Layer can be used directly, it is not convenient enough. So, we develop another layer, the User Layer of the Java/Python SDK.
 
@@ -36,7 +33,8 @@ The Python User Layer supports the `sqlalchemy`. See [sqlalchemy_openmldb](https
 
 We want an easier to use C++ SDK which doesn't need a Wrapper Layer.
 Therefore, in theory, developers only need to design and implement the user layer, which calls the SDK layer.
-However, in consideration of code reuse, the SDK Layer code may be changed to some extent, or the core  SDK code structure may be adjusted (for example, exposing part of the SDK Layer header file, etc.).
+
+However, in consideration of code reuse, the SDK Layer code may be changed to some extent, or the core SDK code structure may be adjusted (for example, exposing part of the SDK Layer header file, etc.).
 
 ## Details of SDK Layer 
 
@@ -48,7 +46,6 @@ The first two methods are using two options, which create a server connecting Cl
 ```
 These two methods, which do not expose the metadata related DBSDK, are suitable for ordinary users. The underlayers of Java and Python SDK also use these two approaches.
 
-
 Another way is to create based on DBSDK:
 ```
 explicit SQLClusterRouter(DBSDK* sdk);
@@ -85,4 +82,18 @@ If you only want to run JAVA testing, try the commands below:
 ```
 mvn test -pl openmldb-jdbc -Dtest="SQLRouterSmokeTest"
 mvn test -pl openmldb-jdbc -Dtest="SQLRouterSmokeTest#AnyMethod"
-```
+```
+
+### batchjob test
+
+batchjob tests can be done using the following method:
+```
+$SPARK_HOME/bin/spark-submit --master local --class com._4paradigm.openmldb.batchjob.ImportOfflineData --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083 --conf spark.openmldb.zk.root.path=/openmldb --conf spark.openmldb.zk.cluster=127.0.0.1:2181 openmldb-batchjob/target/openmldb-batchjob-0.6.5-SNAPSHOT.jar load_data.txt true
+```
+
+Alternatively, you can copy the compiled openmldb-batchjob JAR file to the `lib` directory of the task manager in the OpenMLDB cluster. Then, you can use the client or Taskmanager Client to send commands for testing.
+
+When using Hive as a data source, make sure the metastore service is available. For local testing, you can start the metastore service in the Hive directory with the default address being `thrift://localhost:9083`.
+```
+bin/hive --service metastore
+```
diff --git a/docs/en/developer/udf_develop_guide.md b/docs/en/developer/udf_develop_guide.md
diff --git a/docs/en/integration/deploy_integration/index.rst b/docs/en/integration/deploy_integration/index.rst
@@ -1,5 +1,5 @@
 =============================
-dispatch
+Dispatch
 =============================
 
 .. toctree::
diff --git a/docs/en/integration/index.rst b/docs/en/integration/index.rst
@@ -1,5 +1,5 @@
 =============================
-Upstream and downstream ecology
+Upstream and Downstream Ecology
 =============================
 
 .. toctree::
diff --git a/docs/en/integration/online_datasources/index.rst b/docs/en/integration/online_datasources/index.rst
@@ -1,5 +1,5 @@
 =============================
-online data source
+Online Data Source
 =============================
 
 .. toctree::