Skip to content

Commit 906bd4c

Browse files
authored
Merge pull request #259 from wpleonardo/master
Added the articles of DBA advanced tutorials
2 parents e4295d5 + da975f9 commit 906bd4c

File tree

278 files changed

+12811
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

278 files changed

+12811
-3
lines changed
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: Overview
3+
weight: 1
4+
---
5+
6+
In the first half of 2024, we provided a [getting-started tutorial](https://oceanbase.github.io/docs/user_manual/quick_starts/en-US/chapter_01_overview_of_the_oceanbase_database/introduction) for beginners of OceanBase Database Community Edition. The tutorial covers topics such as the installation and deployment of OceanBase clusters, data migration, and database testing.
7+
8+
As users become more familiar with OceanBase Database, we will release a tutorial for those who are conducting proof of concept (POC) tests and have deployed OceanBase Database for their business. This tutorial, titled *OceanBase Advanced Tutorial for DBAs*, will cover the following sections:
9+
10+
+ Best practices for different scenarios
11+
+ Troubleshooting manuals for the platform tools and database kernel
12+
+ Application development specifications
13+
14+
> Note: At present, *OceanBase Advanced Tutorial for DBAs* applies only to OceanBase Database Community Edition. Therefore, the arbitration replica feature of OceanBase Database Enterprise Edition is not included in the tutorial outline provided in this topic. For more information about the differences between the two editions, see [Differences between Enterprise Edition and Community Edition](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001714481).
15+
## Tutorial Outline
16+
The following figure shows the outline of this tutorial.
17+
18+
(The outline will be continuously adjusted based on user feedback and actual circumstances during the process of tutorial writing. The outline shown in the figure below is not the final version.)
19+
20+
![image.png](/img/user_manual/operation_and_maintenance/en-US/about_this_manual/001.png)
21+
22+
*OceanBase Advanced Tutorial for DBAs* (V4.x)
23+
24+
+ Best practices
25+
- O&M management
26+
- Data migration and synchronization
27+
- Parameter templates
28+
- OLAP scenario
29+
- High-concurrency scenario
30+
- SaaS multitenancy scenario
31+
- History/Archive database scenario
32+
- Read/Write splitting scenario
33+
+ Troubleshooting
34+
- Troubleshooting manual for OceanBase Deployer (obd)
35+
- Troubleshooting manual for OceanBase Migration Service (OMS)
36+
- Troubleshooting manual for OceanBase Database Proxy (ODP)
37+
- Troubleshooting manual for OBServer
38+
- User manual for automatic diagnostics
39+
+ Database development specifications
40+
- Tenant usage specifications
41+
- Database object design
42+
- SQL development specifications
43+
- Usage limitations
44+
- Others
45+
+ Methods and examples for GitBook collaborative building
46+
47+
Additionally, we'd love to hear your suggestions on the framework and content of this tutorial. Please feel free to share any topics you urgently need based on your experience with OceanBase Database Community Edition.
48+
49+
We will improve this tutorial based on your feedback. We are expecting your comments on our post in the [OceanBase community](https://ask.oceanbase.com/t/topic/35610431/).
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
title: About OceanBase Advanced Tutorial for DBAs
3+
weight: 1
4+
---
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
---
2+
title: Database Object Design and Usage Specifications
3+
weight: 1
4+
---
5+
> Note:
6+
>
7+
> **<font color="red">You must follow the specifications highlighted in red in this topic. </font>**
8+
>
9+
> The specifications that are not highlighted in red are optional but recommended. You can determine whether to follow these specifications based on your business requirements.
10+
11+
12+
## Object Naming Specifications
13+
14+
This section will not discuss object naming specifications in detail because they have been frequently mentioned. Essentially, the name of an object cannot be excessively long and must reflect the object type and corresponding business meaning, such as `tbl_student_id`. You may have your own naming styles. I think there is no right or wrong, and unified specifications are unnecessary.
15+
16+
We recommend that **you do not use special characters or keywords in object names unless you have special requirements**. This is because such names are awkward to read and use.
17+
18+
For example, a user once used the reserved keyword `table` as a table name and the escape character backtick (`) as a column name. The names look as if they have been encrypted. This not only makes internal users uncomfortable but also creates a headache for the technical support team when they troubleshoot issues.
19+
```
20+
obclient [test]> create table `table` (```` int);
21+
Query OK, 0 rows affected (0.050 sec)
22+
23+
obclient [test]> insert into `table` values(123);
24+
Query OK, 1 row affected (0.007 sec)
25+
26+
obclient [test]> select ```` from `table`;
27+
+------+
28+
| ` |
29+
+------+
30+
| 123 |
31+
+------+
32+
1 row in set (0.000 sec)
33+
```
34+
This section does not describe how to rename an object because it is a simple operation.
35+
36+
## Tenant Usage Specifications
37+
38+
In OceanBase Database, each tenant is similar to a MySQL instance. For more information, see [Tenants](https://oceanbase.github.io/docs/user_manual/operation_and_maintenance/en-US/scenario_best_practices/chapter_01_multi_tenants/background_knowledge).
39+
40+
**<font color="red">You are not allowed to store data in the sys tenant. To store data, you must create a user tenant.</font>**
41+
42+
**<font color="red">The sys tenant is designed to store the metadata of user tenants and does not provide database services. Misuse of the sys tenant may cause serious impact.</font>**
43+
44+
## Database Usage Specifications
45+
**<font color="red">You are not allowed to store user data in built-in metadatabases such as `information_schema` and `oceanbase`. Misuse of the metadatabases may cause serious impact.</font>**
46+
```
47+
obclient [test]> show databases;
48+
+--------------------+
49+
| Database |
50+
+--------------------+
51+
| information_schema |
52+
| mysql |
53+
| obproxy |
54+
| oceanbase |
55+
| test |
56+
+--------------------+
57+
5 rows in set (0.007 sec)
58+
```
59+
60+
## Table Design Specifications
61+
62+
To build a table with smaller redundancy and a more reasonable schema, you must follow specific rules when you design a database. In a relational database, these rules are called paradigms. You need to understand three paradigms for database design.
63+
64+
65+
* Consider business performance in table schema design based on the aforementioned three paradigms. Design data redundancy in storage to reduce table associations and improve business performance. A redundant column cannot be:
66+
67+
* A column subject to frequent modifications
68+
69+
* An excessively long column of the string type
70+
71+
* Specify a primary key when you create a table.
72+
73+
* We recommend that you use a business column rather than an auto-increment column as the primary key or federated primary key.
74+
75+
* Tables in OceanBase Database are index-organized tables (IOTs). If you do not specify a primary key for a table, the system automatically generates a hidden primary key for the table.
76+
77+
* We recommend that you specify the `COMMENT` attribute for tables and columns.
78+
79+
* To ensure that columns do not contain null values, we recommend that you explicitly specify the `NOT NULL` attribute for the columns.
80+
81+
* We recommend that you specify default values for columns in a table by using the `DEFAULT` clause as needed.
82+
83+
* Try to ensure that the same column in different tables has the same definition. This prevents implicit data type conversions during computation.
84+
85+
* The columns to be joined must be of the same data type. This prevents implicit data type conversions during computation. **Attention should also be given to auxiliary attributes of the data type, such as collation, precision, and scale. Differences in these attributes may cause issues such as invalid indexes and non-optimal execution plans.**
86+
87+
## Column Design Specifications
88+
89+
- We recommend that you create an auto-increment column of the BIGINT type. If the column is of the INT type, the maximum value of the column can be easily reached.
90+
91+
- We recommend that you specify a proper length for strings and a suitable precision and scale for numbers based on your business requirements. This saves storage space and improves query performance.
92+
93+
- When comparing columns of different types, the system performs implicit data type conversions. Based on the general implicit conversion order defined in SQL, a string is first converted to a number and then to a time. To clarify the requirements for data type conversions and use indexes for accelerating queries, we recommend that you use the CAST or CONVERT function to explicitly convert data types before column comparison.
94+
95+
96+
## Partition Design Specifications
97+
98+
**The advantage of a distributed database is that large tables can be split and stored on multiple nodes so that requests can be distributed to multiple nodes for processing. Access requests to a partition are processed by the node on which the partition resides**. As high-concurrency SQL requests access different partitions, the requests are processed by different nodes, and the total queries per second (QPS) of all nodes can be quite tremendous. In this case, you can add more nodes to improve the QPS for processing SQL requests. This is the best case in using a distributed database.
99+
100+
**The goal of partitioning is to evenly distribute large amounts of data and access requests to multiple nodes. This way, you can make full use of resources for parallel computing and eliminate overloads caused by frequent queries on hotspot data. You can also use the partition pruning feature to improve query efficiency.** Theoretically, if each node processes data and requests evenly, 10 nodes can process 10 times the amount of a single node. However, if the table is unevenly partitioned, some nodes will process more data or requests than others, resulting in data skew, which may lead to uneven resource utilization and load among nodes. Intensively skewed data is also known as hotspot data. The straightforward method for preventing hotspot data is to randomly distribute data to different nodes. A problem is that all partitions must be scanned to find the desired data. Therefore, this method is infeasible. In practice, partitioning strategies are often defined based on specific rules.
101+
102+
**<font color="red">You must plan partitions based on clear business query conditions and actual business scenarios. Do not partition a table arbitrarily. </font>** When you plan partitions, try to make sure that the data distributed to each partition is relatively equal in amount.
103+
104+
The three most common partitioning methods are as follows:
105+
106+
- `HASH` partitioning: This method is suitable when the partitioning column has a large number of distinct values (NDV) and it is difficult to clearly define ranges for partitioning. This method can evenly distribute data without specific rules to different partitions. However, this method does not support partition pruning for range queries.
107+
108+
- `RANGE` partitioning: This method is suitable when ranges can be clearly defined based on the partitioning key. For example, you can use the `RANGE` partitioning method to partition a large table that records bank statements based on the column that represents time.
109+
110+
- `LIST` partitioning: This method is suitable when you want to explicitly distribute data to specific partitions. It can precisely distribute unordered or irrelevant data to specific partitions. However, it does not support partition pruning for range queries.
111+
112+
To support parallel computing and partition pruning, OceanBase Database supports subpartitioning. OceanBase Database in MySQL mode supports the ``HASH``, ``RANGE``, ``LIST``, ``KEY``, ``RANGE COLUMNS``, and ``LIST COLUMNS`` partitioning methods, and a combination of any two partitioning methods as the subpartitioning method.
113+
114+
For example, the database needs to partition a bill table based on the `user_id` column by using the `HASH` partitioning method, and then subpartition each partition based on the bill creation time by using the `RANGE` partitioning method.
115+
116+
![image](/img/user_manual/operation_and_maintenance/en-US/development_specification/01_object_specification/001.png)
117+
118+
OceanBase Database supports `RANGE-HASH` and `HASH-RANGE` composite partitioning. However, ADD and DROP operations can be performed on a `RANGE` partition only when the table is first partitioned by using the `RANGE` partitioning method. Therefore, for large tables, we recommend that you use `RANGE-HASH` partitioning to facilitate maintenance such as partition addition and dropping.
119+
120+
### References
121+
122+
- For more information about partitions, see [Create and manage partitions](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001717055).
123+
124+
- If data is not evenly distributed to partitions in a partitioned table, query performance may be compromised due to data skew. We recommend that you use the SQL plan monitor tool of OceanBase Database to check whether query performance is compromised due to data skew. For more information about how to use the tool, see the "Collect sql_plan_monitor" section in [Use obdiag to Collect Information and Diagnose Issues](https://open.oceanbase.com/blog/8810787744).
125+
126+
## Index Design Specifications
127+
128+
### Index creation rules
129+
For more information about the index creation rules, see the "Index tuning" section in [Common SQL tuning methods](https://oceanbase.github.io/docs/user_manual/quick_starts/en-US/chapter_07_diagnosis_and_tuning/sql_tuning).
130+
131+
### Global indexes and their application scenarios
132+
133+
In MySQL tenants of OceanBase Database, indexes are divided into two types: local indexes and global indexes.
134+
135+
The difference between the two is that a local index uses the same partitioning method as the primary table, whereas a global index can use a partitioning method different from that of the primary table. If the index type is not explicitly specified within a MySQL tenant, a local index is created by default.
136+
137+
A global index can be recognized as an extended feature of MySQL. For more information, see the "Global index" section in [Extended features of a MySQL tenant of OceanBase Database](https://oceanbase.github.io/docs/user_manual/quick_starts/en-US/chapter_06_using_ob_for_business_development/extended_functionality).
138+
139+
140+
### Index design suggestions
141+
- Read the preceding "Index creation rules" and "Global indexes and their application scenarios" sections before you continue.
142+
143+
- Do not use global indexes unless necessary. To use global indexes, you must understand the application scenarios of global indexes.
144+
- The cost of table access by index primary key for a global index-based query is high, approximately ten times that for a local index-based query.
145+
- The costs of creating, dropping, and modifying global indexes are high, which compromises DML performance.
146+
- When you create a global index, we recommend that you specify a partitioning method. Otherwise, the global index is not partitioned by default. Select a column with a larger NDV as the partitioning key for the global partitioned index.
147+
148+
- When you perform multi-table join queries, the joined columns must be indexed. This can improve join performance. Try to ensure that the data types of the joined columns are the same. This prevents implicit data type conversions, allowing indexes to be used.
149+
150+
- You can create a covering index to avoid table access by index primary key. Try to ensure that redundant columns covered by an index are not of large object (LOB) data types.
151+
152+
- If an index contains multiple columns, we recommend that you place the column with a larger NDV before others. For example, if the NDV of column `b` is larger than that of column `a` and the filter condition is ``WHERE a= ? AND b= ?``, you can create the `idx(b,a)` index.
153+
154+
- For the filter condition ``WHERE a= ? AND b= ?``, we recommend that you use the composite index `idx_ab(a,b)` instead of creating indexes `idx_a(a)` on column `a` and `idx_b(b)` on column `b`. This is because the indexes created respectively on columns `a` and `b` cannot be used at the same time.
155+
156+
### Index usage suggestions
157+
158+
**<font color="red">To modify an index, create a new one, ensure that the new index has taken effect, and drop the old index once it is confirmed that it is no longer needed. </font>**
159+
160+
By the way, this suggestion is somewhat like advising students to bring their exam admission tickets when taking the exam. Simple as it may seem, the suggestion is necessary. In Ant Group, there are always a few database administrators (DBAs) who drop old indexes before new ones take effect every year. As a result, certain services of Alipay become unavailable, leading to significant losses. Finally, we have to penalize the DBAs to calm public anger.
161+
162+
We hope that you strictly follow this suggestion. Otherwise, a great impact may be caused.
163+
164+
165+
166+
## Auto-increment Column Design Specifications
167+
168+
For more information, see the "Sequences" section in [Extended features of a MySQL tenant of OceanBase Database](https://oceanbase.github.io/docs/user_manual/quick_starts/en-US/chapter_06_using_ob_for_business_development/extended_functionality).
169+
170+
171+
- Read the aforementioned "Sequences" section before you continue.
172+
- To be compatible with MySQL databases, an auto-increment column is created in ORDER mode by default.
173+
- If the sequence values do not need to be incremental but must be unique, we recommend that you set the increment mode to NOORDER. This improves performance.
174+
- To avoid value hopping in NOORDER mode while applying for auto-increment values from different nodes in distributed scenarios, set the increment mode of the auto-increment column to ORDER.
175+
176+
177+
## Recycle Bin Design Specifications
178+
179+
For more information, see the "Recycle bin" section in [Extended features of a MySQL tenant of OceanBase Database](https://oceanbase.github.io/docs/user_manual/quick_starts/en-US/chapter_06_using_ob_for_business_development/extended_functionality).
180+
181+
- Read the aforementioned "Recycle bin" section before you continue.
182+
- While it seems easy to perform FLASHBACK or PURGE operations on tables in the recycle bin by their original names specified by the `ORIGINAL_NAME` parameter, we recommend that you use their unique new names specified by the `OBJECT_NAME` parameter to avoid losses due to your misremembering of the operation rules.
183+
184+
## Table Group Design Specifications
185+
186+
This section describes the table group feature because it is an extended feature that is not supported by MySQL.
187+
188+
For more information, see the "Table groups" section in [Extended features of a MySQL tenant of OceanBase Database](https://oceanbase.github.io/docs/user_manual/quick_starts/en-US/chapter_06_using_ob_for_business_development/extended_functionality).

0 commit comments

Comments
 (0)