diff --git a/docs/blogs/tech/OpenStack-ob.md b/docs/blogs/tech/OpenStack-ob.md new file mode 100644 index 000000000..74a0c2f73 --- /dev/null +++ b/docs/blogs/tech/OpenStack-ob.md @@ -0,0 +1,119 @@ +--- +slug: OpenStack-ob +title: 'OpenStack: Partnering with OceanBase Database to Build a Highly Available and Scalable Infrastructure Platform' +--- + +> Feng Zhongyan, a senior director of the OceanBase Database community, was invited to present the joint technical solution by OceanBase Database and OpenStack at the OpenInfra Asia Summit on September 3, 2024. This article introduces the technical features and benefits of the solution in detail. + +OpenStack has long been at the forefront of cloud computing, offering a powerful open source platform for building and managing cloud infrastructure. Its modular architecture enables efficient management of network resources in both public and private clouds and ensures computing and storage capabilities. Its flexibility and broad community support make it the preferred solution for enterprises seeking scalable cloud infrastructure. + +As we all know, databases are essential to any software system. For OpenStack, MySQL is typically used to provide database services. As a globally renowned and widely adopted database, MySQL can run reliably and stably in most scenarios. However, as the business grows, additional solutions, such as data sharding, are often required to meet performance demands. Additionally, for OpenStack, a cloud infrastructure, high availability (HA) is crucial. Achieving HA with MySQL often requires complex configurations and external tools, such as Galera Cluster. This can incur extra complexity and O&M costs, thus affecting the overall efficiency and reliability of OpenStack. + +OceanBase Database, developed by Ant Group, is a distributed relational database designed to provide cloud-native, high-performance solutions for modern large-scale applications in highly distributed environments. It offers native HA and scalability without relying on external tools or complex configurations. OceanBase Database has set world records in both TPC-C and TPC-H benchmark tests, demonstrating its exceptional performance in processing complex transactional and analytical workloads. OceanBase Database has also demonstrated its reliability, scalability, and support for mission-critical applications in real-world scenarios, providing database services to over 1,000 clients across various industries. Its cloud-native architecture ensures seamless integration with next-generation cloud platforms such as OpenStack, making it a simplified, elastic, scalable, and high-performance database solution. + +Due to its cloud-native availability and scalability, OceanBase Database is an ideal choice for OpenStack. It helps reduce database O&M costs and improve the overall performance and stability of OpenStack. As cloud-native technologies become the trend in modern IT operations, deploying OpenStack and OceanBase Database on Kubernetes is a new form of building next-generation cloud infrastructure. + +Integrate OceanBase Database into OpenStack for Database Services +-------------------------------------- + +**Architecture of OpenStack** + +Let's first take a brief look at the architecture of OpenStack. OpenStack consists of multiple interconnected components, each responsible for different aspects of cloud infrastructure management, such as computing, storage, and networking. Each of these components has its own database to store state data, configurations, and metadata. The integrity and processing performance of these data are vital to the overall functionality and reliability of OpenStack. + +![1725507550](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1725507550338.png) + +**Architecture of OceanBase Database** + +OceanBase Database is specifically designed to tackle the HA and scalability challenges of large-scale distributed systems. A typical OceanBase cluster spans three zones, with each consisting of multiple nodes for storing data replicas. OceanBase Database uses the Paxos consensus protocol to ensure data consistency across replicas. It returns a success message only after data is successfully synchronized to a majority of nodes, thus ensuring strong consistency and fault tolerance. + +Additionally, OceanBase Database has introduced a multitenancy architecture for efficient data isolation and resource management. A tenant in OceanBase Database is similar to a virtual instance in MySQL. Data in a tenant is typically partitioned, with leader partitions distributed across different servers. This enables OceanBase Database to make the most of the processing capabilities of all servers, thus boosting the overall system performance. + +![1725507560](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1725507560254.png) + +To efficiently manage and route requests in the OceanBase cluster, we have introduced OBProxy. OBProxy is a lightweight, stateless proxy server that routes requests to the most suitable OBServer nodes in the cluster. It parses SQL requests and then routes them to the leader nodes of the corresponding table partitions to ensure strong data consistency. Since OBProxy is stateless, you can easily scale it out by deploying multiple instances behind a load balancer for HA. + +The preceding figure shows how OBProxy provides services in an OceanBase cluster through a load balancer. From the perspective of an application, all the components work together as one integrated system that provides database services. + +OceanBase Database is compatible with the MySQL protocol, allowing you to integrate it without modifying the OpenStack code. The following figure shows the architecture where OceanBase Database is used to provide database services: + +![1725507597](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1725507597798.png) + +**Deploy OpenStack with OceanBase Database**: + +To use OceanBase Database, you need to only set the database service address of each OpenStack component to the address of OceanBase Database. For example, the configurations to be updated for the Keystone component are as follows: + +``` + endpoints: + oslo_db: + auth: + admin: + username: root + password: password + keystone: + username: keystone + password: password + hosts: + default: svc-openstack + path: /keystone + scheme: mysql+pymysql + port: + mysql: + default: 2883 +``` + +For most other components, only oslo\_db needs to be updated. For Nova, oslo\_db, oslo\_db\_api, and oslo\_db\_cell0 need to be updated. For more information, see [Deploy OpenStack](https://docs.openstack.org/openstack-helm/latest/install/openstack.html). + +**Deploy OceanBase Database**: + +You can easily deploy OceanBase Database on Kubernetes by using ob-operator. Only the following resources are required: + +· **OceanBase cluster**: Define an OceanBase cluster with three zones. Each zone must consist of at least one OBServer node. + +· **OceanBase tenant**: Define an OceanBase tenant with three replicas, which are distributed across the three zones. + +· **OBProxy**: Deploy OBProxy with at least two instances and one service to route requests. + +For more information about the configurations, see [Deploy OpenStack with OceanBase on Kubernetes](https://github.com/oceanbase/ob-operator/tree/master/example/openstack). + +HA +-------------- + +Once OceanBase Database is used to provide database services for OpenStack, it equips OpenStack with HA. With its distributed architecture, OceanBase Database offers native HA, which greatly reduces operation complexity and improves reliability. OceanBase Database implements HA based on the Paxos consensus protocol, a proven approach for ensuring data consistency. Paxos requires a majority of nodes to reach a consensus on any change to the data. This enables the system to run correctly with consistent data even if some nodes fail. For example, if a node in the OceanBase cluster goes offline, Paxos ensures that the remaining nodes can process transactions without data loss. This enables OceanBase Database to tolerate hardware failures, network partitions, and other potential disruptions. OceanBase Database automatically switches the leader in the event of a failure, and OBProxy can detect this change and seamlessly route requests to the new leader. This is a fully automated process without the need for manual intervention, which ensures that applications on OceanBase Database run uninterruptedly. + +![1725507649](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1725507649227.png) + +ob-operator plays an important role in the disaster recovery process of OceanBase Database. It further enhances HA, enabling OceanBase Database to provide continuous services during failures and achieve automatic disaster recovery. ob-operator maintains a fixed IP address for an OBServer node so that a new Pod can be quickly started with the same IP address in case of a Pod failure, which minimizes downtime. If the data is intact, the new Pod can be directly attached to the existing storage, allowing recovery within minutes. Even if a majority of Pods fail, OceanBase Database can recover them as long as the data is accessible. + +![1725507664](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1725507664317.png) + +OceanBase Database offers clear benefits for OpenStack. It automatically manages node failures and ensures recovery with minimal intervention, making the infrastructure of OpenStack secure and reliable and enabling OpenStack to provide continuous services without the need for complex configurations. + +For more information about other HA features such as backup, restore, and standby tenants, see [High availability](https://oceanbase.github.io/ob-operator/docs/manual/ob-operator-user-guide/high-availability/high-availability-intro). + +Scalability and Performance +----------- + +Scalability is another core benefit of OceanBase Database which enhances the flexibility of OpenStack. Unlike conventional databases, OceanBase Database does not become a bottleneck of the system in the event of heavy loads because its distributed architecture can be easily scaled out. By adding more nodes to the cluster, you can seamlessly increase the processing capability and storage capacity of OceanBase Database. + +OceanBase Database achieves dynamic scaling through several strategies. It stores data by partition to effectively prevent a single node from becoming a bottleneck. When new nodes are added to the cluster, OceanBase Database automatically balances loads and migrates data to maximize the system performance. + +OceanBase Database supports real-time dynamic adjustment of CPU, memory, and storage resources for tenants, enabling quick response to frequent load changes. This feature is crucial in cloud environments, allowing administrators to quickly adjust resources based on loads at any time. + +Scaling out an OceanBase cluster is easy. Assume that an OceanBase cluster has three zones and each zone has one OBServer node. When loads increase, the administrator can scale out the cluster to two OBServer nodes per zone by modifying the replica information of each zone defined in the cluster resource specification. + +![1725507686](/img/blogs/tech/OpenStack-ob/image/1725507686709.png) + +After adding the OBServer nodes, the administrator needs to only change the unitNum value to 2 to double the processing capability. + +![1725507713](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-09/1725507713331.png) + +The scalability of OceanBase Database empowers OpenStack to effortlessly support large-scale applications and fast-growing businesses, making it an ideal database solution for OpenStack. + +For more information about the scalability of OceanBase Database, including tenant management and dynamic resource adjustment, see [Manage tenants](https://oceanbase.github.io/ob-operator/docs/manual/ob-operator-user-guide/tenant-management-of-ob-operator/tenant-management-intro). + +Conclusion +---- + +OceanBase Database is a highly available and scalable database solution for OpenStack. Its cloud-native architecture, native HA, and seamless scalability align perfectly with modern cloud infrastructures. The next-generation solution of integrating OceanBase Database as the database layer into OpenStack helps tackle the scale, flexibility, and stability challenges in cloud-based scenarios. With this solution, cloud vendors can greatly enhance resilience and performance, reduce the O&M complexity, and improve the service reliability of their cloud environments. + +As demonstrated earlier, deploying OceanBase Database in an OpenStack environment in just a few simple steps brings obvious benefits in HA and scalability. As cloud-native technologies evolve, the integration of OceanBase Database with OpenStack and Kubernetes provides an innovative way to build powerful, scalable, and resilient cloud infrastructure. For enterprises aiming to improve and adapt their IT operations for the future, OceanBase Database is a powerful and easy-to-implement solution that enhances the key features of OpenStack, making it a significant force in the cloud computing field. \ No newline at end of file diff --git a/docs/blogs/tech/Standalone-distribute.md b/docs/blogs/tech/Standalone-distribute.md new file mode 100644 index 000000000..1f5805524 --- /dev/null +++ b/docs/blogs/tech/Standalone-distribute.md @@ -0,0 +1,116 @@ +--- +slug: Standalone-distributed +title: 'Thoughts on the Integrated Architecture of OceanBase Database for Standalone and Distributed Modes' +--- + +> **About the author**: +> Yang Chuanhui, CTO of OceanBase, joined the OceanBase team in 2010 as one of the founding members. He led all major tasks of OceanBase architecture design and technology upgrades, and nurtured OceanBase Database to its full blossom in Ant Group. Under his leadership, the OceanBase team took the TPC-C benchmark test and broke the world record twice. He is also the author of Large-scale Distributed Storage Systems: Theory Analysis and Practical Framework. Mr. Yang is determined to lead the OceanBase team in making the next-generation enterprise-level distributed database more open, flexible, efficient, and easier to use. + +My journey with large-scale distributed systems began in 2007, inspired by Google File System (GFS), MapReduce, and Bigtable of Google. In 2010, I joined Taobao to work on OceanBase Database. + +OceanBase Database initially adopted a distributed architecture with very limited SQL functionality, which was extended to support more SQL features and made more versatile over time. When I first delved into large-scale distributed systems, I saw it as a sophisticated field, much like ChatGPT is today. The technology was ahead of its time, and protocols were hard to grasp. It took me over a year to understand Paxos, reading a dozen papers and engaging in extensive technical discussions with colleagues. + +I once viewed distributed architectures as the pinnacle of IT software technology, believing only a distributed design can make a system cutting-edge. However, when we applied early versions of OceanBase Database to Taobao and Alipay, the feedback from users and database administrators (DBAs) centered on SQL compatibility, cost, and performance. They would not choose OceanBase Database over standalone MySQL unless OceanBase Database provides full compatibility and higher cost-effectiveness. They acknowledged the excellence of OceanBase Database in scalability and lossless disaster recovery and agreed that a distributed architecture is the right direction. However, they apologized, saying that due to rapid business growth this year, they could not afford to invest additional manpower and servers in database transformation. + +They wanted a versatile database. I remember discussing with the developers of Google Spanner and asking why Google accepted its poor performance in standalone mode. They explained that Google's engineers are skilled enough to adapt applications to be asynchronous. Additionally, Google had Jeff Dean, who was able to enforce a unified infrastructure from the top down. I admired the developers working on infrastructure at Google, but also realized that this model was not scalable. For developers, a truly user-friendly distributed database must deliver both the high performance and low complexity of a standalone system and the flexible scalability and high availability of a distributed system. + +In 2016, we released OceanBase Database V1.0 with a fully distributed architecture, where all nodes were both readable and writable. However, the high overhead for distributed operations on each node became an issue. With a large number of tables and partitions, even when idle, the system consumed several CPU cores for distributed operations. Therefore, the OceanBase Database V1.x series addressed database issues for large enterprises but struggled to gain widespread adoption in small and medium-sized enterprises. + +In 2018, we started exploring ways to lower the barrier to a distributed database, making it accessible to everyone. Adjusting the underlying architecture of the database required great caution. We spent over two years on technical discussions and overall architecture design, and began detailed design and coding in mid-2020. After two more years, we released OceanBase Database V4.0, codenamed "Little Fish," in August 2022. OceanBase Database V4.0 laid the foundation for the integrated architecture for standalone and distributed modes but had many known issues. These issues were resolved in OceanBase Database V4.1, which was unveiled at the developer conference in March 2023. + +We introduced the concept of integrated architecture in 2021 and, following the marketing team's suggestion, renamed it from "the integrated architecture for centralized and distributed modes" to "the integrated architecture for standalone and distributed modes" for clarity. + +## 1. Feasibility Analysis: How to Eliminate the Overhead Brought by a Distributed Architecture** + + + +The first step in architecture design is feasibility analysis. Anyone in technology knows that architecture design is about trade-offs—what makes it possible, the underlying principles, and what to sacrifice. In designing the integrated architecture, we assumed that, for a distributed database, despite its large data volume, 80% of operations are performed on single nodes, and 20% of operations are performed across nodes. + +In the early days of promoting OceanBase Database within Alibaba Group, I proactively interacted with developers of each business line as both BD and SA. Despite the complexity of Alibaba Group's business lines, covering e-commerce, finance, logistics, local services, entertainment, maps, and healthcare, I came to the conclusion that most online B2C businesses could be distributed through user ID (user_id) sharding. After sharding by user_id, most operations are performed within single users, with only a few being performed across users. + +**For example, in online banking, most of the time we are managing our own accounts, with only a small portion spent on cross-account actions such as transfers. Therefore, during the optimization of the system, we first ensured that the distributed architecture brought no overhead to the 80% single-server operations so that OceanBase Database could have comparable performance as standalone databases in terms of such operations. Then, we focused on maximizing the performance of the 20% cross-server operations.** + +The distributed architecture brought overhead to single-server operations mainly due to its following two features: high availability and scalability. In 2013, a DBA at Alipay told me that enabling strong synchronization in Oracle could result in a performance drop of at least 30%. OceanBase Database adopted a strong synchronization design based on Paxos for lossless disaster recovery. Without changes to the architecture, OceanBase Database would not be comparable to standalone databases in performance. + +Our approach was to make redo log commits asynchronous, thus eliminating the need for database worker threads to wait for log commit results. This minimized the overhead of strong synchronization, even with poor network and disk performance. In our sysbench test on a three-server OceanBase cluster, strong synchronization based on Paxos caused only about an 8% performance loss for OceanBase Database. This loss was acceptable and could be offset by optimizing other modules. The performance loss from scalability mainly came from data sharding, with each shard writing its own redo logs. You can think of each data shard as a mini database. The larger the number of shards, the greater the overhead brought by the distributed architecture for shard management on each node. + +We have innovatively introduced dynamic log streams into the integrated architecture of OceanBase Database V4.0 for standalone and distributed modes. Each node has one log stream per tenant, with all data partitions of a tenant dynamically bound to its log stream. This avoids the overhead resulting from substantial log streams. Additionally, when you add new servers to the system, partitions can be unbound from the source log stream and re-bound to the target log stream for dynamic migration. + +Many may wonder, after all these years of database development, why the OceanBase team came up with this idea while others did not. There is no magic. I believe the key lies in the fact that very few distributed databases in the world need to handle the extreme business scenarios facing OceanBase Database, such as the Double 11 shopping festival. + +In the industry, scalability is implemented in different ways. Classic standalone databases do not support scalability and require changes to applications when a distributed architecture is needed. NewSQL implements scalability at the storage layer and implements functionality at the SQL layer. This approach is simpler but has a downside: every SQL request requires extra remote access, even when you are accessing your own account. OceanBase Database evolves from the fully distributed architecture in V1.x, V2.x, and V3.x to the integrated architecture for standalone and distributed modes in V4.x. + + + +## 2. What Does the Integrated Architecture Mean to Developers?** + + + +The integrated architecture for standalone and distributed modes seems versatile. So, what is its main focus at this stage? On one hand, I believe that the integrated architecture is a technological leap. It is more user- and developer-friendly than before and will gradually become a mainstream choice. On the other hand, it takes time for the new technology to mature and for OceanBase Database to have the same user experience and ecosystem as standalone databases. In the short term, the value of the integrated architecture for developers is reflected in the following aspects: + +**First, it greatly lowers the barrier to distributed databases.** Common standalone NewSQL systems, such as CockroachDB and YugabyteDB, have poor performance, which is only one-tenth to one-fifth of that of MySQL in sysbench performance tests. As databases with an integrated architecture mature, these NewSQL systems will be phased out. I have indeed seen many users switch from their NewSQL systems to OceanBase Database to cut costs and boost efficiency. + +**Second, it suits the scalability needs of growing businesses.** In my conversations with many small and medium-sized enterprises, I have found most of them ambitious. Although a standalone database suffices to handle their current data volumes, they are optimistic about future growth. They prefer to choose a database with an integrated architecture from the start to avoid the hassle of modifying applications and switching databases in the future. + +Will databases with an integrated architecture eventually replace standalone databases? I believe this is the trend, but I also think it will take quite a long time. + + + +## 3. Core Technical Mechanism** + + + +Dynamic log streams are the core technology used in the integrated architecture for standalone and distributed modes. To achieve true integration, the following technical challenges must be addressed: + + + +* **Application transparency**: To eliminate the need for application changes during the switch from the standalone mode to the distributed mode, the client must support dynamic routing. During partition migrations in the backend database, data must be dynamically routed to target servers. Additionally, both standalone and distributed modes must support all SQL features. +* **Single-server operations**: In standalone mode, only one set of redo logs exists, and single-server transactions write redo logs in a similar way as a classic standalone database. Classic standalone databases use a B+ tree in their storage engine. OceanBase Database has innovatively integrated the idea of a B+ tree into its storage engine based on a log-structured merge-tree (LSM tree). This retains the high compression ratio of the LSM tree, enables hotspot data to be stored in memory, and minimizes write amplification incurred by the LSM tree. As a result, even with strong synchronization enabled across three servers, OceanBase Database V4.1 outperforms MySQL 8.0 in both the performance and the storage cost of single-server operations. +* **Cross-server operations**: Cross-server operations must be supported by the underlying distributed architecture and cause no impact on the upper-layer SQL functionality. If a transaction involves only one server, it must be processed on that server. If a transaction involves multiple servers, it must be processed across the servers based on the two-phase commit protocol. Additionally, performance must be further optimized through distributed, parallel, and asynchronous technologies. +* **Migration cost**: Migration operations are performed in the background, typically with throttling applied during execution. Assume that data is migrated at the maximum speed of 200 MB/s, which uses about 20% of the bandwidth of a 10 Gbit/s network interface card (NIC). During the migration process, data is copied with minimal CPU usage. Unless in extreme scenarios such as midnight on the Double 11 shopping festival, the background migration does not affect foreground online transactions. If the data volume is 1 TB, the migration duration, calculated based on the formula 1 TB/200 MB/s = 5000s, is about 1.5 hours. + + + +## 4. Actual Results** + + + +We shared the performance data of OceanBase Database at the developer conference in March 2023 and demonstrated its scalability through TPC-C tests over the past years. + + + +* **Performance in standalone mode**: In scenarios where OceanBase Database and MySQL were deployed on servers with 32 CPU cores, OceanBase Database V4.1 outperformed MySQL 8.0 in all sysbench tests, including point-select, read-only, write-only, read-write, insert, and update scenarios. In the read-write scenario, the most comprehensive test, OceanBase Database V4.1 outperformed MySQL 8.0 by 39%. +* **Cost-effectiveness on public clouds**: We deployed MySQL in primary/standby mode on two servers with 4 CPU cores and 16 GB of memory, and deployed OceanBase Database on three servers of the same specifications, with two servers holding full-featured replicas and one server holding log-only replicas. Regardless of the storage size—from 100 GB, 300 GB, and 500 GB to 1 TB—OceanBase Database V4.1 delivered higher cost-effectiveness than MySQL 8.0 on Alibaba Cloud and Amazon Web Services (AWS). As the storage size increased, the advantage of OceanBase Database became more noticeable. Compared to MySQL on clouds, OceanBase Database cuts the total cost of ownership by 18.57% to 42.05% to offer the same performance while enabling lossless, three-replica disaster recovery. +* **Scalability in TPC-C tests**: OceanBase Database participated in two TPC-C tests, with the latest using more than 1,500 servers. The TPC-C workload well reflected real-world scenarios because it included 10%‒15% distributed transactions and 85%‒90% local transactions. As shown in the TPC-C report released on the official website, the performance of OceanBase Database increased proportionally with the number of servers. + + + +## 5. Issues Worth Discussion** + + + +The integrated architecture for standalone and distributed modes is not perfect. Some of its issues are worth further discussions with developers and users. + + + +### i. From distributed to standalone versus from standalone to distributed + + + +Which approach is better: from distributed to standalone or from standalone to distributed? I believe only the path from distributed to standalone is viable. Distributed databases are an order of magnitude more technically challenging and are less widely adopted than standalone databases. From a return on investment (ROI) perspective, it is unlikely for a standalone database to trade mainstream scenarios for higher-end but smaller-scale scenarios at greater expense, especially with all the existing technical debt. For all business cases, the most effective strategy is to start with the high-end market and then expand to the low-end market. + + + +Technological innovations often come from external sources. For example, it is Tesla, an electric vehicle company, rather than traditional fuel vehicle manufacturers, that led the transformation to electrification. Tesla first rolled out Model X and Model S for the high-end market, and then unveiled more affordable Model 3 to gradually capture the mainstream market. In this sense, distributed technology is much like the battery of electric vehicles. + + + +### ii. Fully distributed scenarios + + + +We have built the integrated architecture for standalone and distributed modes based on the following assumption: In a distributed database, the majority of requests are performed on single servers, while the minority of requests are performed across servers. If this assumption does not hold, the performance per server of the distributed database decreases when the number of servers increases. How do we address this? We can further divide fully distributed scenarios into two types. One is online analytical processing (OLAP) scenarios, which are hard to localize due to large data volumes and complex dimensions for individual users. + + + +However, this type of scenario does not require high concurrency. Therefore, the key to better performance is fully utilizing server resources through techniques such as parallelization and vectorization. As each SQL statement is very large, an extra network request contributes very little to the overhead throughout the execution of the SQL statement. The other is online transaction processing (OLTP) scenarios. Assume that an OLTP business involves only cross-user transfers. If the data volume is small, the integrated architecture can be deployed on a single node to avoid the overhead incurred by the distributed mode. If the data volume is large, the integrated architecture must be deployed on multiple servers. In this case, although a dramatic decrease in the performance per server is inevitable, a database with an integrated architecture is still comparable to a database with a shared-nothing architecture. \ No newline at end of file diff --git a/docs/blogs/tech/column-store.md b/docs/blogs/tech/column-store.md index f95804620..59db824da 100644 --- a/docs/blogs/tech/column-store.md +++ b/docs/blogs/tech/column-store.md @@ -3,7 +3,7 @@ slug: column-store title: 'The Present and Future of Columnar Storage in OceanBase Database' --- -OceanBase Database V4.3 provides the columnar storage feature to support real-time analysis business. As an extension of [**In-depth Interpretation of Columnar Storage**](https://open.oceanbase.com/blog/11685131568), this article further explores the application and evolution of columnar storage in the OceanBase Database architecture and its development trend. +OceanBase Database V4.3 provides the columnar storage feature to support real-time analysis business. As an extension of [**In-depth Interpretation of Columnar Storage**](https://oceanbase.github.io/docs/blogs/tech/analysis-column), this article further explores the application and evolution of columnar storage in the OceanBase Database architecture and its development trend. **1. Background** -------- diff --git a/docs/blogs/tech/hive-to-ob.md b/docs/blogs/tech/hive-to-ob.md new file mode 100644 index 000000000..7651bfe37 --- /dev/null +++ b/docs/blogs/tech/hive-to-ob.md @@ -0,0 +1,122 @@ +--- +slug: hive-to-ob +title: 'From Hive to OceanBase Database: Building an Efficient Real-Time Data Warehouse' +--- + +> **About the author:** coolmoon1202, a senior big data engineer working on high-performance software architecture design. + +Our business is highly related to travel, and we began searching for new data warehouse solutions due to the high latency and low efficiency of our original data warehouse, which was deployed in the early days of our company. In this article, I will share with you the story of our solution selection and lessons learned. It would be great if you could find something useful. + + +**Issues of the Previous Solution** + + + +Our online business environment mainly involves data statistics and analysis. Most data is collected from two sources. In the original architecture, real-time streaming data was collected from the frontend application and stored in Kafka. Then, Spark Streaming tasks would be launched every 10 minutes to synchronize data from Kafka to the Hive data warehouse. A huge amount of real-time streaming data would be collected, and the tables related might contain as many as tens of billions of records. + +Another major data source was a government-managed public data sharing and exchange platform. Data was collected from the platform, aggregated, and stored in an RDS database. Then, Spark tasks would be periodically launched to fully synchronize the data to the Hive data warehouse. Less data was collected from the platform. The largest table related might contain tens of millions of records. Data from different sources was aggregated in Hive, and then Spark would read the data and transfer it to the big data cluster for analysis. + +**This Spark + Hive solution caused three challenges**. + + + +**1. Data latency**: Data was imported into Hive periodically, which led to a data latency of greater than 10 minutes, making real-time updates impossible. + + + +**2. Architecture complexity**: Full data was periodically imported from RDS to Hive, and it was slow. It took over 3 minutes to import a table with tens of millions of records. + + + +**3. Poor cost efficiency**: Our original architecture used Spark to read data from Hive for analytical statistics. It took more than 3 minutes to analyze 100 million records. Using Spark for periodic data import and analysis consumed significant CPU and memory resources of the big data cluster, leading to queuing of concurrent tasks. + +To address those challenges, we decided to try a lightweight real-time data warehouse solution. Among others, OceanBase Database had its reputation as a homegrown native distributed database for its hybrid transaction and analytical processing (HTAP) capabilities and features that enable a real-time data warehouse, such as real-time writing, updating, and analysis of massive amounts of data. So, soon after we learned that it had been open source since June 2021, we tested its performance using OceanBase Database Community Edition V3.1.1. Details of the test are described in this article: Stress Test Results of OceanBase Database Community Edition V3.1.1. Note that the test results are for reference only. + +**Here is our conclusion**: In the testing environment, OceanBase Database achieved a maximum of 355,739 tpmC under TPC-C conditions, and completed all SQL queries in 24.05 seconds under TPC-H conditions. The results proved the extraordinary performance of OceanBase Database Community Edition V3.1.1 in online transaction processing (OLTP) and online analytical processing (OLAP), and indicated that it could be scaled out to cope with most high-concurrency scenarios involving massive data. + + + +We also tested TiDB and PolarDB-X. In comparison, OceanBase Database Community Edition did the best job in the TPC-H performance test and under our real business workload. Another convincing factor was that, OceanBase Database is backed by an open source community that provides excellent technical support. + +**Deployment and Benefits of the New Solution** + + + +Based on our evaluation, we replaced the Hive + Spark solution with an OceanBase + Flink solution, and deployed an OceanBase cluster in a 3-3-3 architecture using OceanBase Database Community Edition V3.1.3. + + + +* **Hardware configuration**: 9 Elastic Compute Service (ECS) instances are used, each with 32 CPU cores, 128 GB of memory, a 500 GB SSD for storing redo logs, and a 4 TB SSD for storing data. + + + + + +* **Resource allocation**: The OBServer memory limit is 102 GB, the system memory size is 30 GB, and the OceanBase Database Proxy (ODP) memory size is 4 GB. After deploying the OceanBase cluster, we set resources for the sys tenant to 4 CPU cores and 4 GB of memory. Then, we created a business tenant, and allocated 26 CPU cores and 64 GB of memory to it. We also set the primary\_zone parameter to RANDOM, so that leader partitions of the business tenant tables are randomly distributed across the 9 ECS instances. + + + +We deployed the OceanBase cluster using OceanBase Deployer (obd) instead of using OceanBase Cloud Platform (OCP) as planned, because OCP installation depends on OceanBase Database. The good news is we can use OCP to easily take over the cluster later. The following figure shows the topology of the OceanBase cluster. + + + +![](https://gw.alipayobjects.com/zos/oceanbase/e4b5cfe6-b452-4386-a1a5-094df5a5d49b/image/2022-11-03/30b8a7c8-690a-454a-8e5a-bbfb6dcc3673.png) + + + +**The OceanBase + Flink solution has brought the following three major benefits.** + + + +**Smaller end-to-end latency**. The new solution uses the OceanBase SQL mode for data querying and analysis. From data generation by the frontend application to OceanBase Database returning a query result, the time consumed is reduced from at least 10 minutes to less than 3 seconds. + + + +**Significant hardware cost savings**. The new solution uses Flink CDC to synchronize incremental data to OceanBase Database in real time, and the resource usage of incremental streaming tasks changes smoothly rather than sharply. In the session mode of Flink, incremental streaming tasks occupy much fewer resources, slashing the resource usage of the big data cluster from 140 CPU cores and 280 GB of memory to 23 CPU cores and 46 GB of memory, which translates to an 84% reduction in hardware costs. + + + +**Shorter SQL query time**: The new architecture allows us to enable parallel distributed SQL execution by specifying hints. As a result, the execution time of the following query, which involves roughly 60 million records, is reduced from 3 minutes to 15 seconds: +```sql +select /*+ parallel(36) */ count(1) from health_query_log where datetime >='2022-05-01 00:00:00' and datetime<='2022-06-01 00:00:00'; +``` + +**Summary** +---------------------- + +Now, let me share with you some experiences in using OceanBase Database Community Edition. + + + +**1. A table index can be quickly created and deleted.** You can create indexes as needed to greatly improve data retrieval efficiency. + + + +**2. A variety of window functions** are provided for you to handle complex queries and statistical jobs. + + + +**3. JSON data types are supported.** You can extract the required JSON data and create virtual columns. This is a very useful feature as you don't need to rerun historical data when upstream data structures change. + +**4. We strongly recommend you use the TableGroup feature, which improves the query speed, especially for multi-table joins.** + +**5. OceanBase Database Community Edition is compatible with most features and syntax of MySQL 5.7, greatly reducing the learning curve for developers.** The data synchronization from the RDS database was quite smooth. + +We have also noticed some features that are not supported by V3.1.3 or may be supported in later versions, and have submitted them to the community for further update. + +**1. Full-text indexes are not supported.** When you perform a fuzzy match query on Chinese strings, the database runs a full table scan. In a MySQL database, for example, if you want to use incomplete address information to perform a fuzzy match query, you can use a FullText index to enhance query performance. However, to do that in OceanBase Database Community Edition V3.1.3, we could only use a LIKE clause as a workaround. We discussed this issue with the OceanBase technical team, and they planned to support full-text indexes in later versions. + + + +**2. Materialized views are not supported**. Therefore, you cannot query a large table (with hundreds of millions of records) to get real-time incremental statistics. For example, if you query a large table using the COUNT() function with the GROUP BY clause, the database performs full data calculations, instead of using pre-calculated data of materialized views to reduce the calculation load. This results in unsatisfactory performance in some scenarios. If real-time statistics on massive data are required for your business, you have to seek alternative solutions. + + + +**3. Out-of-memory (OOM) errors may occur during execution**. Using the COUNT() function with the GROUP BY clause may cause OOM errors of OBServer nodes, leading to node failures. This issue can be avoided by rewriting the subquery, such as +```sql +SELECT COUNT(*) FROM (SELECT DISTINCT ...) +``` + + + +We would like to extend our thanks to the technical staff of the OceanBase community for their professional support in our real-time data warehouse transformation project. They were patient and timely responsive to all our questions throughout the project, from deployment to testing, migration, and O&M. They also offered suggestions on optimizing slow SQL statements, thus helping ensure the smooth progress of the project. We wish the OceanBase community a brilliant future. \ No newline at end of file diff --git a/docs/blogs/tech/parallel-execution-V.md b/docs/blogs/tech/parallel-execution-V.md new file mode 100644 index 000000000..86c8f9276 --- /dev/null +++ b/docs/blogs/tech/parallel-execution-V.md @@ -0,0 +1,131 @@ +--- +slug: parallel-execution-V +title: 'Mastering Parallel Execution in OceanBase Database: Part 5 - Parallel Execution Parameters' +--- + +> OceanBase Database provides a group of parameters for you to control the initialization and tuning of parallel execution. When OceanBase Database starts, the default values of parallel execution parameters can be calculated based on the number of CPU cores of the tenant and the tenant-level parameter `px_workers_per_cpu_quota`. You can also choose not to use the default values but to manually specify parameter values upon startup of OceanBase Database or manually adjust the parameter values later as needed. By default, parallel execution is enabled. +> This article introduces techniques for controlling parallel execution parameters from two aspects: default values and tuning of parallel execution parameters. + +This is the fifth article of a seven-part series on parallel execution. + +Part 1 + +[Introduction](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-I) + +Part 2 + +[Set the DOP](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-II) + +Part 3 + +[Concurrency Control and Queuing](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-III) + +Part 4 + +[Parallel Execution Types](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-IV) + +Part 5 + +[Parallel Execution Parameters](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-V) + +Part 6 + +[Troubleshooting and Tuning Tips](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VI) + +Part 7 + +[Get Started with a PoC Test](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VII) + +5.1 Default Values of Parallel Execution Parameters +------------ + +You can set parallel execution parameters to control the number of parallel execution (PX) threads and queuing in parallel execution. The following table describes the parameters. + +| Parameter name | Default value | Level | Description | +| --------------- | --------------- | --------------- | --------------- | +| px\_workers\_per\_cpu\_quota | 10 | Tenant-level parameter | The number of PX threads that can be allocated on each CPU core. Value range: \[1, 20\]. | +| parallel\_servers\_target | MIN CPU × px\_workers\_per\_cpu\_quota | Tenant-level variable | The number of PX threads that can be requested from each node of the tenant. | +| parallel\_degree\_policy | MANUAL | Tenant-level or session-level variable | The auto degree of parallelism (DOP) strategy. You can set the value to `AUTO` to enable auto DOP. After auto DOP is enabled, the optimizer automatically calculates the DOP for queries based on statistics. If you set the value to `MANUAL`, you can specify a DOP by using hints, a table-level PARALLEL attribute, or a session-level PARALLEL attribute. | +| \_parallel\_max\_active\_sessions | 0 | Tenant-level parameter | In the TPC-H benchmark, a power run requires a higher DOP than a throughput run. However, the TPC-H specification disallows dynamic changes to the DOP by using SQL. To support dynamic changes to the DOP, the `_parallel_max_active_sessions` parameter is introduced. When the value of `_parallel_max_active_sessions` is `0`, the number of active sessions that can be executed in parallel is unlimited. When the value of `_parallel_max_active_sessions` is greater than `0`, the value indicates the number of active sessions that can be executed in parallel. The threads of the extra sessions are suspended. After a query is completed, the suspended session threads are woken up to resume. | + +To lower the requirements for using parallel execution, OceanBase Database minimizes the number of parallel execution parameters. You can use the default values to directly enable parallel execution. In special scenarios, you can change the parameter values for optimization. + +### px\_workers\_per\_cpu\_quota + +This parameter specifies the number of PX threads that can be allocated on each CPU core. Assume that the value of `MIN_CPU` of the tenant is N. If the data to be processed in parallel is evenly distributed, the number of threads that can be allocated on each node is calculated by using the following formula: N × Value of `px_workers_per_cpu_quota`. If the data is unevenly distributed, the actual number of threads allocated on some nodes may exceed the value calculated by using the foregoing formula for a short time. After the parallel execution is completed, the excess threads are automatically reclaimed. + +`px_workers_per_cpu_quota` affects the default value of `parallel_servers_target` only during tenant creation. If you change the value of `px_workers_per_cpu_quota` after the tenant is created, the value of `parallel_servers_target` is not affected. + +Generally, you do not need to change the default value of `px_workers_per_cpu_quota`. If all CPU resources are occupied by parallel execution when resource isolation is disabled, you can try to decrease the value of `px_workers_per_cpu_quota` to lower the CPU utilization. + + + +### parallel\_servers\_target + +This parameter specifies the number of PX threads that can be requested from each node of the tenant. When thread resources are used up, subsequent PX requests need to wait in a queue. For the concept of queuing, see [Mastering Parallel Execution in OceanBase Database: Part 3 - Concurrency Control and Queuing](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-III). + +In parallel execution, the CPU utilization can be very low due to factors such as an excessively small value of `parallel_servers_target`, which downgrades the DOP for the SQL statement, resulting in fewer threads allocated than expected. In OceanBase Database of a version earlier than V3.2.3, the default value of `parallel_servers_target` is very small. You can increase the value of `parallel_servers_target` to resolve the issue. We recommend that you set `parallel_servers_target` to the value of `MIN_CPU` × 10. In OceanBase Database V3.2.3 and later, the default value of `parallel_servers_target` is the value of `MIN_CPU` × 10. Therefore, this issue does not occur. + + + +`MIN_CPU` specifies the minimum number of CPU cores for the tenant and is specified during tenant creation. + +After you set an appropriate value for `parallel_servers_target`, reconnect to your database and execute the following statement to view the latest value: + +```sql + show variables like 'parallel_servers_target'; +``` + +For ease of O&M, you can set `parallel_servers_target` to the maximum value to avoid frequent adjustment. Theoretically, you can set `parallel_servers_target` to an infinite value. However, this results in low efficiency, because all queries are executed once they are initiated, without the need to wait in a queue, and contend for CPU time slices, disk I/Os, and network I/Os. + +This issue is not severe in terms of throughput. However, resource contention will significantly increase the latency of individual SQL statements. Considering the CPU and I/O utilization, you can set `parallel_servers_target` to the value of `MIN_CPU` × 10. In a few I/O-intensive scenarios, CPU resources may not be fully used. In this case, you can set `parallel_servers_target` to the value of `MIN_CPU` × 20. + + + +### parallel\_degree\_policy + +This parameter specifies the DOP strategy. Valid values are `AUTO` and `MANUAL`. You can set the value to `AUTO` to enable auto DOP. In this case, the optimizer automatically calculates the DOP for queries based on statistics. If you set the value to `MANUAL`, you can specify a DOP by using hints, a table-level PARALLEL attribute, or a session-level PARALLEL attribute. + +In OceanBase Database V4.2 and later, if you are not familiar with the DOP setting rules, you can set `parallel_degree_policy` to `AUTO` to allow the optimizer to automatically select a DOP. For more information about the rules for automatically calculating a DOP, see [Mastering Parallel Execution in OceanBase Database: Part 2 - Set the DOP](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-II). OceanBase Database of a version earlier than V4.2 does not support the `parallel_degree_policy` parameter, and therefore does not support the auto DOP feature. In this case, you must manually specify a DOP. + + + +5.2 Tuning of Parallel Execution Parameters +-------------- + +### ob\_sql\_work\_area\_percentage + +This is a tenant-level variable that specifies the maximum memory space available for the SQL workarea. The value is in percentage that indicates the percentage of the memory space available for the SQL module to the total memory space of the tenant. The default value is `5`, which indicates 5%. When the memory space occupied by the SQL module exceeds the specified value, data in the memory is flushed to the disk. To view the actual memory usage of the SQL workarea, you can search for `WORK_AREA` in the `observer.log` file. Here is an example: + +```bash + [MEMORY] tenant_id=1001 ctx_id=WORK_AREA hold=2,097,152 used=0 limit=157,286,400 +``` + +In a scenario with more reads than writes, if data in the memory is flushed to the disk due to insufficient memory for the SQL workarea, you can increase the value of `ob_sql_work_area_percentage`. + + + +### workarea\_size\_policy + +OceanBase Database implements global adaptive memory management. When `workarea_size_policy` is set to `AUTO`, the execution framework allocates memory to operators, such as Hash Join, Group By, and Sort, based on the optimal strategy, and enables the adaptive data flush strategy. If `workarea_size_policy` is set to `MANUAL`, you must manually specify `_hash_area_size` and `_sort_area_size`. + +### \_hash\_area\_size + +This is a tenant-level parameter that allows you to manually specify the maximum memory space available for the hash algorithm of each operator. The default value is 128 MB. When the used memory space exceeds the specified value, data in the memory is flushed to the disk. This parameter applies to operators related to the hash algorithm, such as Hash Join, Hash Group By, and Hash Distinct. **Generally, you do not need to modify the value of this parameter and we recommend that you set `workarea_size_policy` to `AUTO`.** If you do not want the system to automatically flush data from the memory to the disk during the use of the hash algorithm, set `workarea_size_policy` to `MANUAL` and manually specify a `_hash_area_size` value. + +### \_sort\_area\_size + +This is a tenant-level parameter that allows you to manually specify the maximum memory space available for the sort algorithm of each operator. The default value is 128 MB. When the used memory space exceeds the specified value, data in the memory is flushed to the disk. This parameter is mainly used for the sort operator. **Generally, you do not need to modify the value of this parameter, and we recommend that you set `workarea_size_policy` to `AUTO`.** If you do not want the system to automatically flush data from the memory to the disk during the use of the sort algorithm, set `workarea_size_policy` to `MANUAL` and manually specify a `_sort_area_size` value. + + + +### \_px\_shared\_hash\_join + +This is a session-level system variable that determines whether to use a shared hash table during hash joins for optimization. The default value is `true`, which specifies to enable the shared hash join algorithm. When a hash join is executed in parallel, each PX thread independently calculates a hash table. When the left table uses broadcast redistribution, all hash tables calculated by the PX threads are identical. Therefore, each machine needs only one hash table for all PX threads to share to improve CPU cache efficiency. **Generally, you do not need to modify the value of this parameter.** + + + +5.3 Tuning of Parallel DML Parameters +------------------ + +The transaction mechanism is no longer a must in OceanBase Database V4.1 and later. Therefore, when you import data into a table, we recommend that you use the `INSERT INTO SELECT` statement in combination with the direct load feature to insert the data into the table at a time. This can shorten the import time and avoid memory shortage caused by a high write speed. \ No newline at end of file diff --git a/docs/blogs/tech/parallel-execution-VI.md b/docs/blogs/tech/parallel-execution-VI.md new file mode 100644 index 000000000..7ea168d6c --- /dev/null +++ b/docs/blogs/tech/parallel-execution-VI.md @@ -0,0 +1,265 @@ +--- +slug: parallel-execution-VI +title: 'Mastering Parallel Execution in OceanBase Database: Part 6 - Troubleshooting and Tuning Tips' +--- + + +> You can diagnose parallel execution issues from two perspectives. For the whole system, you can check whether the network, disk I/O, and CPU resources are used up. For specific SQL statements, you can locate the problematic SQL statements and check their internal status. + +This is the sixth article of a seven-part series on parallel execution. + +Part 1 + +[Introduction](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-I) + +Part 2 + +[Set the DOP](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-II) + +Part 3 + +[Concurrency Control and Queuing](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-III) + +Part 4 + +[Parallel Execution Types](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-IV) + +Part 5 + +[Parallel Execution Parameters](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-V) + +Part 6 + +[Troubleshooting and Tuning Tips](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VI) + +Part 7 + +[Get Started with a PoC Test](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VII) + +6.1 System Diagnostics +-------- + +When a performance issue occurs in a busy business system, you first need to perform preliminary diagnostics for the whole system by using either of the following two methods: + +* OceanBase Cloud Platform (OCP): You can observe the system performance on the GUI. +* Command-line system tools such as TSAR: You can query historical monitoring data of network, disk, and CPU resources. + +TSAR is a tool for system monitoring and performance analysis. It can provide details about CPU, disk, and network resources. Here are some examples of using the TSAR command. + +```bash + tsar --cpu + + tsar --io + + tsar --traffic +``` + +TSAR also provides other options and parameters. For example, `-d 2` specifies to query the data of the last two days, and `-i 1` specifies to collect data at an interval of 1 minute and display the collected data by minute. + +```bash + tsar -n 2 -i 1 --cpu +``` + +If the disk or network resources are used up, you can first check whether the hardware capacity is too small or the parallel execution load is too heavy. + + + +6.2 SQL Diagnostics +---------- + +When a parallel execution issue occurs, you can perform diagnostics at the SQL layer, parallel execution (PX) thread layer, and operator layer in sequence. + +### 6.2.1 Verify Whether the SQL Query Is Still in Progress + +To verify whether the SQL query is running normally, query the `GV$OB_PROCESSLIST` view. If the value of the `TIME` field keeps increasing and the value of the `STATE` field is `ACTIVE`, the SQL query is still in progress. + +To verify whether the SQL query is repeatedly retried, view the `RETRY_CNT` and `RETRY_INFO` fields. `RETRY_CNT` indicates the number of retries. `RETRY_INFO` indicates the reason for the last retry. `TOTAL_TIME` indicates the total execution time of the SQL query, including the time consumed for each retry. If the SQL query is repeatedly retried, determine whether manual intervention is required based on the error code provided in `RETRY_INFO`. In OceanBase Database of a version earlier than V4.1, the most common error is `-4138 (OB_SNAPSHOT_DISCARDED)`. If this error is returned, increase the value of `undo_retention` by referring to Section 4.2.4 in [Mastering Parallel Execution in OceanBase Database: Part 4 - Parallel Execution Types](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-IV). For other errors such as `-4038 (OB_NOT_MASTER)`, wait for the automatic retry to succeed. If the number of retries consistently exceeds one while the system is stable, contact the OceanBase R&D team for further analysis. + +```sql + -- MySQL mode + SELECT + TENANT,INFO,TRACE_ID,STATE,TIME,TOTAL_TIME,RETRY_CNT,RETRY_INFO + FROM + oceanbase.GV$OB_PROCESSLIST; +``` + +If you find the corresponding SQL statement in the `GV$OB_PROCESSLIST` view and the SQL statement is marked as `SESSION_KILLED` but fails to exit, contact the OceanBase R&D team to report the issue. This often occurs due to the following cause: + +* The `SESSION_KILLED` state is not detected correctly, preventing a timely exit from the execution process. + + + +### 6.2.2 Verify Whether the SQL Query Is Being Executed in Parallel + +You can query the `GV$OB_PX_WORKER_STAT` view for all active PX threads in an OceanBase cluster. + +```sql + -- MySQL mode + OceanBase(admin@oceanbase)>select * from oceanbase.GV$OB_PX_WORKER_STAT; + SESSION_ID: 3221520411 + TENANT_ID: 1002 + SVR_IP: 192.168.0.1 + SVR_PORT: 19510 + TRACE_ID: Y4C360B9E1F4D-0005F9A76E9E66B2-0-0 + QC_ID: 1 + SQC_ID: 0 + WORKER_ID: 0 + DFO_ID: 0 + START_TIME: 2023-04-23 17:29:17.372461 + + -- Oracle mode + OceanBase(root@SYS)>select * from SYS.GV$OB_PX_WORKER_STAT; + SESSION_ID: 3221520410 + TENANT_ID: 1004 + SVR_IP: 192.168.0.1 + SVR_PORT: 19510 + TRACE_ID: Y4C360B9E1F4D-0005F9A76E9E66B1-0-0 + QC_ID: 1 + SQC_ID: 0 + WORKER_ID: 0 + DFO_ID: 0 + START_TIME: 2023-04-23 17:29:15.372461 +``` + +Based on the trace ID queried from the `GV$OB_PROCESSLIST` view, you can query the `GV$OB_PX_WORKER_STAT` view for the data flow operations (DFOs) being executed in the current SQL query, as well as the execution time of the DFOs. + +If no required information is found in the `GV$OB_PX_WORKER_STAT` view, but you can still find the corresponding SQL query in the `GV$OB_PROCESSLIST` view, the possible causes are as follows: + +* All DFOs have been completed, the result set is large, and data is being output to the client. +* All DFOs except for the top-layer DFO have been completed. + + + +### 6.2.3 Verify the Execution Status of Each Operator + +Query the `oceanbase.GV$SQL_PLAN_MONITOR` view in MySQL mode or the `SYS.GV$SQL_PLAN_MONITOR` view in Oracle mode for the execution status of each operator in each PX thread. In OceanBase Database V4.1 and later, the `GV$SQL_PLAN_MONITOR` view records two parts of data: + +* Operators that have been completed, namely operators that have called the `close` operation and no longer process data in the current thread. +* Operators that are being executed, namely operators that have not called the `close` operation and are processing data. To query data of these operators from the `GV$SQL_PLAN_MONITOR` view, you must specify `request_id < 0` in the `WHERE` condition. When you use the `request_id < 0` condition to query this view, you are calling the `Realtime SQL PLAN MONITOR` operation. This operation may change in the future. + +In OceanBase Database of a version earlier than V4.1, you can view the status of only completed operators. + +The important fields in the `GV$SQL_PLAN_MONITOR` view are described as follows: + +* `TRACE_ID`: the unique ID of an SQL statement. +* `PLAN_LINE_ID`: the ID of an operator in the execution plan, which corresponds to the ID queried by using the `EXPLAIN` statement. +* `PLAN_OPERATION`: the name of the operator, such as `TABLE SCAN` or `HASH JOIN`. +* `OUTPUT_ROWS`: the number of rows generated by the current operator. +* `FIRST_CHANGE_TIME`: the time when the operator generated the first row. +* `LAST_CHANGE_TIME`: the time when the operator generated the last row. +* `FIRST_REFRESH_TIME`: the time when the monitoring of the operator started. +* `LAST_REFRESH_TIME`: the time when the monitoring of the operator ended. + +The preceding fields can basically describe the major data processing actions taken by an operator. Here are some examples. + +1. The following sample code queries the number of threads used by each operator in a completed SQL statement. + + ```sql + SELECT PLAN_LINE_ID, PLAN_OPERATION, COUNT(*) THREADS + FROM GV$SQL_PLAN_MONITOR + WHERE TRACE_ID = 'YA1E824573385-00053C8A6AB28111-0-0' + GROUP BY PLAN_LINE_ID, PLAN_OPERATION + ORDER BY PLAN_LINE_ID; + + +--------------+------------------------+---------+ + | PLAN_LINE_ID | PLAN_OPERATION | THREADS | + +--------------+------------------------+---------+ + | 0 | PHY_PX_FIFO_COORD | 1 | + | 1 | PHY_PX_REDUCE_TRANSMIT | 2 | + | 2 | PHY_GRANULE_ITERATOR | 2 | + | 3 | PHY_TABLE_SCAN | 2 | + +--------------+------------------------+---------+ + 4 rows in set (0.104 sec) + ``` + +2. The following sample code queries the operators being executed, the number of threads used, and the number of rows that have been generated in an SQL statement being executed. + + ```sql + SELECT PLAN_LINE_ID, CONCAT(LPAD('', PLAN_DEPTH, ' '), PLAN_OPERATION) OPERATOR, COUNT(*) THREADS, SUM(OUTPUT_ROWS) ROWS + FROM GV$SQL_PLAN_MONITOR + WHERE TRACE_ID = 'YA1E824573385-00053C8A6AB28111-0-0' AND REQUEST_ID < 0 + GROUP BY PLAN_LINE_ID, PLAN_OPERATION, PLAN_DEPTH + ORDER BY PLAN_LINE_ID; + ``` + +3. The following sample code queries the number of rows that have been processed by each operator and the number of rows that have been generated by each operator in a completed SQL statement. + + ```sql + SELECT PLAN_LINE_ID, CONCAT(LPAD('', PLAN_DEPTH, ' '), PLAN_OPERATION) OPERATOR, SUM(OUTPUT_ROWS) ROWS + FROM GV$SQL_PLAN_MONITOR + WHERE TRACE_ID = 'Y4C360B9E1F4D-0005F9A76E9E6193-0-0' + GROUP BY PLAN_LINE_ID, PLAN_OPERATION, PLAN_DEPTH + ORDER BY PLAN_LINE_ID; + +--------------+-----------------------------------+------+ + | PLAN_LINE_ID | OPERATOR | ROWS | + +--------------+-----------------------------------+------+ + | 0 | PHY_PX_MERGE_SORT_COORD | 2 | + | 1 | PHY_PX_REDUCE_TRANSMIT | 2 | + | 2 | PHY_SORT | 2 | + | 3 | PHY_HASH_GROUP_BY | 2 | + | 4 | PHY_PX_FIFO_RECEIVE | 2 | + | 5 | PHY_PX_DIST_TRANSMIT | 2 | + | 6 | PHY_HASH_GROUP_BY | 2 | + | 7 | PHY_HASH_JOIN | 2002 | + | 8 | PHY_HASH_JOIN | 2002 | + | 9 | PHY_JOIN_FILTER | 8192 | + | 10 | PHY_PX_FIFO_RECEIVE | 8192 | + | 11 | PHY_PX_REPART_TRANSMIT | 8192 | + | 12 | PHY_GRANULE_ITERATOR | 8192 | + | 13 | PHY_TABLE_SCAN | 8192 | + | 14 | PHY_GRANULE_ITERATOR | 8192 | + | 15 | PHY_TABLE_SCAN | 8192 | + | 16 | PHY_GRANULE_ITERATOR | 8192 | + | 17 | PHY_TABLE_SCAN | 8192 | + +--------------+-----------------------------------+------+ + 18 rows in set (0.107 sec) + ``` + +The `PLAN_DEPTH` field is used for indentation for better display effects. `PLAN_DEPTH` specifies the depth of an operator in the operator tree. + + + +**Note:** + +1. Information about operators that have not been scheduled is not recorded in the `GV$SQL_PLAN_MONITOR` view. +2. If a procedural language (PL) object contains multiple SQL statements, the statements share the same trace ID. + + + +6.3 Parallel Execution Tuning Tips +------------ + +This section describes the basic tips for parallel execution tuning in OceanBase Database. As tuning never truly ends, we will keep updating the content in this section to include new ideas and improvements. + +### 6.3.1 Manually Collect Statistics + +If the statistics saved in the optimizer are outdated, a nonoptimal execution plan may be generated. OceanBase Database provides API operations for manually collecting statistics in V3.2 and V4.1. For more information, see [Manually collect statistics](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001134124). + +The syntax for collecting statistics on a primary table or an index table in OceanBase Database V4.1 is as follows: + +```sql + -- Collect the global statistics on the T1 table of the TEST user, and enable the AUTO strategy for determining the number of buckets for all columns. + call dbms_stats.gather_table_stats('TEST', 'T1', granularity=>'GLOBAL', method_opt=>'FOR ALL COLUMNS SIZE AUTO'); + -- Collect the statistics on the IDX index in the T1 table of the TEST user, set the degree of parallelism (DOP) to 4, and specify the table name. The table name must be specified because the index name is not unique. + call dbms_stats.gather_index_stats('TEST', 'IDX', degree=>4, tabname=>'T1'); +``` + +### 6.3.2 Modify the Partitioning Method for a Partition-wise Join + +For a large-table join in a proof of concept (PoC) scenario, if allowed by the business system, you can use the same partitioning method for the large tables and bind the tables to the same table group to achieve optimal performance for partition-wise joins. When you perform a partition-wise join, you must adjust the DOP to a value that matches the partition quantity to achieve optimal performance. + +### 6.3.3 Adapt the DOP and Partition Quantity + +Generally, preferable performance can be achieved if the DOP and the partition quantity are in an integral multiple relationship. For more information, see Section 1.6 in [Mastering Parallel Execution in OceanBase Database: Part 1 - Introduction to Parallel Execution](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-I). + +### 6.3.4 Create Indexes + +You can create appropriate indexes to reduce the amount of data to be scanned, thereby improving the parallel execution performance. You need to determine the tables and columns on which indexes are to be created based on specific SQL statements. + +### 6.3.5 Create Replicated Tables + +In OceanBase Database V4.2 and later, you can create replicated tables to reduce data redistribution, thereby improving the parallel execution performance. For more information, see the **Create a replicated table** section in [Create a table](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001031355). The basic syntax for creating a replicated table is as follows: + +```sql + create table dup_t1(c1 int) duplicate_scope = 'cluster'; +``` diff --git a/docs/blogs/tech/parallel-execution-VII.md b/docs/blogs/tech/parallel-execution-VII.md new file mode 100644 index 000000000..cf0f34e65 --- /dev/null +++ b/docs/blogs/tech/parallel-execution-VII.md @@ -0,0 +1,99 @@ +--- +slug: parallel-execution-VII +title: 'Mastering Parallel Execution in OceanBase Database: Part 7 - Get Started with a PoC Test' +--- + +> Parallel execution is a complex subject. You need to have a proper understanding of parallel execution to make full use of its capabilities. This article aims to help you get started with parallel execution and applies to OceanBase Database **V3.1 and later**. Parameters in this article are **not optimal** but can help **avoid most bad cases**. + +This is the last article of a seven-part series on parallel execution. + +Part 1 + +[Introduction](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-I) + +Part 2 + +[Set the DOP](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-II) + +Part 3 + +[Concurrency Control and Queuing](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-III) + +Part 4 + +[Parallel Execution Types](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-IV) + +Part 5 + +[Parallel Execution Parameters](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-V) + +Part 6 + +[Troubleshooting and Tuning Tips](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VI) + +Part 7 + +[Get Started with a PoC Test](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VII) + +Initialize the Environment +----- + +Execute the following command in an analytical processing (AP) tenant: +```sql + /* MySQL */ + set global parallel_servers_target = MIN_CPU * 20; + + /* Oracle */ + alter system set parallel_servers_target = MIN_CPU * 20; +``` + +Collect Statistics +------ + +In OceanBase Database V3.x, statistics collection is bound with major compactions. Therefore, after you import data, you must initiate a major compaction before you collect statistics. + +In OceanBase Database V4.x, after you import data, you can directly call the [DBMS\_STAT package](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001134124) to collect statistics. + + + +Set a Hint +------- + +Make sure that the **maximum** degree of parallelism (DOP) of an SQL statement does not exceed 1.5 times the number of physical CPU cores. + +Generally, if you do not need to execute multiple SQL statements in parallel, you can set the DOP of a single SQL statement to **the number of CPU cores**. + +For example, if the system has 32 physical CPU cores, you can set the hint as `/*+ PARALLEL(32) */`. + +Tune the Performance +---- + +1. Run the `top -H` command to view the CPU utilization of the current tenant. +2. If the performance of a single SQL statement is not as expected, contact OceanBase Technical Support to query the `sql_plan_monitor` view for the performance report and contact R&D engineers for further analysis. + +FAQ +---- + +1. What do I do if the query performance is not as expected while the CPU resources are not fully used? + +> Execute the `show variables like 'parallel_servers_target` statement and check whether the value of `parallel_servers_target` is not less than MIN_CPU × 20. + +2. What do I do if the PDML performance is not as expected? + +Execute the `EXPLAIN EXTENDED` statement to verify whether parallel DML (PDML) is used. If PDML is not used, the `Note` field at the bottom of the plan describes the reason. Generally, if the target table contains triggers, foreign keys, or local unique indexes, PDML will not be used. + +> Keywords such as `DISTRIBUTED INSERT`, `DISTRIBUTED UPDATE`, and `DISTRIBUTED DELETE` indicate that PDML is not used. + + +3. What do I do when the error `-4138 OB_SNAPSHOT_DISCARDED` is returned upon a PDML timeout? + +Set the `undo_retention` parameter to a value that is not less than the maximum execution time of a PDML statement. The default value of `undo_retention` is 30 minutes. If the execution time of a PDML statement exceeds 30 minutes, this error may be returned and the statement will be aborted and retried until it times out. + + + +4. How do I enable parallel execution for business SQL statements without making any modifications to the business? + +OceanBase Database Proxy (ODP) provides a web UI for you to modify connection configurations to enable parallel execution. For example, you can set the DOP of all SQL statements in a read/write splitting connection to 2. +![1705633920](/img/blogs/tech/parallel-execution-VII/image/1705633920006.png) + +The web UI was iterated in April 2023 and released in early May 2023. Make sure that the version of ODP is V3211bp1 or later. \ No newline at end of file diff --git a/docs/blogs/tech/partition-create.md b/docs/blogs/tech/partition-create.md new file mode 100644 index 000000000..79f3b5d68 --- /dev/null +++ b/docs/blogs/tech/partition-create.md @@ -0,0 +1,236 @@ +--- +slug: partition-create +title: 'A Brief Analysis of Frequently Asked Questions about Partition Creation' +--- + +# A Brief Analysis of Frequently Asked Questions about Partition Creation + +In [OceanBase Discord Community](https://discord.gg/74cF8vbNEs), a new module of the OceanBase community forum, a post has introduced how to set partitioning strategies and manage partitioning plans in OceanBase Developer Center (ODC). In that post, I noticed several users asking questions about partition management. + +I have encountered one of the mentioned restrictions on partition creation many times but never stopped to consider why it exists. Since a user has raised this question, I will take the opportunity to briefly analyze it and share my insights. + +## Why Must the Primary Key of a Partitioned Table Include All Partitioning Keys of the Table? + +The first user question is as follows: I have a large order table and want to partition the data by year. Currently, the primary key contains only the ID column. I tried to partition the data by date but failed. Must I combine the date and ID columns into a composite primary key? + +The answer is yes. The primary key of a partitioned table must include all partitioning keys of the table. The primary key uniqueness check is performed within each partition. If the primary key does not include all partitioning keys, the check may fail. This is why MySQL and other databases also have this restriction. + +```sql + -- If the primary key does not include all partitioning keys, the table creation operation fails with a clear error message. + create table t1(c1 int, + c2 int, + c3 int, + primary key (c1)) + partition by range (c2) + (partition p1 values less than(3), + partition p1 values less than(6)); + + ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function +``` + +Here is an example: + +```sql + create table t1(c1 int, + c2 int, + c3 int, + primary key (c1, c2)) + partition by range (c2) + (partition p0 values less than(3), + partition p1 values less than(6)); + Query OK, 0 rows affected (0.146 sec) + + obclient [test]> insert into t1 values(1, 2, 3); + Query OK, 1 row affected (0.032 sec) + + obclient [test]> insert into t1 values(1, 5, 3); + Query OK, 1 row affected (0.032 sec) + + obclient [test]> select * from t1; + +----+----+------+ + | c1 | c2 | c3 | + +----+----+------+ + | 1 | 2 | 3 | + | 1 | 5 | 3 | + +----+----+------+ + 2 rows in set (0.032 sec) +``` + +We created a table, with the c1 and c2 columns as the primary key and the c2 column as the partitioning key. Values smaller than 3 are in the p0 partition, while values greater than or equal to 3 but smaller than 6 are in the p1 partition. We then inserted two rows, with the first row in the p0 partition and the second row in the p1 partition. + +```sql + obclient [test]> select * from t1 PARTITION(p0); + +----+----+------+ + | c1 | c2 | c3 | + +----+----+------+ + | 1 | 2 | 3 | + +----+----+------+ + 1 row in set (0.033 sec) + + obclient [test]> select * from t1 PARTITION(p1); + +----+----+------+ + | c1 | c2 | c3 | + +----+----+------+ + | 1 | 5 | 3 | + +----+----+------+ + 1 row in set (0.034 sec) +``` + +If the primary key includes only c1, the uniqueness check for c1 passes in both p0 and p1 because the c1 values are unique within each partition. As a result, the inserted data is considered to meet the primary key constraint. In reality, duplicate values exist across partitions, and the data violates the primary key constraint. That is why all databases require the primary key to include all partitioning keys during table partitioning. + + + +## Why Does Partitioning Speed Up Queries? + + +The second user question is as follows: Does partitioning by date speed up queries? + +My personal understanding is that partitioning not only balances data from large tables across different database nodes but also speeds up queries. During the execution of a query, the partitioning key in the filter condition is used for partition pruning. Here are two examples. + +If a partitioning key is included in the filter condition, you can find partitions(p0) in the plan, indicating only data in p0 is scanned. + +```sql + obclient [test]> explain select * from t1 where c2 = 1; + +------------------------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------------------------+ + | =============================================== | + | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | + | ----------------------------------------------- | + | |0 |TABLE FULL SCAN|t1 |1 |3 | | + | =============================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([t1.c1], [t1.c2], [t1.c3]), filter([t1.c2 = 1]), rowset=16 | + | access([t1.c1], [t1.c2], [t1.c3]), partitions(p0) | + | is_index_back=false, is_global_index=false, filter_before_indexback[false], | + | range_key([t1.c1], [t1.c2]), range(MIN,MIN ; MAX,MAX)always true | + +------------------------------------------------------------------------------------+ + 11 rows in set (0.034 sec) +``` + +If no partitioning key is included in the filter condition, you can find partitions(p\[0-1\]) in the plan, indicating data in p0 and p1 is scanned. The PX PARTITION ITERATOR operator is used to iterate through and scan all partitions. + +```sql + obclient [test]> explain select * from t1 where c3 = 1; + +------------------------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------------------------+ + | ============================================================= | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | ------------------------------------------------------------- | + | |0 |PX COORDINATOR | |1 |6 | | + | |1 |└─EXCHANGE OUT DISTR |:EX10000|1 |6 | | + | |2 | └─PX PARTITION ITERATOR| |1 |5 | | + | |3 | └─TABLE FULL SCAN |t1 |1 |5 | | + | ============================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([INTERNAL_FUNCTION(t1.c1, t1.c2, t1.c3)]), filter(nil), rowset=16 | + | 1 - output([INTERNAL_FUNCTION(t1.c1, t1.c2, t1.c3)]), filter(nil), rowset=16 | + | dop=1 | + | 2 - output([t1.c1], [t1.c2], [t1.c3]), filter(nil), rowset=16 | + | force partition granule | + | 3 - output([t1.c1], [t1.c2], [t1.c3]), filter([t1.c3 = 1]), rowset=16 | + | access([t1.c1], [t1.c2], [t1.c3]), partitions(p[0-1]) | + | is_index_back=false, is_global_index=false, filter_before_indexback[false], | + | range_key([t1.c1], [t1.c2]), range(MIN,MIN ; MAX,MAX)always true | + +------------------------------------------------------------------------------------+ + 19 rows in set (0.038 sec) +``` + +## RANGE Partitioning Does Not Support the DATETIME Type. What Should I Do? + +The third user question is as follows: RANGE partitioning does not support the DATETIME type. What should I do? + +```sql + CREATE TABLE ff01 (a datetime , b timestamp) + PARTITION BY RANGE(UNIX_TIMESTAMP(a))( + PARTITION p0 VALUES less than (UNIX_TIMESTAMP('2000-2-3 00:00:00')), + PARTITION p1 VALUES less than (UNIX_TIMESTAMP('2001-2-3 00:00:00')), + PARTITION pn VALUES less than MAXVALUE); + + ERROR 1486 (HY000): Constant or random or timezone-dependent expressions in (sub)partitioning function are not allowed +``` + +I tested the MySQL mode of OceanBase Database and found that it has imposed some restrictions on random expressions for compatibility with MySQL. I first considered using generated columns as a workaround, only to find that OceanBase Database disallows the use of the UNIX\_TIMESTAMP expression in them, which is also for compatibility with MySQL. + +```sql + CREATE TABLE ff01 (a datetime , b timestamp as (UNIX_TIMESTAMP(a))) + PARTITION BY RANGE(b)( + PARTITION p0 VALUES less than (UNIX_TIMESTAMP('2000-2-3 00:00:00')), + PARTITION p1 VALUES less than (UNIX_TIMESTAMP('2001-2-3 00:00:00')), + PARTITION pn VALUES less than MAXVALUE + ); + + ERROR 3102 (HY000): Expression of generated column contains a disallowed function +``` + +UNIX\_TIMESTAMP is disallowed in generated columns probably because it is [a nondeterministic system function](https://dev.mysql.com/doc/refman/8.4/en/function-optimization.html). As a nondeterministic function, UNIX\_TIMESTAMP() may return different results when executed at different times, even one second apart. Therefore, nondeterministic functions are not allowed in expressions for partitions, expressions for generated columns, or expressions in check constraints. + +Here is a simple example to clarify the meaning of "random" shown in ERROR 1486 and the meaning of "nondeterministic:" + +```sql + obclient [test]> select UNIX_TIMESTAMP(); + +------------------+ + | UNIX_TIMESTAMP() | + +------------------+ + | 1725008180 | + +------------------+ + 1 row in set (0.042 sec) + + obclient [test]> select UNIX_TIMESTAMP(); + +------------------+ + | UNIX_TIMESTAMP() | + +------------------+ + | 1725008419 | + +------------------+ + 1 row in set (0.041 sec) + + -- Now you see why UNIX_TIMESTAMP is so special that it is disallowed almost everywhere. +``` + +It is undeniable that OceanBase Database offers impressive compatibility with MySQL, accommodating not only usage restrictions but also bugs. While this may cause some inconvenience, migration from MySQL should be much smoother. + +After checking the [documentation of OceanBase Database](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001974641) on the official website, I found RANGE COLUMNS partitioning is similar to RANGE partitioning. Compared to RANGE partitioning, RANGE COLUMNS partitioning supports more data types, including DATETIME, but does not support the use of expressions in partition definitions. + +Therefore, users can replace RANGE partitioning with RANGE COLUMNS partitioning to use nondeterministic functions such as UNIX\_TIMESTAMP. Here is an example: + +```sql + CREATE TABLE ff01 (a datetime , b timestamp) + PARTITION BY RANGE COLUMNS(a)( + PARTITION p0 VALUES less than ('2023-01-01'), + PARTITION p1 VALUES less than ('2023-01-02'), + PARTITION pn VALUES less than MAXVALUE); + + Query OK, 0 rows affected (0.101 sec) +``` + +Actually, I never realized the difference between RANGE partitioning and RANGE COLUMNS partitioning until today. + +If you are interested, see [MySQL official documentation](https://dev.mysql.com/doc/refman/8.4/en/partitioning-columns-range.html) for a better understanding. + +![1725007840](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-08/1725007840101.png) + + + +## What Else? + + +In terms of the third user question, a colleague has suggested replacing the UNIX\_TIMESTAMP function with the to\_days function. This eliminates the need to switch from RANGE partitioning to RANGE COLUMNS partitioning. Here is an example: + +```sql + ## Create a RANGE-based partitioned table. + -- The partitioning key column is start_time, whose data type is DATETIME. + CREATE TABLE dba_test_range_1 ( + id bigint UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'primary key', + `name` varchar(50) NOT NULL COMMENT 'name', + start_time datetime NOT NULL COMMENT 'start time', + PRIMARY KEY (id, start_time) + ) AUTO_INCREMENT = 1 CHARSET = utf8mb4 COMMENT 'test range' + PARTITION BY RANGE (to_days(start_time)) ( + PARTITION M202301 VALUES LESS THAN (to_days('2023-02-01')), + PARTITION M202302 VALUES LESS THAN (to_days('2023-03-01')), + PARTITION M202303 VALUES LESS THAN (to_days('2023-04-01')) + ); +``` diff --git a/docs/blogs/tech/practices-binlog.md b/docs/blogs/tech/practices-binlog.md new file mode 100644 index 000000000..05ae37aa3 --- /dev/null +++ b/docs/blogs/tech/practices-binlog.md @@ -0,0 +1,267 @@ +--- +slug: practices-binlog +title: 'Practices of OceanBase Binlog Service' +--- + +The OceanBase team has recently released an open source edition of the binlog service, which converts clogs in OceanBase into binlogs for downstream tools such as Canal and Flink CDC to consume. Today, we will try the binlog service and check its correctness by using the MySQL binlog tool named mysqlbinlog. + +The binlog service is a service mode provided by oblogproxy. When binlog\_mode is set to true, the service compatible with native MySQL binlogs is enabled, providing functionalities such as the generation of binlog files that contain SQL statements and binlog dump. To use the binlog service, download oblogproxy in the [software center](https://www.oceanbase.com/softwarecenter) of the OceanBase official website. + +Currently, the binlog service documentation is unavailable on the OceanBase official website. You can view it on GitHub: [Binlog Service Documentation](https://github.com/oceanbase/oblogproxy/blob/dev/docs/binlog_service.md). + +### Introduction + +* **ODP**: OceanBase Database Proxy (ODP) provides SQL statements and binlogs with unified access to OceanBase. The service for binlogs consists of binlog commands such as show binlog events and the binlog replication protocol. +* **oblogproxy**: oblogproxy ensures compatibility with MySQL binlogs, including compatibility with binlog commands and the binlog replication protocol. +* **bc**: Binlog converter (bc) is a sub-process of oblogproxy. It pulls and parses clogs through libobcdc to convert clogs into binlogs. +* **bd**: Binlog dumper (bd) is a sub-process of oblogproxy. It provides binlog event subscription service for binlog dump requests from downstream tools such as Canal and Flink CDC. +* **bcm**: Binlog converter manager (bcm) is the bc management module of oblogproxy. +* **bdm**: Binlog dumper manager (bdm) is the bd management module of oblogproxy. + +As shown in the following architecture, oblogproxy connects to ODP in the cluster to obtain cluster logs and converts them into binlogs. Downstream tools such as Canal and Flink CDC also connect to ODP to consume binlogs. + +![1702608197](/img/blogs/tech/practices-binlog/image/1702608197778.png) + +### Limitations + +* Currently, the binlog service requires OBServer and ODP V4.2.1 or later. +* The extended semantics for the ENUM and SET types in OceanBase are not supported. For example, the extended semantics support more than 64 SET definitions, duplication, and insertion of undefined data (such as ") into ENUM. +* Varchar(65536) definitions are not supported. +* Geographic information system (GIS) data types are not supported. +* Differences from some MySQL DDL operations may cause incompatibility between parsed binlogs and MySQL. However, OBServer has resolved this issue. We recommend that you set init\_sql in ODP to enable \_show\_ddl\_in\_compat\_mode at the tenant level. After you do this, the SHOW CREATE TABLE results output by OBServer will be fully compatible with MySQL syntax. + +### Environment + +The environment consists of four servers, with three servers hosting a 3-node OceanBase cluster and one server hosting the oblogproxy service. + +| IP address | Role | +| ---------- | ----------- | +| 172.24.255.54 | oblogproxy | +| 172.24.255.56 | OBServer and ODP | +| 172.24.255.57 | OBServer | +| 172.24.255.58 | OBServer | + +### Installation and Configuration + +#### Configure ODP + +As shown in the preceding architecture, oblogproxy, Canal, and Flink CDC all interact with ODP. Canal and Flink CDC are unaware of oblogproxy, the actual binlog service provider. ODP forwards downstream requests to OBServer or oblogproxy. + +As a result, you must configure the oblogproxy service address in ODP. +```sql + # Connect to ODP to access the cluster or interact with ODP. + [root@OB1 ~]# obclient -h172.24.255.56 -P2883 -uroot@sys#myoceanbase -pxxx -Doceanbase -A + + # Query the IP address of the binlog server. Currently, it is empty. + obclient [oceanbase]> show proxyconfig like 'binlog_service_ip'; + +-------------------+-------+-----------------------------------------+-------------+---------------+ + | name | value | info | need_reboot | visible_level | + +-------------------+-------+-----------------------------------------+-------------+---------------+ + | binlog_service_ip | | binlog service ip, format ip1:sql_port1 | false | SYS | + +-------------------+-------+-----------------------------------------+-------------+---------------+ + 1 row in set (0.001 sec) + + # Configure the binlog server address in the format of ip:port. + obclient [oceanbase]> alter proxyconfig set binlog_service_ip="172.24.255.54:2983"; + Query OK, 0 rows affected (0.004 sec) + + # Enable forwarding for the binlog service. + obclient [oceanbase]> alter proxyconfig set enable_binlog_service='True'; + + # Configure init_sql to set session-level system variables for all sessions passing through the ODP. + obclient [oceanbase]> alter proxyconfig set init_sql='set _show_ddl_in_compat_mode = 1;'; +``` + +#### Install and start the binlog service + +Download the installation package, upload it to the server, and start the installation. +```bash + [root@OB2 ~]# rpm -ivh oblogproxy-2.0.0-100000012023111521.el7.x86_64.rpm +``` + +By default, oblogproxy is installed in the /usr/local/oblogproxy directory. + + + +Modify the conf/conf.json file in the installation directory. +```bash + [root@OB2 ~]# cd /usr/local/oblogproxy + [root@OB2 oblogproxy]# vim conf/conf.json + # Modify the following parameters to enable binlogs and specify absolute paths. + "binlog_mode": true + "oblogreader_path": "/usr/local/oblogproxy/run" + "bin_path": "/usr/local/oblogproxy/bin" + "oblogreader_obcdc_ce_path_template": "/usr/local/oblogproxy/obcdc/obcdc-ce-%d.x-access/libobcdc.so" + "binlog_log_bin_basename": "/usr/local/oblogproxy/run" + "binlog_obcdc_ce_path_template": "/usr/local/oblogproxy/obcdc/obcdc-ce-%d.x-access/libobcdc.so" +``` + +For more information about the configurations, see [Binlog Service Documentation](https://github.com/oceanbase/oblogproxy/blob/dev/docs/binlog_service.md). + + + +Configure an account. As the username and a password cannot be written in plaintext in the configuration file, call the config\_sys function to encrypt them and replace the values of ob\_sys\_username and ob\_sys\_password in the conf.json file with the encrypted username and password. +```bash + [root@OB2 oblogproxy]# ./run.sh config_sys ${sys_usr} ${sys_pwd} + # Enter y when a prompt appears to confirm whether to update ob_sys_username and ob_sys_password in the conf.json file. + DEPLOY_PATH : /usr/local/oblogproxy + + !!DANGER!! About to update logproxy conf/conf.json, Please confirm? [Y/n] y +``` + +Note that the username and password you specify must belong to the sys tenant, such as root@sys#cluster\_name, and must be enclosed in double quotation marks ("). + + + +Start the binlog service. +```bash + # Start oblogproxy. + [root@OB2 oblogproxy]# ./run.sh start +``` + +Check the log file in the log/ directory for errors. +``` + [root@OB2 oblogproxy]# cat log/logproxy.log + [2023-12-06 15:51:34] [info] environmental.cpp(27): Max file descriptors: 655350 + [2023-12-06 15:51:34] [info] environmental.cpp(34): Max processes/threads: 655360 + [2023-12-06 15:51:34] [info] environmental.cpp(41): Core dump size: 18446744073709551615 + [2023-12-06 15:51:34] [info] environmental.cpp(48): Maximum number of pending signals: 252872 + [2023-12-06 15:51:34] [info] binlog_server.cpp(66): Start pull up all BC processes + [2023-12-06 15:51:34] [info] binlog_server.cpp(76): The current binlog converter [myoceanbase,obtest]is alive and the pull action is terminated + [2023-12-06 15:51:34] [info] binlog_server.cpp(76): The current binlog converter [myoceanbase,obtest]is alive and the pull action is terminated + [2023-12-06 15:51:34] [info] binlog_server.cpp(89): Finish to pull up 1 BC processes + [2023-12-06 15:51:34] [info] event_wrapper.cpp(43): Succeed to listen socket with port: 2983 + [2023-12-06 15:51:34] [info] binlog_server.cpp(47): Start OceanBase binlog server on port 2983 +``` + + + + + +#### Configure the binlog service + +After the binlog service starts successfully, specify the tenant whose binlogs you want to obtain. Log in to oblogproxy to create a binlog subscription sub-process using the following official syntax: +```sql + CREATE BINLOG [IF NOT EXISTS] FOR TENANT `cluster`. `tenant` [TO USER `user` PASSWORD `pwd`] [FROM ] WITH CLUSTER URL ``[, INITIAL_TRX_XID `ob_txn_id`, INITIAL_TRX_GTID_SEQ `gtid_seq`]; + + -- You can specify a username and password for the binlog service, which are used for subscribing to OceanBase logs. In version 4.x, business tenants are allowed. + [TO USER `user` PASSWORD `pwd`] + + -- You can map an OceanBase transaction ID to the global transaction ID (GTID) of a binlog. + -- INITIAL_TRX_XID: The OceanBase transaction ID. + -- INITIAL_TRX_GTID_SEQ: The GTID to map to. + [, INITIAL_TRX_XID `ob_txn_id`, INITIAL_TRX_GTID_SEQ `gtid_seq`] + + -- Example: + CREATE BINLOG FOR TENANT `cluster`. `tenant` TO USER `user` PASSWORD `pwd` FROM 1668339370000000 WITH CLUSTER URL 'cluster_url', SERVER UUID '2340778c-7464-11ed-a721-7cd30abc99b4', INITIAL_TRX_XID '97984179', INITIAL_TRX_GTID_SEQ '31'; +``` + + + + + +To obtain the preceding parameters, connect to the target OBServer cluster and run the following commands: + +```sql +timestamp: select time\_to\_usec(NOW()); + +cluster\_url:show parameters like '%url%' + +SERVER UUID: show global variables like '%uuid'; + +INITIAL\_TRX\_XID: select \* from GV$OB\_TRANSACTION\_PARTICIPANTS; + +INITIAL\_TRX\_GTID\_SEQ: 1 # For the first startup, you can specify any number. +``` + + + +For the first startup, you do not need to specify the timestamp, INITIAL\_TRX\_XID, or INITIAL\_TRX\_GTID\_SEQ because the system configures them automatically. The following code executes the startup command and queries the status: + +```sql + [root@OB2 oblogproxy]# mysql -A -c -h 127.0.0.1 -P 2983 + MySQL [(none)]> CREATE BINLOG FOR TENANT `myoceanbase`. `obtest` TO USER `root` PASSWORD `xxxx` WITH CLUSTER URL 'http://172.24.255.53:8080/services?Action=ObRootServiceInfo&User_ID=alibaba&UID=ocpmaster&ObRegion=myoceanbase', SERVER UUID 'xxxx-xxx-xx-xx-xxxxxx'; + MySQL [(none)]> SHOW BINLOG STATUS\G; + *************************** 1. row *************************** + cluster: myoceanbase + tenant: obtest + status: { + "binlog_files" : + [ + { + "binlog_name" : "mysql-bin.000001", + "binlog_size" : 178 + } + ], + "client_id" : "/usr/local/oblogproxy/run/myoceanbase/obtest", + "cpu_status" : + { + "cpu_count" : 16, + "cpu_used_ratio" : 0.12666244804859161 + }, + "disk_status" : + { + "disk_total_size_mb" : 503837, + "disk_usage_size_process_mb" : 0, + "disk_used_ratio" : 0.45975583791732788, + "disk_used_size_mb" : 231642 + }, + "memory_status" : + { + "mem_total_size_mb" : 63238, + "mem_used_ratio" : 0.0, + "mem_used_size_mb" : 735 + }, + "network_status" : + { + "network_rx_bytes" : 0, + "network_wx_bytes" : 0 + }, + "pid" : 7605 + } + 1 row in set (0.00 sec) +``` + + + + + +After startup, the subscription sub-process binlog\_converter is enabled. + +![1702608225](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2023-12/1702608226013.png) + + + +In the run/myoceanbase/obtest/data/ directory under the home directory, mysql-bin files are automatically generated. + +![1702608245](/img/blogs/tech/practices-binlog/image/1702608245245.png) + + + +Now, the binlog service is configured and started, and all create, read, update, and delete (CRUD) operations on the source OceanBase cluster can be captured and written to the mysql-bin files. + + + +### Parse Binlogs + +Lastly, write data to the source cluster and use the mysqlbinlog tool to check whether the parsing is correct. + +In this example, mysqlbinlog 3.4 is used. The corresponding MySQL version is about 5.7.9. + +![1702608266](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2023-12/1702608266385.png) + + + +Create a table and insert data in the source cluster to check whether the parsing succeeds. + +![1702608280](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2023-12/1702608280371.png) + + + +The parsing results are as follows. + +![1702608301](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2023-12/1702608301790.png) + + + +This is a simple test of the binlog service of OceanBase. If you are interested, download, test, and use it. For any questions, contact OceanBase Technical Support. For more information about the commands, view the binlog service documentation on GitHub or stay tuned to the official website. \ No newline at end of file diff --git a/docs/blogs/tech/row-to-vector.md b/docs/blogs/tech/row-to-vector.md new file mode 100644 index 000000000..a4c111323 --- /dev/null +++ b/docs/blogs/tech/row-to-vector.md @@ -0,0 +1,264 @@ +--- +slug: row-to-vector +title: 'From Rows to Vectors: The Evolution of the Execution Engine of OceanBase Database' +--- + +# From Rows to Vectors: The Evolution of the Execution Engine of OceanBase Database + +> This article introduces database system concepts without diving into the detailed design and implementation of vectorized operators and expressions in OceanBase Database. + +Background +== + +The OceanBase team has launched [OceanBase DBA: From Basics to Practices](https://youtube.com/live/3iwhQ4lAqgg), an official course series, to help users resolve issues more efficiently with OceanBase Database Community Edition. However, after the seventh live streaming, many users had difficulty understanding what terms such as `rowset=16` or `rowset=256` mean in a plan similar to the following one: + +```sql + obclient [test]> create table t1(c1 int, c2 int); + Query OK, 0 rows affected (0.203 sec) + + obclient [test]> explain select count(*) from t1 where c1 = 1000; + +------------------------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------------------------+ + | ================================================= | + | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | + | ------------------------------------------------- | + | |0 |SCALAR GROUP BY | |1 |4 | | + | |1 |└─TABLE FULL SCAN|t1 |1 |4 | | + | ================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), rowset=16 | + | group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]) | + | 1 - output([T_FUN_COUNT(*)]), filter([t1.c1 = 1000]), rowset=16 | + | access([t1.c1]), partitions(p0) | + | is_index_back=false, is_global_index=false, filter_before_indexback[false], | + | range_key([t1.__pk_increment]), range(MIN ; MAX)always true | + +------------------------------------------------------------------------------------+ + 14 rows in set (0.033 sec) +``` + +The rowset information in the plan is related to vectorized execution of the OceanBase Database execution engine. This article, the second one in the analytical processing (AP) performance series, answers the question and introduces the vectorized execution technology of OceanBase Database. + +Execution Engine Built on the Volcano Model +======== + +The vectorized execution engine is one of the key tools for boosting AP performance and played an important role in the championship of OceanBase Database in the 2021 TPC-H test. However, to better understand the vectorized execution engine, it is essential to learn about the Volcano model for conventional database execution engines. + +The Volcano model, also known as the Iterator model, is the most renowned query execution model. It was first introduced in the 1990 paper [Volcano—An Extensible and Parallel Query Evaluation System](https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf). Most conventional relational databases, including Oracle, MySQL, Db2, and SQL Server, are built on this model. + +In the Volcano model, a query plan is divided into multiple operators. Each operator is an iterator that implements the next() interface, typically in the following three steps: + +* Calls the next() method of the child operator to obtain its calculation result. +* Performs the calculation operation corresponding to the current operator on the calculation result returned by the child operator to obtain a result. +* Returns the result to the parent operator. + + + +> **Note**: +>    The next() interface of operators in the paper is named ObOperator::get\_next\_row() in the code of OceanBase Database. + +The Volcano model enables the query execution engine to elegantly assemble any operators without the need to consider the specific processing logic of each operator. During the execution of a query, nested get\_next\_row() methods in the query tree are called from the top down while data is pulled and processed from the bottom up. That is why the Volcano model is also called a pull-based model. To better understand the pull-based execution process of the Volcano model, let's continue with the preceding aggregation example: +```sql + select count(*) from t1 where c1 = 1000; +``` + +![11](/img/blogs/tech/row-to-vector/image/11.png) + + + +**Note**: + +Each tuple in the preceding figure is a result row returned by a lower-level operator to a higher-level operator. + +The process in the preceding figure is described as follows: + +* **Steps 1‒3:** The AGGREGATE operator first calls the get\_next\_row() method so that lower-level operators can call the get\_next\_row() method of their child operators level by level. +* **Steps 4‒6:** After obtaining data from the storage layer, the TABLE SCAN operator returns the result row to the FILTER operator. After calculating data based on the filter condition `c1 = 1000`, the FILTER operator returns the result row to the AGGREGATE operator. +* **Step 7:** The AGGREGATE operator repeatedly calls the next() method to retrieve the required data, completes the aggregation, and returns the result. + +If you disable vectorization in OceanBase Database, you can find execution plan trees similar to the one in the preceding figure. +```sql + -- Disable vectorization to force subsequent SQL queries to use the default single-row calculation mode, which is similar to that of the Volcano model. + alter system set _rowsets_enabled = false; + + -- You can observe that no rowset value exists in the following plan. + explain select count(*) from t1 where c1 = 1000; + +------------------------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------------------------+ + | ================================================= | + | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | + | ------------------------------------------------- | + | |0 |SCALAR GROUP BY | |1 |6 | | + | |1 |└─TABLE FULL SCAN|t1 |1 |6 | | + | ================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil) | + | group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]) | + | 1 - output([T_FUN_COUNT(*)]), filter([t1.c1 = 1000]) | + | access([t1.c1]), partitions(p0) | + | is_index_back=false, is_global_index=false, filter_before_indexback[false], | + | range_key([t1.__pk_increment]), range(MIN ; MAX)always true | + +------------------------------------------------------------------------------------+ + 14 rows in set (0.010 sec) +``` + +The plan in OceanBase Database contains only two operators and is simpler than that in the preceding figure. As every operator in OceanBase Database contains the functionality of the FILTER operator, no separate FILTER operator is needed. As shown in the preceding plan, the TABLE SCAN operator contains `filter([t1.c1 = 1000])`. The SCALAR GROUP BY operator in the plan corresponds to the AGGREGATE operator in the figure. It performs aggregations in scenarios where GROUP BY is not used. + +The Volcano model has clear processing logic, where operators are decoupled so that each operator focuses only on its own tasks. However, the model has two obvious drawbacks: + +* The virtual function get\_next\_row() is called for each row processed by every operator, and excessive calls can waste CPU resources. This issue is especially apparent in online analytical processing (OLAP) queries with a large data volume. +* Processing data row by row does not fully unleash the potential of modern CPUs. + +Vectorized Execution Engine and Its Benefits +=========== + +Vectorized models were first introduced in the paper [MonetDB/X100: Hyper-Pipelining Query Execution](http://cs.brown.edu/courses/cs227/archives/2008/Papers/ColumnStores/MonetDB.pdf). Unlike the Volcano model which iterates data row by row, a vectorized model adopts batch iterations, allowing a batch of data to be passed between operators at a time. Due to their effective use of CPU resources and modern CPU features, vectorized models have been widely adopted in the design of modern database engines. + +![1](/img/blogs/tech/row-to-vector/image/1.png) + +As shown in the preceding figure, the vectorized model pulls data from the root node of an operator tree level by level in a similar way as the traditional Volcano model. The difference is that the vectorized engine calls the get\_next\_batch() function to pass a batch of data at a time and keeps the batch as compact as possible in the memory, rather than calling the get\_next\_row() function to pass one row at a time. + +Reduce the Overhead of Virtual Function Calls +---------- + +The vectorized engine drastically reduces the number of function calls. Assuming that you want to query a table with 100 million rows of data. In a database based on the Volcano model, each operator must call the get\_next\_row() function 100 million times to complete the query. If you use the vectorized engine and set the vector size to 1,024 rows, the number of calls to the get\_next\_batch() function for the same query, which is calculated by dividing 100 million by 1,024, is 97,657. This greatly decreases the number of virtual function calls and reduces CPU overhead. + +![2](/img/blogs/tech/row-to-vector/image/2.png) + +In terms of the user question mentioned at the start of this article, the rowset in the plan indicates the number of rows in a batch or vector. +```sql + -- Enable vectorization. + alter system set _rowsets_enabled = true; + + -- Set the vector size to 16 rows. + alter system set _rowsets_max_rows = 16; + + -- The rowset information (rowset = 16) in the plan indicates that the vector size is 16 rows. + explain select count(*) from t1 where c1 = 1000; + +------------------------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------------------------+ + | ================================================= | + | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| | + | ------------------------------------------------- | + | |0 |SCALAR GROUP BY | |1 |4 | | + | |1 |└─TABLE FULL SCAN|t1 |1 |4 | | + | ================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), rowset=16 | + | group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]) | + | 1 - output([T_FUN_COUNT(*)]), filter([t1.c1 = 1000]), rowset=16 | + | access([t1.c1]), partitions(p0) | + | is_index_back=false, is_global_index=false, filter_before_indexback[false], | + | range_key([t1.__pk_increment]), range(MIN ; MAX)always true | + +------------------------------------------------------------------------------------+ + 14 rows in set (0.021 sec) +``` + +Unleash the Potential of Modern CPUs +-------------- + +### Compact data layout for better cache efficiency + +During vectorized execution, OceanBase Database compactly stores batch data in memory, with intermediate data organized in columns. For example, if a batch contains 256 rows, the 256 rows of data of the c1 column are stored contiguously in memory, followed by those of the c2 column, which are also stored contiguously. For the `concat(c1, c2)` expression, calculation is performed on the 256 rows at a time, with the result stored in the memory space pre-allocated to the expression. + +![3](/img/blogs/tech/row-to-vector/image/3.png) + +Since the intermediate data is contiguous, the CPU can quickly load the data into the L2 cache through the prefetch instruction to reduce memory stalls and improve CPU utilization. Inside an operator function, data is processed in batches rather than row by row, enhancing the efficiency of data cache (DCache) and instruction cache (ICache) in the CPU while reducing cache misses. + +### Reduced impact of branch mispredictions on the CPU pipeline + +The paper [DBMSs On A Modern Processor: Where Does Time Go?](http://www.cs.cmu.edu/afs/cs/academic/class/15740-f03/public/doc/discussions/uniprocessors/databases/wisc_vldb99.pdf) discusses the impact of branch mispredictions on database performance. Branch mispredictions have a serious impact on the database performance because the CPU halts the execution of an instruction stream and refreshes the pipeline upon a misprediction. The paper [Micro Adaptivity in Vectorwise](https://15721.courses.cs.cmu.edu/spring2018/papers/03-compilation/p1231-raducanu.pdf) released on the 2013 ACM SIGMOD Conference on Management of Data (SIGMOD'13) also elaborates on the execution efficiency of branching at different levels of selectivity. A figure is provided below for your information. + +![4](https://gw.alipayobjects.com/zos/oceanbase/e4bdecb0-8536-4713-8b30-8d9218007eb2/image/2022-09-28/5ca51137-a512-4cb7-9244-502b4b6aadb7.png) + +The logic of the SQL engine of a database is complicated. Therefore, conditionals appear frequently in the Volcano model. + +```c++ + // The following pseudocode outlines the single-row calculation process, where the IF statement is executed 256 times to process 256 rows of data: + for (auto row_no : 256) { + get_next_row() { + if (A) { + eval_func_A(); + } else if (B) { + eval_func_B(); + } + } + } +``` + +In vectorized execution, conditionals are minimized within operators and expressions. For example, no IF statement is within any FOR loops, thus protecting the CPU pipeline from branch mispredictions and greatly improving CPU capabilities. + +```c++ + // The following pseudocode outlines the vectorized calculation process, where the IF statement is executed only once to process 256 rows of data: + get_next_batch() { + if (A) { + for (auto row_no : 256) { + eval_func_A(); + } + } else if (B) { + for (auto row_no : 256) { + eval_func_B(); + } + } + } +``` + +### Accelerated computation through SIMD instructions + +The vectorized engine handles contiguous data in the memory, and hence can easily load a batch of data into a vector register. It then sends a single instruction, multiple data (SIMD) instruction to perform vector computation instead of using the traditional scalar algorithm. The SIMD instruction enables the CPU to perform the same computation on the batch of data in parallel, reducing the number of CPU cycles required for processing the data. + +![5](/img/blogs/tech/row-to-vector/image/5.png) + +The right side of the preceding figure shows a typical SIMD computation, where two sets of four contiguous data elements are processed in parallel. The CPU simultaneously performs the same operation on each pair of data elements (A1 and B1, A2 and B2, A3 and B3, and A4 and B4) based on the SIMD instruction. The results of the four parallel operations are also stored contiguously. + +If a processor supports 4-element SIMD multiplication, it has vector registers that can simultaneously store four integers. As OceanBase Database stores data contiguously during vectorized execution, SIMD code can be written as follows: + +* **Load data (\_mm\_loadu\_si128):** First, load the vector with the A1, A2, A3, and A4 elements and the vector with the B1, B2, B3, and B4 elements into two SIMD registers. +* **Perform SIMD multiplication (\_mm\_mullo\_epi32):** Next, use the SIMD multiplication instruction to simultaneously multiply all elements in both registers. +* **Store data (\_mm\_storeu\_si128):** Last, store the results from the SIMD registers in the allocated memory to form the result vector. + +```c++ + // The sample C++ pseudocode based on Streaming SIMD Extensions (SSE) for x86 performs element-wise multiplication of integer vectors by using the SIMD technology. + #include // Include the SSE header file. + + // Use the function to perform element-wise multiplication of two integer vectors. + void simdIntVectorMultiply(const int* vec1, const int* vec2, int* result, size_t length) { + // As SSE registers process four 32-bit integers at a time, make sure that the vector length is a multiple of four. + assert(length % 4 == 0); + + // Execute the loop that uses SSE instructions for optimization. + for (size_t i = 0; i < length; i += 4) { + + // Load four integers into the 128-bit XMM register. + __m128i vec1_simd = _mm_loadu_si128(reinterpret_cast(vec1 + i)); + __m128i vec2_simd = _mm_loadu_si128(reinterpret_cast(vec2 + i)); + + // Perform vector multiplication. + __m128i product_simd = _mm_mullo_epi32(vec1_simd, vec2_simd); + + // Store the results in the memory. + _mm_storeu_si128(reinterpret_cast<__m128i*>(result + i), product_simd); + } + } +``` + +TPC-H Performance Test +========= + +In the TPC-H test based on the TPC-H 30 TB dataset on OceanBase Database, vectorized execution outperforms single-row execution by 2.48 times. For compute-intensive Q1 queries, performance is improved by over 10 times. + +![1](/img/blogs/tech/row-to-vector/image/image.png) + +In OceanBase Database V4.3, the OceanBase team has optimized and restructured the vectorized execution engine, which has been supported since OceanBase Database V3.x. + +Summary +== + +This article was inspired by the question about the meaning of `rowset=16` in a plan, which was raised during the seventh live streaming of [OceanBase DBA: From Basics to Practices](https://youtube.com/live/3iwhQ4lAqgg). After answering the question, this article also briefly introduces the vectorized execution technology of OceanBase Database. + +I hope both database administrators (DBAs) and kernel developers find this helpful. For any questions, feel free to leave a comment. \ No newline at end of file diff --git a/docs/blogs/tech/ticket-olap.md b/docs/blogs/tech/ticket-olap.md new file mode 100644 index 000000000..3b97d1671 --- /dev/null +++ b/docs/blogs/tech/ticket-olap.md @@ -0,0 +1,590 @@ +--- +slug: ticket-olap +title: 'Columnar Storage Engine: Your Ticket to OLAP' +--- + +# Columnar Storage Engine: Your Ticket to OLAP + +> Recently, the OceanBase team has rolled out OceanBase Database V4.3.0. According to the official website, "This update leverages the log-structured merge-tree (LSM-tree) architecture of OceanBase Database to combine row-based and columnar storage, introducing a new vectorized engine and a cost evaluation model based on columnar storage. These enhancements significantly boost the efficiency of processing wide tables, improve query performance in AP scenarios, and cater to TP business requirements." +> This article kicks off a series on AP performance. It covers basic tests of the columnar storage feature based on the official description and extracts practical usage tips from the test results. + +## Background + +Let's begin with the concepts of "baseline data" and "incremental data" in the storage architecture of OceanBase Database. + +The storage architecture of OceanBase Database is as follows. + +![1](/img/blogs/tech/ticket-olap/image/1.png) + +The storage engine of OceanBase Database, built on the LSM-tree structure, divides data into static baseline data and dynamic incremental data. + +Data manipulated by DML operations, such as `INSERT`, `UPDATE`, and `DELETE`, is first written into MemTables. After reaching the specified size, the MemTables are compacted into SSTables on the disk. **Data in the MemTables and SSTables is referred to as incremental data**. When the incremental data reaches a certain size, it is compacted with the baseline data of the old version to form the baseline data of a new version, namely, the baseline SSTables of a new version. In addition, the system performs a daily compaction during idle hours every night. + +When receiving a user query, OceanBase Database queries both the incremental data and the baseline data, merges the query results, and then returns the results to the SQL layer. OceanBase Database implements both block cache and row cache in the memory to avoid random read of the baseline data. + +## Overall Columnar Storage Architecture + +Columnar storage greatly boosts the performance of AP queries and is key to the excellence of OceanBase Database in hybrid transactional/analytical processing (HTAP). Data used for AP is typically static and rarely updated in place. The baseline data in the LSM-tree architecture of OceanBase Database is also static, making it ideal for implementing columnar storage. Incremental data is dynamic. Even in columnstore tables, the incremental data and synchronized logs at the storage layer remain rowstore, which avoids impacts on TP, log synchronization, and backup and restore. This enables OceanBase Database to balance the performance of both TP and AP queries. + +When creating tables in OceanBase Database V4.3, you can select among the rowstore, columnstore, and hybrid rowstore-columnstore formats. Whatever format you choose, the incremental data in the tables remains rowstore. Therefore, the DML operations, transactions, and upstream-downstream data synchronization of columnstore tables are not affected. + +The key difference between columnstore and rowstore tables at the storage layer is the format of the baseline data. Based on the storage format that you specify when creating a table, the baseline data is stored by row, column, or both row and column (with redundancy). + +**In rowstore mode**, the baseline data is stored by row, as shown in the following figure. + +![](/img/blogs/tech/ticket-olap/image/2.png) + +**In columnstore mode**, each column of the baseline data is stored as an independent baseline SSTable, as shown in the following figure. + +![](/img/blogs/tech/ticket-olap/image/3.png) + +**In hybrid rowstore-columnstore mode**, the baseline data is stored as both columnstore SSTables and rowstore SSTables, as shown in the following figure. + +![](/img/blogs/tech/ticket-olap/image/4.png) + +In this mode, the optimizer automatically chooses whether to scan columnstore or rowstore SSTables based on access costs. + +Take creating a hybrid rowstore-columnstore table named t\_column\_row as an example. In the `CREATE TABLE` statement, `with column group (all columns, each column)` specifies the hybrid rowstore-columnstore mode for the table, where `each column` represents columnstore and `all columns` represents rowstore. + +```sql + create table tt_column_row( + c1 int primary key, c2 int , c3 int) + with column group (all columns, each column); +``` + +If we query all the data in a column without specifying any filter conditions, the following execution plan will be generated. The `COLUMN TABLE FULL SCAN` operator in the execution plan indicates that the optimizer has chosen to scan the columnstore baseline data based on the cost model. Compared to scanning rowstore data, scanning columnstore data reduces the additional I/O overhead at the storage layer for the c2 and c3 columns. + +```sql + explain select c1 from t_column_row; + +-----------------------------------------------------------------+ + | Query Plan | + +-----------------------------------------------------------------+ + | ============================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | -------------------------------------------------------------- | + | |0 |COLUMN TABLE FULL SCAN|t_column_row|1 |3 | | + | ============================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([t_column_row.c1]), filter(nil), rowset=16 | + | access([t_column_row.c1]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([t_column_row.c1]), range(MIN ; MAX)always true | + +-----------------------------------------------------------------+ +``` + +If we query all the data in a table without specifying any filter conditions, the following execution plan will be generated. The `TABLE FULL SCAN` operator in the execution plan indicates that the optimizer has chosen to scan the rowstore baseline data. When both incremental data and baseline data are rowstore, merging them is faster. In this case, the optimizer produces an execution plan that scans rowstore data. + +```sql + explain select * from t_column_row; + +-----------------------------------------------------------------------------------------------+ + | Query Plan | + +-----------------------------------------------------------------------------------------------+ + | ======================================================= | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | ------------------------------------------------------- | + | |0 |TABLE FULL SCAN|t_column_row|1 |3 | | + | ======================================================= | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([t_column_row.c1], [t_column_row.c2], [t_column_row.c3]), filter(nil), rowset=16 | + | access([t_column_row.c1], [t_column_row.c2], [t_column_row.c3]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([t_column_row.c1]), range(MIN ; MAX)always true | + +-----------------------------------------------------------------------------------------------+ +``` + +## Basic Performance Testing on Columnar Storage + +We compared the compression ratios between columnar storage and row-based storage and tested the query performance of columnar storage based on the TPC-H 100 GB test set. The OceanBase Database version we used is Community Edition V4.3.0.1. + +### Compression Ratio Test + +We first tested the compression ratio of columnstore tables in OceanBase Database V4.3.0 and compared it with that of rowstore tables. + +We imported test set data respectively into a set of pure rowstore tables and a set of pure columnstore tables, and chose lineitem, the largest table, to calculate the storage overhead. The imported lineitem.tbl data is about 76 GB in size, occupying a storage space of 22.5 GB as a rowstore table and a storage space of 15 GB as a columnstore table. + +```sql + -- Definition of the columnstore table lineitem + CREATE TABLE lineitem ( + l_orderkey BIGINT NOT NULL, + l_partkey BIGINT NOT NULL, + l_suppkey INTEGER NOT NULL, + l_linenumber INTEGER NOT NULL, + l_quantity DECIMAL(15,2) NOT NULL, + l_extendedprice DECIMAL(15,2) NOT NULL, + l_discount DECIMAL(15,2) NOT NULL, + l_tax DECIMAL(15,2) NOT NULL, + l_returnflag char(1) DEFAULT NULL, + l_linestatus char(1) DEFAULT NULL, + l_shipdate date NOT NULL, + l_commitdate date DEFAULT NULL, + l_receiptdate date DEFAULT NULL, + l_shipinstruct char(25) DEFAULT NULL, + l_shipmode char(10) DEFAULT NULL, + l_comment varchar(44) DEFAULT NULL, + PRIMARY KEY(l_orderkey, l_linenumber)) + row_format = condensed + partition by key (l_orderkey) partitions 4 + with column group(each column); +``` + +**For the lineitem table, the storage space occupied in columnstore mode is about two-thirds of that in rowstore mode.** The reason is simple. Compared to rowstore tables, columnstore tables store data of the same type in each column, allowing for more efficient compression. + +![2](/img/blogs/tech/ticket-olap/image/101.png) + +Why is the compression ratio of columnstore tables not as high as expected compared to rowstore tables? This is because OceanBase Database already excels in compressing rowstore tables. However, even though we have optimized compression for rowstore tables, compressing columnstore tables is slightly more effective than compressing rowstore tables. The larger the number of columns in a columnstore table, the more noticeable the compression effect for the table. + +### Query Performance Test + +We conducted all subsequent tests on three machines, each with 6 CPU cores and 35 GB of memory, to compare the query performance between rowstore and columnstore tables. + +![3](/img/blogs/tech/ticket-olap/image/5.png) + +We created one rowstore table named lineitem\_row and one columnstore table named lineitem\_column and imported the TPC-H 100 GB test set to both tables. + +### Point queries with a primary key + +We tested the performance of the point queries with a primary key on the columnstore and rowstore tables by executing the following SQL statements: + +```sql + -- Columnstore table + select * from lineitem_column where l_orderkey = 7 and l_linenumber = 1; + 1 row in set (0.035 sec) + + select * from lineitem_column where l_orderkey = 7; + 7 rows in set (0.036 sec) + + -- Rowstore table + select * from lineitem_row where l_orderkey = 7 and l_linenumber = 1; + 1 row in set (0.044 sec) + + select * from lineitem_row where l_orderkey = 7; + 7 rows in set (0.044 sec) +``` + +The execution duration of the point queries with a primary key on the columnstore table and that on the rowstore table were both about 0.04 seconds, demonstrating similar performance. For brevity, only the execution duration of the SQL statements was displayed. + +### Full table scans without any index + +We respectively executed the following SQL statements without specifying a primary key or index on the columnstore and rowstore tables: +```sql + -- Columnstore table + select * from lineitem_column where l_extendedprice = 13059.24; + 102 rows in set (0.467 sec) + + -- Rowstore table + select * from lineitem_row where l_extendedprice = 13059.24; + 102 rows in set (2.306 sec) +``` + +Each statement returned 102 rows. The execution duration for the SQL statement on the rowstore table was about 2.31 seconds, and that on the columnstore table was about 0.47 seconds. For lineitem, a wide table with a dozen columns, the performance of the columnstore table was five times that of the rowstore table. + +If you do not use a primary key or index as a filter condition, scanning a columnstore table is more time-consuming than scanning a rowstore table because the results of each column must be combined for the columnstore table. However, a columnstore table incurs much less I/O overhead than a rowstore table because the rowstore table requires a full scan. + +If you use a single column as the filter condition without specifying a primary key or index, the more columns a columnstore table has, the better the query performance of the columnstore table than that of a rowstore table. In this example, to increase the performance of the rowstore table by reducing the I/O overhead, you must create an index on the l\_extendedprice column. In certain scenarios, columnstore tables reduce the overhead of creating and maintaining indexes compared to rowstore tables. + +Without specifying a primary key or index, we made the filter condition more complex by including the calculation of multiple columns in it. +```sql + -- Columnstore table + select * from lineitem_column where l_partkey + l_suppkey = 20999999; + 7 rows in set (5.091 sec) + + -- Rowstore table + select * from lineitem_row where l_partkey + l_suppkey = 20999999; + 7 rows in set (6.254 sec) +``` + +The columnstore table still outperformed the rowstore table, but only slightly. + +We continued to include the calculation of more columns in the filter condition. +```sql + -- Columnstore table + select * from lineitem_column where l_partkey + l_suppkey + + l_extendedprice + l_discount + l_tax = 19173494.34; + 1 row in set (15.675 sec) + + -- Rowstore table + select * from lineitem_row where l_partkey + l_suppkey + + l_extendedprice + l_discount + l_tax = 19173494.34; + 1 row in set (15.837 sec) +``` + +We can observe a pattern: As the number of columns involved in the filter condition increases, the performance of a rowstore table gradually approaches that of a columnstore table. + +In simple terms, in a columnstore table, the corresponding rows from different columns must be combined based on the primary key value before any calculations between columns can be performed. When the additional overhead incurred by the combination and calculation operations approaches the additional column I/O overhead of a rowstore table, the difference in performance between the columnstore and rowstore tables diminishes. + +### Aggregations + +In terms of simple aggregations, the columnstore table outperformed the rowstore table. As shown in the following figure, the columnstore table lineitem\_column is on the left and the rowstore table lineitem\_row is on the right. + +![4](/img/blogs/tech/ticket-olap/image/102.png) + +In terms of complex aggregations such as max(l\_partkey + l\_suppkey), the performance of the columnstore and rowstore tables is as follows: +```sql + -- Columnstore table + select max(l_partkey + l_suppkey) from lineitem_column; + +----------------------------+ + | max(l_partkey + l_suppkey) | + +----------------------------+ + | 20999999 | + +----------------------------+ + 1 row in set (19.302 sec) + + -- Rowstore table + select max(l_partkey + l_suppkey) from lineitem_row; + +----------------------------+ + | max(l_partkey + l_suppkey) | + +----------------------------+ + | 20999999 | + +----------------------------+ + 1 row in set (4.833 sec) +``` + +In tests involving aggregations with expressions, the performance of the columnstore stable was inferior to that of the rowstore table. One reason, as mentioned earlier, is that in a columnstore table, the rows corresponding to the l\_partkey and l\_suppkey columns must be combined before any addition operation can be performed. The combination and calculation operations between columns incur additional overhead for the columnstore table. Another reason is that V4.3.0.1 focuses on optimizing vectorized execution for columnstore expression filtering, while optimizations on columnstore expression aggregations are planned for future versions. + +We can conclude that **multi-column expression calculations in aggregate functions are not a strength of the current columnar storage version**. However, as a wider columnstore table saves more I/O overhead compared to a rowstore table than the lineitem table in the same scenario, the test results may differ if we use a table with hundreds or thousands of columns. + +### Impact of updating columnstore data at different percentages on query performance + +A columnstore table uses different formats for baseline and incremental data. In scenarios involving substantial incremental data, querying a columnstore table requires format conversion and integration for column and row data during the merge of incremental and baseline data, and thus inevitably incurs more overhead than querying a rowstore table. + +![5](/img/blogs/tech/ticket-olap/image/6.png) + +We tested updates to continuous columnstore data at different percentages. As the l\_orderkey column was evenly distributed, we controlled the percentages by using different l\_orderkey ranges as follows: + +```sql + -- l_orderkey is evenly distributed from 1 to 600,000,000. + -- Update 1% of the data by using the condition "where l_orderkey <= 6000000." + -- As l_orderkey is the primary key column, the data to update is continuous. + update lineitem_column set + l_partkey = l_partkey + 1, + l_suppkey = l_suppkey - 1, + l_quantity = l_quantity + 1, + l_extendedprice = l_extendedprice + 1, + l_discount = l_discount + 0.01, + l_tax = l_tax + 0.01, + l_returnflag = lower(l_returnflag), + l_linestatus = lower(l_linestatus), + l_shipdate = date_add(l_shipdate, interval 1 day), + l_commitdate = date_add(l_commitdate, interval 1 day), + l_receiptdate = date_add(l_receiptdate, interval 1 day), + l_shipinstruct = lower(l_shipinstruct), + l_shipmode = lower(l_shipmode), + l_comment = upper(l_comment) + where l_orderkey <= 6000000; + Query OK, 6001215 rows affected (4 min 2.397 sec) + Rows matched: 6001215 Changed: 6001215 Warnings: 0 + + -- Execute queries with a primary key multiple times. + select * from lineitem_column where l_orderkey = 7; + select * from lineitem_column where l_orderkey = 600000000; + (0.036 sec) + + -- Execute queries without a primary key multiple times. + select * from lineitem_column where l_suppkey = 825656; + (31.722 sec) +``` + +The following table displays the execution duration of queries with and without a primary key after we updated different percentages of data of the columnstore table. + +| Update percentage (%) | Execution duration for the query with a primary key (s) | Execution duration for the query without a primary key (s) | +| ---------- | ----------- | ----------- | +| 0 | 0.03 | 0.5 | +| 1 | 0.03 | 32 | +| 2 | 0.03 | 54 | +| 3 | 0.03 | 80 | +| 5 | 0.03 | 126 | +| 10 | 0.03 | 245 | +| 20 | 0.03 | 495 | +| 30 | 0.03 | 733 | +| 40 | 0.03 | 1075 | +| 50 | 0.04 | 1453 | +| 60 | 0.04 | 1636 | +| 70 | 0.04 | 1916 | +| 80 | 0.04 | 2195 | +| 90 | 0.04 | 2468 | +| 100 | 0.04 | 2793 | + +Based on the preceding table, we created the following line chart. The x-axis shows the percentages of incremental data, and the y-axis represents the execution duration of queries without a primary key or index. In the columnar storage scenario, updating different percentages of incremental data without compacting data during the tests resulted in a query performance curve that was almost straight. + +![6](/img/blogs/tech/ticket-olap/image/7.png) + +It is important to note that all the preceding tests updated continuous data. + +If you randomly update a certain percentage of discontinuous data, the performance will deteriorate compared to updating continuous data. In OceanBase Database, the smallest I/O unit for reading data files is a variable-length data block of 16 KB, which we refer to as a microblock. If the data is discontinuous, even a small update to the table can cause changes to a large number of microblocks. Therefore, modifying 10% of the table data may impact 100% of the table microblocks. In this case, the query performance does not differ much from the performance of modifying 100% of the table data. + +From the preceding tests, we can draw the following conclusion: **If you perform a large number of update operations on a columnstore table without performing a major compaction in a timely manner, the query performance will be compromised. Therefore, we recommend that you initiate a major compaction after batch data import to achieve optimal query performance.** + +## Scenarios of Columnar Storage + +Based on the preceding test results, we can conclude that columnstore tables in OceanBase Database are suited to the following two types of scenarios: + +* Wide table scenarios + +* When a query scans only a single column or a few columns of a wide table, a columnstore table can significantly reduce disk I/O overhead. For a wide rowstore table, an index needs to be created on specific columns so that the query can scan the index rather than the primary table with more columns. Compared to a rowstore table, a columnstore table eliminates the overhead of creating, storing, and maintaining indexes for specific columns. + +* Read-intensive AP data warehouse scenarios + +* In data warehouse scenarios, complex analytical queries are frequently executed but often involve only specific columns. Storing data by column, columnstore tables can efficiently support such AP queries and reduce unnecessary I/O overhead. +* To support frequent small transaction writes in columnstore tables and avoid significant impact of data updates on the performance of columnstore tables, OceanBase Database stores the incremental data of columnstore tables in the rowstore format. In other words, a columnstore table uses different formats for baseline and incremental data. In scenarios involving substantial incremental data, querying a columnstore table requires format conversion and integration for column and row data during the merge of incremental and baseline data, and thus incurs more overhead than querying a rowstore table. The time required for scanning column data increases proportionally to the amount of incremental data, making columnstore tables more suitable for read-intensive scenarios. + +## Basic Syntax of Columnar Storage + +This section introduces the columnar storage syntax, which is also well-documented on the official website of OceanBase Database. + +### Set the Relevant Tenant-level Parameter + +For OLAP scenarios, we recommend that you set the parameter to `row` as follows so that rowstore tables are created by default: +```sql + -- Modify the parameter, which takes effect for the current tenant. + alter system set default_table_store_format = "column"; // Columnstore tables are created by default. + alter system set default_table_store_format = "row"; // Rowstore tables are created by default. + alter system set default_table_store_format = "compound"; // Hybrid rowstore-columnstore tables are created by default. + + -- View the value of the parameter. The default value is row. + show parameters like 'default_table_store_format'; +``` + +### Create a Columnstore Table + +The new syntax for creating a columnstore table is `with column group`. If you specify `with column group (each column)` at the end of a `CREATE TABLE` statement, a columnstore table will be created. +```sql + -- Create a columnstore table. + create table t1 (c1 int, c2 int) with column group (each column); + + -- Create a partitioned columnstore table. + create table t2( + pk int, + c1 int, + c2 int, + primary key (pk) + ) partition by hash(pk) partitions 4 + with column group (each column); +``` + +If you want to balance between AP business and TP business and can accept a specific degree of data redundancy, you can add `all columns` in the `with column group` syntax to enable rowstore redundancy. + +```sql + -- Create a hybrid rowstore-columnstore table. + create table t2 (c1 int, c2 int) with column group(all columns, each column); +``` + +The options in the `with column group` syntax are described as follows: + +* all columns: groups all columns together as a single wide column to store data by row. +* each column: stores data by column. +* all columns, each column: stores data both by row and by column, with each replica storing two sets of baseline data. + +### Create a Columnstore Index + +You can also use the `with column group` syntax to specify the attribute of an index the same way you create a columnstore table. Note that creating an index for a columnstore table differs from creating a columnstore index. For a columnstore index, the index table is in the columnstore format. Compared to rowstore indexes, columnstore indexes reduce the I/O overhead at the storage layer. +```sql + -- Create a columnstore index on the c1 and c2 columns of the t1 table. + create index idx1 on t1(c1, c2) with column group(each column); + + -- Create a hybrid rowstore-columnstore index on the c1 column of the t1 table. + create index idx2 on t1(c1) with column group(all columns, each column); + + -- Create a columnstore index on the c2 column of the t1 table and store the data of the non-indexed c1 column in the index. + alter table t1 add index idx3 (c2) storing(c1) with column group(each column); +``` + +In the preceding example, the purpose of specifying `storing(c1)` to store an additional column in the index is to optimize the performance of specific queries. This avoids retrieving c1 values from the primary table and reduces the cost of indexing and sorting the c1 column. As the c1 column is redundantly stored in the idx3 index but not indexed, only the c2 column needs to be sorted. If the c1 column is indexed, both the c1 and c2 columns need to be sorted. +```sql + explain select c1 from t1 order by c2; + +------------------------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------------------------+ + | ========================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | ---------------------------------------------------------- | + | |0 |COLUMN TABLE FULL SCAN|t1(idx3)|1 |5 | | + | ========================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([t1.c1]), filter(nil), rowset=16 | + | access([t1.c1]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([t1.c2], [t1.__pk_increment]), range(MIN,MIN ; MAX,MAX)always true | + +------------------------------------------------------------------------------------+ +``` + +In the preceding SQL execution plan, no SORT operator is assigned because the idx3 index eliminates the need to sort the c2 column. As the non-indexed c1 column is redundantly stored in the index, table access by index primary key (is\_index\_back=false) is not required. + +### Conversion between Rowstore and Columnstore Tables + +The syntax for conversions between storage formats is complex. + +Convert a table from the rowstore format to the columnstore format: +```sql + create table t1(c1 int, c2 int); + + -- This syntax is somewhat confusing because the add keyword gives the impression that it converts a table from the rowstore format to the hybrid rowstore-columnstore format. + alter table t1 add column group(each column); +``` + +Convert a table from the columnstore format to the rowstore format: +```sql + alter table t1 drop column group(each column); +``` + +Convert a table from the rowstore format to the hybrid rowstore-columnstore format: +```sql + create table t1(c1 int, c2 int); + + alter table t1 add column group(all columns, each column); +``` + +Convert a table from the hybrid rowstore-columnstore format to the rowstore format: +```sql + alter table t1 drop column group(all columns, each column); +``` + +> **Note**: After `drop column group(all columns, each column);` is executed, all columns will be put in the default group named `DEFAULT COLUMN GROUP` for storing data. The storage format of `DEFAULT COLUMN GROUP` is determined by the value of the tenant-level parameter `default_table_store_format`, which defaults to `row`. If you do not modify the default value, the t1 table is converted into a rowstore table after the statement is executed. + +Convert a table from the columnstore format to the hybrid rowstore-columnstore format: +```sql + create table t1(c1 int, c2 int) with column group(each column); + + alter table t1 add column group(all columns); +``` + +Convert a table from the hybrid rowstore-columnstore format to the columnstore format: +```sql + alter table t1 drop column group(all columns); +``` + +### Hints Related to Columnar Storage + +For a hybrid rowstore-columnstore table, the optimizer determines whether to perform a rowstore or columnstore scan based on costs. You can also forcibly perform a columnstore scan by specifying the USE\_COLUMN\_TABLE hint or forcibly perform a rowstore scan by specifying the NO\_USE\_COLUMN\_TABLE hint. +```sql + explain select /*+ USE_COLUMN_TABLE(tt_column_row) */ * from tt_column_row; + +--------------------------------------------------------------------------------------------------+ + | Query Plan | + +--------------------------------------------------------------------------------------------------+ + | =============================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | --------------------------------------------------------------- | + | |0 |COLUMN TABLE FULL SCAN|tt_column_row|1 |7 | | + | =============================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([tt_column_row.c1], [tt_column_row.c2], [tt_column_row.c3]), filter(nil), rowset=16 | + | access([tt_column_row.c1], [tt_column_row.c2], [tt_column_row.c3]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([tt_column_row.c1]), range(MIN ; MAX)always true | + +--------------------------------------------------------------------------------------------------+ + + explain select /*+ NO_USE_COLUMN_TABLE(tt_column_row) */ c2 from tt_column_row; + +------------------------------------------------------------------+ + | Query Plan | + +------------------------------------------------------------------+ + | ======================================================== | + | |ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)| | + | -------------------------------------------------------- | + | |0 |TABLE FULL SCAN|tt_column_row|1 |3 | | + | ======================================================== | + | Outputs & filters: | + | ------------------------------------- | + | 0 - output([tt_column_row.c2]), filter(nil), rowset=16 | + | access([tt_column_row.c2]), partitions(p0) | + | is_index_back=false, is_global_index=false, | + | range_key([tt_column_row.c1]), range(MIN ; MAX)always true | + +------------------------------------------------------------------+ +``` + +To check whether a columnstore scan is performed in an execution plan, view the output of the `explain` command. If `TABLE FULL SCAN` is displayed, a rowstore scan has been performed. If `COLUMN TABLE FULL SCAN` is displayed, a columnstore scan has been performed. + +## Suggestions on Using Columnar Storage + +After testing the columnar storage feature of OceanBase Database and learning about its basic syntax, we have several suggestions for using the feature. + +1. For a newly created cluster of OceanBase Database V4.3.0 or later used in OLAP data warehouse scenarios, we recommend that you change the tenant-level parameter `default\_table\_store\_format` from its default value `row` to `column`. +2. For a cluster upgraded from an earlier version to OceanBase Database V4.3.0 or later, you can use the new columnar storage feature to optimize old rowstore tables in one of the following ways: + +* Create a columnstore index. + +* Advantage: As creating a columnstore index is an online DDL operation, you can create a columnstore index on some columns of a wide table without affecting business. +* Disadvantage: Incremental data is written to both the original and index tables, which increases memory and disk usage. + +* Use the `ALTER TABLE` statement to change the storage format of the original table. + +* Advantage: As incremental data is in the rowstore format, it is written only to the original table. +* Disadvantage: Changing the storage format is an offline DDL operation, during which the table is locked and cannot be updated. + +3. Hybrid rowstore-columnstore tables are suited only to HTAP scenarios. The optimizer determines, based on estimated costs, whether to scan a hybrid rowstore-columnstore table by row or by column. In AP scenarios, we recommend that you use columnstore tables. +4. If you perform a large number of update operations on a columnstore table without performing a major compaction in a timely manner, the query performance will be compromised. Therefore, we recommend that you initiate a major compaction after batch data import to achieve optimal query performance. The major compaction speed of columnstore tables is lower than that of rowstore tables. To initiate a major compaction, execute `alter system major freeze;` in the current tenant. To check whether a major compaction is completed, execute `select STATUS from CDB_OB_MAJOR_COMPACTION where TENANT_ID = Tenant ID;` in the sys tenant. If the `STATUS` value becomes `IDLE`, the major compaction is completed. You can also complete a major compaction by using OceanBase Cloud Platform (OCP). +5. We recommend that you collect statistics once after a major compaction. You can [collect statistics](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001107473) in the following way: + +* Execute the following command to start 16 threads to concurrently collect all table statistics of a tenant: + +```sql + CALL DBMS_STATS.GATHER_SCHEMA_STATS ('db', granularity=>'auto', degree=>16); +``` + +* Check the `GV$OB_OPT_STAT_GATHER_MONITOR` view to observe the collection progress. + +6\. You can use direct load to batch import data to a table. This allows you to achieve the optimal columnstore scan performance of the table without performing a major compaction. The obloader tool and the native `load data` command support full direct load. + +7\. For large tables, hot runs outperform cold runs in most cases. + +8\. In scenarios involving no wide tables, you can achieve comparable performance even if you do not use columnar storage. This is because the row-based storage versions of OceanBase Database adopt a hybrid row-column storage architecture at the microblock level. + +9\. Here are some practical suggestions to further increase the performance of columnstore tables in AP scenarios. + +* If appropriate, use binary instead of utf8mb4 as the character set when creating a table. Here is a sample statement: + +```sql + create table t1(c1 int, c2 int) CHARSET=binary with column group (each column); +``` + +* If the character set must be utf8mb4 or if appropriate, use the utf8mb4\_bin collation when creating a MySQL tenant by specifying, for example, `locality = 'F@z1', collate = utf8mb4_bin`. Alternatively, specify utf8mb4\_bin as the character set when creating a table by adding `CHARSET = utf8mb4 collate=utf8mb4_bin` to the `CREATE TABLE` statement. +* Recommended columnar storage configurations for PoC testing: + + ```sql + -- Use the utf8mb4_bin collation. + set global collation_connection = utf8mb4_bin; + set global collation_server = utf8mb4_bin; + + set global ob_query_timeout=10000000000; + set global ob_trx_timeout=100000000000; + alter system set_tp tp_no = 2100, error_code = 4001, frequency = 1; + alter system set _trace_control_info='' + alter system set _rowsets_enabled=true; + alter system set _bloom_filter_enabled=1; + alter system set _px_message_compression=1; + set global _nlj_batching_enabled=true; + set global ob_sql_work_area_percentage=70; + set global max_allowed_packet=67108864; + set global parallel_servers_target=1000; -- We recommend that you set the value of this parameter to 10 times the number of CPU cores. + set global parallel_degree_policy = auto; + set global parallel_min_scan_time_threshold = 10; + set global parallel_degree_limit = 0; + + alter system set _pushdown_storage_level = 4; + alter system set _enable_skip_index=true; + alter system set _enable_column_store=true; + alter system set compaction_low_thread_score = cpu_count; + alter system set compaction_mid_thread_score = cpu_count; + ``` + +## Vision for the Future + +OceanBase Database V4.3.x will support columnstore replicas to reduce storage overhead from hybrid rowstore-columnstore tables in HTAP scenarios. + +As shown in the following figure, read-only columnstore replicas can be deployed in a separate zone. This deployment mode ensures physical resource isolation between TP and AP workloads and enables independent major compactions between columnstore and rowstore tables, making it ideal for HTAP scenarios with highly concurrent reads and writes. + +![7](/img/blogs/tech/ticket-olap/image/8.png) + +For most users, learning about the preceding content is all that is needed to effectively use the columnar storage feature of OceanBase Database. + +I had planned to cover the complex technical principles behind the columnar storage feature. Given the fact that I worked on only one optimization called decimal int, I had to abandon this unrealistic idea. + +Special thanks to my colleagues Xiaochu and Hanhui. Without their help, I would not have completed this first AP article on columnar storage. + +## References + +* [Columnar storage](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001230786) +* [Create an index](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001230704) +* [Modify a table](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001230726) \ No newline at end of file diff --git a/docs/blogs/users/Beike-Dict-service.md b/docs/blogs/users/Beike-Dict-service.md index 2c614c418..7552324b6 100644 --- a/docs/blogs/users/Beike-Dict-service.md +++ b/docs/blogs/users/Beike-Dict-service.md @@ -138,31 +138,11 @@ The following table shows the test results. First, let's compare the batch read throughput (unit: row/s). - - -Stress - -HBase - -OceanBase Database - -Level I - -83109.45 - -158579.1 - -Level II - -84355.54 - -264192.8 - -Level III - -76857.87 - -329107.3 +| Stress | HBase | OceanBase Database | +| ---------- | ----------- | ----------- | +| Level I | 83109.45 | 158579.1 | +| Level II | 84355.54 | 264192.8 | +| Level III | 76857.87 | 329107.3 | ![1686714194](/img/blogs/users/Beike-Dict-service/image/1686714194863.png) @@ -170,31 +150,11 @@ As mentioned above, the batch size was set to 100 for HBase and 500 for OceanBas **Now, let's compare the batch write throughput (unit: row/s)**. - - -Stress - -HBase - -OceanBase Database - -Level I - -43256.6 - -249612.5 - -Level II - -64339.58 - -326436.7 - -Level III - -77805.46 - -358716.2 +| Stress | HBase | OceanBase Database | +| ---------- | ----------- | ----------- | +| Level I | 43256.6 | 249612.5 | +| Level II | 64339.58 | 326436.7 | +| Level III | 77805.46 | 358716.2 | ![1686714222](/img/blogs/users/Beike-Dict-service/image/1686714221976.png) @@ -204,31 +164,11 @@ To ensure the uniqueness of keys, HBase uses the checkAndPut method to write one **Now, let's look at the average time, in milliseconds, that each database system took to finish a complete processing cycle**. - - -Stress - -HBase - -OceanBase Database - -Level I - -657.52 - -307.45 - -Level II - -1000.85 - -386.42 - -Level III - -1279.63 - -474.59 +| Stress | HBase | OceanBase Database | +| ---------- | ----------- | ----------- | +| Level I | 657.52 | 307.45 | +| Level II | 1000.85 | 386.42 | +| Level III | 1279.63 | 474.59 | ![1686714234](/img/blogs/users/Beike-Dict-service/image/1686714234502.png) @@ -244,31 +184,11 @@ The comparison indicates that: **At last, let's compare the average throughput (unit: row/s)**. - - -Stress - -HBase - -OceanBase Database - -Level I - -25033.94 - -57429.03 - -Level II - -33161.58 - -91582.48 - -Level III - -35500.47 - -112002.3 +| Stress | HBase | OceanBase Database | +| ---------- | ----------- | ----------- | +| Level I | 25033.94 | 57429.03 | +| Level II | 33161.58 | 91582.48 | +| Level III | 35500.47 | 112002.3 | ![1686714264](/img/blogs/users/Beike-Dict-service/image/1686714264396.png) diff --git a/docs/blogs/users/E-mind.md b/docs/blogs/users/E-mind.md index ef3341365..5bcacbac9 100644 --- a/docs/blogs/users/E-mind.md +++ b/docs/blogs/users/E-mind.md @@ -71,37 +71,15 @@ Running through its documentation, we had no problem with its high availability We deployed OceanBase Database in a test environment to test its functionality and compatibility. With the help of its official documentation, we deployed the test environment on a local virtual machine of average specifications by simply executing a few lines of commands. -Item - -Description - -OS - -CentOS Linux 7.6 - -CPU - -4 cores - -Memory - -8 GB - -Disk type - -SSD - -Disk size - -100 GB - -File system - -XFS - -All-in-one package - -V4.1.0 or later +| Item | Description | +| ---------- | ----------- | +| OS | CentOS Linux 7.6 | +| CPU | 4 cores | +| Memory | 8 GB | +| Disk type | SSD | +| Disk size | 100 GB | +| File system | XFS | +| All-in-one package | V4.1.0 or later | Then, we imported the test data and launched the reporting system to execute a test task. To our surprise, OceanBase Database generated a perfect report before we adapted it to our specific business requirements. This indicated that OceanBase Database was fully compatible with our BI system and the thousands of reports in it. We were so excited that we finally found a feasible solution that supported our complex statistical statements. diff --git a/docs/blogs/users/Hybrid-Storage-Deploy.md b/docs/blogs/users/Hybrid-Storage-Deploy.md index 6d104ec5f..c59b2e4c6 100644 --- a/docs/blogs/users/Hybrid-Storage-Deploy.md +++ b/docs/blogs/users/Hybrid-Storage-Deploy.md @@ -42,36 +42,12 @@ Kirin ARM servers × 3, each equipped with ARM-based Kirin CPUs (32 cores × 2), ### **Scenario 1: batch writes** - -``` -8C16GB - -16C32GB - -32C64GB - -64C128GB - -All-flash - -54903 - -85870 - -178158 - -221443 - -Hybrid - -38622 - -61772 - -121918 - -164340 -``` +| | All-flash | Hybrid | +| ---------- | ----------- | ----------- | +| 8C16GB | 54903 | 38622 | +| 16C32GB | 85870 | 61772 | +| 32C64GB | 178158 | 121918 | +| 64C128GB | 221443 | 164340 | ![1732082836](/img/blogs/users/Hybrid-Storage-Deploy/image/7c6096f8-9432-4cbe-9882-9abd79648977.png) @@ -79,36 +55,12 @@ Conclusion: In the batch write scenario, performance increases linearly with spe ### **Scenario 2: regular writes** - -``` -8C16GB - -16C32GB - -32C64GB - -64C128GB - -All-flash - -24490 - -48079 - -87440 - -91702 - -Hybrid - -20513 - -40768 - -56511 - -56352 -``` +| | All-flash | Hybrid | +| ---------- | ----------- | ----------- | +| 8C16GB | 24490 | 20513 | +| 16C32GB | 48079 | 40768 | +| 32C64GB | 87440 | 56511 | +| 64C128GB | 91702 | 56352 | ![1732082862](/img/blogs/users/Hybrid-Storage-Deploy/image/3692d7da-604c-4961-9a6d-4b1f80798e7a.png) @@ -116,36 +68,12 @@ Conclusion: In the regular write scenario, performance increases linearly with s ### **Scenario 3: regular read-only** - -``` -8C16GB - -16C32GB - -32C64GB - -64C128GB - -All-flash - -57283 - -106498 - -173308 - -198595 - -Hybrid - -53010 - -96702 - -166397 - -188010 -``` +| | All-flash | Hybrid | +| ---------- | ----------- | ----------- | +| 8C16GB | 57283 | 53010 | +| 16C32GB | 106498 | 96702 | +| 32C64GB | 173308 | 166397 | +| 64C128GB | 198595 | 188010 | ![1732082878](/img/blogs/users/Hybrid-Storage-Deploy/image/0f461c7e-0c8a-46d1-82f6-e82d31df8aa6.png) @@ -153,56 +81,13 @@ Conclusion: In the regular read-only scenario, performance increases linearly wi ### **Scenario 4: regular reads/writes** - -``` -8C16GB - -16C32GB - -32C64GB - -64C128GB - -All-flash (reads) - -36029 - -69911 - -131861 - -148802 - -All-flash (writes) - -1801 - -3495 - -6593 - -7440 - -Hybrid (reads) - -34102 - -61301 - -111449 +| | All-flash (reads) | All-flash (writes) | Hybrid (reads) | Hybrid (writes) | +| ---------- | ----------- | ----------- | ----------- | ----------- | +| 8C16GB | 36029 | 1801 | 34102 | 1705 | +| 16C32GB | 69911 | 3495 | 61301 | 3065 | +| 32C64GB | 131861 | 6593 | 111449 | 5572 | +| 64C128GB | 148802 | 7440 | 125255 | 6262 | -125255 - -Hybrid (writes) - -1705 - -3065 - -5572 - -6262 -``` ![1732082891](/img/blogs/users/Hybrid-Storage-Deploy/image/156ae12a-7fd7-4439-bde1-c7520bdbe92a.png) @@ -210,36 +95,12 @@ Conclusion: In the regular read/write scenario, performance increases linearly w ### **Scenario 5: TPC-C benchmark on TP performance** - -``` -8C16GB - -16C32GB - -32C64GB - -64C128GB - -All-flash - -35127 - -82656 - -137854 - -158995 - -Hybrid - -28042 - -67327 - -118766 - -157774 -``` +| | All-flash | Hybrid | +| ---------- | ----------- | ----------- | +| 8C16GB | 35127 | 28042 | +| 16C32GB | 82656 | 67327 | +| 32C64GB | 137854 | 118766 | +| 64C128GB | 158995 | 157774 | ![1732082904](/img/blogs/users/Hybrid-Storage-Deploy/image/298265e0-0b5f-4bbe-9741-3cf39bc8129c.png) diff --git a/docs/blogs/users/Loong-Airlines.md b/docs/blogs/users/Loong-Airlines.md new file mode 100644 index 000000000..2c021ac70 --- /dev/null +++ b/docs/blogs/users/Loong-Airlines.md @@ -0,0 +1,94 @@ +--- +slug: Loong-Airlines +title: 'HTAP Practice of Loong Airlines: Unified Technology Stack for Efficient Online Services and A Lightweight Real-Time Data Warehouse' +tags: + - User Case +--- + +> **About the author:** Lu Qiuxiao, System Operations Engineer at Loong Airlines + +## Database HTAP Capabilities Required for the Aviation Business + +Zhejiang Loong Airlines Co., Ltd., the only airline headquartered in Zhejiang Province, China, provides public passenger and cargo services. It holds all domestic and international air transport licenses, and has developed into a medium-large sized airline ranking among the largest private airlines in China. Since its founding in 2011, it has operated up to 600 domestic and international passenger and cargo routes, covering the entire Chinese mainland and reaching over 170 cities in regions along the "Belt and Road" routes, such as Hong Kong, Macau, Japan, South Korea, Southeast Asia, and Central Asia. + +Efficient data management and a reliable database system are key factors for successful modern aviation operations. Loong Airlines needs to process a significant amount of data, including flight information, ticket sales data, customer information, and seat allocation data. Its business not only has high demands for online transaction processing (OLTP) but also requires advanced online analytical processing (OLAP) of T+N, T+1, and even T+0 data, emphasizing real-time data handling and reliable data analysis. Our original database, based on a master-slave mode, posed risks to business continuity in case of faults and could not process or analyze data in real time. This drove us to seek new database solutions. + +## Research on the HTAP Capabilities of OceanBase Database + +Our market research results indicated that OceanBase Database offers excellent performance and reliability for both OLTP and OLAP without the need to build two systems. So, we delved into key factors such as scalability, performance, data security, and reliability based on our business system requirements. + +**1. Distributed architecture and scalability** + +Our extensive aviation business requires the database system to handle massive amounts of data and concurrent requests. OceanBase Database can be deployed in a distributed architecture, which stores data across multiple nodes. We can easily add more nodes to cope with growing data volume. + +![1704793000](/img/blogs/users/Loong-Airlines/image/1704793000832.png) + +**2. High performance and complex queries** + +We often need to retrieve and analyze data based on multiple conditions and metrics to provide accurate flight information and sales data. The SQL layer of OceanBase Database efficiently handles complex queries and returns results in a short time, thanks to its SQL optimizer and execution engine. + +The SQL optimizer of OceanBase Database rewrites SQL queries based on rules and cost models, generating and selecting optimal query rewrite plans. It also optimizes various plans in distributed processing scenarios. + +![1704793014](/img/blogs/users/Loong-Airlines/image/1704793014642.png) + +The SQL execution engine of OceanBase Database supports parallel execution and vectorized computing. Using the parallel execution framework, it adaptively handles both parallel execution on a standalone server and distributed parallel execution. While serial execution suffices for small business, OceanBase Database supports parallel execution on a standalone server when a large amount of data is involved. Many open source standalone databases lack this capability. However, with sufficient CPU resources, the processing time of an SQL query can be linearly reduced by parallel execution in OceanBase Database. For distributed execution plans in the same form, OceanBase Database can execute them in parallel on multiple servers to process larger amounts of data. It breaks the performance bottleneck limited by the number of CPU cores of a single server, allowing us to scale up to hundreds or even thousands of CPU cores. + +![1704793030](/img/blogs/users/Loong-Airlines/image/1704793030493.png) + +Furthermore, OceanBase Database handles both OLAP and OLTP requests within a single cluster, and resource isolation therefore is crucial. OceanBase Database provides various resource isolation methods, such as physical isolation of multiple zones and isolation of database connections based on CPU resource groups. It also automatically identifies and isolates slow queries to prevent them from affecting the overall transaction response time. + +![1704793067](/img/blogs/users/Loong-Airlines/image/1704793067111.png) + +**3. Data security and reliability** + +Airlines have stringent requirements for data security and reliability, and must ensure data safety and system reliability to avoid downtime. OceanBase Database ensures data integrity and reliability through multi-layered security measures, such as data backup, fault recovery, and fault tolerance mechanisms. + +To better support the storage, queries, and modifications of our ticket data, we decided to build the core database system of our new business based on OceanBase Database. + +## Aviation Business Efficiency Improved by HTAP Capabilities + +### 1. Benefits: real-time, smooth, reliable, and cost-effective data processing + +a. Unified technology stack with enhanced real-time analysis capabilities + +As shown in the following figure, OceanBase Database is everywhere in the system, serving multiple roles within the ticketing architecture and providing robust data management and analysis capabilities. + +![1704793089](/img/blogs/users/Loong-Airlines/image/1704793089912.png) + +The data collection layer collects data from multiple sources in real time and aggregates the data in OceanBase Database. The preprocessing layer cleanses business data, while the operational data store (ODS) layer performs data modeling and stores the results in OceanBase Database. Finally, the service and publishing layer provides AP and TP results from OceanBase Database to various business applications through APIs for cross-system data calls. + +The high-performance query engine and distributed computing framework of OceanBase Database provide exceptional data processing and analysis capabilities, allowing us to swiftly conduct large-scale data analysis and respond to complex queries in real time. As AP and TP requests are handled in a single database system to produce accurate outcomes in real time, we can make more informed strategic decisions. From passenger behavior analysis to flight scheduling and resource management, OceanBase Database has streamlined our data processing workflows, making them more efficient and stable. + +b. Stable and reliable operations with zero downtime + +Since the launch of OceanBase Database, our system has been running with zero downtime. OceanBase Database not only ensures system stability and reliability but also facilitates seamless business operations by enabling smooth integration between different business units, allowing for uninterrupted data flow. This has significantly reduced potential downtime and business risks. + +c. 70% reduction in storage costs + +Thanks to the outstanding data processing and compression capabilities of OceanBase Database, we achieved a 70% reduction in storage costs after migrating business from MySQL to OceanBase Database. + +In addition to lower storage costs, OceanBase Database offers other benefits, such as high scalability of its distributed architecture, data security, and simplified O&M due to its hybrid transaction/analytical processing (HTAP) capabilities. It provides us with a reliable data management solution, thus enhancing our operational efficiency and competitive edge. + +### 2. Lessons learned: disk IOPS optimization + +We deployed and comprehensively tested OceanBase Database based on the official documentation and, with satisfactory results, introduced it into our production environment. However, lacking familiarity with the database product, we began experiencing a regular spike in disk IOPS at 22:00 every day after a few months of operation, when a considerable amount of data had been accumulated. + +![1704793121](/img/blogs/users/Loong-Airlines/image/1704793121478.png) + +When troubleshooting with OceanBase Technical Support, we identified several issues with our cluster deployment. + +* We stored the /redo, /log, and /data directories on the same disk, rather than physically isolating them. The /redo directory in particular has high I/O performance requirements. +* Instead of SSDs, we installed HDDs, which provided unsatisfactory disk I/O capabilities. OceanBase Technical Support recommended SSDs for production environments because HDDs might result in performance bottlenecks during a major compaction in the backend. +* We set the primary\_zone parameter of tenants to zone1, so that all leader replicas were stored on the same server, which became a read/write hotspot. + +With the help of OceanBase Technical Support, we physically separated the /redo, /log, and /data directories by migrating them to different disks. This avoids contention of disk I/O resources and ensures optimal I/O performance for each directory. + +We also optimized parameter settings, especially the connection timeout, single-transaction time, and log level parameters, and managed accounts of our self-built operating system in OceanBase Cloud Platform (OCP). + +The optimization measures brought a significant improvement in system performance with stable and efficient operations. + +## Vision for the Future + +As our aviation business continues to grow and data volume increases, we will further expand the application scope of OceanBase Database. We will leverage its high scalability to enhance the system capacity and performance by adding more nodes as needed. To access the latest features and optimizations, we will also keep a close eye on the technological updates of OceanBase Database, and will collaborate with the OceanBase community. + +We believe that OceanBase Database, as an efficient and reliable distributed database product developed by a Chinese company, its successful application in our aviation ticketing system has set an example for companies in other industries. By optimizing our business data management and analysis capabilities based on OceanBase Database, we can further reduce O&M and storage costs while boosting business processing efficiency. \ No newline at end of file diff --git a/docs/blogs/users/NetEase-Games.md b/docs/blogs/users/NetEase-Games.md new file mode 100644 index 000000000..774bb2e82 --- /dev/null +++ b/docs/blogs/users/NetEase-Games.md @@ -0,0 +1,215 @@ +--- +slug: NetEase-Games +title: 'NetEase Games: Why We Chose OceanBase Database?' +tags: + - User Case +--- + +As one of **China's leading** game development companies, NetEase Games invests heavily in the R&D of online games in order to keep ahead of the curve. Our company has many game products and derivatives, requiring different data processing products to meet diversified business requirements. Our database team mainly serves the internal departments and provides comprehensive database services in private cloud environments. + +NetEase Games boasts a mature database product matrix, but every product has its own pros and cons, and the existing database products are not flexible enough to cover all internal business scenarios. To solve this problem, we finally decided to introduce OceanBase Database. + +## Business Architecture and Requirements + + +The figure below shows the MySQL database architecture used by a business platform of NetEase Games. As more business data and requests need to be processed on the platform, the original MySQL database architecture gradually evolves into one that consists of a primary cluster and a dozen standby clusters, in which the standby clusters are created to process read requests. + +![1711436205](/img/blogs/users/NetEase-Games/image/1711436205907.png) + + + +However, the following paint points are becoming obvious during our use of this database architecture. + +* **High concurrency and sensitive to latency:** During peak hours, the queries per second (QPS) in the primary cluster can reach 100,000, and the number of read requests of a single standby cluster can reach 10,000 or more, with a total of millions of QPS in all standby clusters. The overall concurrency is high, but our business is latency-sensitive, which means that performance jitters are intolerable. +* **Heavy storage pressure on the primary cluster:** The storage space of a single node has exceeded 10 TB, which exerts heavy pressure on the MySQL database architecture in business scenarios involving high-concurrency transaction processing (TP). +* **Poor real-time performance of standby clusters:** The standby clusters demand high real-time performance. Latency caused by slow queries or other issues in the standby clusters severely affects our business. +* **Difficult O&M:** Traffic surges bring big challenges to database O&M. The MySQL database architecture, even though supplied with resources of the highest specifications, can deal with such a situation only by adding instances. What's worse, instance recreation will be required if a read-only standby cluster fails. Due to the huge amount of data stored in MySQL instances, both the storage space scale-outs of standby clusters and backup and restore are time-consuming, which is unacceptable for business. + +To solve this pain point, we are crying out for a distributed database that supports smooth horizontal scale-outs and ensures stable performance. Besides, the MySQL database architecture requires large storage space for data archiving and involves high QPS during the archiving. In consideration of this, we want the distributed database to be able to process large queries while scaling out and ensure that business modules in the same cluster do not affect each other. Of course, it would be better if the distributed database is also cost-effective. + +We analyze our requirements for a distributed database and sort them in the following priorities from high to low: + +* **Stable performance:** Our business is sensitive to query latency, which requires no jitters or slow queries with the database. +* **High concurrency:** The database can process highly concurrent requests in peak hours, during which nearly 100,000 QPS are generated in the primary cluster, and millions of read requests are generated in standby clusters. +* **Smooth scale-outs:** The storage space of a standalone server needs to be enough to support continuous data growth. +* **Low latency:** The database must allow data synchronization to the primary cluster with a second-level latency. +* **Low costs:** Due to a large amount of business data, we hope that the database operational costs can be minimized. + +## Why We Chose OceanBase Database + + +After investigation, we find that OceanBase Database can help us solve the preceding business pain points, and it has the following advantages: + +* First, stability. Database stability is critical to an online payment system in the gaming industry. A distributed three-replica OceanBase cluster supports automatic failovers and ensures a recovery point objective (RPO) of 0 and a recovery time objective (RTO) of 8s when a minority of OBServer nodes are down. Serving Alipay for years, the high stability of OceanBase Database has been testified by core financial transaction systems in ultra-large scenarios. + +* Second, transparency and scalability. An OceanBase database has high scalability and supports smooth online scaling. After a scale-out, the database automatically performs load balancing, which is transparent to applications without the need for changing business code and does not affect system continuity. This perfectly satisfies the horizontal storage scale-out requirement of NetEase Games. + +![1711436259](/img/blogs/users/NetEase-Games/image/1711436259038.png) + +* Third, real-time data synchronization. Our business requires high real-time performance of data. Therefore, we spent one month testing the stress and batch processing capability of OceanBase Database in the early stage after introducing it. After we use OceanBase Migration Service (OMS), a migration tool provided by OceanBase, no obvious latency occurs during data synchronization, ensuring real-time queries of business data. + +* Fourth, tenant-level resource isolation. OceanBase Database supports the multitenancy architecture, where each tenant can be seen as an instance in the original MySQL database architecture. Within an OceanBase cluster, multiple tenants can be created to serve different business modules. To ensure business stability, OceanBase Database isolates resources, such as CPU, memory, and IOPS, between tenants to avoid interference among business modules and ensure quick payment. + +* Last one, low storage costs with a high data compression ratio. OceanBase Database adopts an advanced log-structured merge-tree (LSM-tree)-based storage engine that is developed in-house. This engine automatically encodes and compresses microblocks when they are stored in disks. Unlike the B+ tree used by the original MySQL database architecture, the OceanBase storage engine reduces the storage costs by 70%–90% and improves the query efficiency by storing more data in each microblock. + +Regardless of the three-replica architecture, OceanBase Database still helps us significantly lower the storage costs while ensuring high performance after data migration. + +In addition to the preceding advantages, we also find the following strengths of OceanBase Database: + +* **Compatibility with MySQL.** This ensures smooth business data migration without the need to change business code or invest too much manpower in adaptation tests. +* **Hybrid transactional and analytical processing (HTAP).** With this capability, we can try using a single OceanBase cluster to replace the original two systems for business modules involving both TP and AP. + +## Tests on OceanBase Database in the Early Stage + + +Before putting a new database into formal use, we need to perform strict benchmark tests, business tests, and grayscale tests on the database to ensure it is stable, reliable, and adapted to our business. In this stage, we compared OceanBase Database with MySQL from various aspects, including the architecture, high availability (HA), consistency, compatibility, storage cost, and performance. The test details are as follows: + +### 1. Test environment + +![1711436386](/img/blogs/users/NetEase-Games/image/1711436386200.png) + +![1711436404](/img/blogs/users/NetEase-Games/image/1711436404918.png) + +![1711436412](/img/blogs/users/NetEase-Games/image/1711436412103.png) + + + +### 2. Test tool + +We used sysbench to perform tests in read-write hybrid (read-write ratio: 8:2) scenarios, read-only scenarios, and write-only scenarios. + +### 3. Test results + +* Online transaction processing (OLTP) capability: In a small-scale data scenario where only 10 tables need to be processed and each table stores 20 million data records, OceanBase Database V4.0 is on a par with MySQL, and OceanBase Database V4.1 outperforms a standalone MySQL database. However, in a large-scale data scenario where more than 100 million data records need to be processed, the OLTP capability of a scaled-out OceanBase database is far beyond that of a standalone MySQL database. + +![1711436426](/img/blogs/users/NetEase-Games/image/1711436426618.png) + +* Online analytical processing (OLAP) capability: When aggregate analysis and joined queries are involved for multiple large tables (storing more than 100 million data records), the overall performance of OceanBase Database V4.x is stable without serious jitters, and the time it spends on queries is much less than that MySQL spends. Actually, it's of little significance to compare the OLAP capability of the two databases since MySQL does not apply to AP-related business. +* Data compression for storage: We exported 5 TB of data from the upstream MySQL cluster to the OceanBase database, and the total size of the generated replicas is only 2.1 TB, with 700 GB of each replica. The compression ratio for a single replica is almost 86%. Due to space fragments in the upstream MySQL cluster, the calculation result may have a slight deviation. + +**Note:** We performed this test only to compare the data compression capability, without considering other costs such as CPU overhead. The total costs may vary according to business scenarios. + + +### 1. Tenant-level resource isolation test + +We performed a performance stress test on two tenants to check whether the response to a tenant's requests is affected when the other tenant uses up all resources. The test process is as follows: + + +#### Step 1: Create two tenants. + +(1) CPU: 2 cores; memory: 2 GB + +```sql +create resource unit test_unit max_cpu 2, max_memory '2G', max_iops 128, max_disk_size '2G', max_session_num 128, MIN_CPU=2, MIN_MEMORY='2G', MIN_IOPS=128; create resource pool test_pool unit = 'test_unit', unit_num = 1, zone_list=('zone1','zone2','zone3'); create tenant test_tenant resource_pool_list=('test_pool'), charset=utf8mb4, replica_num=3, zone_list('zone1', 'zone2', 'zone3'), primary_zone=RANDOM, locality='F@zone +``` + + + +(2) CPU: 4 cores; memory: 4 GB + +```sql +create resource unit test_unit2 max_cpu 4, max_memory '4G', max_iops 1280, max_disk_size 53687091200, max_session_num 128, MIN_CPU=4, MIN_MEMORY='4G', MIN_IOPS=1280; create resource pool sysbench_pool unit = 'test_unit2', unit_num = 1, zone_list=('zone1','zone2','zone3'); create tenant sysbench_tenant resource_pool_list=('sysbench_pool'), charset=utf8mb4, replica_num=3, zone_list('zone1', 'zone2', 'zone3'), primary_zone=RANDOM, locality +``` + + + +In Tenant 2, perform the stress test on 2,048 threads concurrently. + +![1711436493](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-03/1711436493542.png) + +Check the response to requests from Tenant 1 before and during the stress test on Tenant 2, as well as the CPU usage of Tenant 2 during the test. + +The following figure shows the performance of Tenant 1 before the stress test on Tenant 2. + +![1711436505](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-03/1711436505287.png) + +The following figure shows the performance of Tenant 1 during the stress test on Tenant 2. + +![1711436518](https://obcommunityprod.oss-cn-shanghai.aliyuncs.com/prod/blog/2024-03/1711436518050.png) + + + +#### Step 2: Check the resource usage of Tenant 2. + +* CPU: + +![1711436542](/img/blogs/users/NetEase-Games/image/1711436542465.png) + +* Memory: + +![1711436553](/img/blogs/users/NetEase-Games/image/1711436553061.png) + +The test results are as follows: + +* The resource usage stability meets the expectation. During the stress test on multiple threads in a high concurrency scenario, the CPU and memory resources are stably used by Tenant 2, without exceeding the upper limit. +* No reciprocal impact is found between the two tenants. The expected isolation effect is achieved. During the stress test on Tenant 2, the number of requests from Tenant 1 slightly decreases, and the QPS reduces by about 20%. The usage of I/O resources may have not been thoroughly limited, resulting in resource contention. It's worth noting that OceanBase Database does not support the usage limitation on I/O resources until the release of V4.x, which means that this feature is not supported in V3.x. Overall, the two tenants have no significant impact on each other when using CPU and memory resources. + +To sum up, the test results prove the superiority of OceanBase Database in terms of performance, stability, recourse isolation, and cost reduction. So, we finally decided to use it in actual business scenarios. + +## Benefits Brought by OceanBase Database + +OceanBase Database helps us build a new business architecture (as shown in the following figure), which effectively solves our business pain points. Currently, OMS synchronizes all data in the primary MySQL cluster to an OceanBase cluster. This helps the upstream MySQL cluster regularly clear redundant data based on its business logic and release more storage space. OMS also allows us to set the related parameter for ignoring the DML and DDL operations in MySQL data cleanup. By doing this, a complete data replica is always available in the OceanBase cluster for business queries. + +![1711436608](/img/blogs/users/NetEase-Games/image/1711436608218.png) + +The benefits of this new architecture are as follows: + +* **Stable business queries:** OceanBase Database shares 15% QPS of the original read-only MySQL standby cluster and delivers stable performance with few jitters. +* **Flexible scaling under high concurrency stress:** The high scalability of OceanBase Database can easily handle a large number of concurrent requests. By migrating a part of QPS in the original read-only MySQL standby cluster to an OceanBase cluster, the high concurrency stress and risk are effectively controlled. +* **High real-time performance of data:** The logical replication feature provided by OMS reads the MySQL binlogs and replays the replica in the OceanBase cluster, minimizing the primary/standby switchover latency. OceanBase Database generates monitoring reports if a data synchronization latency or slow query occurs. +* **Lower storage costs for a single standby cluster:** Compared with the single-replica storage in MySQL, the OceanBase Database storage solution lowers the cost by over 80% and compresses data in the upstream MySQL cluster to about 30% of its original size for storage and archiving, greatly reducing the storage pressure. We also take control of the risks caused by excessive data by using OMS to migrate data to an OceanBase cluster and clearing business data in MySQL on a regular basis. +* **Easier O&M:** In the event of traffic surges, OceanBase Database can dynamically adjust the available resources of tenants and clusters. OceanBase Database also provides a GUI-based SQL throttling feature for you to deal with traffic surges or slow SQL queries. What's more, both the Paxos-based HA mode in failure scenarios and horizontal scaling are almost imperceptible to applications. By compressing backup data to about 30% of its original size, OceanBase Database restores less data within a shorter time than MySQL when replacing a failed node, improving the backup and restore efficiency by three times. + +## Best Practices + + +The use and operations of a native distributed database are different from those of a distributed database middleware product. We have summarized some best practices during the use of OceanBase Database. I hope these practices are useful for enterprises with the same pain points as NetEase Games. + +### Best practice 1: Optimize the data synchronization performance + +We encountered a series of challenges in the early stage of using OMS to migrate data from MySQL to OceanBase Database. The biggest challenge is the high latency in incremental migration, resulting in poor migration performance. By querying logs, we find that the REPLACE INTO statement is executed to write data to OceanBase Database, making data migration slow. We run the SQL diagnostics tool and identify an issue, that is, the OB\_GAIS\_PUSH\_AUTO\_INC\_RE function increases the time that the system spends processing remote procedure call (RPC) requests for auto-increment columns. + +Then, we consult the OceanBase Database team about this issue. The team tells us that the migrated table is a partitioned table in OceanBase Database and the created auto-increment column is of the ORDER attribute. In this case, the auto-increment column is fully scanned for synchronization between all partitions when OceanBase Database runs the REPLACE INTO SQL and UPDATE statements, increasing the RPC cost. The data synchronization performance is greatly affected especially in the event of large data volume. + +**🧰 Solutions:** + +* In the premise that the business requirements are met, remove the ORDER attribute from the auto-increment column to avoid the global scan of the column upon an attribute update. We finally adopted this solution, with which the upstream MySQL cluster successfully ensures the ID uniqueness. +* Change the attribute of the auto-increment column from ORDER to NOORDER and leave the values of the auto-increment column unspecified in the REPLACE INTO statement. In other words, use the auto-increment values generated by autoincrement\_service in this statement. + +### Best practice 2: Design reasonable partitioned tables + +Considering the scalability of OceanBase Database as a distributed system, we plan to divide a large table that stores billions of rows of data in the MySQL cluster into partitions when migrated to OceanBase Database. In the early tests, we take account into both the current query performance and future scalability and design a table schema that contains 512 hash partitions based on transaction IDs (a general condition used for most business queries) for migration, trying to balance between storage and performance. + +When it comes to the grayscale test, other columns instead of the transaction ID column are used as the filter conditions for thousands of QPS, which account for a small proportion. These queries do not use the partitioning key. Therefore, data in the 512 partitions on all OBServer nodes is scanned upon each request, leading to multiple RPC requests. Consequently, high network latency occurs frequently, and SQL statements are slowly executed. + +**🧰 Solutions:** + +* Select appropriate columns as the filter conditions for queries without using the partitioning key, and create global indexes to reduce the amount of data to scan. Note that if too many such query requests with different filter conditions are sent, the partitioned table may need multiple global indexes, which gives rise to extra maintenance costs. Therefore, we recommend that you do not create excessive global indexes for a single table. +* Reduce the number of partitions in the table from 512 to about 10. This solution can achieve the horizontal balance and lower the RPC latency even if all partitions are scanned. After discussion, we chose this solution. + +### Best practice 3: Ensure the atomicity of the transactions from the upstream MySQL cluster to an OceanBase cluster + +There's a special business scenario during the test run. In this scenario, the seller sells products to the buyer in batches and hundreds of orders are generated accordingly. The data of these orders, plus various cash settlement records, may cause hundreds of different DML statements to be executed in a transaction. The DML statements are executed in the MySQL cluster, while the business query requests are made in an OceanBase cluster. Sometimes, the execution results of some SQL statements in a transaction read from OceanBase Database may not conform to the atomicity. + +We check OMS logs and find that the "maxRecords is 64, cut it" message is displayed in the incremental synchronization link to indicate that a large transaction is divided into several small transactions by default. This is because when OMS is used to synchronize the upstream binlogs, a default parameter splitThreshold is used to control the number of parsed upstream transaction records. Once the transaction size exceeds the value specified by splitThreshold, the transaction is divided into small pieces for execution. As a result, only statements in a part of these small transactions have been executed when data is read from OceanBase Database. For business modules, only the intermediate state of a small transaction in MySQL is read. + +**🧰 Solution:** + +Set JDBCWriter.sourceFile.splitThreshold in OMS to a greater value, for example, 1024. This can ensure that a transaction is completed once data has been synchronized to OceanBase Database. Note that a larger value of this parameter indicates more resources occupied by OMS. Proceed with caution. + +### Best practice 4: Specify a primary key or unique key to ensure data consistency in synchronization + +Repeated rows of data are found when we query data in an OceanBase table migrated from MySQL using OMS. After troubleshooting, we identify that this partitioned table does not have a primary key or unique key for data consistency checks. As a result, data conflicts may occur if a restart or retry happens during the synchronization with OMS. + +**🧰 Solution:** + +Specify a primary key or unique key for each OceanBase table to ensure data consistency in synchronization. + +## Summary and Vision for the Future + + +Since the introduction of OceanBase Database, its performance has been stable and reliable without jitters or synchronization latency, effectively helping us solve business pain points. + +In the future, we will keep exploring the applications of OceanBase Database and gradually replace MySQL standby clusters with OceanBase clusters. + +Meanwhile, we are striving to incorporate the OceanBase ecosystem into the SaaS DB platform of NetEase Games. We believe that this action will enhance our service capabilities and enable us to provide database support for more products and business modules. \ No newline at end of file diff --git a/docs/blogs/users/Yoka.md b/docs/blogs/users/Yoka.md new file mode 100644 index 000000000..cee892d40 --- /dev/null +++ b/docs/blogs/users/Yoka.md @@ -0,0 +1,159 @@ +--- +slug: Yoka +title: 'Yoka Games: Migrating Business from MySQL to OceanBase Database in Only One Month' +tags: + - User Case +--- + +> Editor's note: As one of the earliest tabletop game developers in the Chinese mainland, Yoka Games started testing OceanBase Database in September 2023 and spent only two months migrating three core business modules to OceanBase Database. Why does Yoka Games discard the universal MySQL solution used in the gaming industry and choose OceanBase Database? In this article, Yu Zhenjia, O&M owner at Yoka Games, shares his practical experience in database replacement. + +> **About the author:** Yu Zhenjia, head of the O&M department in the support center of Hangzhou Yoka Network Technology Co., Ltd. + +Architecture Features and Paint Points of the Gaming Business +----------- + +As one of the earliest tabletop game developers in the Chinese mainland, Yoka Games provides both offline board games and online digital games. The core business of our company is to offer derivative games and products of "War of the Three Kingdoms". As time goes by, Yoka Games has also been exploring and developing other phenomenal games in recent years. One typical game is "Monkey King: Arena of Heroes", whose revenue exceeded CNY 200 million in one month after the game was released. + +MySQL is one of the most popular database products used in the gaming industry. However, it is not a distributed database and has poor scalability, hindering the development of the industry. The database cluster architecture of Yoka Games has three features, as shown in Figure 1. + +![1701411042](/img/blogs/users/Yoka/image/1701411042089.png) + +
_Figure 1: Database cluster architecture of Yoka Games_
+ +**Feature 1: two IDCs across three regions, meeting the standards of Multi-Level Protection Scheme (MLPS) Level 3.** As illustrated in Figure 1, it is a typical conventional database architecture in primary/standby mode. Yoka Games deploys an IDC in primary/standby mode in Hangzhou, with an IDC in Shanghai for disaster recovery, and an IDC in Jiangsu for offline data backups. + +**Feature 2: hybrid cloud deployment, with data stored on the local servers.** Yoka Games deploys business on the cloud, but all data is stored in IDCs. Connections are established between the IDCs and Alibaba Cloud through an enterprise leased line. This method is widely adopted by enterprises to migrate their business to the cloud. Most of them think data is private, and they must take control of it. + +**Feature 3: at least one database cluster created for each project.** Yoka Games has a lot of game projects, and each of them requires at least one MySQL cluster regardless of the data volume involved. As a result, the database cluster architecture is complicated. + +#### Due to the preceding architecture features, the following pain points arise accordingly: + +**Pain point 1: poor usability of MySQL Master High Availability (MHA) Manager, making automatic business switchover difficult.** Here's an example. The primary MySQL cluster was down in an accident. Due to a 1–2s latency, the standby cluster failed to change to the primary one. The attribute data of game roles was written to databases in real time, and a large amount of data was concurrently written every second. Therefore, even though the primary/standby switchover had been successful, data loss still occurred. + +**Pain point 2: difficulty in scale-outs and high maintenance costs.** Figure 2 shows the growth trend of average log space occupied per month by "War of the Three Kingdoms" on mobile devices since the log system was put into use in 2015. We can see from the figure that the log space of the game has been multiplied over the years. MySQL can rely on only migration or database/table sharding to increase the database capacity, resulting in higher maintenance costs. Especially when ads are placed in games, a huge amount of data needs to be processed. If we invest substantial manpower in sharding and subsequent maintenance, the ad performance will definitely be affected. + +![1701411092](/img/blogs/users/Yoka/image/1701411092754.png) + +
_Figure 2: Log space occupied by the mobile game "War of the Three Kingdoms"_
+ +**Pain point 3: uneven resource utilization.** Only a few games can be best-sellers, and the rest are average. Those mediocre games occupy excessive CPU and memory resources of servers, making the CPU and memory resources available for best-selling games insufficient. + +**Pain point 4: difficulty in data migration.** Game data needs to be migrated frequently, but the mysqldump client utility provided by MySQL cannot display the migration progress or speed due to its poor performance and visualization. On this account, we have to look for substitutes. + +The preceding pain points have been haunting our company for a long time. After investigations, we decided to choose OceanBase Database, a native distributed database. Next, I'll elaborate on the architecture transformation and business benefits brought by OceanBase Database. + +**Architecture Transformation Brought by OceanBase Database** +-------------------- + +### Deployment: configure memory and disk space settings + +Currently, we use OceanBase Database only for several business modules and have created a primary cluster that consists of three servers, each with 48 CPU cores, 256 GB of memory, and 80 TB of disk space. Since we spent only one month completing the test on and deployment of OceanBase Database, some issues occurred in this process. + +**Issue 1: maximum memory settings for an OceanBase database** + +The [maximum memory for an OceanBase database](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001103404) is defined by related parameters, such as [memory\_limit\_percentage](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001105533) and [memory\_limit](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001105475). However, beginners of OceanBase Database may not modify the default settings of these parameters. Like most beginners, we retained the default value (80%) of memory\_limit\_percentage when using OceanBase Database V4.2.0 for the first time, making 20% of the OBServer memory unable to be used. + +In addition, the [system\_memory](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001105522) parameter needs to be set to reserve memory for the virtual SYS500 tenant. If the parameter value is not specified, the system will automatically adjust the memory usage strategy based on the current memory usage. The SYS500 tenant is a special virtual tenant. In OceanBase Database, the memory for the SYS500 tenant is the memory shared by physical tenants and the memory consumed by virtual tenants. When configuring the deployment environment for the first time, we found in OceanBase Cloud Platform (OCP) that the remaining memory allocated to user tenants was only 180 GB (approximately equal to 256 \x 80%(memory\_limit\_percentage) – 30(system\_memory)) after the servers with 256 GB of memory were put online. We finally resolved the issue by modifying the values of the corresponding parameters: memory\_limit\_percentage and system\_memory. + +**Issue 2: disk space settings** + +If clogs and data are stored on the same disk, OceanBase Database reserves 40% of disk space for clogs by default. Consequently, the percentage of disk space available for data files is only 60%. This percentage can be adjusted by changing the value of [datafile\_disk\_percentage](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001105552). After learning that, we stored data files and clogs on different disks and adjusted the percentage of disk space reserved for data files. + +For beginners of OceanBase Database, the preceding two issues may be overlooked. Based on past experiences, we recommend that they follow the suggestions provided on the official OceanBase Database website when configuring the memory and disk space settings. Mainstream OceanBase Database servers are configured with 384 GB or 512 GB memory. On the official website, the suggested percentage of disk space allocated to a database is 80% for the 384 GB server memory or 90% for the 512 GB server memory. It is also recommended that enterprises store data files and clogs on different disks. In this case, the percentage of disk space available for data files will be 90% by default. For more information, refer to the [relevant document](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001106069) on the official website. + +The business architecture of Yoka Games has not much changed since we replaced MySQL with OceanBase Database, as shown in Figure 3. The difference is that the primary/standby nodes for disaster recovery in the MySQL environment change to three OBServer nodes, and each IDC is deployed with OceanBase Database Proxy (ODP). However, due to the limitations of OceanBase Database Community Edition, we figure out a solution to replace the ODP cluster. We use the Network Load Balancing (NLB) service provided by Alibaba Cloud to load two ODPs. By doing this, we can ensure the business availability in the production environment when any of the nodes in an IDC fails. + +![1701411231](/img/blogs/users/Yoka/image/1701411231895.png) + +
_Figure 3: Database cluster architecture of Yoka Games after business migration to OceanBase Database_
+ +Along with ODP, we also use many other OceanBase ecosystem tools, including OCP, OceanBase Migration Service (OMS), OceanBase Agent (OBAgent), and OceanBase Developer Center (ODC). Efficient and easy to operate, these tools lighten the workload of our O&M personnel and reduce the O&M costs. + +### **OCP: easier O&M at a lower cost** + +Firstly, the cost of learning arises inevitably each time when a new product is put into use. OCP provides users with both CLI and GUI tools, greatly shortening the learning curve of O&M engineers. These tools lower the threshold for the database administrators (DBAs) of conventional centralized databases to learn knowledge of distributed databases and make database management easier. Secondly, our company has a plentiful of game products, and each product needs to be deployed with a database cluster, which increases the O&M workload of DBAs. However, OCP slims down the management scale and allows DBAs to manage these clusters in a unified manner. What's more, OCP centralizes operations that need to be done in different systems. These operations are managing accounts, backups, resources, and disaster recovery tasks, analyzing slow logs, and monitoring databases. + +### **OMS: simple, efficient, and visualized data migration** + +We have tested the data migration performance of OMS and compared it with mysqldump. The test result shows that data migration with mysqldump is time-consuming and involves multiple steps such as data export, compression, transmission, and recovery. What's worse, the migration process is not visualized, and no data verification tool is provided by MySQL. Unlike mysqldump, OMS simplifies data migration and supports process-based migration and post-migration verification. As shown in Figure 4, OMS is more efficient than mysqldump in data migration and visualizes the migration progress and speed for users to check in real time. By now, we have migrated more than 20 TB of data, and a large part of these migration tasks were done by OMS. So, we're heavily dependent on OMS at this stage. + +![1701411268](/img/blogs/users/Yoka/image/1701411268411.png) + +
_Figure 4: Comparison between OMS and mysqldump in data migration_
+ +### **ODP, OBAgent, and ODC** + +In addition to OCP and OMS, other OceanBase ecosystem tools also outperform their MySQL counterparts. + +* Unlike MySQL Proxy as an independent tool, ODP has been integrated into OCP and can be used together with load balancing (LB) features on the cloud to simply achieve high availability (HA). +* OBAgent can replace MySQL Exporter to be integrated into the monitoring mid-end of Yoka Games and visualize monitoring data by working with Prometheus and Grafana. +* More than a development and O&M tool, ODC can also substitute Navicat to provide ticket creation and review features. + +On top of revenue, we hope that OceanBase Database can make optimizations in the following aspects: One is the interconnection between ecosystem tools. For example, our O&M personnel need to log in to different platforms when using OCP, OMS, and ODC. We want these tools to be integrated into the same platform that requires only one account-password pair for login. The other is the barrier among different steps that hinders notifications of ticket results. We hope that related features can be added, such as modification of table schemas, modification of stored procedure, and data archiving. + +Benefits from OceanBase Database: Lower Costs and Higher Efficiency +---------------- + +As we've talked about, the introduction of a new technology can make O&M easier. However, an enterprise may care more about how this technology helps it increase efficiency at lower costs. In this regard, OceanBase Database not only helps Yoka Games improve resource utilization, but also reduces storage and hardware costs. + +### **1. Higher utilization of storage resources** + +In the beginning, we chose OceanBase Database because of its high data compression ratio for compact storage. In the test stage, OceanBase Database compressed game data to 19%–37% of its original size while maintaining the performance and CPU overhead, as shown in Figure 5. + +![1701411370](/img/blogs/users/Yoka/image/1701411370429.png) + +
_Figure 5: Storage space before and after migration from MySQL to OceanBase Database_
+ +How about the reduction in hardware costs? Currently, Yoka Games adopts a solid storage solution. The cost of this solution is about CNY 450 per TB and will increase to CNY 900 per TB after RAID 10. In this year, 20 TB of data is about to be migrated and the data storage space will be reduced to 7 TB after migration. This way, the saved cost will be (20 TB–7 TB) x CNY 900/TB x 3 = CNY 35,100. As mentioned above, the available storage space for an OceanBase cluster is about 50 GB. If the data file storage percentage is 80%, we can migrate about 125 TB of data. So, CNY 200,000 can be saved by creating such a cluster. + +### **2. Higher utilization of CPU and memory resources** + +First of all, let's have a look at the database performance indicators applied in the gaming industry. + +In this industry, the CPU utilization of the primary cluster hardly reaches 10% in most cases, let alone a standby cluster or disaster recovery cluster. Only when a game publishes the daily quests, usually at 02:00 or 10:00, the CPU utilization reaches the peak, and the peak hours vary according to the games. Meanwhile, the O&M personnel conduct data analysis mainly between 03:00 and 04:00. + +Considering the preceding features of the gaming industry, we optimize the cluster settings in OceanBase Database, such as the zone priorities for tenants and CPU resource overallocation (when appropriate), to improve resource utilization. To be specific, the configured CPU resources have exceeded the 48 CPU cores for the primary cluster. I'll give you an example. If the peak hours for both games A and B are 00:00, set the zone priorities for the tenants corresponding to the two games to ZONE1 and ZONE2, respectively. If the peak hours for game A and game C are 00:00 and 04:00, respectively, set the zone priorities for both games to ZONE1. + +By doing this, our resource utilization greatly increases and exceeds 10%, as shown in Figure 6. + + + +![1701411395](/img/blogs/users/Yoka/image/1701411394934.jpg) + +![1701411402](/img/blogs/users/Yoka/image/1701411402873.png) + +
_Figure 6: Resource utilization before and after OceanBase Database is used_
+ +From Figure 7, we can also see that OceanBase Database significantly improves the CPU utilization of hosts. + +![1701411413](/img/blogs/users/Yoka/image/1701411413339.png) + +
_Figure 7: Host CPU utilization comparison between MySQL and OceanBase Database_
+ +### **3. Lower server cost** + +After using OceanBase Database, we discard a MySQL cluster and an ApsaraDB instance on Alibaba Cloud. Figure 8 shows the host specifications of the discarded MySQL cluster and the ApsaraDB instance on Alibaba Cloud. This indicates the cost saved. + +![1701411526](/img/blogs/users/Yoka/image/1701411526196.png) + +
_Figure 8: Cost saved after OceanBase Database is used_
+ +Our company started testing and using OceanBase Database nearly three months ago. In such a short period, we adopted a relatively aggressive strategy, but the result was remarkable. I believe that OceanBase Database will unlock its potential in more business scenarios in the future. + +Exploration of OceanBase Database in Diverse Scenarios +-------------------- + +### **Scenario 1: optimize the performance of a large table that contains hundreds of millions of data records** + +A mobile game of our company generates about 400 to 500 million data records every day. In the past, we used MySQL to store these data records in 14 tables, encompassing 164 data types and occupying 60–100 GB of storage space. But now, OceanBase Database helps us combine these tables and convert them into 14 partitions. To test the OceanBase database performance, we randomly inserted 100,000 data records in the database and selected 10,000 accounts to query data types. The test result shows that almost the same time was spent on a query in the OceanBase database and in a MySQL table. As more data records and data types are generated, the advantages of OceanBase Database partitions get increasingly obvious, which means that we no longer need to worry about the overhead brought by sharding for large tables. + +### **Scenario 2: reduce the storage cost of a history database** + +We want to set up a history database for cold log data with a cost-effective storage solution, that is, a large-capacity low-RPM HDD. In compliance with the industry requirements, game companies must be able to track the consumer behavior data of players in the past 12 months. However, the fact is that game companies seldom query players' behavior data in the past three months, not to mention earlier data. If such data is stored in the primary cluster, it will result in a waste of storage resources. To avoid this, we need to synchronize history cold data to a history cluster for a lower storage cost. With OceanBase Database, it is expected that the cost of a history database can be reduced by 50% to 95%. We are looking forward to verifying this prediction in real-world scenarios. + +Migration Schedule and Planning +--------- + +The business that needs to be migrated involves three types of business modules. First, three business modules that have used OceanBase Database, which include the core product of Yoka Games — "War of the Three Kingdoms". Second, four internal IT-based business modules, including monitoring, auditing, and configuration management database (CMDB). Last, projects under testing and development, such as the instant messaging (IM) project of Yoka Games. The migration process is full of twists and turns, but we have received tremendous support. I'd like to extend my thanks to the internal project teams for their trust and willingness to accept new technologies, as well as to the OceanBase open source community for professional technical support. + +By now, we have migrated all data involved in the first type of business modules, solved paint points in using MySQL databases (such as complex HA configuration and difficulties in disaster recovery and scale-outs), greatly improved the utilization of server resources, and reduced the costs of computing resources. I'm fully assured that OceanBase Database will be applied to more business modules of Yoka Games in the future. Wish the prosperity of OceanBase in the long term. \ No newline at end of file diff --git a/docs/blogs/users/game-company.md b/docs/blogs/users/game-company.md new file mode 100644 index 000000000..af6064cc9 --- /dev/null +++ b/docs/blogs/users/game-company.md @@ -0,0 +1,181 @@ +--- +slug: game-company +title: 'A Game Company‘s Database Transformation: Replace ClickHouse and Hive with OceanBase Database' +tags: + - User Case +--- + +Introduction: A complex architecture is a silent hazard that keeps growing, especially in massive data processing scenarios, and results in system bottlenecks, module interference, data redundancy, and maintenance nightmares. A game company found itself in this exact pickle and partnered with OceanBase to build an integrated data warehouse that supports both data storage and real-time analysis. In this article, the company's big data platform leader walks us through their challenges, solution selection process, and the bumpy road of solution implementation. + +**Background: Data Analysis, Processing, and Operations Based on a Complex Data Warehouse Architecture** + +We're a game company that has shifted our priorities from game development to operations. Data analysis is crucial for game companies, so the capabilities of an analytical system are extremely important to us. We primarily use data warehouse tools to analyze user behaviors (downloads, registrations, and payments), advertising and marketing data, and game data such as user levels and battle parameters. + +Like most companies, our data warehouse was built in a typical Lambda architecture, as shown in Figure 1. We'd collect data from sources, preprocess it (including data quality control and cleaning), and then cache it in Kafka. Then, some data was sent to a Hive data warehouse for offline processing and some other data was sent to a ClickHouse data warehouse for real-time analysis by scheduled tasks. The analysis results were fed to various application systems, such as the business intelligence system, user system, and marketing system, as well as third-party platforms like Baidu, Tencent, Toutiao, and Douyin. + + + +![](/img/blogs/users/game-company/image/57c9fbdd-3ad6-4096-a0c8-9c41baa3c97f.png) + +Figure 1 Architecture of our original data warehouse + + + +Our original data warehouse would perform data parsing and quality control after data collection, and then trigger alerts against the collected data with quality issues, such as missing fields and incorrect field types. A unique aspect of data processing in the gaming industry is data attribution, which essentially means to analyze the data collection process in detail to identify the channels and ad slots that generated specific data. Future advertising and marketing strategies depend on data attribution. Our data processing also involved data broadening, a common data warehouse scenario, where IP addresses were parsed to display user location, and other user details such as their mobile device model, age, and gender could be obtained and fed to both offline and real-time data warehouses to support user profiling. + + + + + +**Challenges: Real-time Performance, Data Consistency, Maintainability, and Query Efficiency** + + + +Our original data warehouse architecture consisted of multiple layers, such as the operational data store (ODS) layer, data warehouse detail (DWD) layer, data warehouse middle (DWM) layer, data warehouse service (DWS) layer, and data mart. After quality checks, raw data was written into the ODS layer of the Hive and ClickHouse data warehouses. Kafka and the ODS layer contained the same data, and Kafka was technically a part of the ODS layer. Then, the task scheduling system would perform data broadening, store data details in the DWD layer, and carry out metric aggregation in the DWM and DWS layers before sending the results to data marts built on PostgreSQL and Redis. The in-house task scheduling system was quite powerful. It could perform, for example, source dictionary and data quality management, task rerunning, task priority adjustment, and quality issue alerting. The original architecture was quite advanced back then. However, we encountered significant challenges. + + + +**Challenge 1: real-time performance**. While many companies adopted the T+1 data warehouse strategy, we optimized the Hive data warehouse, and could get analysis results 30 minutes after data generation. In other words, we would load the data once every 30 minutes, write it to Hive, and then execute the INSERT OVERWRITE statement to store the data to the partition of that day. This method could reduce data fragmentation. The real-time ClickHouse data warehouse, on the other hand, could output results within 1 minute after data generation. However, we needed to see results in milliseconds in some scenarios, which was far beyond what Hive or ClickHouse could achieve. + + + +**Challenge 2: data consistency**. Lambda architecture users know that ClickHouse and Hive often generate inconsistent data. The same issue bothered us despite our data deduplication measures. As a result, we used the data from ClickHouse for real-time queries, and that from Hive for final data consumption. + + + +**Challenge 3: maintainability**. Apparently, it's not that easy to maintain two code systems in the same architecture. + + + +**Challenge 4: query efficiency**. Hive took about 10 minutes or more to return query results, while ClickHouse took from a few seconds to a few minutes. Such performance was fine in most cases, but would be unacceptable in the following two scenarios: + +* Federated queries for user identity. Users may associate their accounts with their identity card numbers. For queries of accounts by identity card number, the query results should be returned in a few milliseconds. We stored user information in a MySQL database, which had no problem meeting that response time if a small amount of data was queried. However, the MySQL database became sluggish or even unavailable if millions or billions of data records were involved. +* Federated queries for advertising channels. In this scenario, we needed to perform federated queries on the order data, user data, and advertising information. The original architecture took 30 minutes to generate the advertising result, while we wanted to view the result within 1 second. + + + +These challenges pushed us to explore new data warehouse solutions. + + + + + +**Database Selection: A Significant Performance Boost Brought by OceanBase Database** + + + +We researched Hudi and Doris. From data writes to returning the result of a JOIN query, Hudi took at least 60 seconds, while Doris took 10-60 seconds. Compared to ClickHouse, which took about 66 seconds to return the query result, as shown in Figure 2, the performance of Hudi or Doris was not a remarkable improvement, and could hardly meet our business needs. + + + +![](https://gw.alipayobjects.com/zos/oceanbase/3f94c232-97af-44e4-9298-1db56fab7117/image/2022-11-30/c8ed05e9-ab41-4308-83a3-d6857d3503bf.png) + +Figure 2 ClickHouse took about 66 seconds to return the query result + + + +During our tool research, we learned about OceanBase Database, a database system that is capable of hybrid transaction and analytical processing (HTAP), and tested its query speed of retrieving user account IDs by identity card number. We only created indexes on the tables under test instead of creating partitions, and we performed a total of 120 million queries on 3.4 billion data rows. As shown in Figure 3, the first test returned the query results in 0.23 seconds, meaning that the performance was improved by 286 times. The query results were returned even in 0.01 seconds after the data was preloaded. A quite thrilling performance boost, right? + + + +![](https://gw.alipayobjects.com/zos/oceanbase/e0f5da4f-15b2-462c-978d-dc97545dbff4/image/2022-11-30/6e585a5e-5ec2-4090-848a-535f20221b8a.png) + +Figure 3 OceanBase Database returned the query result within a few milliseconds + + + +The test result immediately convinced us to deploy OceanBase Database for our key business needs, such as user account ID retrievals by identity card number, user ID-based advertising information retrievals, and real-time tracking of marketing results. + + + + + +**Production Deployment: Data Write Optimization and Challenges to Data Import** + + + +We manage historical and real-time data separately in OceanBase Database. + +* Historical data: Using DataX, we exported historical data into CSV files, and then imported the CSV files into OceanBase Database. +* Real-time data: Expecting a query response in milliseconds, we selected Flink SQL to extract real-time data. We performed a test and the test result showed that Flink SQL can deliver data to OceanBase Database within 1 second from data generation. + + + +As a first-time user, we encountered some difficulties during historical data import, and many were resolved with the assistance of OceanBase technical experts on DingTalk (group ID: 33254054). Personally, I suggest connecting to OceanBase Database directly through port 2881 if you export data into CSV files. If you use port 2883, OceanBase Database is connected through OceanBase Database Proxy (ODP), which may distribute commands to a server where DataX is not deployed and CSV files are not stored. + + + +We considered using Spark for real-time data writes. Spark writes data in micro-batches with an inter-batch latency of up to 300 ms, while Flink supports real-time data writes to OceanBase Database. So, we selected Flink SQL to do the job. + + + +The following three screenshots show how Flink performs the extract-transform-load (ETL) process and writes data to OceanBase Database. + + + +![](/img/blogs/users/game-company/image/73ed5780-4fab-4da1-8a57-ebc9bfbad4f6.png) + +Figure 4 Extracting real-time data from Kafka + + + +![](/img/blogs/users/game-company/image/bc2f4bc6-1c53-4fc4-b748-9484ed3939d6.png) + +Figure 5 Performing the ETL process of real-time data + + + +![](/img/blogs/users/game-company/image/0a9c3202-71ca-4e0e-b0d2-5ba23506561f.png) + +Figure 6 Loading processed data into OceanBase Database in real time + + + +I converted the process into a batch commit script, which enables Flink to synchronize data to our new real-time data warehouse based on OceanBase Database from multiple sources, such as Kafka, MySQL, Oracle, OceanBase Database, MongoDB, PostgreSQL, and SQL Server. + + + +The preceding code has been implemented in our production environment to support two scenarios: user account ID retrievals by identity card number, and data attribution, so that we can learn about, for example, the advertising channel that attracted a user. The following figure shows the position of OceanBase Database in our business system. + + + +![](/img/blogs/users/game-company/image/d8cf8805-bfff-410c-bd6c-5260601c9c77.png) + +Figure 7 Architecture of our data warehouse with OceanBase Database + + + + + + + +**Summary: An All-in-one System Supporting Both TP and AP** + + + +OceanBase Database has solved the aforesaid challenges to our business systems. + +* **Real-time performance**: The real-time performance of data write and export is no longer a problem. Flink SQL extracts real-time data from Kafka and writes it into OceanBase Database in real time. We hope OceanBase can offer better versions of the OceanBase Change Data Capture (obcdc) tool and improve the flink-sql-connector-OceanBase-cdc tool to better support the reprocessing of historical data. We are also looking forward to an OceanBase-specific Flink connector, which writes data to OceanBase Database efficiently without data duplication or loss. This way, we can process data in the second and third layers of the data warehouse and extend the OceanBase ecosystem to big data, achieving storage/computing splitting in a big data environment. +* **Data consistency**: OceanBase Database has been working greatly with all historical and real-time data of our business system with zero data duplication and loss. +* **Query efficiency**: In the database selection test, we only created indexes on the tables under test without creating table partitions. A test that involved 120 million queries on 3.4 billion data rows returned the results in 0.23 seconds, meaning that the performance was improved by 286 times. After the data was preloaded, the query results were returned even in 0.01 seconds. +* **Maintainability**: We will phase out ClickHouse and Hive and gradually migrate all our core systems to OceanBase Database, making use of both TP and AP capabilities in a simplified architecture. + + + +Next, we will migrate our user system, advertising system, data analysis system, and marketing and channel management system to OceanBase Database, as shown in Figure 8. We have already started code development and data adaptation. The ideal solution is to preserve and analyze all business data in OceanBase Database, handling all needs in one database system. + + + +![](/img/blogs/users/game-company/image/61d3df81-e5a6-40ba-a75b-8d669a24c530.png) + +Figure 8 Migrating more business systems to OceanBase Database + + + +This journey with OceanBase Database has brought numerous surprises. As the saying goes, "A journey of a thousand miles begins with a single step." Only by constantly trying can we reach ambitious goals previously thought impossible. To conclude, we sincerely wish OceanBase Database a better future. + +Scan the QR code below to join the OceanBase Discord community. You can find answers to all your technical questions there. + + + +![ob-discord](/img/blogs/users/game-company/image/ob-discord.png) \ No newline at end of file diff --git a/docs/blogs/users/iFLYTEK-htap.md b/docs/blogs/users/iFLYTEK-htap.md new file mode 100644 index 000000000..371a3e8dd --- /dev/null +++ b/docs/blogs/users/iFLYTEK-htap.md @@ -0,0 +1,147 @@ +--- +slug: iFLYTEK-htap +title: 'iFLYTEK Leverages HTAP Capabilities to Improve Query Performance by 40 Times for Large Tables with Hundreds of Millions of Records' +tags: + - User Case +--- + +> Editor's note: iFLYTEK Co., Ltd. (iFLYTEK) became interested in native distributed databases in 2021 and deployed OceanBase Database in 2023 to empower their core business systems. The new database architecture achieves stable business operations, supports automatic scaling, and handles both transaction processing (TP) and analytical processing (AP) tasks without mutual interference. Unexpectedly, it also reduces the storage costs by 50% and greatly simplifies O&M. In this article, Li Mengjia, head of iFLYTEK's database team, shares their experience in database upgrades. + +Exploring Native Distributed Databases +----------- + +iFLYTEK is a listed tech company well-known in the Asia-Pacific region. Since its founding, the company has been engaged in technological research. Our core technologies, such as intelligent voice, natural language understanding, and computer vision, have maintained the edge on the international market. In 2023, we launched a promising business line, which initially experienced a trough period. As more services were released, the business volume welcomed explosive growth after a promotional campaign in September of that year. + +![1702293646](/img/blogs/users/iFLYTEK-htap/image/1702293646599.png) + +The business data was stored in a MySQL database, and as the business grew, the data volume and disk usage increased drastically. + +* The business system generated a record for each user interaction. About 700 million records were squeezed into the core table in just half a year. +* The data volume grew rapidly, with an estimated annual increment of around 5 TB. + +![1701691209](/img/blogs/users/iFLYTEK-htap/image/1701691209757.png) + +We soon realized that MySQL could no longer effectively support our multi-dimensional, real-time report analysis for business decision-making due to the massive and rapidly growing data volume. A native distributed database might be a good solution. + +Some vulnerabilities of MySQL further spurred our determination to replace it. A MySQL database cluster can be horizontally scaled out by sharding to improve its overall read/write performance. Many of our business systems were running on MySQL databases in a well-maintained architecture. However, MySQL was invasive to the business systems, so business adaptation was necessary. If we adhered to the sharding strategy, we would have to spend significant extra energy and time to modify the new business system because of frequent large table creation and update operations. Besides, it was in a critical stage, so we hoped to minimize the adaptation costs. + +After comprehensive consideration, we decided to replace MySQL with a native distributed database solution for three benefits: + +1. Scalability. Highly scalable data storage and processing capabilities handle drastic data growth and high-concurrency access with ease. + +2. Maintainability. A comprehensive ecosystem of tools helps simplify database management and maintenance, reducing maintenance costs and complexity. + +3. Hybrid transaction and analytical processing (HTAP) capabilities. With HTAP and read/write splitting, we can handle TP and AP tasks in a single architecture. + +So, why did we pick OceanBase Database, and what can it do to help us? + +Why OceanBase Database +------------------- + +OceanBase Database has caught our attention since 2021. After two years of research, we were clear that it provides what we need. + +**1. Scalability** + +OceanBase Database can be deployed in a distributed, scalable architecture with a cluster of equivalent nodes. We can deploy a cluster across multiple zones to ensure fault isolation and rapid recovery. + +We can scale out an OceanBase cluster in two ways. When a cluster runs out of resources to maintain its performance due to rapid business volume growth, we can enhance its service capacity by adding more nodes within each zone. This method is called intra-zone scaling. Another method is horizontal scaling. OceanBase Database distributes multiple replicas of the same set of data across different zones. If a minority of zones fails, the remaining replicas ensure that the cluster continues to provide service. We can improve the overall disaster recovery performance of a cluster by adding more zones. + +Either way, the cluster can be scaled out when it is running, with zero business interruptions. + +![1701691238](/img/blogs/users/iFLYTEK-htap/image/1701691238252.png) + +**2. Maintainability** + +OceanBase Database is backed by a grand ecosystem of more than 400 tools. Specifically, OceanBase Cloud Platform (OCP), an in-house O&M tool, offers a range of capabilities: + +* IaaS resource management: such as region, IDC, and host management +* Tenant management: database, session, parameter, and zone priority management, as well as creation, deletion, and scaling of tenants +* Software package management: package upload, download, and storage +* OceanBase Database Proxy (ODP) management: creation, takeover, deletion, upgrade, and scaling of ODP clusters, as well as parameter management +* Backup and restore: data and log backup, backup cleanup, second backup, restore, and sampling +* Database cluster management: creation, deletion, upgrade, scaling, and monitoring of database clusters, fault alerting, as well as management of compaction tasks, parameters, and resource units + +OCP allows us to execute arguably all O&M tasks on a GUI instead of a command line interface. This greatly reduces the overall O&M workload. + +![1701691284](/img/blogs/users/iFLYTEK-htap/image/1701691284063.png) + +OceanBase Database also surprised us with its performance in DDL operations. If you are a MySQL user, you may have experienced the awkwardness when performing DDL operations on a large table in your MySQL database. MySQL 5.6 and later have been optimized, but still cannot meet business requirements in many scenarios. Therefore, we used three tools to help perform DDL operations on large tables in MySQL. Typically, we did that job at night because it took a very long time for tables with hundreds of millions of records. OceanBase Database performs distributed DDL operations, with priority given to business requests. A time-consuming DDL operation in MySQL can be completed within seconds in OceanBase Database. + +**3. HTAP capabilities** + +Most databases cope with AP and TP requests separately. In other words, data is written to an online TP system and is extracted to an AP system for analysis. OceanBase Database provides an engine that supports both TP and AP capabilities. It analyzes data immediately after the data is inserted. Resources for TP and AP requests are isolated to avoid business interference. + +![1701691295](/img/blogs/users/iFLYTEK-htap/image/1701691295206.png) + +Furthermore, the online transaction processing (OLTP) capabilities of OceanBase Database have withstood the huge traffic of the Double 11 shopping festival, ten years in a row, and Alipay. Its online analytical processing (OLAP) capabilities also bring many benefits, such as complex query optimization, low-latency response (within seconds), and horizontal linear scaling (to handle JOIN queries on tens or even hundreds of millions of data records). + +To make sure that OceanBase Database can meet our business requirements, we tested its performance. + +**4. Performance test** + +Using the common TPC-C benchmark tools, we measured the tpmC value (transactions per minute) of a three-node OceanBase cluster, a standalone MySQL database, and a sharded MySQL database in a production environment with 96 CPU cores, 384 GB of memory, and SSDs. The test result showed that the MySQL database slightly outperformed the OceanBase cluster when the concurrency was below 64. However, the OceanBase cluster gained significant advantage when the concurrency was set to 128 or larger. As the concurrency became larger, the performance of the OceanBase cluster kept improving, while that of MySQL peaked at the concurrency of 256. + +![1701691306](/img/blogs/users/iFLYTEK-htap/image/1701691306742.png) + +We also compared the performance of MySQL and OceanBase Database in handling the most time-consuming statistical queries of the system. Results indicated that depending on SQL complexity, OceanBase Database outperformed MySQL by 7 to 40 times. + +![1701691313](/img/blogs/users/iFLYTEK-htap/image/1701691313876.png) + +The performance stress test also proved the high data compression ratio of OceanBase Database. The compressed data volume of the three-replica OceanBase cluster was about 50% smaller than that of the MySQL cluster, significantly reducing the storage cost. + + **5. Protocol compatibility and data migration** + + We were highly concerned about two issues. First, we hoped that OceanBase Database would be compatible with MySQL protocols, so that we could migrate data to OceanBase Database without extensive modifications. In fact, OceanBase Database is fully compatible with MySQL protocols, allowing us to switch business systems to it without complicated code modifications. Second, the new business system provides 24/7 service, leaving a narrow time window for data migration. OceanBase Migration Service (OMS) supports near-real-time data synchronization from MySQL to OceanBase Database. Using OMS, with the traffic switching capabilities of the intermediate layer, we can quickly switch the business traffic from MySQL to OceanBase Database, shortening the expected downtime. + +**6. Conclusion** + +In a word, OceanBase Database fully meets our requirements based on the research and test results. + +* Scalability: OceanBase Database can be vertically and horizontally scaled based on business fluctuations. +* Maintainability: OCP allows us to perform O&M on a GUI instead of a command line interface, reducing O&M workload. A DDL operation is completed within seconds. +* HTAP capabilities: OceanBase Database supports both TP and AP capabilities. Resources for TP and AP requests are isolated to avoid business interference. +* Others: OceanBase Database is fully compatible with MySQL protocols. We can use OMS to smoothly migrate data from MySQL to OceanBase Database with zero business code modifications. Compared with MySQL, OceanBase Database saves the storage cost by half. + +Switching Business from MySQL to OceanBase Database +---------------------- + +We were fully prepared before switching to OceanBase Database. + +- First, we built a test environment and tested OceanBase Database to verify its adaptability and compatibility, and make sure that its OLAP and OLTP performance meet online business requirements. + +- Second, we verified the OMS-based synchronization method and then synchronized data from MySQL to OceanBase Database and verified data consistency. To address unknown migration risks, we designed a rollback plan. We also tried management operations, and verified the high availability solution and the emergency plan. + +- Finally, we switched the business system from MySQL to OceanBase Database, which proceeded smoothly. After that, we carried out routine O&M works, such as performance optimization, monitoring, and backup and restore, to ensure continuous and stable operations of the OceanBase cluster. + +The following figure shows the switching procedure. Our original MySQL architecture consisted of a master instance and a slave instance to ensure high availability. It received requests forwarded by ProxySQL. The new architecture consists of a three-node OceanBase cluster. It receives requests forwarded by HAProxy, which also serves as a load balancer. We set the same configuration for HAProxy and ProxySQL, and migrated data from the MySQL cluster to the OceanBase cluster using OMS before the switching. At the same time, we verified data consistency between the two clusters, and synchronized their user information. This way, we only needed to change the virtual IP address for the business connection of the MySQL cluster to that of the OceanBase cluster, and then switch the business traffic with negligible impact on the upper-layer business applications. + +![1701691368](/img/blogs/users/iFLYTEK-htap/image/1701691368737.png) + +After the OceanBase cluster went live, we kept the MySQL cluster for some time, and enabled reverse synchronization from the OceanBase cluster to the production MySQL cluster, as shown in the following figure. This is to prevent any incompatibility issues. In that case, we could quickly switch the business traffic back to MySQL. The good news was nothing bad happened. + +![1701691376](/img/blogs/users/iFLYTEK-htap/image/1701691376501.png) + +To further improve the overall system availability, we set up a standby OceanBase cluster for the purpose of disaster recovery. If the production cluster failed, we could quickly switch to the standby cluster and keep system services available. + +As a first-time OceanBase Database user, exhaustive preparation strengthened our confidence despite the limited time window for the business switching. + +Nevertheless, we encountered problems during the process anyway. At first, we deployed OceanBase Database V4.1. Our developers reported errors in the result set after the switching. We contacted OceanBase Technical Support, who fixed the issue by adding hints. We also noticed exceptions during the synchronization of large tables and DDL operations using OMS. Those exceptions were kicked out after we upgraded OceanBase Database to V4.2.1. + +Summary +---------------------- + +OceanBase Database has been running stably for six months as I put together this article, and has brought us many benefits: + +Stable business operations. The native distributed architecture of OceanBase Database is based on the Paxos protocol. It avoids single points of failure, providing stable service for upper-layer applications. + + Automatic scaling with zero business interruptions. OceanBase Database supports vertical and horizontal scaling in response to business changes, causing zero business interruptions to upper-layer applications. + + Improved maintainability and simplified O&M. We can perform all O&M tasks in OCP. This management platform has dramatically reduced our O&M workload. + + Great storage cost reduction. As indicated by our test results, OceanBase Database provides a data compression ratio two times that of MySQL, saving half of the storage cost. + + Smooth business migration. OceanBase Database is fully compatible with MySQL protocols, requiring minimal code modifications, thus saving considerable development costs. In addition, OMS has helped us smoothly migrate data from MySQL to OceanBase Database. + + Convenient HTAP capabilities. OceanBase Database provides an engine that supports both OLTP and OLAP, saving the costs of building a separate OLAP system. + + In the days to come, iFLYTEK will migrate more systems to OceanBase Database, and deepen the partnership with OceanBase. \ No newline at end of file diff --git a/static/img/blogs/tech/OpenStack-ob/image/1725507686709.png b/static/img/blogs/tech/OpenStack-ob/image/1725507686709.png new file mode 100644 index 000000000..b67300901 Binary files /dev/null and b/static/img/blogs/tech/OpenStack-ob/image/1725507686709.png differ diff --git a/static/img/blogs/tech/OpenStack-ob/image/1725507686709.psd b/static/img/blogs/tech/OpenStack-ob/image/1725507686709.psd new file mode 100644 index 000000000..9719d4015 Binary files /dev/null and b/static/img/blogs/tech/OpenStack-ob/image/1725507686709.psd differ diff --git a/static/img/blogs/tech/hive-to-ob/image/c3e0979c-0903-4890-8735-e27a4bf9285a.png b/static/img/blogs/tech/hive-to-ob/image/c3e0979c-0903-4890-8735-e27a4bf9285a.png new file mode 100644 index 000000000..db6239607 Binary files /dev/null and b/static/img/blogs/tech/hive-to-ob/image/c3e0979c-0903-4890-8735-e27a4bf9285a.png differ diff --git a/static/img/blogs/tech/hive-to-ob/image/c3e0979c-0903-4890-8735-e27a4bf9285a.psd b/static/img/blogs/tech/hive-to-ob/image/c3e0979c-0903-4890-8735-e27a4bf9285a.psd new file mode 100644 index 000000000..6a8407855 Binary files /dev/null and b/static/img/blogs/tech/hive-to-ob/image/c3e0979c-0903-4890-8735-e27a4bf9285a.psd differ diff --git a/static/img/blogs/tech/parallel-execution-VII/image/1705633920006.png b/static/img/blogs/tech/parallel-execution-VII/image/1705633920006.png new file mode 100644 index 000000000..adb14cf04 Binary files /dev/null and b/static/img/blogs/tech/parallel-execution-VII/image/1705633920006.png differ diff --git a/static/img/blogs/tech/parallel-execution-VII/image/1705633920006.psd b/static/img/blogs/tech/parallel-execution-VII/image/1705633920006.psd new file mode 100644 index 000000000..21a84da83 Binary files /dev/null and b/static/img/blogs/tech/parallel-execution-VII/image/1705633920006.psd differ diff --git a/static/img/blogs/tech/practices-binlog/image/1702608197778.png b/static/img/blogs/tech/practices-binlog/image/1702608197778.png new file mode 100644 index 000000000..09e3c62c0 Binary files /dev/null and b/static/img/blogs/tech/practices-binlog/image/1702608197778.png differ diff --git a/static/img/blogs/tech/practices-binlog/image/1702608197778.psd b/static/img/blogs/tech/practices-binlog/image/1702608197778.psd new file mode 100644 index 000000000..26e72a78b Binary files /dev/null and b/static/img/blogs/tech/practices-binlog/image/1702608197778.psd differ diff --git a/static/img/blogs/tech/practices-binlog/image/1702608245245.png b/static/img/blogs/tech/practices-binlog/image/1702608245245.png new file mode 100644 index 000000000..61635486f Binary files /dev/null and b/static/img/blogs/tech/practices-binlog/image/1702608245245.png differ diff --git a/static/img/blogs/tech/practices-binlog/image/1702608245245.psd b/static/img/blogs/tech/practices-binlog/image/1702608245245.psd new file mode 100644 index 000000000..de84b041f Binary files /dev/null and b/static/img/blogs/tech/practices-binlog/image/1702608245245.psd differ diff --git a/static/img/blogs/tech/row-to-vector/image/1.png b/static/img/blogs/tech/row-to-vector/image/1.png new file mode 100644 index 000000000..ae8a560dc Binary files /dev/null and b/static/img/blogs/tech/row-to-vector/image/1.png differ diff --git a/static/img/blogs/tech/row-to-vector/image/11.png b/static/img/blogs/tech/row-to-vector/image/11.png new file mode 100644 index 000000000..c4968f4cb Binary files /dev/null and b/static/img/blogs/tech/row-to-vector/image/11.png differ diff --git a/static/img/blogs/tech/row-to-vector/image/2.png b/static/img/blogs/tech/row-to-vector/image/2.png new file mode 100644 index 000000000..9b8d17c67 Binary files /dev/null and b/static/img/blogs/tech/row-to-vector/image/2.png differ diff --git a/static/img/blogs/tech/row-to-vector/image/3.png b/static/img/blogs/tech/row-to-vector/image/3.png new file mode 100644 index 000000000..6075f1efa Binary files /dev/null and b/static/img/blogs/tech/row-to-vector/image/3.png differ diff --git a/static/img/blogs/tech/row-to-vector/image/5.png b/static/img/blogs/tech/row-to-vector/image/5.png new file mode 100644 index 000000000..4e3283354 Binary files /dev/null and b/static/img/blogs/tech/row-to-vector/image/5.png differ diff --git a/static/img/blogs/tech/row-to-vector/image/image.png b/static/img/blogs/tech/row-to-vector/image/image.png new file mode 100644 index 000000000..1b74d5790 Binary files /dev/null and b/static/img/blogs/tech/row-to-vector/image/image.png differ diff --git a/static/img/blogs/tech/row-to-vector/image/image.psd b/static/img/blogs/tech/row-to-vector/image/image.psd new file mode 100644 index 000000000..babd3cad0 Binary files /dev/null and b/static/img/blogs/tech/row-to-vector/image/image.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/1.png b/static/img/blogs/tech/ticket-olap/image/1.png new file mode 100644 index 000000000..62037d21a Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/1.psd b/static/img/blogs/tech/ticket-olap/image/1.psd new file mode 100644 index 000000000..2f1b086f2 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/101.png b/static/img/blogs/tech/ticket-olap/image/101.png new file mode 100644 index 000000000..8999e923e Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/101.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/102.png b/static/img/blogs/tech/ticket-olap/image/102.png new file mode 100644 index 000000000..8834f2cee Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/102.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/1716370902473.png b/static/img/blogs/tech/ticket-olap/image/1716370902473.png new file mode 100644 index 000000000..36d6460f2 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1716370902473.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/1716370902473.psd b/static/img/blogs/tech/ticket-olap/image/1716370902473.psd new file mode 100644 index 000000000..61ef35c52 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1716370902473.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/1716533392671.png b/static/img/blogs/tech/ticket-olap/image/1716533392671.png new file mode 100644 index 000000000..1e19280cc Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1716533392671.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/1716533392671.psd b/static/img/blogs/tech/ticket-olap/image/1716533392671.psd new file mode 100644 index 000000000..4d67af162 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1716533392671.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/1716533410796.png b/static/img/blogs/tech/ticket-olap/image/1716533410796.png new file mode 100644 index 000000000..1c1135430 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1716533410796.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/1716533410796.psd b/static/img/blogs/tech/ticket-olap/image/1716533410796.psd new file mode 100644 index 000000000..3893d8ee5 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/1716533410796.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/2.png b/static/img/blogs/tech/ticket-olap/image/2.png new file mode 100644 index 000000000..4ec9cab7d Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/2.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/2.psd b/static/img/blogs/tech/ticket-olap/image/2.psd new file mode 100644 index 000000000..ad5be70a6 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/2.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/3.png b/static/img/blogs/tech/ticket-olap/image/3.png new file mode 100644 index 000000000..abc284d2f Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/3.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/3.psd b/static/img/blogs/tech/ticket-olap/image/3.psd new file mode 100644 index 000000000..afa16026f Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/3.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/4.png b/static/img/blogs/tech/ticket-olap/image/4.png new file mode 100644 index 000000000..22c51631c Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/4.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/4.psd b/static/img/blogs/tech/ticket-olap/image/4.psd new file mode 100644 index 000000000..ed4e4cacc Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/4.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/5.png b/static/img/blogs/tech/ticket-olap/image/5.png new file mode 100644 index 000000000..890ee3024 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/5.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/5.psd b/static/img/blogs/tech/ticket-olap/image/5.psd new file mode 100644 index 000000000..26743675d Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/5.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/6.png b/static/img/blogs/tech/ticket-olap/image/6.png new file mode 100644 index 000000000..62019bd3d Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/6.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/6.psd b/static/img/blogs/tech/ticket-olap/image/6.psd new file mode 100644 index 000000000..9b50e4995 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/6.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/7.png b/static/img/blogs/tech/ticket-olap/image/7.png new file mode 100644 index 000000000..339a1da92 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/7.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/7.psd b/static/img/blogs/tech/ticket-olap/image/7.psd new file mode 100644 index 000000000..f620f61fa Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/7.psd differ diff --git a/static/img/blogs/tech/ticket-olap/image/8.png b/static/img/blogs/tech/ticket-olap/image/8.png new file mode 100644 index 000000000..1ed709924 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/8.png differ diff --git a/static/img/blogs/tech/ticket-olap/image/8.psd b/static/img/blogs/tech/ticket-olap/image/8.psd new file mode 100644 index 000000000..3d71f4ff5 Binary files /dev/null and b/static/img/blogs/tech/ticket-olap/image/8.psd differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793000832.png b/static/img/blogs/users/Loong-Airlines/image/1704793000832.png new file mode 100644 index 000000000..66044bc55 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793000832.png differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793000832.psd b/static/img/blogs/users/Loong-Airlines/image/1704793000832.psd new file mode 100644 index 000000000..5c6ea00c5 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793000832.psd differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793014642.png b/static/img/blogs/users/Loong-Airlines/image/1704793014642.png new file mode 100644 index 000000000..72b88d041 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793014642.png differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793014642.psd b/static/img/blogs/users/Loong-Airlines/image/1704793014642.psd new file mode 100644 index 000000000..82ca5d026 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793014642.psd differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793030493.png b/static/img/blogs/users/Loong-Airlines/image/1704793030493.png new file mode 100644 index 000000000..43744fa4b Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793030493.png differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793030493.psd b/static/img/blogs/users/Loong-Airlines/image/1704793030493.psd new file mode 100644 index 000000000..053a829e0 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793030493.psd differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793067111.png b/static/img/blogs/users/Loong-Airlines/image/1704793067111.png new file mode 100644 index 000000000..7535d967a Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793067111.png differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793067111.psd b/static/img/blogs/users/Loong-Airlines/image/1704793067111.psd new file mode 100644 index 000000000..50f7312c1 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793067111.psd differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793089912.png b/static/img/blogs/users/Loong-Airlines/image/1704793089912.png new file mode 100644 index 000000000..9bb13dd37 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793089912.png differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793089912.psd b/static/img/blogs/users/Loong-Airlines/image/1704793089912.psd new file mode 100644 index 000000000..638931824 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793089912.psd differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793121478.png b/static/img/blogs/users/Loong-Airlines/image/1704793121478.png new file mode 100644 index 000000000..83ac9a227 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793121478.png differ diff --git a/static/img/blogs/users/Loong-Airlines/image/1704793121478.psd b/static/img/blogs/users/Loong-Airlines/image/1704793121478.psd new file mode 100644 index 000000000..7b697db84 Binary files /dev/null and b/static/img/blogs/users/Loong-Airlines/image/1704793121478.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436205907.png b/static/img/blogs/users/NetEase-Games/image/1711436205907.png new file mode 100644 index 000000000..32913fdd3 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436205907.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436205907.psd b/static/img/blogs/users/NetEase-Games/image/1711436205907.psd new file mode 100644 index 000000000..28c69228a Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436205907.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436259038.png b/static/img/blogs/users/NetEase-Games/image/1711436259038.png new file mode 100644 index 000000000..b92528d56 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436259038.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436259038.psd b/static/img/blogs/users/NetEase-Games/image/1711436259038.psd new file mode 100644 index 000000000..a2118339d Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436259038.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436386200.png b/static/img/blogs/users/NetEase-Games/image/1711436386200.png new file mode 100644 index 000000000..622ce0a6c Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436386200.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436386200.psd b/static/img/blogs/users/NetEase-Games/image/1711436386200.psd new file mode 100644 index 000000000..256710bd3 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436386200.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436404918.png b/static/img/blogs/users/NetEase-Games/image/1711436404918.png new file mode 100644 index 000000000..aa2229071 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436404918.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436404918.psd b/static/img/blogs/users/NetEase-Games/image/1711436404918.psd new file mode 100644 index 000000000..89df51dca Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436404918.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436412103.png b/static/img/blogs/users/NetEase-Games/image/1711436412103.png new file mode 100644 index 000000000..2d01384fe Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436412103.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436412103.psd b/static/img/blogs/users/NetEase-Games/image/1711436412103.psd new file mode 100644 index 000000000..a1f1da61e Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436412103.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436426618.png b/static/img/blogs/users/NetEase-Games/image/1711436426618.png new file mode 100644 index 000000000..1d1b29897 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436426618.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436426618.psd b/static/img/blogs/users/NetEase-Games/image/1711436426618.psd new file mode 100644 index 000000000..accf95082 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436426618.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436542465.png b/static/img/blogs/users/NetEase-Games/image/1711436542465.png new file mode 100644 index 000000000..44f4c8f7f Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436542465.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436542465.psd b/static/img/blogs/users/NetEase-Games/image/1711436542465.psd new file mode 100644 index 000000000..7f51630f9 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436542465.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436553061.png b/static/img/blogs/users/NetEase-Games/image/1711436553061.png new file mode 100644 index 000000000..2985fa415 Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436553061.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436553061.psd b/static/img/blogs/users/NetEase-Games/image/1711436553061.psd new file mode 100644 index 000000000..64827fc3a Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436553061.psd differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436608218.png b/static/img/blogs/users/NetEase-Games/image/1711436608218.png new file mode 100644 index 000000000..6526f15ec Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436608218.png differ diff --git a/static/img/blogs/users/NetEase-Games/image/1711436608218.psd b/static/img/blogs/users/NetEase-Games/image/1711436608218.psd new file mode 100644 index 000000000..a48e630fd Binary files /dev/null and b/static/img/blogs/users/NetEase-Games/image/1711436608218.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411042089.png b/static/img/blogs/users/Yoka/image/1701411042089.png new file mode 100644 index 000000000..27414a859 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411042089.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411042089.psd b/static/img/blogs/users/Yoka/image/1701411042089.psd new file mode 100644 index 000000000..0de07c84f Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411042089.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411092754.png b/static/img/blogs/users/Yoka/image/1701411092754.png new file mode 100644 index 000000000..acac082c1 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411092754.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411092754.psd b/static/img/blogs/users/Yoka/image/1701411092754.psd new file mode 100644 index 000000000..5f58e1a45 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411092754.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411231895.png b/static/img/blogs/users/Yoka/image/1701411231895.png new file mode 100644 index 000000000..458dd1521 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411231895.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411231895.psd b/static/img/blogs/users/Yoka/image/1701411231895.psd new file mode 100644 index 000000000..a70404a92 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411231895.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411268411.png b/static/img/blogs/users/Yoka/image/1701411268411.png new file mode 100644 index 000000000..744a39548 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411268411.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411268411.psd b/static/img/blogs/users/Yoka/image/1701411268411.psd new file mode 100644 index 000000000..2ea6a9e9b Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411268411.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411370429.png b/static/img/blogs/users/Yoka/image/1701411370429.png new file mode 100644 index 000000000..a7e71eb04 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411370429.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411370429.psd b/static/img/blogs/users/Yoka/image/1701411370429.psd new file mode 100644 index 000000000..b9a6c8a7f Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411370429.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411394934.jpg b/static/img/blogs/users/Yoka/image/1701411394934.jpg new file mode 100644 index 000000000..15d49f46b Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411394934.jpg differ diff --git a/static/img/blogs/users/Yoka/image/1701411394934.png b/static/img/blogs/users/Yoka/image/1701411394934.png new file mode 100644 index 000000000..91f33b2dd Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411394934.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411394934.psd b/static/img/blogs/users/Yoka/image/1701411394934.psd new file mode 100644 index 000000000..fb7a46c28 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411394934.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411402873.png b/static/img/blogs/users/Yoka/image/1701411402873.png new file mode 100644 index 000000000..6949164e9 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411402873.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411413339.png b/static/img/blogs/users/Yoka/image/1701411413339.png new file mode 100644 index 000000000..077e20dd0 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411413339.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411413339.psd b/static/img/blogs/users/Yoka/image/1701411413339.psd new file mode 100644 index 000000000..1770cb7fc Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411413339.psd differ diff --git a/static/img/blogs/users/Yoka/image/1701411526196.png b/static/img/blogs/users/Yoka/image/1701411526196.png new file mode 100644 index 000000000..7bcc44a25 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411526196.png differ diff --git a/static/img/blogs/users/Yoka/image/1701411526196.psd b/static/img/blogs/users/Yoka/image/1701411526196.psd new file mode 100644 index 000000000..50829d3c5 Binary files /dev/null and b/static/img/blogs/users/Yoka/image/1701411526196.psd differ diff --git a/static/img/blogs/users/game-company/image/0a9c3202-71ca-4e0e-b0d2-5ba23506561f.png b/static/img/blogs/users/game-company/image/0a9c3202-71ca-4e0e-b0d2-5ba23506561f.png new file mode 100644 index 000000000..8236c1f8f Binary files /dev/null and b/static/img/blogs/users/game-company/image/0a9c3202-71ca-4e0e-b0d2-5ba23506561f.png differ diff --git a/static/img/blogs/users/game-company/image/0a9c3202-71ca-4e0e-b0d2-5ba23506561f.psd b/static/img/blogs/users/game-company/image/0a9c3202-71ca-4e0e-b0d2-5ba23506561f.psd new file mode 100644 index 000000000..79952b979 Binary files /dev/null and b/static/img/blogs/users/game-company/image/0a9c3202-71ca-4e0e-b0d2-5ba23506561f.psd differ diff --git a/static/img/blogs/users/game-company/image/57c9fbdd-3ad6-4096-a0c8-9c41baa3c97f.png b/static/img/blogs/users/game-company/image/57c9fbdd-3ad6-4096-a0c8-9c41baa3c97f.png new file mode 100644 index 000000000..a5462272c Binary files /dev/null and b/static/img/blogs/users/game-company/image/57c9fbdd-3ad6-4096-a0c8-9c41baa3c97f.png differ diff --git a/static/img/blogs/users/game-company/image/57c9fbdd-3ad6-4096-a0c8-9c41baa3c97f.psd b/static/img/blogs/users/game-company/image/57c9fbdd-3ad6-4096-a0c8-9c41baa3c97f.psd new file mode 100644 index 000000000..e58b5476a Binary files /dev/null and b/static/img/blogs/users/game-company/image/57c9fbdd-3ad6-4096-a0c8-9c41baa3c97f.psd differ diff --git a/static/img/blogs/users/game-company/image/61d3df81-e5a6-40ba-a75b-8d669a24c530.png b/static/img/blogs/users/game-company/image/61d3df81-e5a6-40ba-a75b-8d669a24c530.png new file mode 100644 index 000000000..1a24ce5b2 Binary files /dev/null and b/static/img/blogs/users/game-company/image/61d3df81-e5a6-40ba-a75b-8d669a24c530.png differ diff --git a/static/img/blogs/users/game-company/image/61d3df81-e5a6-40ba-a75b-8d669a24c530.psd b/static/img/blogs/users/game-company/image/61d3df81-e5a6-40ba-a75b-8d669a24c530.psd new file mode 100644 index 000000000..5d805a8af Binary files /dev/null and b/static/img/blogs/users/game-company/image/61d3df81-e5a6-40ba-a75b-8d669a24c530.psd differ diff --git a/static/img/blogs/users/game-company/image/73ed5780-4fab-4da1-8a57-ebc9bfbad4f6.png b/static/img/blogs/users/game-company/image/73ed5780-4fab-4da1-8a57-ebc9bfbad4f6.png new file mode 100644 index 000000000..5e0f36e6a Binary files /dev/null and b/static/img/blogs/users/game-company/image/73ed5780-4fab-4da1-8a57-ebc9bfbad4f6.png differ diff --git a/static/img/blogs/users/game-company/image/73ed5780-4fab-4da1-8a57-ebc9bfbad4f6.psd b/static/img/blogs/users/game-company/image/73ed5780-4fab-4da1-8a57-ebc9bfbad4f6.psd new file mode 100644 index 000000000..f83b57db2 Binary files /dev/null and b/static/img/blogs/users/game-company/image/73ed5780-4fab-4da1-8a57-ebc9bfbad4f6.psd differ diff --git a/static/img/blogs/users/game-company/image/bc2f4bc6-1c53-4fc4-b748-9484ed3939d6.png b/static/img/blogs/users/game-company/image/bc2f4bc6-1c53-4fc4-b748-9484ed3939d6.png new file mode 100644 index 000000000..5bc5c5549 Binary files /dev/null and b/static/img/blogs/users/game-company/image/bc2f4bc6-1c53-4fc4-b748-9484ed3939d6.png differ diff --git a/static/img/blogs/users/game-company/image/bc2f4bc6-1c53-4fc4-b748-9484ed3939d6.psd b/static/img/blogs/users/game-company/image/bc2f4bc6-1c53-4fc4-b748-9484ed3939d6.psd new file mode 100644 index 000000000..1496e8a4f Binary files /dev/null and b/static/img/blogs/users/game-company/image/bc2f4bc6-1c53-4fc4-b748-9484ed3939d6.psd differ diff --git a/static/img/blogs/users/game-company/image/d8cf8805-bfff-410c-bd6c-5260601c9c77.png b/static/img/blogs/users/game-company/image/d8cf8805-bfff-410c-bd6c-5260601c9c77.png new file mode 100644 index 000000000..0ad054adc Binary files /dev/null and b/static/img/blogs/users/game-company/image/d8cf8805-bfff-410c-bd6c-5260601c9c77.png differ diff --git a/static/img/blogs/users/game-company/image/d8cf8805-bfff-410c-bd6c-5260601c9c77.psd b/static/img/blogs/users/game-company/image/d8cf8805-bfff-410c-bd6c-5260601c9c77.psd new file mode 100644 index 000000000..ab172276a Binary files /dev/null and b/static/img/blogs/users/game-company/image/d8cf8805-bfff-410c-bd6c-5260601c9c77.psd differ diff --git a/static/img/blogs/users/game-company/image/ob-discord.png b/static/img/blogs/users/game-company/image/ob-discord.png new file mode 100644 index 000000000..607d40887 Binary files /dev/null and b/static/img/blogs/users/game-company/image/ob-discord.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691209757.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691209757.png new file mode 100644 index 000000000..2159073dc Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691209757.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691209757.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691209757.psd new file mode 100644 index 000000000..128f17b5f Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691209757.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691238252.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691238252.png new file mode 100644 index 000000000..9858313c8 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691238252.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691238252.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691238252.psd new file mode 100644 index 000000000..a86f3cce3 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691238252.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691284063.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691284063.png new file mode 100644 index 000000000..9e4cc2584 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691284063.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691284063.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691284063.psd new file mode 100644 index 000000000..6f28c1023 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691284063.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691295206.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691295206.png new file mode 100644 index 000000000..3326b8e0b Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691295206.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691295206.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691295206.psd new file mode 100644 index 000000000..2ee6faf22 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691295206.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691306742.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691306742.png new file mode 100644 index 000000000..5e7e2f539 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691306742.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691306742.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691306742.psd new file mode 100644 index 000000000..953ef094d Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691306742.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691313876.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691313876.png new file mode 100644 index 000000000..64086f763 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691313876.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691313876.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691313876.psd new file mode 100644 index 000000000..77893acf0 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691313876.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691368737.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691368737.png new file mode 100644 index 000000000..c0fcd9cb6 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691368737.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691368737.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691368737.psd new file mode 100644 index 000000000..dea14b1c2 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691368737.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691376501.png b/static/img/blogs/users/iFLYTEK-htap/image/1701691376501.png new file mode 100644 index 000000000..ba0c90c33 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691376501.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1701691376501.psd b/static/img/blogs/users/iFLYTEK-htap/image/1701691376501.psd new file mode 100644 index 000000000..1c9bce17d Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1701691376501.psd differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1702293646599.png b/static/img/blogs/users/iFLYTEK-htap/image/1702293646599.png new file mode 100644 index 000000000..289079e4d Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1702293646599.png differ diff --git a/static/img/blogs/users/iFLYTEK-htap/image/1702293646599.psd b/static/img/blogs/users/iFLYTEK-htap/image/1702293646599.psd new file mode 100644 index 000000000..fe066d7c9 Binary files /dev/null and b/static/img/blogs/users/iFLYTEK-htap/image/1702293646599.psd differ