Skip to content

Commit 9c7fd1c

Browse files
committed
Epic: rename old names for ASF brand Compliance
This commit renames "Cloudberry Database" to "Apache Cloudberry" in the documentation and ensures clarity across references.
1 parent 25e285f commit 9c7fd1c

File tree

328 files changed

+1915
-1915
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

328 files changed

+1915
-1915
lines changed

docs/advanced-analytics/directory-tables.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ title: Directory Table
44

55
# Directory Table
66

7-
Cloudberry Database has introduced *directory tables* in v1.5.3 for unified management of unstructured data on local or object storage.
7+
Apache Cloudberry has introduced *directory tables* in v1.5.3 for unified management of unstructured data on local or object storage.
88

99
In the context of large-scale AI, AI applications have generated the need to manage unstructured multi-modal corpora. There is a need to continuously prepare a large amount of high-quality curated unstructured corpora, train large models through data iteration, and summarize rich knowledge bases. Therefore, there are technical challenges in the management and processing of structured corpora.
1010

11-
To address these challenges, Cloudberry Database introduces directory tables for managing multiple types of unstructured data. Developer users can use simple SQL statements to take advantage of the capabilities of multiple computing engines to achieve one-stop data processing and application development.
11+
To address these challenges, Apache Cloudberry introduces directory tables for managing multiple types of unstructured data. Developer users can use simple SQL statements to take advantage of the capabilities of multiple computing engines to achieve one-stop data processing and application development.
1212

1313
Directory tables store, manage, and analyze unstructured data objects. They reside within tablespaces. When unstructured data files are imported, a directory table record (file metadata) is created, and the file itself is loaded into object storage. The table metadata remains associated with the corresponding object storage file.
1414

@@ -40,7 +40,7 @@ CREATE DIRECTORY TABLE <table_name>;
4040

4141
To create a directory table in an external storage, you first need to create a tablespace in that storage. You'll need to provide connection information of the external storage server, such as server IP address, protocol, and access credentials. The following examples show how to create directory tables on QingCloud Object Storage and HDFS.
4242

43-
1. Create server objects and define connection methods for external data sources. Cloudberry Database supports protocols for multiple storage options, including S3 object storage and HDFS. The following examples create server objects named `oss_server` and `hdfs_server` on QingCloud and HDFS, respectively.
43+
1. Create server objects and define connection methods for external data sources. Apache Cloudberry supports protocols for multiple storage options, including S3 object storage and HDFS. The following examples create server objects named `oss_server` and `hdfs_server` on QingCloud and HDFS, respectively.
4444

4545
- For QingCloud:
4646

@@ -58,7 +58,7 @@ To create a directory table in an external storage, you first need to create a t
5858

5959
- `protocol`: The protocol used to connect to the external data source. In the examples above, `'qingstor'` indicates using the QingCloud object storage service protocol, and `'hdfs'` indicates using the HDFS storage service protocol.
6060
- `prefix`: Sets the path prefix when accessing object storage. If this prefix is set, all operations will be limited to this specific path, such as `prefix '/rose-oss-test4/usf1'`. This is typically used to organize and isolate data stored in the same bucket.
61-
- `endpoint`: Specifies the network address of the external object storage service. For example, `'pek3b.qingstor.com'` is a specific regional node of the QingCloud service. Through this endpoint, Cloudberry Database can access external data.
61+
- `endpoint`: Specifies the network address of the external object storage service. For example, `'pek3b.qingstor.com'` is a specific regional node of the QingCloud service. Through this endpoint, Apache Cloudberry can access external data.
6262
- `https`: Specifies whether to connect to the object storage service using the HTTPS protocol. In this command, `'false'` indicates using an unencrypted HTTP connection. This setting might be influenced by data transmission security requirements, and it is generally recommended to use HTTPS to ensure data security.
6363
- `virtual_host`: Determines whether to access the bucket using virtual hosting. `'false'` means that bucket access is not done in virtual host style (which means that the bucket name is not included in the URL). This option is typically dependent on the URL format support provided by the storage service provider.
6464
- `namenode`: Represents the IP of the HDFS node. You need to replace `<HDFS node IP:port>` with the actual IP address and port number, such as `'192.168.51.106:8020'`.
@@ -134,7 +134,7 @@ In general, the fields of a directory table are as follows:
134134

135135
### Upload file into directory table
136136

137-
After uploading a file to a directory table, Cloudberry Database manages the file's upload to local storage or object storage and stores the file's metadata in the directory table. In Cloudberry Database v1.5.3, users cannot directly manage object storage directory files.
137+
After uploading a file to a directory table, Apache Cloudberry manages the file's upload to local storage or object storage and stores the file's metadata in the directory table. In Apache Cloudberry v1.5.3, users cannot directly manage object storage directory files.
138138

139139
Upload files from local storage to database object storage:
140140

docs/advanced-analytics/postgis.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,17 @@ title: Geospatial Analytics
44

55
# Geospatial Analytics
66

7-
[PostGIS](https://postgis.net/) extends the capabilities of the PostgreSQL by adding support for storing, indexing, and querying geospatial data. Cloudberry Database supports PostGIS for geospatial analytics.
7+
[PostGIS](https://postgis.net/) extends the capabilities of the PostgreSQL by adding support for storing, indexing, and querying geospatial data. Apache Cloudberry supports PostGIS for geospatial analytics.
88

9-
This document introduces how to compile and build PostGIS for your Cloudberry Database cluster.
9+
This document introduces how to compile and build PostGIS for your Apache Cloudberry cluster.
1010

11-
You can access the Cloudberry Database PostGIS project repo at [`cloudberrydb/postgis`](https://github.com/cloudberrydb/postgis). The PostGIS code in this repo is dedicated to Cloudberry Database. The compilation and building method introduced in this document is based on the code of this repo.
11+
You can access the PostGIS for Apache Cloudberry project repo at [`cloudberry-contrib/postgis`](https://github.com/cloudberry-contrib/postgis). The PostGIS code in this repo is dedicated to Apache Cloudberry. The compilation and building method introduced in this document is based on the code of this repo.
1212

13-
## Compile PostGIS for Cloudberry Database
13+
## Compile PostGIS for Apache Cloudberry
1414

15-
Before installing PostGIS for Cloudberry Database, install the required dependencies and compile several components. This process is currently supported only on CentOS, with plans to support Rocky Linux in the future.
15+
Before installing PostGIS for Apache Cloudberry, install the required dependencies and compile several components. This process is currently supported only on CentOS, with plans to support Rocky Linux in the future.
1616

17-
Before you get started, ensure that the Cloudberry Database is correctly installed on your machine. If it is not installed, see the [documentation](https://cloudberrydb.org/docs/) for installation instructions.
17+
Before you get started, ensure that the Apache Cloudberry is correctly installed on your machine. If it is not installed, see the [documentation](https://cloudberry.apache.org/docs/) for installation instructions.
1818

1919
1. Install the pre-requested dependencies.
2020

@@ -93,10 +93,10 @@ Before you get started, ensure that the Cloudberry Database is correctly install
9393
9494
3. Build and install PostGIS.
9595
96-
1. Download the `cloudberrydb/postgis` repo to your `/home/gpadmin` directory:
96+
1. Download the `cloudberry-contrib/postgis` repo to your `/home/gpadmin` directory:
9797
9898
```bash
99-
git clone https://github.com/cloudberrydb/postgis.git /home/gpadmin/postgis
99+
git clone https://github.com/cloudberry-contrib/postgis.git /home/gpadmin/postgis
100100
chown -R gpadmin:gpadmin /home/gpadmin/postgis
101101
```
102102
@@ -105,8 +105,8 @@ Before you get started, ensure that the Cloudberry Database is correctly install
105105
Before starting the compilation process, run the following commands to make sure the environment variables are set ready:
106106
107107
```bash
108-
source /usr/local/cloudberrydb/greenplum_path.sh
109-
source /home/gpadmin/cloudberrydb/gpAux/gpdemo/gpdemo-env.sh
108+
source /usr/local/cloudberry/greenplum_path.sh
109+
source /home/gpadmin/cloudberry/gpAux/gpdemo/gpdemo-env.sh
110110
scl enable devtoolset-10 bash
111111
source /opt/rh/devtoolset-10/enable
112112
```
@@ -120,9 +120,9 @@ Before you get started, ensure that the Cloudberry Database is correctly install
120120
make && make install
121121
```
122122
123-
## Use PostGIS in Cloudberry Database
123+
## Use PostGIS in Apache Cloudberry
124124
125-
After you have compiled and built PostGIS and the supporting extensions successfully on your Cloudberry Database cluster and have started the demo cluster, you can run the following commands to enable PostGIS and the supporting extensions:
125+
After you have compiled and built PostGIS and the supporting extensions successfully on your Apache Cloudberry cluster and have started the demo cluster, you can run the following commands to enable PostGIS and the supporting extensions:
126126
127127
```sql
128128
$ psql -p 7000 postgres

docs/basic-query-syntax.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22
title: Basic Query Syntax
33
---
44

5-
# Basic Queries of Cloudberry Database
5+
# Basic Queries of Apache Cloudberry
66

7-
This document introduce the basic queries of Cloudberry Database.
7+
This document introduce the basic queries of Apache Cloudberry.
88

9-
Cloudberry Database is a high-performance, highly parallel data warehouse developed based on PostgreSQL and Greenplum. Here are some examples of the basic query syntax.
9+
Apache Cloudberry is a high-performance, highly parallel data warehouse developed based on PostgreSQL and Greenplum. Here are some examples of the basic query syntax.
1010

1111
- `SELECT`: Used to retrieve data from databases & tables.
1212

@@ -59,7 +59,7 @@ Cloudberry Database is a high-performance, highly parallel data warehouse develo
5959
WHERE department_id IN (SELECT id FROM departments WHERE location = 'New York'); -- Queries all employees working in New York.
6060
```
6161

62-
The above is just a brief overview of the basic query syntax in Cloudberry Database. Cloudberry Database also provides more advanced queries and functions to help developers perform complex data operations and analyses.
62+
The above is just a brief overview of the basic query syntax in Apache Cloudberry. Apache Cloudberry also provides more advanced queries and functions to help developers perform complex data operations and analyses.
6363

6464
## See also
6565

docs/cbdb-architecture.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,37 @@
22
title: Architecture
33
---
44

5-
# Cloudberry Database Architecture
5+
# Apache Cloudberry Architecture
66

7-
This document introduces the product architecture and the implementation mechanism of the internal modules in Cloudberry Database.
7+
This document introduces the product architecture and the implementation mechanism of the internal modules in Apache Cloudberry.
88

9-
In most cases, Cloudberry Database is similar to PostgreSQL in terms of SQL support, features, configuration options, and user functionalities. Users can interact with Cloudberry Database in a similar way to how they interact with a standalone PostgreSQL system.
9+
In most cases, Apache Cloudberry is similar to PostgreSQL in terms of SQL support, features, configuration options, and user functionalities. Users can interact with Apache Cloudberry in a similar way to how they interact with a standalone PostgreSQL system.
1010

11-
Cloudberry Database uses MPP (Massively Parallel Processing) architecture to store and process large volumes of data, by distributing data and computing workloads across multiple servers or hosts.
11+
Apache Cloudberry uses MPP (Massively Parallel Processing) architecture to store and process large volumes of data, by distributing data and computing workloads across multiple servers or hosts.
1212

13-
MPP, known as the shared-nothing architecture, refers to systems with multiple hosts that work together to perform a task. Each host has its own processor, memory, disk, network resources, and operating system. Cloudberry Database uses this high-performance architecture to distribute data loads and can use all system resources in parallel to process queries.
13+
MPP, known as the shared-nothing architecture, refers to systems with multiple hosts that work together to perform a task. Each host has its own processor, memory, disk, network resources, and operating system. Apache Cloudberry uses this high-performance architecture to distribute data loads and can use all system resources in parallel to process queries.
1414

15-
From users' view, Cloudberry Database is a complete relational database management system (RDBMS). In a physical view, it contains multiple PostgreSQL instances. To make these independent PostgreSQL instances work together, Cloudberry Database performs distributed cluster processing at different levels for data storage, computing, communication, and management. Cloudberry Database hides the complex details of the distributed system, giving users a single logical database view. This greatly eases the work of developers and operational staff.
15+
From users' view, Apache Cloudberry is a complete relational database management system (RDBMS). In a physical view, it contains multiple PostgreSQL instances. To make these independent PostgreSQL instances work together, Apache Cloudberry performs distributed cluster processing at different levels for data storage, computing, communication, and management. Apache Cloudberry hides the complex details of the distributed system, giving users a single logical database view. This greatly eases the work of developers and operational staff.
1616

17-
The architecture diagram of Cloudberry Database is as follows:
17+
The architecture diagram of Apache Cloudberry is as follows:
1818

19-
![Cloudberry Database Architecture](./media/cbdb-arch.png)
19+
![Apache Cloudberry Architecture](./media/cbdb-arch.png)
2020

21-
- **Coordinator node** (or control node) is the gateway to the Cloudberry Database system, which accepts client connections and SQL queries, and allocates tasks to data node instances. Users interact with Cloudberry Database by connecting to the coordinator node using a client program (such as psql) or an application programming interface (API) (such as JDBC, ODBC, or libpq PostgreSQL C API).
22-
- The coordinator node acts as the global system directory, containing a set of system tables that record the metadata of Cloudberry Database.
21+
- **Coordinator node** (or control node) is the gateway to the Apache Cloudberry system, which accepts client connections and SQL queries, and allocates tasks to data node instances. Users interact with Apache Cloudberry by connecting to the coordinator node using a client program (such as psql) or an application programming interface (API) (such as JDBC, ODBC, or libpq PostgreSQL C API).
22+
- The coordinator node acts as the global system directory, containing a set of system tables that record the metadata of Apache Cloudberry.
2323
- The coordinator node does not store user data. User data is stored only in data node instances.
2424
- The coordinator node performs authentication for client connections, processes SQL commands, distributes workload among segments, coordinates the results returned by each segment, and returns the final results to the client program.
25-
- Cloudberry Database uses Write Ahead Logging (WAL) for coordinator/standby mirroring. In WAL-based logging, all modifications are first written to a log before being written to the disk, which ensures the data integrity of in-process operations.
25+
- Apache Cloudberry uses Write Ahead Logging (WAL) for coordinator/standby mirroring. In WAL-based logging, all modifications are first written to a log before being written to the disk, which ensures the data integrity of in-process operations.
2626

2727
- **Segment** (or data node) instances are individual Postgres processes, each storing a portion of the data and executing the corresponding part of the query. When a user connects to the database through the coordinator node and submits a query request, a process is created on each segment node to handle the query. User-defined tables and their indexes are distributed across the available segments, and each segment node contains distinct portions of the data. The processes of data processing runs in the corresponding segment. Users interact with segments through the coordinator, and the segment operate on servers known as the segment host.
2828

29-
Typically, a segment host runs 2 to 8 data nodes, depending on the processor, memory, storage, network interface, and workload. The configuration of the segment host needs to be balanced, because evenly distributing the data and workload among segments is the key to achieving optimal performance with Cloudberry Database, which allows all segments to start processing a task and finish the work at the same time.
29+
Typically, a segment host runs 2 to 8 data nodes, depending on the processor, memory, storage, network interface, and workload. The configuration of the segment host needs to be balanced, because evenly distributing the data and workload among segments is the key to achieving optimal performance with Apache Cloudberry, which allows all segments to start processing a task and finish the work at the same time.
3030

31-
- **Interconnect** is the network layer in the Cloudberry Database system architecture. Interconnect refers to the network infrastructure upon which the communication between the coordinator node and the segments relies, which uses a standard Ethernet switching structure.
31+
- **Interconnect** is the network layer in the Apache Cloudberry system architecture. Interconnect refers to the network infrastructure upon which the communication between the coordinator node and the segments relies, which uses a standard Ethernet switching structure.
3232

33-
For performance reasons, a 10 GB or faster network is recommended. By default, the Interconnect module uses the UDP protocol with flow control (UDPIFC) for communication to send messages through the network. The data packet verification performed by Cloudberry Database exceeds the scope provided by UDP, which means that its reliability is equivalent to using the TCP protocol, and its performance and scalability surpass the TCP protocol. If the Interconnect is changed to the TCP protocol instead, the scalability of Cloudberry Database is limited to 1000 segments. This limit does not apply when UDPIFC is used as the default protocol.
33+
For performance reasons, a 10 GB or faster network is recommended. By default, the Interconnect module uses the UDP protocol with flow control (UDPIFC) for communication to send messages through the network. The data packet verification performed by Apache Cloudberry exceeds the scope provided by UDP, which means that its reliability is equivalent to using the TCP protocol, and its performance and scalability surpass the TCP protocol. If the Interconnect is changed to the TCP protocol instead, the scalability of Apache Cloudberry is limited to 1000 segments. This limit does not apply when UDPIFC is used as the default protocol.
3434

35-
- Cloudberry Database uses Multiversion Concurrency Control (MVCC) to ensure data consistency. When querying the database, each transaction only sees a snapshot of the data, ensuring that current transactions do not see modifications made by other transactions on the same records. In this way, MVCC provides transaction isolation in the database.
35+
- Apache Cloudberry uses Multiversion Concurrency Control (MVCC) to ensure data consistency. When querying the database, each transaction only sees a snapshot of the data, ensuring that current transactions do not see modifications made by other transactions on the same records. In this way, MVCC provides transaction isolation in the database.
3636

3737
MVCC minimizes lock contention to ensure performance in a multi-user environment. This is done by avoiding explicit locking for database transactions.
3838

0 commit comments

Comments
 (0)