[Bug] Insert data into the cloudberry via the copy method concurrently using Spark, "could not find segment file to use" errors may occur random when the data volume is extremely large

### Apache Cloudberry version

apache-cloudberry-2.0.0-incubating

### What happened

The database is configured with 1 coordinator node and 24 segment nodes, with no standby nodes or mirror nodes deployed. It uses NVMe hard drives for storage, and both the limits.conf and sysctl.conf files have been modified in accordance with the documentation requirements.
Now import paper data to db,one paper has multiple authors and multiple references. Specifically, one paper includes 10 authors and 50 references,with 10 million papers, this amounts to 100 million authors and 500 million references. The database consists of three tables: a basic paper information table, an author table, and a reference table. Each table contains 30 fields,include varchar,text,int,text[] format, all using append optimized,column orientation and zstd compression.
Data import is performed using Spark. After multiple tests, the basic paper data and author data can be imported successfully; however, random errors occur exclusively in the reference table. 

<img width="1843" height="564" alt="Image" src="https://github.com/user-attachments/assets/4e3ea629-7d9d-4bd6-8344-8e044a0b04c5" />
spark error like this

<img width="1829" height="509" alt="Image" src="https://github.com/user-attachments/assets/57b9f9b2-4255-4e52-87ac-6fc3e5a41735" />
database error like this

### What you think should happen instead

_No response_

### How to reproduce

copy function is
def apply(df: DataFrame, pgUrl: String, tableName: String, connectionProperties: Properties): Unit = {
    df.rdd.foreachPartition { iter =>
      val conn = DriverManager.getConnection(pgUrl, connectionProperties)
      val copyManager = new CopyManager(conn.asInstanceOf[BaseConnection])
      val sql = s"COPY $tableName FROM STDIN WITH (FORMAT csv, NULL '\\N')"
      val sb = new StringBuilder
      iter.foreach { row: Row =>
        sb.append(rowToCsv(row) + "\n")
      }
      val reader = new StringReader(sb.toString)
      copyManager.copyIn(sql, reader)
      conn.close()
    }
  }
use this function import 1 billion data concurrently

### Operating System

Rocky 9.7

### Anything else

A temporary solution is to submit the data in multiple batches, which allows all data to be fully imported

### Are you willing to submit PR?

- [x] Yes, I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/cloudberry/blob/main/CODE_OF_CONDUCT.md).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Insert data into the cloudberry via the copy method concurrently using Spark, "could not find segment file to use" errors may occur random when the data volume is extremely large #1494

Apache Cloudberry version

What happened

What you think should happen instead

How to reproduce

Operating System

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Insert data into the cloudberry via the copy method concurrently using Spark, "could not find segment file to use" errors may occur random when the data volume is extremely large #1494

Description

Apache Cloudberry version

What happened

What you think should happen instead

How to reproduce

Operating System

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions