Skip to content

WIP: [R] Verify CRAN release-22.0.0#47938

Closed
thisisnic wants to merge 21 commits intomainfrom
maint-22.0.0-r
Closed

WIP: [R] Verify CRAN release-22.0.0#47938
thisisnic wants to merge 21 commits intomainfrom
maint-22.0.0-r

Conversation

@thisisnic
Copy link
Member

Do not merge - just needed to check the R release

thisisnic and others added 19 commits October 8, 2025 12:57
…rted image (#47730)

### Rationale for this change

Old image fails due to debian update

### What changes are included in this PR?

Use newer image

### Are these changes tested?

Will submit crossbow run

### Are there any user-facing changes?

No
* GitHub Issue: #47705

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
### Rationale for this change

#45964 changed paths of pre-built Apache Arrow C++ binaries for R. But we forgot to update the nightly upload job.

### What changes are included in this PR?

Update paths in the nightly upload job.

### Are these changes tested?

No...

### Are there any user-facing changes?

Yes.
* GitHub Issue: #47704

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
…47743)

### Rationale for this change

Valgrind would report memory leaks induced by protobuf initialization on library load, for example:
```
==14628== 414 bytes in 16 blocks are possibly lost in loss record 22 of 26
==14628==    at 0x4914EFF: operator new(unsigned long) (vg_replace_malloc.c:487)
==14628==    by 0x8D0B6CA: void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] (in /opt/conda/envs/arrow/lib/libprotobuf.so.25.3.0)
==14628==    by 0x8D33E62: google::protobuf::DescriptorPool::Tables::Tables() (in /opt/conda/envs/arrow/lib/libprotobuf.so.25.3.0)
==14628==    by 0x8D340E2: google::protobuf::DescriptorPool::DescriptorPool(google::protobuf::DescriptorDatabase*, google::protobuf::DescriptorPool::ErrorCollector*) (in /opt/conda/envs/arrow/lib/libprotobuf.so.25.3.0)
==14628==    by 0x8D341A2: google::protobuf::DescriptorPool::internal_generated_pool() (in /opt/conda/envs/arrow/lib/libprotobuf.so.25.3.0)
==14628==    by 0x8D34277: google::protobuf::DescriptorPool::InternalAddGeneratedFile(void const*, int) (in /opt/conda/envs/arrow/lib/libprotobuf.so.25.3.0)
==14628==    by 0x8D9C56F: google::protobuf::internal::AddDescriptorsRunner::AddDescriptorsRunner(google::protobuf::internal::DescriptorTable const*) (in /opt/conda/envs/arrow/lib/libprotobuf.so.25.3.0)
==14628==    by 0x40D147D: call_init.part.0 (dl-init.c:70)
==14628==    by 0x40D1567: call_init (dl-init.c:33)
==14628==    by 0x40D1567: _dl_init (dl-init.c:117)
==14628==    by 0x40EB2C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
```

This was triggered by the `libprotobuf` upgrade on conda-forge from 3.21.12 to 4.25.3.

### What changes are included in this PR?

Add a Valgrind suppression for these leak reports, as there is probably not much we can do about them.

### Are these changes tested?

Yes, by existing CI test.

### Are there any user-facing changes?

No.

* GitHub Issue: #47742

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…Parquet data (#47741)

### Rationale for this change

Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader:
* https://issues.oss-fuzz.com/issues/447262173
* https://issues.oss-fuzz.com/issues/447480433
* https://issues.oss-fuzz.com/issues/447490896
* https://issues.oss-fuzz.com/issues/447693724
* https://issues.oss-fuzz.com/issues/447693728
* https://issues.oss-fuzz.com/issues/449498800

### Are these changes tested?

Yes, using the updated fuzz regression files from apache/arrow-testing#115

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

* GitHub Issue: #47740

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change
Mimalloc default generates LSE atomic instructions only work on armv8.1. This causes illegal instruction on armv8.0 platforms like Raspberry4. This PR sets mimalloc build flag -DMI_NO_OPT_ARCH=ON to disable LSE instruction.
Please note even with flag set, compiler and libc will replace the atmoic call with an ifunc that matches hardware best at runtime. That means LSE is used only if the running platform supports it.

### What changes are included in this PR?
Force mimalloc build flag -DMI_NO_OPT_ARCH=ON.

### Are these changes tested?
Manually tested.

### Are there any user-facing changes?
No.

**This PR contains a "Critical Fix".**
Fixes crashes on Armv8.0 platform.
* GitHub Issue: #47229

Lead-authored-by: Yibo Cai <cyb70289@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change

According to microsoft/mimalloc#1073 , mimalloc v3 is preferred over v2 for production usage.

There are reports of higher than expected memory consumption with mimalloc 2.2.x, notably when reading Parquet data (example: GH-47266).

### What changes are included in this PR?

Bump to mimalloc 3.1.5, which is the latest mimalloc 3.1.x release as of this writing.

### Are these changes tested?

Yes, by existing tests and CI.

### Are there any user-facing changes?

Hopefully not, besides a potential reduction in memory usage due to improvements in mimalloc v3.

* GitHub Issue: #47588

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change

There are link errors with build options for JNI on macOS.

### What changes are included in this PR?

`ARROW_BUNDLED_STATIC_LIBS` has CMake target names defined in Apache Arrow not `find_package()`-ed target names. So we should use `aws-c-common` not `AWS::aws-c-common`.

Recent aws-c-common or something use the Network framework. So add `Network` to `Arrow::arrow_bundled_dependencies` dependencies.

Don't use `compute/kernels/temporal_internal.cc` in `libarrow.dylib` and `libarrow_compute.dylib` to avoid duplicated symbols error.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #47748

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
### Rationale for this change

This is for preventing to break Apache Arrow Java JNI use case on Linux.

### What changes are included in this PR?

* Add a CI job that uses build options for JNI use case
* Install more packages in manylinux image that is also used by JNI build 

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47632

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

`archery docker push` doesn't support custom Docker registry such as ghcr.io.

### What changes are included in this PR?

Parse Docker image tag and specify Docker registry name to `docker push` if it's specified in the tag. 

Docker image tag format: `[HOST[:PORT]/]NAMESPACE/REPOSITORY[:TAG]`

See also: https://docs.docker.com/reference/cli/docker/image/tag/#description

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47795

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…47616)

### Rationale for this change

Python 3.14 is currently in a prerelease status and is expected to have a final release in October this year (https://peps.python.org/pep-0745/).

We should ensure we are fully ready to support Python 3.14 for the PyArrow 22 release.

### What changes are included in this PR?

This PR  updates wheels for Python 3.14.

### Are these changes tested?

Tested in the CI and with extended builds.

### Are there any user-facing changes?

No, but users will be able to use PyArrow with Python 3.14.

* GitHub Issue: #47438

---

Todo:

- Update the image revision name in `.env`
- Add 3.14 conda build ([arrow/dev/tasks/tasks.yml](https://github.com/apache/arrow/blob/d803afcc43f5d132506318fd9e162d33b2c3d4cd/dev/tasks/tasks.yml#L809)) when conda-forge/pyarrow-feedstock#156 is merged 

Follow-ups:

- #47437

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: AlenkaF <frim.alenka@gmail.com>
…47804)

Found by OSS-Fuzz, should fix https://issues.oss-fuzz.com/issues/451150486.

Ensure RLE run is within bounds before reading it.

Yes, by fuzz regression test in ASAN/UBSAN build.

No.

**This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

* GitHub Issue: #47803

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change

Summarise changes for release

### What changes are included in this PR?

Update NEWS file

### Are these changes tested?

No

### Are there any user-facing changes?

No
* GitHub Issue: #47738

Authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…l patch from conda (#47810)

### Rationale for this change

Our verify-rc-source Windows job is failing due to patch not being available for Windows.

### What changes are included in this PR?

Move patch requirement from `conda_env_cpp.txt` to `conda_env_unix.txt`

### Are these changes tested?

Yes via CI and archery.

### Are there any user-facing changes?

No

* GitHub Issue: #47809

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
… release branch push (#47826)

### Rationale for this change

We require the Linux package jobs to be triggered on RC tag creation. For example for 22.0.0, we currently push the tag `apache-arrow-22.0.0-rc0` and the release branch `release-22.0.0-rc0`. Those events are triggering builds over the same commit and the tag event gets cancelled due to a "high priority task" triggering the same jobs. This causes jobs to fail on the branch because the ARROW_VERSION is not generated. If we manually re-trigger the jobs on the tag they are successful.

### What changes are included in this PR?

Remove the `release-*` branches from triggering the event to allow only the tag to run the jobs so they don't get cancelled.

### Are these changes tested?

No

### Are there any user-facing changes?

No

* GitHub Issue: #47819

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…ign with the variant spec (#47835)

### Rationale for this change
According to the [Variant specification](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md), the specification_version field must be set to 1 to indicate Variant encoding version 1. Currently, this field defaults to 0, which violates the specification. Parquet readers that strictly enforce specification version validation will fail to read files containing Variant types.
<img width="624" height="185" alt="image" src="https://github.com/user-attachments/assets/b0f1deb9-0301-4b94-a472-17fd9cc0df5d" />

### What changes are included in this PR?
The change includes defaulting the specification version to 1.
### Are these changes tested?
The change is covered by unit test.
### Are there any user-facing changes?
The Parquet files produced the variant logical type annotation `VARIANT(1)`.

```
Schema:
message schema {
  optional group V (VARIANT(1)) = 1 {
    required binary metadata;
    required binary value;
  }
}
```

* GitHub Issue: #47838

Lead-authored-by: Aihua <aihua.xu@snowflake.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@thisisnic
Copy link
Member Author

@github-actions crossbow submit --group r

@github-actions
Copy link

Revision: 2423882

Submitted crossbow builds: ursacomputing/crossbow @ actions-f3d12f3b41

Task Status
r-binary-packages GitHub Actions
r-recheck-most GitHub Actions
test-r-arrow-backwards-compatibility GitHub Actions
test-r-depsource-bundled Azure
test-r-depsource-system GitHub Actions
test-r-dev-duckdb GitHub Actions
test-r-devdocs GitHub Actions
test-r-extra-packages GitHub Actions
test-r-gcc-11 GitHub Actions
test-r-gcc-12 GitHub Actions
test-r-install-local GitHub Actions
test-r-install-local-minsizerel GitHub Actions
test-r-linux-as-cran GitHub Actions
test-r-linux-rchk GitHub Actions
test-r-linux-sanitizers GitHub Actions
test-r-linux-valgrind GitHub Actions
test-r-m1-san GitHub Actions
test-r-macos-as-cran GitHub Actions
test-r-minimal-build Azure
test-r-offline-maximal GitHub Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-ubuntu-gcc12-custom-ccache Azure
test-r-rhub-ubuntu-release-latest Azure
test-r-rocker-r-ver-latest Azure
test-r-rstudio-r-base-4.1-focal Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 GitHub Actions
test-r-versions GitHub Actions

@thisisnic
Copy link
Member Author

@github-actions crossbow submit test-r-linux-as-cran

@thisisnic
Copy link
Member Author

thisisnic commented Oct 25, 2025

r-rcheck-most is a space issue: https://github.com/ursacomputing/crossbow/actions/runs/18805117680/job/53658341467#step:4:343

test-r-linux-as-cran looks like a bad download, hence retriggering it

@github-actions
Copy link

Revision: bacd0e3

Submitted crossbow builds: ursacomputing/crossbow @ actions-160b1f7f96

Task Status
test-r-linux-as-cran GitHub Actions

### Rationale for this change

`r/tools/nixlibs.R` assumes that `arrow.repo` ends with `/` but `dev/tasks/macros.jinja` uses `arrow.repo` without the end `/`.

### What changes are included in this PR?

Add missing the end `/` and `/libarrow` sub directory.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #47821

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@thisisnic
Copy link
Member Author

I cherry-picked the CI fix in so we can test here and make sure there are no other issues with the binaries

@thisisnic
Copy link
Member Author

@github-actions crossbow submit r-binary-packages

@github-actions
Copy link

Revision: ba57c36

Submitted crossbow builds: ursacomputing/crossbow @ actions-7b9ce5176d

Task Status
r-binary-packages GitHub Actions

@thisisnic
Copy link
Member Author

OK, given all the failures are explainable, this looks g2g!

@thisisnic thisisnic closed this Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants