Skip to content

Commit ed37b14

Browse files
authored
apacheGH-49422: [CI][Integration][Ruby] Add the Ruby implementation (apache#49423)
### Rationale for this change There are some missing features in the Ruby implmentaion for now but we can pass them by skipping some tests in our integration tests. ### What changes are included in this PR? Archery: * Add `--with-ruby` to `archery integration` * Add `archery.integration.tester_ruby.RubyTester` * Add `no_map_field_names_validate` quirk for apacheGH-49415 * Show environment variables too on external command failure because Ruby tester uses environment variables not command line arguments to pass information to integration tester * Use `ARCHERY_INTEGRATION_WITH_CPP=1` instead of `ARROW_INTEGRATION_CPP=ON` like other implementations such as `ARCHERY_INTEGRATION_WITH_GO` Ruby: * Add `red-arrow-format-integration-test` as the test driver * This is not included in `.gem` because this is only for development * Add `ruby/red-arrow-format/lib/arrow-format/integration/` as helpers of the test driver * This is not included in `.gem` because this is only for development * Add `ArrowFormat::Array#empty?` * Add `ArrowFormat::RecordBatch#empty?` * Add `ArrowFormat::NullArray#n_nulls` * `ArrowFormat::*Array#to_a`: Add support for empty case * Fix Apache Arrow decimal <-> `BigDecimal` conversion * `ArrowFormat::Bitmap#each`: Fix a bug that one bit is ignored * Move dictionary ID to `ArrowFormat::DictionaryType` from `ArrowFormat::Field` * Add support for V4 union that has validity bitmap * Add support for no continuation token message for backward compatibility * `ArrowFormat::StreamingReader`: Add support for reading schema without calling `#each` * `ArrowFormat::MapType`: Add support for keys sorted * `ArrowFormat::MapType`: Always use "key"/"value"/"entries" for field names ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: apache#49422 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
1 parent 8c2d93c commit ed37b14

28 files changed

+1612
-127
lines changed

.github/workflows/integration.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ on:
3333
- 'integration/**'
3434
- 'cpp/**'
3535
- 'format/**'
36+
- 'ruby/red-arrow-format/**'
3637
pull_request:
3738
paths:
3839
- '.dockerignore'
@@ -43,6 +44,7 @@ on:
4344
- 'integration/**'
4445
- 'cpp/**'
4546
- 'format/**'
47+
- 'ruby/red-arrow-format/**'
4648

4749
concurrency:
4850
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}

ci/docker/conda-integration.dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ RUN mamba install -q -y \
4242
nodejs=${node} \
4343
yarn=${yarn} \
4444
openjdk=${jdk} \
45+
ruby \
4546
zstd && \
4647
mamba clean --yes --all --force-pkgs-dirs
4748

ci/scripts/integration_arrow.sh

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,14 @@ build_dir=${2}
2424

2525
gold_dir=$arrow_dir/testing/data/arrow-ipc-stream/integration
2626

27+
# For backward compatibility.
2728
: "${ARROW_INTEGRATION_CPP:=ON}"
29+
: "${ARCHERY_INTEGRATION_WITH_CPP:=$([ "${ARROW_INTEGRATION_CPP}" = "ON" ] && echo "1" || echo "0")}"
30+
export ARCHERY_INTEGRATION_WITH_CPP
31+
: "${ARCHERY_INTEGRATION_WITH_RUBY:=1}"
32+
export ARCHERY_INTEGRATION_WITH_RUBY
2833

29-
: "${ARCHERY_INTEGRATION_TARGET_IMPLEMENTATIONS:=cpp}"
34+
: "${ARCHERY_INTEGRATION_TARGET_IMPLEMENTATIONS:=cpp,ruby}"
3035
export ARCHERY_INTEGRATION_TARGET_IMPLEMENTATIONS
3136

3237
. "${arrow_dir}/ci/scripts/util_log.sh"
@@ -57,14 +62,11 @@ export PYTHONFAULTHANDLER=1
5762
export GOMEMLIMIT=200MiB
5863
export GODEBUG=gctrace=1,clobberfree=1
5964

60-
ARCHERY_WITH_CPP=$([ "$ARROW_INTEGRATION_CPP" == "ON" ] && echo "1" || echo "0")
61-
6265
# Rust can be enabled by exporting ARCHERY_INTEGRATION_WITH_RUST=1
6366
time archery integration \
6467
--run-c-data \
6568
--run-ipc \
6669
--run-flight \
67-
--with-cpp="${ARCHERY_WITH_CPP}" \
6870
--gold-dirs="$gold_dir/0.14.1" \
6971
--gold-dirs="$gold_dir/0.17.1" \
7072
--gold-dirs="$gold_dir/1.0.0-bigendian" \

ci/scripts/integration_arrow_build.sh

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,10 @@ set -e
2222
arrow_dir=${1}
2323
build_dir=${2}
2424

25+
# For backward compatibility.
2526
: "${ARROW_INTEGRATION_CPP:=ON}"
27+
: "${ARCHERY_INTEGRATION_WITH_CPP:=$([ "${ARROW_INTEGRATION_CPP}" = "ON" ] && echo "1" || echo "0")}"
28+
: "${ARCHERY_INTEGRATION_WITH_RUBY:=1}"
2629

2730
. "${arrow_dir}/ci/scripts/util_log.sh"
2831

@@ -41,7 +44,7 @@ fi
4144
github_actions_group_end
4245

4346
github_actions_group_begin "Integration: Build: C++"
44-
if [ "${ARROW_INTEGRATION_CPP}" == "ON" ]; then
47+
if [ "${ARCHERY_INTEGRATION_WITH_CPP}" -gt "0" ]; then
4548
"${arrow_dir}/ci/scripts/cpp_build.sh" "${arrow_dir}" "${build_dir}"
4649
fi
4750
github_actions_group_end
@@ -69,3 +72,9 @@ if [ "${ARCHERY_INTEGRATION_WITH_JS}" -gt "0" ]; then
6972
cp -a "${arrow_dir}/js" "${build_dir}/js"
7073
fi
7174
github_actions_group_end
75+
76+
github_actions_group_begin "Integration: Build: Ruby"
77+
if [ "${ARCHERY_INTEGRATION_WITH_RUBY}" -gt "0" ]; then
78+
rake -C "${arrow_dir}/ruby/red-arrow-format" install
79+
fi
80+
github_actions_group_end

dev/archery/archery/cli.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -667,7 +667,8 @@ def _set_default(opt, default):
667667
@click.option('--random-seed', type=int, default=12345,
668668
help="Seed for PRNG when generating test data")
669669
@click.option('--with-cpp', type=bool, default=False,
670-
help='Include C++ in integration tests')
670+
help='Include C++ in integration tests',
671+
envvar="ARCHERY_INTEGRATION_WITH_CPP")
671672
@click.option('--with-dotnet', type=bool, default=False,
672673
help='Include .NET in integration tests',
673674
envvar="ARCHERY_INTEGRATION_WITH_DOTNET")
@@ -683,6 +684,9 @@ def _set_default(opt, default):
683684
@click.option('--with-nanoarrow', type=bool, default=False,
684685
help='Include nanoarrow in integration tests',
685686
envvar="ARCHERY_INTEGRATION_WITH_NANOARROW")
687+
@click.option('--with-ruby', type=bool, default=False,
688+
help='Include Ruby in integration tests',
689+
envvar="ARCHERY_INTEGRATION_WITH_RUBY")
686690
@click.option('--with-rust', type=bool, default=False,
687691
help='Include Rust in integration tests',
688692
envvar="ARCHERY_INTEGRATION_WITH_RUST")

dev/archery/archery/integration/datagen.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1937,13 +1937,15 @@ def get_generated_json_files(tempdir=None):
19371937
.skip_tester('Java')
19381938
.skip_tester('JS')
19391939
.skip_tester('nanoarrow')
1940+
.skip_tester('Ruby')
19401941
.skip_tester('Rust')
19411942
.skip_tester('Go'),
19421943

19431944
generate_decimal64_case()
19441945
.skip_tester('Java')
19451946
.skip_tester('JS')
19461947
.skip_tester('nanoarrow')
1948+
.skip_tester('Ruby')
19471949
.skip_tester('Rust')
19481950
.skip_tester('Go'),
19491951

@@ -1993,32 +1995,37 @@ def get_generated_json_files(tempdir=None):
19931995
.skip_tester('nanoarrow')
19941996
.skip_tester('Java') # TODO(ARROW-7779)
19951997
# TODO(https://github.com/apache/arrow/issues/38045)
1996-
.skip_format(SKIP_FLIGHT, '.NET'),
1998+
.skip_format(SKIP_FLIGHT, '.NET')
1999+
.skip_tester('Ruby'),
19972000

19982001
generate_run_end_encoded_case()
19992002
.skip_tester('.NET')
20002003
.skip_tester('JS')
20012004
# TODO(https://github.com/apache/arrow-nanoarrow/issues/618)
20022005
.skip_tester('nanoarrow')
2006+
.skip_tester('Ruby')
20032007
.skip_tester('Rust'),
20042008

20052009
generate_binary_view_case()
20062010
.skip_tester('JS')
20072011
# TODO(https://github.com/apache/arrow-nanoarrow/issues/618)
20082012
.skip_tester('nanoarrow')
2013+
.skip_tester('Ruby')
20092014
.skip_tester('Rust'),
20102015

20112016
generate_list_view_case()
20122017
.skip_tester('.NET') # Doesn't support large list views
20132018
.skip_tester('JS')
20142019
# TODO(https://github.com/apache/arrow-nanoarrow/issues/618)
20152020
.skip_tester('nanoarrow')
2021+
.skip_tester('Ruby')
20162022
.skip_tester('Rust'),
20172023

20182024
generate_extension_case()
20192025
.skip_tester('nanoarrow')
20202026
# TODO(https://github.com/apache/arrow/issues/38045)
2021-
.skip_format(SKIP_FLIGHT, '.NET'),
2027+
.skip_format(SKIP_FLIGHT, '.NET')
2028+
.skip_tester('Ruby'),
20222029
]
20232030

20242031
generated_paths = []

dev/archery/archery/integration/runner.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,9 +196,11 @@ def _gold_tests(self, gold_dir):
196196
skip_testers.add(".NET")
197197
skip_testers.add("Java")
198198
skip_testers.add("JS")
199+
skip_testers.add("Ruby")
199200
skip_testers.add("Rust")
200201
if prefix == '2.0.0-compression':
201202
skip_testers.add("JS")
203+
skip_testers.add("Ruby")
202204
if prefix == '2.0.0-compression' and 'lz4' in name:
203205
# https://github.com/apache/arrow-nanoarrow/issues/621
204206
skip_testers.add("nanoarrow")
@@ -590,9 +592,9 @@ def get_static_json_files():
590592

591593

592594
def select_testers(with_cpp=True, with_java=True, with_js=True,
593-
with_dotnet=True, with_go=True, with_rust=False,
594-
with_nanoarrow=False, target_implementations="",
595-
**kwargs):
595+
with_dotnet=True, with_go=True, with_ruby=False,
596+
with_rust=False, with_nanoarrow=False,
597+
target_implementations="", **kwargs):
596598
target_implementations = (target_implementations.split(",")
597599
if target_implementations else [])
598600

@@ -629,6 +631,10 @@ def append_tester(implementation, tester):
629631
from .tester_nanoarrow import NanoarrowTester
630632
append_tester("nanoarrow", NanoarrowTester(**kwargs))
631633

634+
if with_ruby:
635+
from .tester_ruby import RubyTester
636+
append_tester("ruby", RubyTester(**kwargs))
637+
632638
if with_rust:
633639
from .tester_rust import RustTester
634640
append_tester("rust", RustTester(**kwargs))
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
import os
19+
20+
from .tester import Tester
21+
from .util import run_cmd, log
22+
from ..utils.source import ARROW_ROOT_DEFAULT
23+
24+
25+
_EXE_PATH = os.path.join(
26+
ARROW_ROOT_DEFAULT, "ruby/red-arrow-format/bin/red-arrow-format-integration-test")
27+
28+
29+
class RubyTester(Tester):
30+
PRODUCER = True
31+
CONSUMER = True
32+
33+
name = "Ruby"
34+
35+
def _run(self, env):
36+
command_line = [_EXE_PATH]
37+
if self.debug:
38+
command_line_string = ""
39+
for key, value in env.items:
40+
command_line_string += f"{key}={value} "
41+
command_line_string += " ".join(command_line)
42+
log(command_line_string)
43+
run_cmd(command_line, env=os.environ | env)
44+
45+
def validate(self, json_path, arrow_path, quirks=None):
46+
env = {
47+
"ARROW": arrow_path,
48+
"COMMAND": "validate",
49+
"JSON": json_path,
50+
}
51+
if quirks:
52+
for quirk in quirks:
53+
env[f"QUIRK_{quirk.upper()}"] = "true"
54+
self._run(env)
55+
56+
def json_to_file(self, json_path, arrow_path):
57+
env = {
58+
"ARROW": arrow_path,
59+
"COMMAND": "json-to-file",
60+
"JSON": json_path,
61+
}
62+
self._run(env)
63+
64+
def stream_to_file(self, stream_path, file_path):
65+
env = {
66+
"ARROW": file_path,
67+
"ARROWS": stream_path,
68+
"COMMAND": "stream-to-file",
69+
}
70+
self._run(env)
71+
72+
def file_to_stream(self, file_path, stream_path):
73+
env = {
74+
"ARROW": file_path,
75+
"ARROWS": stream_path,
76+
"COMMAND": "file-to-stream",
77+
}
78+
self._run(env)

dev/archery/archery/integration/util.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717

1818
import contextlib
1919
import io
20+
import os
2021
import random
2122
import socket
2223
import subprocess
@@ -137,7 +138,13 @@ def run_cmd(cmd, **kwargs):
137138
except subprocess.CalledProcessError as e:
138139
# this avoids hiding the stdout / stderr of failed processes
139140
sio = io.StringIO()
140-
print('Command failed:', " ".join(cmd), file=sio)
141+
command_line_string = ''
142+
env = kwargs.get('env', {})
143+
for key in env.keys() - os.environ.keys():
144+
value = env[key]
145+
command_line_string += f'{key}={value} '
146+
command_line_string += ' '.join(cmd)
147+
print(f'Command failed: {command_line_string}', file=sio)
141148
print('With output:', file=sio)
142149
print('--------------', file=sio)
143150
print(frombytes(e.output), file=sio)
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
#!/usr/bin/env ruby
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
20+
require_relative "../lib/arrow-format"
21+
require_relative "../lib/arrow-format/integration/options"
22+
23+
options = ArrowFormat::Integration::Options.singleton
24+
case options.command
25+
when "validate"
26+
require_relative "../lib/arrow-format/integration/validate"
27+
when "json-to-file"
28+
require_relative "../lib/arrow-format/integration/json-reader"
29+
File.open(options.json, "r") do |input|
30+
reader = ArrowFormat::Integration::JSONReader.new(input)
31+
File.open(options.arrow, "wb") do |output|
32+
writer = ArrowFormat::FileWriter.new(output)
33+
writer.start(reader.schema)
34+
reader.each do |record_batch|
35+
writer.write_record_batch(record_batch)
36+
end
37+
writer.finish
38+
end
39+
end
40+
when "stream-to-file"
41+
File.open(options.arrows, "rb") do |input|
42+
reader = ArrowFormat::StreamingReader.new(input)
43+
File.open(options.arrow, "wb") do |output|
44+
writer = ArrowFormat::FileWriter.new(output)
45+
writer.start(reader.schema)
46+
reader.each do |record_batch|
47+
writer.write_record_batch(record_batch)
48+
end
49+
writer.finish
50+
end
51+
end
52+
when "file-to-stream"
53+
File.open(options.arrow, "rb") do |input|
54+
reader = ArrowFormat::FileReader.new(input)
55+
File.open(options.arrows, "wb") do |output|
56+
writer = ArrowFormat::StreamingWriter.new(output)
57+
writer.start(reader.schema)
58+
reader.each do |record_batch|
59+
writer.write_record_batch(record_batch)
60+
end
61+
writer.finish
62+
end
63+
end
64+
end

0 commit comments

Comments
 (0)