Conversation
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the data CI image build process to use depsets, which is a good step towards better dependency management. It also includes several dependency version upgrades. I've found a couple of critical issues in the Wanda configuration files that will likely cause build failures. There's also a minor improvement that can be made to the new Dockerfile for clarity and best practices.
ci/docker/data.base.build.wanda.yaml
Outdated
| @@ -0,0 +1,9 @@ | |||
| name: "databuild-py$PYTHON" | |||
| froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"] | |||
| dockerfile: ci/docker/data.build.Dockerfile | |||
There was a problem hiding this comment.
| - name: databuild-multipy | ||
| label: "wanda: databuild-py{{matrix}}" | ||
| wanda: ci/docker/data.build.wanda.yaml | ||
| wanda: ci/docker/data.base.build.wanda.yaml |
There was a problem hiding this comment.
ci/docker/data.base.build.Dockerfile
Outdated
|
|
||
| SHELL ["/bin/bash", "-ice"] | ||
|
|
||
| COPY . . |
There was a problem hiding this comment.
The COPY . . command seems redundant as the required lock file is already explicitly copied on line 10. This broad COPY could unintentionally bring in other files from the build context, potentially increasing image size or causing unexpected behavior. For better clarity, hermeticity, and caching, it's best to remove this line if it's not strictly necessary for other files not declared in the wanda.yaml srcs.
ci/docker/datamongo.build.wanda.yaml
Outdated
| name: "datamongobuild-py$PYTHON" | ||
| froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"] | ||
| dockerfile: ci/docker/data.build.Dockerfile | ||
| dockerfile: t |
There was a problem hiding this comment.
Dockerfile path set to invalid value t
High Severity
The dockerfile field in datamongo.build.wanda.yaml is changed to t, which is not a valid Dockerfile path. This appears to be an accidental edit (possibly partial deletion). The previous value was ci/docker/data.build.Dockerfile. This will cause the datamongobuild-multipy CI build step to fail when trying to locate the Dockerfile.
| srcs: | ||
| - python/deplocks/ci/data_base_depset_py$PYTHON.lock | ||
| build_args: | ||
| - DOCKER_IMAGE_BASE_BUILD=cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON |
There was a problem hiding this comment.
Missing PYTHON_DEPSET build argument in wanda config
Medium Severity
The data.base.build.wanda.yaml does not pass PYTHON_DEPSET as a build argument, so the Dockerfile would use its default value. However, the default python/deplocks/ci/data_base_depset_py$PYTHON.lock references $PYTHON which is never defined as a Dockerfile ARG, causing it to expand to an empty string. The resulting path data_base_depset_py.lock (without version) won't match the actual lock file. The similar datatfds.build.wanda.yaml correctly passes PYTHON_DEPSET explicitly.
Additional Locations (1)
| - python/requirements/ml/data-test-requirements.txt | ||
| - python/requirements/data/data-base.txt | ||
| constraints: | ||
| - /tmp/ray-deps/requirements_compiled_py3.13.txt |
There was a problem hiding this comment.
Hardcoded Python 3.13 constraints for 3.10/3.12 builds
Medium Severity
The constraints path hardcodes requirements_compiled_py3.13.txt but the depset builds target Python 3.10 and 3.12 (via build_arg_sets). All other depset configs in rayimg.depsets.yaml use ${PYTHON_VERSION} to match the constraints file to the target Python version. Using 3.13 constraints for 3.10/3.12 builds could result in dependency versions that are incompatible with those Python versions.
| if [[ -n "$ARROW_MONGO_VERSION" ]]; then | ||
| # Older versions of Arrow Mongo require an older version of NumPy. | ||
| pip install numpy==1.23.5 | ||
| fi |
There was a problem hiding this comment.
Dead code conditional branches never execute
Low Severity
The conditional blocks for ARROW_MONGO_VERSION and RAY_CI_JAVA_BUILD will never execute because the corresponding data.base.build.wanda.yaml doesn't pass these build args. Both ARGs default to empty, so the if [[ -n "$ARROW_MONGO_VERSION" ]] and if [[ $RAY_CI_JAVA_BUILD == 1 ]] conditions always evaluate to false. This appears to be code copied from data.build.Dockerfile that isn't needed for this simpler depset-based build.
Additional Locations (1)
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
ci/docker/data.base.build.Dockerfile
Outdated
|
|
||
| ARG ARROW_MONGO_VERSION= | ||
| ARG RAY_CI_JAVA_BUILD= | ||
| ARG PYTHON_DEPSET=python/deplocks/ci/data_base_depset_py$PYTHON.lock |
There was a problem hiding this comment.
Missing ARG declaration causes empty variable expansion
High Severity
The PYTHON_DEPSET ARG default value uses $PYTHON, but PYTHON is never declared as an ARG in this Dockerfile. Without an ARG PYTHON declaration, the variable expands to an empty string, causing the path to resolve to python/deplocks/ci/data_base_depset_py.lock (missing the version number). The wanda file also doesn't pass PYTHON or PYTHON_DEPSET as a build argument to override this default.
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
aslonnie
left a comment
There was a problem hiding this comment.
let me know when this is ready for review.
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…/ray-project/ray into elliot-barn-data-ci-dep-upgrades
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Fixes PerReplica -> tf.Tensor conversion error with TF 2.20 / Keras 3 by forcing tf.keras to use the Keras 2 API via tf-keras. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pytest 8.2+ removed support for pytest.warns(None). Use warnings.catch_warnings(record=True) instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
downloads.apache.org is a CDN mirror redirector that inconsistently serves old releases like Hadoop 3.2.4, causing CI Docker builds to fail silently. archive.apache.org is the permanent canonical archive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
With set -eo pipefail, `yes` receives SIGPIPE (exit 141) when hadoop closes the pipe after finishing. Replace with `echo "Y"` which sends one confirmation and exits cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The GPU doctest target was missing this env var, causing TF 2.16+ to initialize Keras 3 which conflicts with tf_keras (Keras 2) imports in batch_inference.rst and transforming-data.rst. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>


Generating data depsets for data ci images