Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
6bbe0bd
Kafka Consumer Group Properties
talatuyarer Mar 18, 2025
4241b31
Need to remove properies prefix from properties to apply on Kafka
talatuyarer Mar 18, 2025
b2ef3b0
Expose add-modules JVM option in SDKHarnessOptions (#34289)
Abacn Mar 18, 2025
1003194
[AnomalyDetection] Refactor and improve Specifiable (#34310)
shunping Mar 18, 2025
33fd2bb
Bump golang.org/x/oauth2 from 0.26.0 to 0.28.0 in /sdks (#34190)
dependabot[bot] Mar 18, 2025
c3c10d2
Support directory separator for Python local filesystem (#34318)
liferoad Mar 18, 2025
10c22cf
Bump github.com/nats-io/nats.go from 1.39.0 to 1.39.1 in /sdks (#34029)
dependabot[bot] Mar 18, 2025
09e9993
Make yaml provider an abstract class.
robertwb Feb 25, 2025
1c0a362
Add support for python package dependencies on UDFs.
robertwb Feb 22, 2025
7126636
Add dependency support for delegating providers.
robertwb Feb 25, 2025
a47fa9b
Allow dependency declarations for java.
robertwb Feb 25, 2025
56b3ff6
Make yapf, lint, and mypy happy, fix test.
robertwb Mar 3, 2025
c088abe
[AnomalyDetection] Support offline detectors (#34311)
shunping Mar 18, 2025
00d9f00
add option to disable metrics (#34303)
Naireen Mar 19, 2025
b7ec790
initial schema transform provider implementation for SQS read functio…
prodriguezdefino Mar 17, 2025
e0df796
fixing checkstyle errors.
prodriguezdefino Mar 17, 2025
ea7980f
removing region configuration since its not needed
prodriguezdefino Mar 18, 2025
db4c87d
adding package info for the new providers
prodriguezdefino Mar 18, 2025
2392a7d
added tests to the provider
prodriguezdefino Mar 18, 2025
76a1374
[Dataflow Streaming] WindmillTimerInternals: Use a single map to stor…
arunpandianp Mar 19, 2025
16fb2fb
add equals hashCode to BoundedToUnboundedSourceAdapter (#34057)
scwhittle Mar 19, 2025
fbead7f
Remove cancelled tasks from ReadOperation queue when shutting down (#…
damccorm Mar 19, 2025
cb67604
[Prism] Refactor stageState to a behavior interface to reduce branch …
lostluck Mar 19, 2025
29785ca
More precise binary operation inference. (#34305)
robertwb Mar 19, 2025
d24cbf6
Fix precommit flink container (#34250)
akashorabek Mar 19, 2025
04e9744
Aggregation option in Kinesis Writer Python sdk (#34323)
Nakachi-S Mar 19, 2025
c6a8caa
Add an experiment named enableLineageRollup which when passed to Java…
rohitsinha54 Mar 16, 2025
0af1ede
Allow boundedtrie emission outside of lineage usecase
rohitsinha54 Mar 17, 2025
b2372b2
Update Lineage query result to support BoundedTrie and StringSet both
rohitsinha54 Mar 18, 2025
1f68c49
update CHANGES with Read bugs/improvements (#34344)
scwhittle Mar 19, 2025
6c557fe
Revert Skip BoundedTrie on Dataflow till service is have BoundedTrie …
rohitsinha54 Mar 14, 2025
2710ba8
Do not runner UsesBoundedTrie test on Dataflow runner as the test ver…
rohitsinha54 Mar 19, 2025
00cb04b
[#31438] Add changes.md message for Trigger support in Prism. (#34346)
lostluck Mar 19, 2025
a437dc4
Bump cloud.google.com/go/storage from 1.50.0 to 1.51.0 in /sdks (#34340)
dependabot[bot] Mar 19, 2025
55a6a67
Bump serialize-javascript and mocha in /sdks/typescript (#34012)
dependabot[bot] Mar 19, 2025
667b045
Bump github.com/fsouza/fake-gcs-server from 1.52.1 to 1.52.2 in /sdks…
dependabot[bot] Mar 19, 2025
457fe82
Bump golang.org/x/net from 0.23.0 to 0.36.0 in /learning/katas/go (#3…
dependabot[bot] Mar 19, 2025
6c156c5
Fix pandas doctests sensitive to NumpyExtensionArray formatting. (#34…
robertwb Mar 19, 2025
5ad150b
Bump google.golang.org/api from 0.221.0 to 0.226.0 in /sdks (#34293)
dependabot[bot] Mar 19, 2025
65fab8c
Fix relative path yaml provider error when no base provided.
robertwb Mar 19, 2025
ded696c
Fixed infinitive loop
talatuyarer Mar 20, 2025
c73226a
Avoid holding data elements alive via stack frame gc roots. (#33086)
scwhittle Mar 20, 2025
3bd8e02
Remove finalizers from sdb-cluster before deleting singlestoreio name…
akashorabek Mar 20, 2025
9864fd6
Bump google.golang.org/api from 0.226.0 to 0.227.0 in /sdks (#34356)
dependabot[bot] Mar 20, 2025
0e0bc35
Bump github.com/nats-io/nats-server/v2 from 2.10.25 to 2.11.0 in /sdk…
dependabot[bot] Mar 20, 2025
a90e411
[IcebergIO] Filter out data files that have already been committed (#…
ahmedabu98 Mar 20, 2025
b7d6e88
[Python] File staging to user worker support (#34208)
stankiewicz Mar 20, 2025
e96dff4
Fix dockerfile version (#34352)
damccorm Mar 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"comment": "Modify this file in a trivial way to cause this test suite to run",
"modification": 1
"modification": 2,
"https://github.com/apache/beam/pull/34294": "noting that PR #34294 should run this test"
}
33 changes: 30 additions & 3 deletions .github/workflows/beam_PreCommit_Flink_Container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ env:
HARNESS_IMAGES_TO_PULL: gcr.io/apache-beam-testing/beam-sdk/beam_go_sdk:latest
JOB_SERVER_IMAGE: gcr.io/apache-beam-testing/beam_portability/beam_flink1.17_job_server:latest
ARTIFACTS_DIR: gs://beam-flink-cluster/beam-precommit-flink-container-${{ github.run_id }}
DOCKER_REGISTRY: gcr.io
DOCKER_REPOSITORY_ROOT: ${{ github.event_name == 'pull_request_target' && 'gcr.io/apache-beam-testing/beam-sdk-pr' || 'gcr.io/apache-beam-testing/beam-sdk' }}
PYTHON_VERSION: 3.9
PYTHON_SDK_IMAGE_TAG: latest

jobs:
beam_PreCommit_Flink_Container:
Expand All @@ -87,7 +91,7 @@ jobs:
github.event_name == 'pull_request_target' ||
github.event.comment.body == 'Run Flink Container PreCommit'
runs-on: [self-hosted, ubuntu-20.04, main]
timeout-minutes: 45
timeout-minutes: 90
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
strategy:
matrix:
Expand All @@ -105,6 +109,24 @@ jobs:
uses: ./.github/actions/setup-environment-action
with:
python-version: default
- name: GCloud Docker credential helper
if: ${{ github.event_name == 'pull_request_target' }}
run: |
gcloud auth configure-docker ${{ env.DOCKER_REGISTRY }}
- name: Set PYTHON_SDK_IMAGE_TAG unique variable based on timestamp
if: ${{ github.event_name == 'pull_request_target' }}
run: echo "PYTHON_SDK_IMAGE_TAG=$(date +'%Y%m%d-%H%M%S%N')" >> $GITHUB_ENV
- name: Build and push to registry
if: ${{ github.event_name == 'pull_request_target' }}
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:python:container:py39:docker
arguments: |
-PpythonVersion=${{ env.PYTHON_VERSION }} \
-Pdocker-repository-root=${{ env.DOCKER_REPOSITORY_ROOT }} \
-Pdocker-tag=${{ env.PYTHON_SDK_IMAGE_TAG }} \
-PuseBuildx \
-Ppush-containers
- name: Prepare test arguments
uses: ./.github/actions/test-arguments-action
with:
Expand Down Expand Up @@ -141,11 +163,11 @@ jobs:
arguments: |
-PloadTest.mainClass=apache_beam.testing.load_tests.combine_test \
-Prunner=FlinkRunner \
'-PloadTest.args=${{ env.beam_PreCommit_Flink_Container_test_arguments_2 }} --job_name=flink-tests-python-${{env.NOW_UTC}}'
'-PloadTest.args=${{ env.beam_PreCommit_Flink_Container_test_arguments_2 }} --environment_config=${{ env.DOCKER_REPOSITORY_ROOT }}/beam_python${{ env.PYTHON_VERSION }}_sdk:${{ env.PYTHON_SDK_IMAGE_TAG }} --job_name=flink-tests-python-${{env.NOW_UTC}}'

# Run a Java Combine load test to verify the Flink container
- name: Run Flink Container Test with Java Combine
timeout-minutes: 10
timeout-minutes: 20
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:java:testing:load-tests:run
Expand All @@ -158,3 +180,8 @@ jobs:
if: always()
run: |
${{ github.workspace }}/.test-infra/dataproc/flink_cluster.sh delete

- name: Cleanup Python SDK Container
if: ${{ always() && github.event_name == 'pull_request_target' }}
run: |
gcloud container images delete ${{ env.DOCKER_REPOSITORY_ROOT }}/beam_python${{ env.PYTHON_VERSION }}_sdk:${{ env.PYTHON_SDK_IMAGE_TAG }} --force-delete-tags --quiet
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,5 @@
--parallelism=2
--job_endpoint=localhost:8099
--environment_type=DOCKER
--environment_config=gcr.io/apache-beam-testing/beam-sdk/beam_python3.9_sdk:latest
--top_count=10
--runner=PortableRunner
2 changes: 1 addition & 1 deletion .test-infra/tools/stale_dataflow_prebuilt_image_cleaner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ set -euo pipefail
# Clean up private registry (us.gcr.io)
# Images more than 5 day old and not the latest (either has latest label or newest)

PUBLIC_REPOSITORIES=(beam-sdk beam_portability beamgrafana beammetricssyncjenkins beammetricssyncgithub)
PUBLIC_REPOSITORIES=(beam-sdk beam-sdk-pr beam_portability beamgrafana beammetricssyncjenkins beammetricssyncgithub)
PRIVATE_REPOSITORIES=(java-postcommit-it python-postcommit-it jenkins github-actions)
# set as the same as 6-week release period
if [[ $OSTYPE == "linux-gnu"* ]]; then
Expand Down
11 changes: 7 additions & 4 deletions .test-infra/tools/stale_k8s_workload_cleaner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ set -euo pipefail
PROJECT=apache-beam-testing
LOCATION=us-central1-a
CLUSTER=io-datastores
MEMSQL_CLUSTER_RESOURCE="memsqlclusters.memsql.com/sdb-cluster"

function should_teardown() {
if [[ $1 =~ ^([0-9]+)([a-z]) ]]; then
Expand All @@ -43,10 +44,12 @@ function should_teardown() {
gcloud container clusters get-credentials io-datastores --zone us-central1-a --project apache-beam-testing

while read NAME STATUS AGE; do
# Regex has temporary workaround to avoid trying to delete beam-performancetests-singlestoreio-* to avoid getting stuck in a terminal state
# See https://github.com/apache/beam/pull/33545 for context.
# This may be safe to remove if https://cloud.google.com/knowledge/kb/deleted-namespace-remains-in-terminating-status-000004867 has been resolved, just try it before checking in :)
if [[ $NAME =~ ^beam-.+(test|-it)(?!s-singlestoreio) ]] && should_teardown $AGE; then
if [[ $NAME =~ ^beam-.+(test|-it) ]] && should_teardown $AGE; then
# For namespaces containing "-singlestoreio-", remove the finalizers from the sdb-cluster resource
# to ensure it can be fully deleted and not block namespace removal.
if [[ $NAME == *-singlestoreio-* ]]; then
kubectl patch $MEMSQL_CLUSTER_RESOURCE -n $NAME -p '[{"op": "remove", "path": "/metadata/finalizers"}]' --type=json
fi
kubectl delete namespace $NAME
fi
done < <( kubectl get namespaces --context=gke_${PROJECT}_${LOCATION}_${CLUSTER} )
12 changes: 9 additions & 3 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,17 @@

* Support for X source added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
* [Java] Use API compatible with both com.google.cloud.bigdataoss:util 2.x and 3.x in BatchLoads ([#34105](https://github.com/apache/beam/pull/34105))
* [IcebergIO] Address edge case where bundle retry following a successful data commit results in data duplication ([#34264](https://github.com/apache/beam/pull/34264))

## New Features / Improvements

* Support custom coders in Reshuffle ([#29908](https://github.com/apache/beam/issues/29908), [#33356](https://github.com/apache/beam/issues/33356)).
* [Python] Support custom coders in Reshuffle ([#29908](https://github.com/apache/beam/issues/29908), [#33356](https://github.com/apache/beam/issues/33356)).
* [Java] Upgrade SLF4J to 2.0.16. Update default Spark version to 3.5.0. ([#33574](https://github.com/apache/beam/pull/33574))
* [Java] Support for `--add-modules` JVM option is added through a new pipeline option `JdkAddRootModules`. This allows extending the module graph with optional modules such as SDK incubator modules. Sample usage: `<pipeline invocation> --jdkAddRootModules=jdk.incubator.vector` ([#30281](https://github.com/apache/beam/issues/30281)).
* X feature added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
* Managed API for [Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/managed/Managed.html) and [Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.managed.html#module-apache_beam.transforms.managed) supports [key I/O connectors](https://beam.apache.org/documentation/io/connectors/) Iceberg, Kafka, and BigQuery.
* Prism now supports event time triggers for most common cases. ([#31438](https://github.com/apache/beam/issues/31438))
* Prism does not yet support triggered side inputs, or triggers on merging windows (such as session windows).

## Breaking Changes

Expand All @@ -91,7 +95,9 @@
* (Java) Fixed TIME field encodings for BigQuery Storage API writes on GenericRecords ([#34059](https://github.com/apache/beam/pull/34059)).
* (Java) Fixed a race condition in JdbcIO which could cause hangs trying to acquire a connection ([#34058](https://github.com/apache/beam/pull/34058)).
* (Java) Fix BigQuery Storage Write compatibility with Avro 1.8 ([#34281](https://github.com/apache/beam/pull/34281)).
* Fixed checkpoint recovery and streaming behavior in Spark Classic and Portable runner's Flatten transform by replacing queueStream with SingleEmitInputDStream ([#34080](https://github.com/apache/beam/pull/34080), [#18144](https://github.com/apache/beam/issues/18144), [#20426](https://github.com/apache/beam/issues/20426)).
* Fixed checkpoint recovery and streaming behavior in Spark Classic and Portable runner's Flatten transform by replacing queueStream with SingleEmitInputDStream ([#34080](https://github.com/apache/beam/pull/34080), [#18144](https://github.com/apache/beam/issues/18144), [#20426](https://github.com/apache/beam/issues/20426))
* (Java) Fixed Read caching of UnboundedReader objects to effectively cache across multiple DoFns and avoid checkpointing unstarted reader. [#34146](https://github.com/apache/beam/pull/34146) [#33901](https://github.com/apache/beam/pull/33901)


## Security Fixes
* Fixed (CVE-YYYY-NNNN)[https://www.cve.org/CVERecord?id=CVE-YYYY-NNNN] (Java/Python/Go) ([#X](https://github.com/apache/beam/issues/X)).
Expand Down Expand Up @@ -133,7 +139,7 @@
* [Dataflow Streaming] Enable Windmill GetWork Response Batching by default ([#33847](https://github.com/apache/beam/pull/33847)).
* With this change user workers will request batched GetWork responses from backend and backend will send multiple WorkItems in the same response proto.
* The feature can be disabled by passing `--windmillRequestBatchedGetWorkResponse=false`

* Added supports for staging arbitrary files via `--files_to_stage` flag (Python) ([#34208](https://github.com/apache/beam/pull/34208))
## Breaking Changes

* AWS V1 I/Os have been removed (Java). As part of this, x-lang Python Kinesis I/O has been updated to consume the V2 IO and it also no longer supports setting producer_properties ([#33430](https://github.com/apache/beam/issues/33430)).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -573,7 +573,7 @@ class BeamModulePlugin implements Plugin<Project> {
}

project.ext.useBuildx = {
return project.containerArchitectures() != [project.nativeArchitecture()]
return (project.containerArchitectures() != [project.nativeArchitecture()]) || project.rootProject.hasProperty("useBuildx")
}

/** ***********************************************************************************************/
Expand Down
4 changes: 2 additions & 2 deletions examples/notebooks/beam-ml/run_inference_vllm.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -338,9 +338,9 @@
"RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 && pip install --upgrade pip\n",
"\n",
"# Copy the Apache Beam worker dependencies from the Beam Python 3.10 SDK image.\n",
"COPY --from=apache/beam_python3.10_sdk:2.60.0 /opt/apache/beam /opt/apache/beam\n",
"COPY --from=apache/beam_python3.10_sdk:2.61.0 /opt/apache/beam /opt/apache/beam\n",
"\n",
"RUN pip install --no-cache-dir -vvv apache-beam[gcp]==2.60.0\n",
"RUN pip install --no-cache-dir -vvv apache-beam[gcp]==2.61.0\n",
"RUN pip install openai>=1.52.2 vllm>=0.6.3 triton>=3.1.0\n",
"\n",
"RUN apt install libcairo2-dev pkg-config python3-dev -y\n",
Expand Down
4 changes: 2 additions & 2 deletions learning/katas/go/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ go 1.14

require (
github.com/apache/beam/sdks/v2 v2.40.0
github.com/google/go-cmp v0.5.8
golang.org/x/net v0.23.0 // indirect
github.com/google/go-cmp v0.6.0
golang.org/x/net v0.36.0 // indirect
google.golang.org/genproto v0.0.0-20220815135757-37a418bb8959 // indirect
)
Loading
Loading