Skip to content

Commit c082ab9

Browse files
damccormmohamedawnallahdependabot[bot]chamikaramjAbacn
authored
Properly add milvus as extra dependency (#35986)
* sdks/python: properly make milvus as extra dependency * sdks/python: update image requirements * .github: trigger postcommit python * sdks/python: fix linting issues * sdks/python: fix formatting issues * .github: trigger beam postcommit python * sdks/python: revert milvus version in itests * sdks/python: update image requirements * trigger_files: trigger postcommit python * Bump github.com/docker/go-connections from 0.5.0 to 0.6.0 in /sdks (#35906) Bumps [github.com/docker/go-connections](https://github.com/docker/go-connections) from 0.5.0 to 0.6.0. - [Commits](docker/go-connections@v0.5.0...v0.6.0) --- updated-dependencies: - dependency-name: github.com/docker/go-connections dependency-version: 0.6.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add the readme link to new YAML examples (#35941) * Bump google.golang.org/api from 0.247.0 to 0.248.0 in /sdks (#35969) * Remove mysql-connector-python dependency (#35932) * Fix typos and update test implementation from #35656 (#35958) * implement lambda name pickling in cloudpickle * add enable_lambda_name to __init__ * fix formatting and lint * fix typo * fix code paths in test * fix tests * fix lint * fix formatting and failing test * fix formatting again * remove cloudpickle implementation to leave only typo fixes and fixing test structure. * fix _make_function typo * revert regex * fix failing tests * fix formatting * update prefix to not hardcode * feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 (#35946) * feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 Update MongoDB Java driver from 3.12.11 to 5.5.0 and refactor code to use new API Add mongo-bson dependency required by new driver version Replace deprecated MongoClient with MongoClients and update GridFS implementation * refactor(mongodb): update MongoDB client usage to modern API Replace deprecated MongoClient with MongoClients.create() and update database drop method * build(dependencies): add mongodb driver core dependency Add mongodb-driver-core to support MongoDB Java driver functionality. Also mark mongo_java_driver as permitUnusedDeclared and add testImplementation. * fix(mongodb): update embedded mongo version and fix split key filtering Update embedded MongoDB test dependency to version 3.5.4 and simplify split key filtering logic by using BsonObjectId for range queries. This ensures proper type handling when filtering MongoDB documents by _id field. * build: add mongodb-driver-core dependency Add mongodb-driver-core version 5.5.0 to support MongoDB Java driver functionality * use version * refactor: simplify mongo client creation logic Remove redundant null check and consolidate uri handling in MongoDbGridFSIO * Bump github.com/aws/aws-sdk-go-v2/credentials in /sdks (#35974) Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.18.6 to 1.18.7. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/config/v1.18.7/CHANGELOG.md) - [Commits](aws/aws-sdk-go-v2@config/v1.18.6...config/v1.18.7) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/credentials dependency-version: 1.18.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump google.golang.org/grpc from 1.74.2 to 1.75.0 in /sdks (#35971) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.74.2 to 1.75.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.74.2...v1.75.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.75.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Override localhost endpoint when a worker is running in docker on mac (#35964) * fix(parquetio): handle missing nullable fields in row conversion (#35948) * fix(parquetio): handle missing nullable fields in row conversion Add null value handling when converting rows to Arrow tables for nullable fields that are missing from input data. This fixes KeyError when writing to Parquet with missing nullable fields, addressing issue #35791. * fix lint * Bump cloud.google.com/go/storage from 1.56.0 to 1.56.1 in /sdks (#35980) Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.56.0 to 1.56.1. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](googleapis/google-cloud-go@spanner/v1.56.0...storage/v1.56.1) --- updated-dependencies: - dependency-name: cloud.google.com/go/storage dependency-version: 1.56.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Prism] Fix segv when docker container self-terminated. (#35977) * Fix segv when docker container is self-terminated * Add some debug logging for docker and process env. * add a jinja % include/import pipeline example to docs (#35931) * add a jinja include pipeline example * update yaml doc with import example * address gemini and other comments * fix table of contents for readme * add link to jinja pipeline examples * Bump github.com/aws/aws-sdk-go-v2/config from 1.31.2 to 1.31.3 in /sdks (#35983) * Add a security GCP log analyzer (#35922) * Add the base log_analyzer * Add github action for security logging * Enhance LogAnalyzer to filter logs by time range and include file names in event summary * Add dry-run option for weekly email report generation in LogAnalyzer * Better error handling for timezones and missing details * Refactor LogAnalyzer to use SinkCls for type consistency and enhance bucket permission management for log sinks * update py containers (#35982) * [YAML]: add import jinja pipeline example (#35945) * add import jinja pipeline example * revert name change * update overall examples readme * fix lint issue * fix gemini small issue * Update sdks/python/apache_beam/yaml/examples/transforms/jinja/import/README.md --------- Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com> * workflows: capture DinD tests in PreCommit Py Coverage workflow * workflows: temporarily removing `ubuntu-latest` till resolving deps * workflows: add `matrix.os` label to `beam_PreCommit_Python_Coverage` --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Mohamed Awnallah <mohamedmohey2352@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chamikara Jayalath <chamikaramj@gmail.com> Co-authored-by: Yi Hu <yathu@google.com> Co-authored-by: kristynsmith <kristynsmith@google.com> Co-authored-by: liferoad <huxiangqian@gmail.com> Co-authored-by: Shunping Huang <shunping@google.com> Co-authored-by: Derrick Williams <derrickaw@google.com> Co-authored-by: Enrique Calderon <71863693+ksobrenat32@users.noreply.github.com> Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com> Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
1 parent 2dcf5ff commit c082ab9

File tree

47 files changed

+1493
-509
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1493
-509
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
22
"comment": "Modify this file in a trivial way to cause this test suite to run.",
3-
"modification": 33
3+
"modification": 27
44
}
55

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
# This workflow works with the GCP security log analyzer to
19+
# generate weekly security reports and initialize log sinks
20+
21+
name: GCP Security Log Analyzer
22+
23+
on:
24+
workflow_dispatch:
25+
schedule:
26+
# Once a week at 9:00 AM on Monday
27+
- cron: '0 9 * * 1'
28+
push:
29+
paths:
30+
- 'infra/security/config.yml'
31+
32+
# This allows a subsequently queued workflow run to interrupt previous runs
33+
concurrency:
34+
group: '${{ github.workflow }} @ ${{ github.sha || github.head_ref || github.ref }}-${{ github.event.schedule || github.event.sender.login }}'
35+
cancel-in-progress: true
36+
37+
#Setting explicit permissions for the action to avoid the default permissions which are `write-all` in case of pull_request_target event
38+
permissions:
39+
contents: read
40+
41+
jobs:
42+
beam_GCP_Security_LogAnalyzer:
43+
name: GCP Security Log Analysis
44+
runs-on: [self-hosted, ubuntu-20.04, main]
45+
timeout-minutes: 30
46+
steps:
47+
- uses: actions/checkout@v4
48+
49+
- name: Setup Python
50+
uses: actions/setup-python@v4
51+
with:
52+
python-version: '3.13'
53+
54+
- name: Install Python dependencies
55+
working-directory: ./infra/security
56+
run: |
57+
python -m pip install --upgrade pip
58+
pip install -r requirements.txt
59+
60+
- name: Setup gcloud
61+
uses: google-github-actions/setup-gcloud@v2
62+
63+
- name: Initialize Log Sinks
64+
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
65+
working-directory: ./infra/security
66+
run: python log_analyzer.py --config config.yml initialize
67+
68+
- name: Generate Weekly Security Report
69+
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
70+
working-directory: ./infra/security
71+
env:
72+
SMTP_SERVER: smtp.gmail.com
73+
SMTP_PORT: 465
74+
EMAIL_ADDRESS: ${{ secrets.ISSUE_REPORT_SENDER_EMAIL_ADDRESS }}
75+
EMAIL_PASSWORD: ${{ secrets.ISSUE_REPORT_SENDER_EMAIL_PASSWORD }}
76+
EMAIL_RECIPIENT: "dev@beam.apache.org"
77+
run: python log_analyzer.py --config config.yml generate-report --dry-run

.github/workflows/beam_PreCommit_Python_Coverage.yml

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -58,35 +58,45 @@ env:
5858

5959
jobs:
6060
beam_PreCommit_Python_Coverage:
61-
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
62-
runs-on: [self-hosted, ubuntu-20.04, highmem]
61+
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }} ${{ matrix.python_version }}) (${{ join(matrix.os, ', ') }})
62+
runs-on: ${{ matrix.os }}
6363
strategy:
64+
fail-fast: false
6465
matrix:
6566
job_name: [beam_PreCommit_Python_Coverage]
6667
job_phrase: [Run Python_Coverage PreCommit]
68+
python_version: ['3.9']
69+
# Run on both self-hosted and GitHub-hosted runners.
70+
# Some tests (marked require_docker_in_docker) can't run on Beam's
71+
# self-hosted runners due to Docker-in-Docker environment constraint.
72+
# These tests will only execute on ubuntu-latest (GitHub-hosted).
73+
# Context: https://github.com/apache/beam/pull/35585
74+
# Temporary removed the ubuntu-latest env till resolving deps issues.
75+
os: [[self-hosted, ubuntu-20.04, highmem]]
6776
timeout-minutes: 180
6877
if: |
6978
github.event_name == 'push' ||
7079
github.event_name == 'pull_request_target' ||
7180
(github.event_name == 'schedule' && github.repository == 'apache/beam') ||
7281
github.event_name == 'workflow_dispatch' ||
73-
github.event.comment.body == 'Run Python_Coverage PreCommit'
82+
startswith(github.event.comment.body, 'Run Python_Coverage PreCommit 3.')
7483
steps:
7584
- uses: actions/checkout@v4
7685
- name: Setup repository
7786
uses: ./.github/actions/setup-action
7887
with:
79-
comment_phrase: ${{ matrix.job_phrase }}
88+
comment_phrase: ${{ matrix.job_phrase }} ${{ matrix.python_version }}
8089
github_token: ${{ secrets.GITHUB_TOKEN }}
81-
github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
90+
github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }} ${{ matrix.python_version }}) (${{ join(matrix.os, ', ') }})
8291
- name: Setup environment
8392
uses: ./.github/actions/setup-environment-action
8493
with:
8594
java-version: default
86-
python-version: default
95+
python-version: ${{ matrix.python_version }}
8796
- name: Start DinD
8897
uses: ./.github/actions/dind-up-action
8998
id: dind
99+
if: contains(matrix.os, 'self-hosted')
90100
with:
91101
# Enable all the new features
92102
cleanup-dind-on-start: "true"
@@ -97,9 +107,9 @@ jobs:
97107
export-gh-env: "true"
98108
- name: Run preCommitPyCoverage
99109
env:
100-
DOCKER_HOST: ${{ steps.dind.outputs.docker-host }}
110+
DOCKER_HOST: ${{ contains(matrix.os, 'self-hosted') && steps.dind.outputs.docker-host || '' }}
101111
TOX_TESTENV_PASSENV: "DOCKER_*,TESTCONTAINERS_*,TC_*,BEAM_*,GRPC_*,OMP_*,OPENBLAS_*,PYTHONHASHSEED,PYTEST_*"
102-
TESTCONTAINERS_HOST_OVERRIDE: ${{ env.DIND_IP }}
112+
TESTCONTAINERS_HOST_OVERRIDE: ${{ contains(matrix.os, 'self-hosted') && env.DIND_IP || '' }}
103113
TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE: "/var/run/docker.sock"
104114
TESTCONTAINERS_RYUK_DISABLED: "false"
105115
TESTCONTAINERS_RYUK_CONTAINER_PRIVILEGED: "true"
@@ -110,6 +120,12 @@ jobs:
110120
uses: ./.github/actions/gradle-command-self-hosted-action
111121
with:
112122
gradle-command: :sdks:python:test-suites:tox:py39:preCommitPyCoverage
123+
arguments: |
124+
-Pposargs="${{
125+
contains(matrix.os, 'self-hosted') &&
126+
'-m (not require_docker_in_docker)' ||
127+
'-m require_docker_in_docker'
128+
}}"
113129
- uses: codecov/codecov-action@v3
114130
with:
115131
flags: python
@@ -118,7 +134,7 @@ jobs:
118134
uses: actions/upload-artifact@v4
119135
if: failure()
120136
with:
121-
name: Python Test Results
137+
name: Python ${{ matrix.python_version }} Test Results (${{ join(matrix.os, ', ') }})
122138
path: '**/pytest*.xml'
123139
- name: Publish Python Test Results
124140
env:
@@ -129,4 +145,5 @@ jobs:
129145
commit: '${{ env.prsha || env.GITHUB_SHA }}'
130146
comment_mode: ${{ github.event_name == 'issue_comment' && 'always' || 'off' }}
131147
files: '**/pytest*.xml'
132-
large_files: true
148+
large_files: true
149+
check_name: "Python ${{ matrix.python_version }} Test Results (${{ join(matrix.os, ', ') }})"

buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,9 @@ class BeamModulePlugin implements Plugin<Project> {
840840
log4j2_log4j12_api : "org.apache.logging.log4j:log4j-1.2-api:$log4j2_version",
841841
mockito_core : "org.mockito:mockito-core:4.11.0",
842842
mockito_inline : "org.mockito:mockito-inline:4.11.0",
843-
mongo_java_driver : "org.mongodb:mongo-java-driver:3.12.11",
843+
mongo_java_driver : "org.mongodb:mongodb-driver-sync:5.5.0",
844+
mongo_bson : "org.mongodb:bson:5.5.0",
845+
mongodb_driver_core : "org.mongodb:mongodb-driver-core:5.5.0",
844846
nemo_compiler_frontend_beam : "org.apache.nemo:nemo-compiler-frontend-beam:$nemo_version",
845847
netty_all : "io.netty:netty-all:$netty_version",
846848
netty_handler : "io.netty:netty-handler:$netty_version",

infra/security/README.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# GCP Security Analyzer
21+
22+
This document describes the implementation of a security analyzer for Google Cloud Platform (GCP) resources. The analyzer is designed to enhance security monitoring within our GCP environment by capturing critical events and generating alerts for specific security-sensitive actions.
23+
24+
## How It Works
25+
26+
1. **Log Sinks**: The system uses [GCP Log Sinks](https://cloud.google.com/logging/docs/export/configure_export_v2) to capture specific security-related log entries. These sinks are configured to filter for events like IAM policy changes or service account key creation.
27+
2. **Log Storage**: The filtered logs are routed to a dedicated Google Cloud Storage (GCS) bucket for persistence and analysis.
28+
3. **Report Generation**: A scheduled job runs weekly, executing the `log_analyzer.py` script.
29+
4. **Email Alerts**: The script analyzes the logs from the past week, compiles a summary of security events, and sends a report to a configured email address.
30+
31+
## Configuration
32+
33+
The behavior of the log analyzer is controlled by a `config.yml` file. Here’s an overview of the configuration options:
34+
35+
- `project_id`: The GCP project ID where the resources are located.
36+
- `bucket_name`: The name of the GCS bucket where logs will be stored.
37+
- `logging`: Configures the logging level and format for the script.
38+
- `sinks`: A list of log sinks to be created. Each sink has the following properties:
39+
- `name`: A unique name for the sink.
40+
- `description`: A brief description of what the sink monitors.
41+
- `filter_methods`: A list of GCP API methods to include in the filter (e.g., `SetIamPolicy`).
42+
- `excluded_principals`: A list of service accounts or user emails to exclude from monitoring, such as CI/CD service accounts.
43+
44+
### Example Configuration (`config.yml`)
45+
46+
```yaml
47+
project_id: your-gcp-project-id
48+
bucket_name: your-log-storage-bucket
49+
50+
sinks:
51+
- name: iam-policy-changes
52+
description: Monitors changes to IAM policies.
53+
filter_methods:
54+
- "SetIamPolicy"
55+
excluded_principals:
56+
- "ci-cd-account@your-project.iam.gserviceaccount.com"
57+
```
58+
59+
## Usage
60+
61+
The `log_analyzer.py` script provides two main commands for managing the security analyzer.
62+
63+
### Initializing Sinks
64+
65+
To create or update the log sinks in GCP based on your `config.yml` file, run the following command:
66+
67+
```bash
68+
python log_analyzer.py --config config.yml initialize
69+
```
70+
71+
This command ensures that the log sinks are correctly configured to capture the desired security events.
72+
73+
### Generating Weekly Reports
74+
75+
To generate and send the weekly security report, run this command:
76+
77+
```bash
78+
python log_analyzer.py --config config.yml generate-report
79+
```
80+
81+
This is typically run as a scheduled job (GitHub Action) to automate the delivery of weekly security reports.
82+
83+
84+

infra/security/config.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
project_id: testing-me-460223
17+
18+
# Logging
19+
logging:
20+
level: DEBUG
21+
format: "[%(asctime)s] %(levelname)s: %(message)s"
22+
23+
# gcloud storage bucket
24+
bucket_name: "testing-me-460223-tfstate"
25+
26+
# GCP Log sinks
27+
sinks:
28+
- name: iam-policy-changes
29+
description: Monitors changes to IAM policies, excluding approved CI/CD service accounts.
30+
filter_methods:
31+
- "SetIamPolicy"
32+
excluded_principals:
33+
- beam-github-actions@apache-beam-testing.iam.gserviceaccount.com
34+
- github-self-hosted-runners@apache-beam-testing.iam.gserviceaccount.com
35+
36+
- name: sa-key-management
37+
description: Monitors creation and deletion of service account keys.
38+
filter_methods:
39+
- "google.iam.admin.v1.IAM.CreateServiceAccountKey"
40+
- "google.iam.admin.v1.IAM.DeleteServiceAccountKey"
41+
excluded_principals:
42+
- beam-github-actions@apache-beam-testing.iam.gserviceaccount.com
43+
- github-self-hosted-runners@apache-beam-testing.iam.gserviceaccount.com

0 commit comments

Comments
 (0)