Skip to content

Commit 0f35873

Browse files
committed
Update S3KeySensor docs
1 parent 84f5e47 commit 0f35873

File tree

3 files changed

+34
-5
lines changed
  • providers/amazon

3 files changed

+34
-5
lines changed

providers/amazon/docs/operators/s3/s3.rst

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -204,8 +204,8 @@ Wait on an Amazon S3 key
204204
To wait for one or multiple keys to be present in an Amazon S3 bucket you can use
205205
:class:`~airflow.providers.amazon.aws.sensors.s3.S3KeySensor`.
206206
For each key, it calls
207-
`head_object <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.head_object>`__
208-
API (or `list_objects_v2 <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>`__
207+
`head_object <https://docs.aws.amazon.com/boto3/latest/reference/services/s3/client/head_object.html>`__
208+
API (or `list_objects_v2 <https://docs.aws.amazon.com/boto3/latest/reference/services/s3/client/list_objects_v2.html>`__
209209
API if ``wildcard_match`` is ``True``) to check whether it is present or not.
210210
Please keep in mind, especially when used to check a large volume of keys, that it makes one API call per key.
211211

@@ -217,6 +217,7 @@ To check one file:
217217
:start-after: [START howto_sensor_s3_key_single_key]
218218
:end-before: [END howto_sensor_s3_key_single_key]
219219

220+
220221
To check multiple files:
221222

222223
.. exampleinclude:: /../../amazon/tests/system/amazon/aws/example_s3.py
@@ -225,6 +226,7 @@ To check multiple files:
225226
:start-after: [START howto_sensor_s3_key_multiple_keys]
226227
:end-before: [END howto_sensor_s3_key_multiple_keys]
227228

229+
228230
To check a file with regular expression:
229231

230232
.. exampleinclude:: /../../amazon/tests/system/amazon/aws/example_s3.py
@@ -233,6 +235,7 @@ To check a file with regular expression:
233235
:start-after: [START howto_sensor_s3_key_regex]
234236
:end-before: [END howto_sensor_s3_key_regex]
235237

238+
236239
To check with an additional custom check you can define a function which receives a list of matched S3 object
237240
attributes and returns a boolean:
238241

@@ -241,11 +244,13 @@ attributes and returns a boolean:
241244

242245
This function is called for each key passed as parameter in ``bucket_key``.
243246
The reason why the parameter of this function is a list of objects is when ``wildcard_match`` is ``True``,
244-
multiple files can match one key. The list of matched S3 object attributes contain only the size and is this format:
247+
multiple files can match one key. The list of matched S3 object attributes contains the associated S3 key as well as
248+
any valid keys in the ``head_object`` response. If no ``metadata_keys`` are given, it defaults to the size
249+
and is in this format:
245250

246251
.. code-block:: python
247252
248-
[{"Size": int}]
253+
[{"Key": str, "Size": int}]
249254
250255
.. exampleinclude:: /../../amazon/tests/system/amazon/aws/example_s3.py
251256
:language: python
@@ -259,6 +264,14 @@ multiple files can match one key. The list of matched S3 object attributes conta
259264
:start-after: [START howto_sensor_s3_key_function]
260265
:end-before: [END howto_sensor_s3_key_function]
261266

267+
To filter by S3 key:
268+
269+
.. exampleinclude:: /../../amazon/tests/system/amazon/aws/example_s3.py
270+
:language: python
271+
:dedent: 4
272+
:start-after: [START howto_sensor_s3_key_function_filter_definition]
273+
:end-before: [END howto_sensor_s3_key_function_filter_definition]
274+
262275
You can also run this operator in deferrable mode by setting the parameter ``deferrable`` to True.
263276
This will lead to efficient utilization of Airflow workers as polling for job status happens on
264277
the triggerer asynchronously. Note that this will need triggerer to be available on your Airflow deployment.

providers/amazon/src/airflow/providers/amazon/aws/sensors/s3.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ def check_fn(files: List, **kwargs) -> bool:
6666
:param deferrable: Run operator in the deferrable mode
6767
:param use_regex: whether to use regex to check bucket
6868
:param metadata_keys: List of head_object attributes to gather and send to ``check_fn``.
69+
Contains the associated S3 key along with list of given attributes.
6970
Acceptable values: Any top level attribute returned by s3.head_object. Specify * to return
7071
all available attributes.
7172
Default value: "Size".
@@ -113,7 +114,7 @@ def _check_key(self, key, context: Context):
113114
"""
114115
Set variable `files` which contains a list of dict which contains attributes defined by the user
115116
Format: [{
116-
'Size': int
117+
'Key': str, 'Size': int
117118
}]
118119
"""
119120
if self.wildcard_match:

providers/amazon/tests/system/amazon/aws/example_s3.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,21 @@ def check_fn(files: list, **kwargs) -> bool:
8888

8989
# [END howto_sensor_s3_key_function_definition]
9090

91+
# [START howto_sensor_s3_key_function_filter_definition]
92+
def check_fn_with_filter(files: list, **kwargs) -> bool:
93+
"""
94+
Example of custom check: check if one file is bigger than ``20 bytes``
95+
96+
:param files: List of S3 object attributes.
97+
:return: true if the criteria is met
98+
"""
99+
for f in files:
100+
if "hadoop" in f.get("Key", ""):
101+
return f.get("Size", 0) > 20
102+
return True
103+
104+
# [END howto_sensor_s3_key_function_filter_definition]
105+
91106
# [START howto_operator_s3_create_bucket]
92107
create_bucket = S3CreateBucketOperator(
93108
task_id="create_bucket",

0 commit comments

Comments
 (0)