@@ -380,6 +380,102 @@ location for the data set to be saved there.
380380 From here you can train a model using this data set and then perform
381381inference.
382382
383+ .. rubric :: Using the Offline Store SDK: Getting Started
384+ :name: bCe9CA61b79
385+
386+ The Feature Store Offline SDK provides the ability to quickly and easily
387+ build ML-ready datasets for use by ML model training or pre-processing.
388+ The SDK makes it easy to build datasets from SQL join, point-in-time accurate
389+ join, and event range time frames, all without the need to write any SQL code.
390+ This functionality is accessed via the DatasetBuilder class which is the
391+ primary entry point for the SDK functionality.
392+
393+ .. code :: python
394+
395+ from sagemaker.feature_store.feature_store import FeatureStore
396+
397+ feature_store = FeatureStore(sagemaker_session = feature_store_session)
398+
399+ .. code :: python
400+
401+ base_feature_group = identity_feature_group
402+ target_feature_group = transaction_feature_group
403+
404+ You can create dataset using `create_dataset ` of feature store API.
405+ `base ` can either be a feature group or a pandas dataframe.
406+
407+ .. code :: python
408+
409+ result_df, query = feature_store.create_dataset(
410+ base = base_feature_group,
411+ output_path = f " s3:// { s3_bucket_name} "
412+ ).to_dataframe()
413+
414+ If you want to join other feature group, you can specify extra
415+ feature group using `with_feature_group ` method.
416+
417+ .. code :: python
418+
419+ dataset_builder = feature_store.create_dataset(
420+ base = base_feature_group,
421+ output_path = f " s3:// { s3_bucket_name} "
422+ ).with_feature_group(target_feature_group, record_identifier_name)
423+
424+ result_df, query = dataset_builder.to_dataframe()
425+
426+ .. rubric :: Using the Offline Store SDK: Configuring the DatasetBuilder
427+ :name: bCe9CA61b80
428+
429+ How the DatasetBuilder produces the resulting dataframe can be configured
430+ in various ways.
431+
432+ By default the Python SDK will exclude all deleted and duplicate records.
433+ However if you need either of them in returned dataset, you can call
434+ `include_duplicated_records ` or `include_deleted_records ` when creating
435+ dataset builder.
436+
437+ .. code :: python
438+
439+ dataset_builder.include_duplicated_records()
440+ dataset_builder.include_deleted_records()
441+
442+ The DatasetBuilder provides `with_number_of_records_from_query_results ` and
443+ `with_number_of_recent_records_by_record_identifier ` methods to limit the
444+ number of records returned for the offline snapshot.
445+
446+ `with_number_of_records_from_query_results ` will limit the number of records
447+ in the output. For example, when N = 100, only 100 records are going to be
448+ returned in either the csv or dataframe.
449+
450+ .. code :: python
451+
452+ dataset_builder.with_number_of_records_from_query_results(number_of_records = N)
453+
454+ On the other hand, `with_number_of_recent_records_by_record_identifier ` is
455+ used to deal with records which have the same identifier. They are going
456+ to be sorted according to `event_time ` and return at most N recent records
457+ in the output.
458+
459+ .. code :: python
460+
461+ dataset_builder.with_number_of_recent_records_by_record_identifier(number_of_recent_records = N)
462+
463+ Since these functions return the dataset builder, these functions can
464+ be chained.
465+
466+ .. code :: python
467+
468+ dataset_builder
469+ .with_number_of_records_from_query_results(number_of_records = N)
470+ .include_duplicated_records()
471+ .with_number_of_recent_records_by_record_identifier(number_of_recent_records = N)
472+ .to_dataframe()
473+
474+ There are additional configurations that can be made for various use cases,
475+ such as time travel and point-in-time join. These are outlined in the
476+ Feature Store `DatasetBuilder API Reference
477+ <https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#dataset-builder> `__.
478+
383479.. rubric :: Delete a feature group
384480 :name: bCe9CA61b78
385481
@@ -395,3 +491,4 @@ The following code example is from the fraud detection example.
395491
396492 identity_feature_group.delete()
397493 transaction_feature_group.delete()
494+
0 commit comments