Skip to content

Commit c4890de

Browse files
committed
feat: s3 integration docs added
1 parent b74042b commit c4890de

File tree

2 files changed

+50
-3
lines changed

2 files changed

+50
-3
lines changed

docs/data_engineering/data_lakehouse/delta_lake/rust/02_insert_data.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,20 @@ async fn main() {
1515
Field::new("name", ArrowDataType::Utf8, true),
1616
]));
1717

18-
// Create a employee record
18+
// Create employee records
1919
let ids = Int32Array::from(vec![1, 2, 3]);
2020
let names = StringArray::from(vec!["Tom", "Tim", "Titus"]);
2121
let employee_record = RecordBatch::try_new(schema, vec![Arc::new(ids), Arc::new(names)]).unwrap();
2222

23-
// Insert record
23+
// Insert records
2424
let table = DeltaOps(table).write(vec![employee_record]).await.expect("Insert failed");
2525
}
26-
2726
```
2827

28+
> The Arrow Rust array primitives are _very_ fickle and so creating a direct transformation is quite tricky in Rust, whereas in Python or another loosely typed language it might be simpler.
29+
30+
[(source)](https://github.com/delta-io/delta-rs/blob/99e39ca1ca372211cf7b90b62d33878fa961881c/crates/deltalake/examples/recordbatch-writer.rs#L156)
31+
2932
## Overwrite
3033

3134
The default save mode for the `delta_ops.write` function is `SaveMode::Append`. To overwrite existing data instead of appending, use `SaveMode::Overwrite`:
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# S3 storage integration
2+
3+
```rust
4+
use deltalake::{DeltaOps, DeltaTableBuilder, DeltaTableError};
5+
use std::env;
6+
7+
fn configure_s3() {
8+
// Set S3 configuration options using environment variables
9+
env::set_var("AWS_ENDPOINT_URL", "http://localhost:5561");
10+
env::set_var("AWS_REGION", "us-east-1");
11+
env::set_var("AWS_ACCESS_KEY_ID", "admin");
12+
env::set_var("AWS_SECRET_ACCESS_KEY", "password");
13+
env::set_var("AWS_ALLOW_HTTP", "true");
14+
env::set_var("AWS_S3_ALLOW_UNSAFE_RENAME", "true");
15+
16+
// Register AWS S3 handlers for Delta Lake operations
17+
deltalake::aws::register_handlers(None);
18+
}
19+
20+
/// Builds a `DeltaOps` instance for the specified Delta table.
21+
/// Enabling operations such as creating, reading and writing data in the Delta Lake format.
22+
async fn get_delta_ops(table_name: &str, load_state: bool) -> Result<DeltaOps, DeltaTableError> {
23+
let delta_table_builder = DeltaTableBuilder::from_uri(format!("s3://data-lakehouse/{}", table_name));
24+
let delta_table = match load_state {
25+
// Load the existing table state
26+
true => delta_table_builder.load().await?,
27+
// Build the table without loading existing state
28+
false => delta_table_builder.build()?,
29+
};
30+
31+
Ok(DeltaOps::from(delta_table))
32+
}
33+
34+
#[tokio::main()]
35+
async fn main() {
36+
configure_s3();
37+
38+
let table_name = "employee";
39+
let load_state = false;
40+
let delta_ops = get_delta_ops(table_name, load_state).await.expect("Failed to create data_ops object");
41+
}
42+
```
43+
44+
If the table doesn't exist yet, the `load_state` parameter in `get_delta_ops` should be set to `false`, as setting it to `true` would attempt to read a non-existent state, resulting in an error. On the other hand, if you want to read from an existing table, `load_state` must be set to `true` to successfully load the data; otherwise, the load operation will fail.

0 commit comments

Comments
 (0)