Skip to content

Commit 22c29e2

Browse files
committed
Added more information about partitions
1 parent 67c4f17 commit 22c29e2

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
pcx_content_type: concept
3+
title: Partitions, Filenames and Filepaths
4+
sidebar:
5+
order: 11
6+
7+
---
8+
9+
## Partitions
10+
Partitioning organizes data into directories based on specific fields to improve query performance. It helps by reducing the amount of data scanned for queries, enabling faster reads. By default, Pipelines partitions data by event date. This will be customizable in the future.
11+
12+
For example, the output from a Pipeline in your R2 bucket might look like this:
13+
```sh
14+
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
15+
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
16+
```
17+
18+
## Filepath
19+
Customizing the filepath allows you to store data with a specific prefix inside your specified R2 bucket. The data will remain partitioned by date.
20+
21+
To modify the prefix for a Pipeline using Wrangler:
22+
```sh
23+
wrangler pipelines update <pipeline-name> --filepath "test"
24+
```
25+
26+
All the output records generated by your pipeline will be stored under the prefix "test", and will look like this:
27+
```sh
28+
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
29+
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
30+
```

0 commit comments

Comments
 (0)