Skip to content

[Streams] be more descriptive about when to partition and when not to partition #3991

@LucaWintergerst

Description

@LucaWintergerst

modify https://github.com/elastic/docs-content/blob/main/solutions/observability/streams/management/partitioning.md

I want us to add something along the lines of:


When should I partition my data? How fine grained should partitioning be?
Try having 10s of partitions, instead of hundreds.

Partitioning your data makes it a little easier to manage. If you have lots of different systems logging into a single stream, it can make sense to partition a subset of the data into a separate stream. For example, maybe you would like all your custom logs for a certain team or department in one stream.

Each partition comes with a cost, as it creates a data stream in Elasticsearch under the hood. You can have many of them, but not unlimited, so don't be too aggresive.

Partition the data by teams, overarching technologies (webservers in one stream, custom app logs in another). Don't partition by high cardinality attributes. Even partitioning on service.name is too much usually.

When do I need a partition?
Technically speaking, the only good reason why you need a partition is if you want to control the lifecycle of a subset of your data separately from the rest.
Let's say you have 2 things logging into logs - a noisy firewall and a quiet custom application. The noisy firewall logs you don't need that long, and they take up lots of disk space, so you prefer to delete them sooner.

in this case it can make a log of sense to partition and then assign a different ilm policy or retention setting

logs
- logs.firewall [7d]
- logs.custom-app [30d]

Metadata

Metadata

Assignees

Labels

Team:ExperienceIssues owned by the Experience Docs Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions