Skip to content

Conversation

kkrik-es
Copy link
Contributor

Related to #109334

@kkrik-es kkrik-es added >docs General docs changes auto-backport Automatically create backport pull requests when merged Team:StorageEngine :StorageEngine/Logs You know, for Logs v8.18.0 labels Jan 23, 2025
@kkrik-es kkrik-es self-assigned this Jan 23, 2025
Copy link
Contributor

Documentation preview:

@kkrik-es kkrik-es requested a review from martijnvg January 23, 2025 14:37
@kkrik-es kkrik-es marked this pull request as ready for review January 23, 2025 15:23
@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label Jan 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@marciw
Copy link
Contributor

marciw commented Jan 23, 2025

hi @kkrik-es, thanks for these additions.

I need to do an edit for clarity and style before you merge this, but I think it makes sense for @martijnvg to review first. I'll do an edit as soon as I can after that.

@marciw marciw self-requested a review January 23, 2025 22:18
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kkrik-es
Copy link
Contributor Author

I need to do an edit for clarity and style before you merge this, but I think it makes sense for @martijnvg to review first. I'll do an edit as soon as I can after that.

Please do, thanks!

Copy link
Contributor

@marciw marciw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added some suggestions and comments. I'm happy to commit the edits directly to your branch if that's easier.

If I've introduced inaccuracies, please correct me. 🙂 And feel free to adjust line breaks etc. 👍

Comment on lines 79 to 81
NOTE: If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as an object
field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, a single
`host.name` field is mapped as a `keyword` field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the restructuring I suggested in the preceding lines, delete this note. I folded it into my suggestion above.

Comment on lines 128 to 131
Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
(e.g. `host.name` and `host.id` are likely a bad choice).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
(e.g. `host.name` and `host.id` are likely a bad choice).
Logging spikes can cause hotspots by producing documents that all get routed to a single
shard. To prevent hotspots and improve storage efficiency, your configuration should use a few sort fields that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic requires a custom sort config to reduce the likelihood of hotspots, as opposed to working with the default sort config. I think the updated text (and my version, possibly) missed this part. Maybe we can clarify this better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks, I see what you mean. I'd suggest losing the logging spikes sentence -- WDYT of this?

Suggested change
Using a custom sort configuration is required to minimize the possibility of creating hotspots, in case of a
logging spike producing documents that all get routed to a single shard. To prevent this, and to improve storage
efficiency, it is recommended to use a few fields that have a rather low cardinality and don't co-vary
(e.g. `host.name` and `host.id` are likely a bad choice).
A custom sort configuration is required, to minimize hotspots and improve storage efficiency. For best results, use a few sort fields that have a relatively low cardinality and don't co-vary
(for example, `host.name` and `host.id` are not optimal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, though it'd be nice to explain what leads to hotspots - I don't think this is mentioned elsewhere in this page. Another possibility is to include such a note above, where we describe the option for custom sort config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK one more try 🙂

A custom sort configuration is required, to improve storage efficiency and to
minimize hotspots from logging spikes that route documents to a single shard. 
For best results, use a few sort fields that have a relatively low cardinality and
don't co-vary (for example, `host.name` and `host.id` are not optimal).

Copy link
Contributor

@marciw marciw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for applying my changes 👍

@kkrik-es kkrik-es merged commit 3532d0b into elastic:main Jan 27, 2025
5 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

kkrik-es added a commit to kkrik-es/elasticsearch that referenced this pull request Jan 27, 2025
…lastic#120721)

* [DOCS] Update documentation for index sorting and routing for logsdb

* update

* Apply suggestions from code review

Co-authored-by: Marci W <[email protected]>

* Update logs.asciidoc

* Update docs/reference/data-streams/logs.asciidoc

Co-authored-by: Marci W <[email protected]>

* Update logs.asciidoc

---------

Co-authored-by: Marci W <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Jan 27, 2025
…120721) (#120904)

* [DOCS] Update documentation for index sorting and routing for logsdb

* update

* Apply suggestions from code review



* Update logs.asciidoc

* Update docs/reference/data-streams/logs.asciidoc



* Update logs.asciidoc

---------

Co-authored-by: Marci W <[email protected]>
@kkrik-es kkrik-es deleted the logsdb/routing-doc branch January 29, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >docs General docs changes :StorageEngine/Logs You know, for Logs Team:Docs Meta label for docs team Team:StorageEngine v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants