Skip to content

Commit 37fd83d

Browse files
committed
Added links
1 parent ac038f2 commit 37fd83d

File tree

1 file changed

+18
-19
lines changed

1 file changed

+18
-19
lines changed

docs/internal/DistributedArchitectureGuide.md

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -145,15 +145,14 @@ Some concepts are applicable to both cluster and project scopes, e.g. [persisten
145145
### Translog
146146

147147
[Basic write model]:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-replication.html
148-
[Translog]:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/Translog.java
149148
[History retention]:https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-history-retention.html
150149

151150
It is important to understand first the [Basic write model] of documents:
152151
documents are written to Lucene in-memory buffers, then "refreshed" to searchable segments which may not be persisted on disk, and finally "flushed" to a durable Lucene commit on disk.
153152
This means that newly ingested data may be lost if there is an outage before the next persisted commit on disk.
154-
For this reason, newly ingested data is also written to a shard's [Translog], whose main purpose is to persist uncommitted operations (e.g., document insertions or deletions), so they can be replayed in the event of ephemeral failures such as a crash or power loss.
153+
For this reason, newly ingested data is also written to a shard's `[Translog](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/Translog.java)`, whose main purpose is to persist uncommitted operations (e.g., document insertions or deletions), so they can be replayed in the event of ephemeral failures such as a crash or power loss.
155154
The translog is always persisted and fsync'ed on disk before acknowledging writes back to the user.
156-
This can be seen in `InternalEngine` which calls the `add()` method of the translog to append operations, e.g., its `index()` method at some point adds a document insertion operation to the translog.
155+
This can be seen in `[InternalEngine](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java)` which calls the `add()` method of the translog to append operations, e.g., its `index()` method at some point adds a document insertion operation to the translog.
157156
The translog ultimately truncates operations once they have been flushed to disk by a Lucene commit.
158157

159158
Main usages of the translog are:
@@ -169,12 +168,12 @@ Main usages of the translog are:
169168
Translog files are automatically truncated when they are no longer needed, specifically after all their operations have been persisted by Lucene commits on disk.
170169
Lucene commits are initiated by flushes (e.g., with the index [Flush API]).
171170

172-
Flushes may also be automatically initiated by Elasticsearch, e.g., if the translog exceeds a configurable size (`INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING`) or age (`INDEX_TRANSLOG_FLUSH_THRESHOLD_AGE_SETTING`),which ultimately truncates the translog as well.
171+
Flushes may also be automatically initiated by Elasticsearch, e.g., if the translog exceeds a configurable size ([`INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING`](https://github.com/elastic/elasticsearch/blob/dd1db5031ee7fdac284753c0c3b096b0e981d71a/server/src/main/java/org/elasticsearch/index/IndexSettings.java#L352)) or age ([`INDEX_TRANSLOG_FLUSH_THRESHOLD_AGE_SETTING`](https://github.com/elastic/elasticsearch/blob/dd1db5031ee7fdac284753c0c3b096b0e981d71a/server/src/main/java/org/elasticsearch/index/IndexSettings.java#L370)),which ultimately truncates the translog as well.
173172

174173
#### Acknowledging writes
175174

176-
A bulk request will repeateadly call ultimately the Engine methods such as `index()` or `delete()` which adds operations to the Translog.
177-
Finally, the AfterWrite action of the `TransportWriteAction` will call `indexShard.syncAfterWrite()` which will put the last written transloc Location of the bulk request into a `AsyncIOProcessor` that is responsible for gradually fsync'ing the Translog and notifying any waiters.
175+
A bulk request will repeateadly call ultimately the Engine methods such as [`index()` or `delete()`](https://github.com/elastic/elasticsearch/blob/591fa87e43a509d3eadfdbbb296cdf08453ea91a/server/src/main/java/org/elasticsearch/index/engine/Engine.java#L546-L564) which adds operations to the Translog.
176+
Finally, the AfterWrite action of the `[TransportWriteAction](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/replication/TransportWriteAction.java)` will call `[indexShard.syncAfterWrite()](https://github.com/elastic/elasticsearch/blob/387eef070c25ed57e4139158e7e7e0ed097c8c98/server/src/main/java/org/elasticsearch/action/support/replication/TransportWriteAction.java#L548)` which will put the last written transloc Location of the bulk request into a `[AsyncIOProcessor](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/util/concurrent/AsyncIOProcessor.java)` that is responsible for gradually fsync'ing the Translog and notifying any waiters.
178177
Ultimately the bulk request is notified that the translog has fsync'ed passed the requested location, and can continue to acknowledge the bulk request.
179178

180179
#### Translog internals
@@ -184,32 +183,32 @@ Ultimately the bulk request is notified that the translog has fsync'ed passed th
184183
Each translog is a sequence of files, each identified by a translog generation ID, each containing a sequence of operations, with the last file open for writes.
185184
The last file has a part which has been fsync'ed to disk, and a part which has been written but not necessarily fsync'ed yet to disk.
186185
Each operation is identified by a sequence number (`seqno`), which is monotonically increased by the engine's ingestion functionality.
187-
A `Checkpoint` file is also maintained, that contains, among other information, the current translog generation ID, and its last fsync'ed operation and location, the minimum translog generation ID, and the minimum and maximum sequence number of operations the sequence of translog generations include.
186+
A [`Checkpoint`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/Checkpoint.java) file is also maintained, that contains, among other information, the current translog generation ID, and its last fsync'ed operation and location, the minimum translog generation ID, and the minimum and maximum sequence number of operations the sequence of translog generations include.
188187
When the translog rolls over, e.g., upon the translog file exceeding a configurable size, a new file in the sequence is created for writes, and the last one becomes read-only.
189188
A new commit flushed to the disk will also induce a translog rollover, since the operations in the translog so far will become eligible for truncation.
190189

191190
A few more words on terminology and classes used around the translog Java package.
192-
A `Location` of an operation is defined by the translog generation file it is contained in, the offset of the operation in that file, and the number of bytes that encode that operation.
193-
An `Operation` can be a document indexed, a document deletion, or a no-op operation.
194-
A `Snapshot` iterator can be created to iterate over a range of requested operation sequence numbers read from the translog files.
191+
A [`Location`](https://github.com/elastic/elasticsearch/blob/693f3bfe30271d77a6b3147e4519b4915cbb395d/server/src/main/java/org/elasticsearch/index/translog/Translog.java#L977) of an operation is defined by the translog generation file it is contained in, the offset of the operation in that file, and the number of bytes that encode that operation.
192+
An [`Operation`](https://github.com/elastic/elasticsearch/blob/693f3bfe30271d77a6b3147e4519b4915cbb395d/server/src/main/java/org/elasticsearch/index/translog/Translog.java#L1087) can be a document indexed, a document deletion, or a no-op operation.
193+
A [`Snapshot`](https://github.com/elastic/elasticsearch/blob/693f3bfe30271d77a6b3147e4519b4915cbb395d/server/src/main/java/org/elasticsearch/index/translog/Translog.java#L711) iterator can be created to iterate over a range of requested operation sequence numbers read from the translog files.
195194
A retention lock can be acquired for [History retention] purposes, e.g., for potentially facilitating a replica shard's recovery, which prohibits truncating the translog files.
196-
The `sync()` method is the one that fsync's the current translog generation file to disk, and updates the checkpoint file with the last fsync'ed operation and location.
197-
The `rollGeneration()` method is the one that rolls the translog, creating a new translog generation, e.g., called during an index flush.
198-
The `createEmptyTranslog()` method creates a new translog, e.g., for a new empty index shard.
199-
Each translog file starts with a `TranslogHeader` that is followed by translog operations.
195+
The [`sync()`](https://github.com/elastic/elasticsearch/blob/693f3bfe30271d77a6b3147e4519b4915cbb395d/server/src/main/java/org/elasticsearch/index/translog/Translog.java#L813) method is the one that fsync's the current translog generation file to disk, and updates the checkpoint file with the last fsync'ed operation and location.
196+
The [`rollGeneration()`](https://github.com/elastic/elasticsearch/blob/693f3bfe30271d77a6b3147e4519b4915cbb395d/server/src/main/java/org/elasticsearch/index/translog/Translog.java#L1656) method is the one that rolls the translog, creating a new translog generation, e.g., called during an index flush.
197+
The [`createEmptyTranslog()`](https://github.com/elastic/elasticsearch/blob/693f3bfe30271d77a6b3147e4519b4915cbb395d/server/src/main/java/org/elasticsearch/index/translog/Translog.java#L1929) method creates a new translog, e.g., for a new empty index shard.
198+
Each translog file starts with a [`TranslogHeader`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/TranslogHeader.java) that is followed by translog operations.
200199

201200
Some internal classes used for reading and writing from the translog are the following.
202-
A `TranslogReader` can be used to read operation bytes from a translog file.
203-
A `TranslogSnapshot` can be used to iterate operations from a translog reader.
204-
A `MultiSnapshot` can be used to iterate operations over multiple `TranslogSnapshot`s.
205-
A `TranslogWriter` can be used to write operations to the translog.
201+
A [`TranslogReader`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/TranslogReader.java) can be used to read operation bytes from a translog file.
202+
A [`TranslogSnapshot`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/TranslogSnapshot.java) can be used to iterate operations from a translog reader.
203+
A [`MultiSnapshot`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/MultiSnapshot.java) can be used to iterate operations over multiple `TranslogSnapshot`s.
204+
A [`TranslogWriter`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/translog/TranslogWriter.java) can be used to write operations to the translog.
206205

207206
#### Real-time GETs from the translog
208207

209208
[Get API]:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-get.html
210209

211210
The [Get API] (and by extension, the multi-get API) supports a real-time mode, which can query documents by ID, even recently ingested documents that have not yet been refreshed and not searchable.
212-
This capability is facilitated by another data structure, the `LiveVersionMap`, which maps recently ingested documents by their ID to the translog location that encodes their indexing operation.
211+
This capability is facilitated by another data structure, the [`LiveVersionMap`](https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/index/engine/LiveVersionMap.java), which maps recently ingested documents by their ID to the translog location that encodes their indexing operation.
213212
That way, we can return the document by reading the translog operation.
214213

215214
The tracking in the version map is not enabled by default.

0 commit comments

Comments
 (0)