Skip to content

Conversation

@julian-elastic
Copy link
Contributor

No description provided.

julian-elastic and others added 30 commits June 24, 2025 08:43
…ect mismatch (elastic#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
…elastic#129601)

TransportStartDatafeedAction previously tried to validate remote index cluster
names in datafeed jobs, before checking if the local cluster had
remote_cluster_client role. Because this role enables retrieval of the remote
cluster names, the validation step would always fail with a no-such-cluster
exception. This was confusing. This change moves the remote_cluster_client check
ahead of cluster name validation, and adds a test.

Closes ES-11841
Closes elastic#121149
This commit upgrades to Upgrade to Lucene 10.2.2.

With the release of 10.2.2, we no longer need to workaround the Lucene bug mentioned in 128671.
With this change we will create first the tmp file and the posting list and once the file is deleted we will 
merge the vectors on the vec file. Therefore we only have two copies of the vector at the same time.
…kSpaceTests testAbortingOrRunningMergeTaskHoldsUpBudget elastic#129823
…lastic#129538)

Due to the way how stored fields get flushed when index sorting is active, it is possible that we encounter significant page cache faults when memory is scarce. In order to mitigate some of the slowness around this, we're planning to no longer mmap the fdt temp file. Initially behind a feature flag, to check for unforeseen side effects.

Typically using always mmap directory is better compared to noifs directory given there is a sufficient memory available to the OS for filesystem caching. However when that isn't the case, then indexing performance can vary a lot (often very slow). This is more true for files tmp files that stored fields create during flushing. These files exist for only a brief moment to sort stored fields in the order of the configured index sorting and are then removed. If these tmp files are mmapped there is risk to trash file system cache.

This change only avoids using mmap for the fdt tmp file. This the file that actually contains the data and can large compared to other files that get flushed. The fdm (metadata) and fdi (stored field index) remain being mmapped.
This change makes the GeoIp persistent task executor/downloader multi-project aware. 
- the database downloader persistent task will be at the project level, meaning there will be a downloader instance per project
- persistent task id is prefixed with project id, namely `<project-id>/geoip-downloader` for cluster in MP mode
Splits up bc and pr upgrade tests as they tend to be a bottle neck in intake and pr builds nowadays.
This commit ports the IndexVersions.UPGRADE_TO_LUCENE_9_12_2 constant to the main branch.

This is required after the update of Lucene 9.12.2 in the 8.19 branch, see elastic#129555.
…#129227)

A followup to elastic#128440, which introduces a new `managed_by` field (`<1>`) that will be returned in the response of the Authenticate API.


Besides `managed_by` field, it also captures additional `internal` field (`<2>`) for cloud API key authentication and exposes it as part of the `api_key` fields.

```json
{
  "username": "omSAd5YBK3gZiBcD-GvX", 
  "roles": [ "viewer" ],
  "metadata": {
    ...
  },
  "enabled": true,
  "authentication_realm": {
    "name": "_cloud_api_key",
    "type": "_cloud_api_key"
  },
  "lookup_realm": {
    "name": "_cloud_api_key",
    "type": "_cloud_api_key"
  },
  "authentication_type": "api_key",
  "api_key": { 
    "id": "omSAd5YBK3gZiBcD-GvX",
    "name": "my cloud API key",
    "managed_by": "cloud", <1>
    "internal": false <2>
  }
}

```


- Additionally it implements the `Authentication#canAccessResourcesOf` for the cloud API keys. Ownership check allows access only to the same cloud API key.

- And lastly, adds a consistency check for cloud API keys in `Authentication#checkConsistencyForApiKeyAuthenticationType`.
Add a new logical plan optimization:

When there is a Project (KEEP/DROP/RENAME/renaming EVALs) in a LOOKUP JOIN's left child (the "main" side), perform the Project after the LOOKUP JOIN. This prevents premature field extractions when the lookup join happens on data nodes.
Now that elastic#128589, we do not need to use the serverless FF to skip the check.
This PR removes it.

Relates to https://elasticco.atlassian.net/browse/ES-12004
@github-actions
Copy link
Contributor

🔍 Preview links for changed docs:

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

@julian-elastic
Copy link
Contributor Author

please ignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.