Skip to content

Conversation

@slobodanadamovic
Copy link
Contributor

@slobodanadamovic slobodanadamovic commented Jan 30, 2025

The #120323 PR enabled a feature that now eagerly creates the .security index on cluster formation. Most of our YAML test execute all REST calls using a test user with _es_test_root role which grants all permissions. Because we use this almighty user, this now caused many test failures since they do not account for .security index in their assertions. This PR adjust some test assertions, introduces a dedicated user per test, or for simplicity deletes the .security index before the test.

Resolves #121130
Resolves #121186
Resolves #121238
Resolves #121242
Resolves #121246
Resolves #121131
Resolves #121290
Resolves #120890
Resolves #120920
Resolves #121014
Resolves #121128
Resolves #120965

- gte: { number_of_data_nodes: 1 }
- match: { active_primary_shards: 0 }
- match: { active_shards: 0 }
- gte: { active_primary_shards: 0 }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only cares about reaching a timeout and not how many shards are there.
In some cases it can be 0 and in some 1, depends if .security index is created before test executes.

allowed_warnings:
- "this request accesses system indices: [.security-7], but in a future major version, direct access to system indices will be prevented by default"
indices.delete:
index: .security-*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to keep this test as is and simply remove .security index before the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised deleting that index doesn't cause any problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still a chance that the setup runs before the security index is created and the index will thus not be deleted. I guess that chance is pretty small but are we willing to take that chance?

get_alias_test_role:
cluster: [ ]
indices:
- names: ["test*", "myindex", "non-existent", "another-non-existent", "foo"]
Copy link
Contributor Author

@slobodanadamovic slobodanadamovic Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granting only necessary privileges for get alias YAML tests. This also includes privileges for missing names in order to avoid changing tests that expect 404 instead of 403. The 403 is returned when user requests index for which it has not permissions (regardless if index exists or not). This test is executed both with security enabled and disabled, hence changing it would cause failures when security is disabled.

.setting("xpack.license.self_generated.type", "trial")
.setting("xpack.security.autoconfiguration.enabled", "false")
.user(USER, PASS)
.user("get_alias_test_user", "x-pack-test-password", "get_alias_test_role", false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The users and roles must be defined in the test suit in order to avoid causing failures for tests which do not run with the security enabled.

@slobodanadamovic slobodanadamovic added :Security/Security Security issues without another label Team:Data Management Meta label for data/management team Team:Security Meta label for security team :Data Management/Data streams Data streams and their lifecycles >test Issues or PRs that are addressing/adding tests auto-backport Automatically create backport pull requests when merged labels Jan 30, 2025
@slobodanadamovic slobodanadamovic requested review from a team January 30, 2025 16:57
@slobodanadamovic slobodanadamovic marked this pull request as ready for review January 30, 2025 16:59
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)


- do:
headers:
Authorization: Basic ${login_credentials}
Copy link
Contributor Author

@slobodanadamovic slobodanadamovic Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could simply hardcode precomputed base64 header value, but I wanted to avoid that as it's not human friendly and makes it harder to debug.

Other options I explored were to introduce new field user_credentials, but that turned our to be more complex and not compatible with rest clients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. It seems like a good balance.

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jan 31, 2025
…nd-unmute-tests

# Conflicts:
#	muted-tests.yml
.user("x_pack_rest_user", "x-pack-test-password");
.keystore("bootstrap.password", PASS)
.user("x_pack_rest_user", PASS)
.user("data_stream_test_user", PASS, "data_stream_alias_test_role", true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same data stream YAML tests are running in Serveless. Marked data_stream_test_user as operator to allow setting index.number_of_replicas index setting.

@masseyke masseyke self-requested a review January 31, 2025 19:00
Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this so soon, @slobodanadamovic! I think I'm a little bit on the fence about this approach. Here are my thoughts:

  • I agree that we should try to move away from using an almighty user to make our tests more realistic from a security POV. At the same time, we don't have to address that in this PR. If we can keep it into account, great. But I think I would be more inclined to design a solution that works across the test suite -- defining the roles and users as we do in this PR probably doesn't scale well if we need a bunch of different users/permissions.
  • I'm not a huge fan of having to specify the authorization in every request and generate the login credentials in every test -- that's easy to forget. I do know I am more averse to boilerplate code than most people at ES, so I'm willing to let that go if no one else is bothered by it.
  • Deleting the security index before the test feels like a (rare) failure waiting to happen (see my comment below).
  • There are some other tests failing and I assume there will be more.

I assume it's not an option to make the default user not have access to system indices? That way, we'd switch the default testing behavior to have a more realistic set of permissions and make tests require almighty permissions only when they need it (which would make test writers more conscious about wielding almighty power). Alternatively, I liked the idea that you mentioned on Slack; to extend the test framework to allow tests to declare required permissions. It's unfortunate that that turned out to be more complicated than you expected. Would it be worth exploring that again?

If it weren't for my last bullet point, I'd be ok with merging this as-is to at least unmute these tests and work on a long-term solution in the meantime. But because of that last bullet point, I'm a little worried that this will only be a partial fix and we'll be plugging holes for the coming weeks.

What do you think?

allowed_warnings:
- "this request accesses system indices: [.security-7], but in a future major version, direct access to system indices will be prevented by default"
indices.delete:
index: .security-*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still a chance that the setup runs before the security index is created and the index will thus not be deleted. I guess that chance is pretty small but are we willing to take that chance?

private static final String PASS = "x-pack-test-password";

private static final String ROLES = """
data_stream_alias_test_role:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
data_stream_alias_test_role:
data_stream_test_role:

Or rename the test user to data_stream_alias_test_user for consistency.

@slobodanadamovic
Copy link
Contributor Author

slobodanadamovic commented Feb 3, 2025

Thanks for working on this so soon, @slobodanadamovic! I think I'm a little bit on the fence about this approach. Here are my thoughts:

  • I agree that we should try to move away from using an almighty user to make our tests more realistic from a security POV. At the same time, we don't have to address that in this PR. If we can keep it into account, great. But I think I would be more inclined to design a solution that works across the test suite -- defining the roles and users as we do in this PR probably doesn't scale well if we need a bunch of different users/permissions.
  • I'm not a huge fan of having to specify the authorization in every request and generate the login credentials in every test -- that's easy to forget. I do know I am more averse to boilerplate code than most people at ES, so I'm willing to let that go if no one else is bothered by it.

I'm not a fan of it either, but there is no way around adding boilerplate if we don't want to use a single superuser for each test. Whichever way we choose to implement it, we have to be able to express "run this test with this user" to all downstream clients, and that simply have to be done per test. Every do section executes one REST request. For each REST request we would have to specify user credentials to use (i.e. set Authorization header). Defining users and their roles can be done in setup section even now, but for some tests (like rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.get_alias/10_basic.yml), this is not an option because they are executed both with security enabled and disabled. When security is disabled, all security APIs are unavailable.

  • Deleting the security index before the test feels like a (rare) failure waiting to happen (see my comment below).

++ agreed. I'll revert those changes.

I assume it's not an option to make the default user not have access to system indices? That way, we'd switch the default testing behavior to have a more realistic set of permissions and make tests require almighty permissions only when they need it (which would make test writers more conscious about wielding almighty power).

It's an option, but not a pragmatic one. As much as I'd like to do this, the effort is just not justified at the moment. If security was a first class citizen (and not just a plugin), we could make many enforcements. I think it's fair to leave this to developers for now.

Alternatively, I liked the idea that you mentioned on Slack; to extend the test framework to allow tests to declare required permissions. It's unfortunate that that turned out to be more complicated than you expected. Would it be worth exploring that again?

It's something worth exploring for sure. But based on the initial look at it, it's non-trivial effort to introduce. Being non-trivial is not a reason not to do it, it's just not where prioritise are.
Even adding the #base64EncodeInput is something I very much underestimated. This change requires adjusting all language specific YAML runners (php, C#, python, go, java, etc...) in order to understand it. Hence, I will revert that change.

  • There are some other tests failing and I assume there will be more.

If it weren't for my last bullet point, I'd be ok with merging this as-is to at least unmute these tests and work on a long-term solution in the meantime. But because of that last bullet point, I'm a little worried that this will only be a partial fix and we'll be plugging holes for the coming weeks.

What do you think?

Having the .security index created on cluster "startup" was expected to disrupt a lot of tests that did not account for it previously. There is no single solution to fit all of them. I think we have to look at each test individually and make appropriate adjustments where needed.

After giving it some more thoughts I'd be leaning towards disabling the feature (for now) that creates .security index for these core YAML tests. The auto-creation of the .security index is only relevant for testing that built-in roles are queryable via Query Role API. There is no benefit (coverage-wise) in having it enabled for all YAML tests, especially for the ones that simply want to have more controlled environment when testing specific feature (e.g. get alias API).

With all that said, I'll be closing this PR and addressing these failures by disabling the security queryable feature.

@nielsbauman and @masseyke Thank you both for the feedback and sorry for !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Data Management/Data streams Data streams and their lifecycles :Security/Security Security issues without another label serverless-linked Added by automation, don't add manually Team:Data Management Meta label for data/management team Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v8.18.1 v8.19.0 v9.0.0 v9.1.0

Projects

None yet

4 participants