Skip to content

Comments

Config Structural Validation using a declarative JSON Schema Part 0: Scripts, Global#1862

Merged
Zash merged 2 commits intomainfrom
ka/jsonschema
Mar 27, 2024
Merged

Config Structural Validation using a declarative JSON Schema Part 0: Scripts, Global#1862
Zash merged 2 commits intomainfrom
ka/jsonschema

Conversation

@Zash
Copy link
Contributor

@Zash Zash commented Nov 10, 2023

Warning

This is public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request
  • business confidential information, such as customer names

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • kind/adr

Release notes

The configuration files are now validated against a JSON Schema.

Platform Administrator notice

When validating config against the schema you may encounter cases where they disagree on what type a field should be and it may be necessary to tweak either the config or the schema to match the intent.

What does this PR do / why do we need this PR?

...

Additional information to reviewers

This adds a huge JSON Schema file (in YAML format) that describes the structure of the config. Feel free to ignore it when reviewing, it does not need to be perfect immediately but can be incrementally perfected over time.

A new tool, yajsv, written in Go, is added as requirement which needs to be installed (bin/install-requirements.bash). Since JSON Schema is an an open specification with multiple implementations, it could be replaced in the future. We could even write our own using one of the available libraries.

To try this, run e.g.:

./bin/ck8s validate sc

Information about JSON schema can be found at https://json-schema.org/

The specifications themselves are found at

Notes about certain common schema fields:

  • default can be used to describe a default value, which could be used in future tooling to actually fill in default values, but for now serves as documentation.
  • example and examples can contain one or many example values. This could be shown in future tooling to improve UX.
  • format allows specifying what kind of string to expect. Common usable values include hostname, uri (for URLs etc), email. Unknown values are ignored by the validator, so common formats could be described and potentially implemented in future tooling.
  • additionalProperties defines schema for a property of an object that is not covered by properties, setting this to false rejects unknown values, and thus detects things that might have been missed.

Screenshots

$ ./bin/ck8s validate sc
[ck8s] Validating sc config
[ck8s] Failed schema validation:
config-sc.yaml: nodeLocalDns.customConfig: Invalid type. Expected: string, given: null
config-sc.yaml: falco.driver.ebpf.path: Invalid type. Expected: string, given: null
config-sc.yaml: certmanager.tolerations: Invalid type. Expected: array, given: object
config-sc.yaml: certmanager.cainjector.tolerations: Invalid type. Expected: array, given: object
config-sc.yaml: certmanager.webhook.tolerations: Invalid type. Expected: array, given: object
config-sc.yaml: certmanager: Must validate all the schemas (allOf)
config-sc.yaml: networkPolicies.global.externalLoadBalancer: Invalid type. Expected: string, given: boolean
config-sc.yaml: networkPolicies.global.ingressUsingHostNetwork: Invalid type. Expected: string, given: boolean
1 of 1 failed validation
[ck8s] Do you want to abort? (y/n): 

Can provide validation and help messages in e.g. VS Code:

image

Could also be used to generate documentation:

image

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts are not affected)
    • The metrics names did change (Grafana dashboards and Prometheus alerts were fixed)
  • Logs checks:
    • The logs do not show any errors after the change
  • Network Policy checks:
    • Any changed pod is covered by Network Policies
    • The change does not cause any dropped packages in the NetworkPolicy Dashboard
  • Pod Security Policy checks:
    • Any changed pod is covered by Pod Security Admission
    • Any changed pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any pods to be blocked by Pod Security Admission or Policies
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Bug checks:
    • The bug fix is covered by regression tests

@Zash Zash requested review from Eliastisys, aarnq, crssnd, davidumea, robinAwallace and viktor-f and removed request for robinAwallace November 10, 2023 10:50
@Zash Zash marked this pull request as ready for review November 10, 2023 10:51
Copy link
Contributor

@robinAwallace robinAwallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the schema very long I have not really gone though it. But it looks promising 🙂 I will try to try it out next week.

Copy link
Contributor

@robinAwallace robinAwallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have manged to try it out now and it is very nice to get error on specific values when they are "wrong" 👍
But I have some thoughts. 🙂

@Zash
Copy link
Contributor Author

Zash commented Nov 28, 2023

In the pipeline, where does this falco setting come from?

[ck8s] Failed schema validation:
wc-config.yaml: falco.alerts.enabled: Invalid type. Expected: boolean, given: string
1 of 1 failed validation

@Zash Zash force-pushed the ka/jsonschema branch 2 times, most recently from ad0978c to d5ffa16 Compare November 30, 2023 13:42
@Zash Zash force-pushed the ka/jsonschema branch 3 times, most recently from 06eb73a to 9d74eab Compare December 14, 2023 13:01
@aarnq
Copy link
Contributor

aarnq commented Jan 17, 2024

Please add yajsv into check_tools and into the tests/Dockerfile.

@aarnq
Copy link
Contributor

aarnq commented Feb 1, 2024

I've created a PR with a helper to regenerate the full diff that is used for the update-ips tests.

@Zash
Copy link
Contributor Author

Zash commented Feb 13, 2024

thoughts on split vs single schema file

split schema into multiple files

pro:

  • easier review with smaller files
  • easy to validate individual sections

con:

  • some tools don't support external references, e.g. that documentation generation tool
  • some tools have poor handling of multi-file schema, e.g. doesn't report which file to look in or reports a 404 error
  • breaks the symmetry-ish between config and schema
  • reference by URL seems needlessly complicated
  • tools may try to download resources over the internet

single schema

pro:

  • simpler references, just $ref: "#/path/somewhere" instead of a full URL and needing $id
  • a larger schema may work as a pressure towards smaller schema, and thus smaller config with more reuse

con:

  • harder to review
  • harder to navigate?

conclusion

tools vs tools tradeoff? I prefer single file. can always switch later.

@aarnq
Copy link
Contributor

aarnq commented Feb 13, 2024

thoughts on split vs single schema file

Yeah, the single schema sounds better in this case.

pro:

  • a larger schema may work as a pressure towards smaller schema, and thus smaller config with more reuse

And this I especially like 😁

I assume the idea is to still have a separate one for secrets, as we don't merge that in?

@robinAwallace
Copy link
Contributor

thoughts on split vs single schema file

tools vs tools tradeoff? I prefer single file. can always switch later.

Very nice summary of the pros and cons. 👍 I agree stick with a single file.

@Zash
Copy link
Contributor Author

Zash commented Feb 14, 2024

I assume the idea is to still have a separate one for secrets, as we don't merge that in?

Yeah, assuming secrets is its own thing. Different kind of thing having its own schema seems sensible. And given the sensitive nature, keeping them separate as long as possible to minimize exposure is probably good.

@Zash Zash marked this pull request as draft March 4, 2024 10:37
@Zash
Copy link
Contributor Author

Zash commented Mar 4, 2024

✅ All checks have passed

🎉

Now to add more tests specifically for schema...

@Eliastisys Eliastisys removed their request for review March 5, 2024 10:02
@Zash Zash marked this pull request as ready for review March 5, 2024 10:46
@Zash
Copy link
Contributor Author

Zash commented Mar 6, 2024

Gonna rebase, reorder and squash a bit.

@Zash Zash requested a review from robinAwallace March 6, 2024 13:24
Copy link
Contributor

@robinAwallace robinAwallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥳 Im happy to merge this. Just one thought, could you mention some were how to setup vscode to work with the validator? 🙂

@Zash
Copy link
Contributor Author

Zash commented Mar 26, 2024

could you mention some were how to setup vscode to work with the validator? 🙂

As in, enabling it here or enabling it somewhere else where you have config?

I can write some extra words about it in .../schema/README.md, it should mainly be having an already recommended plugin and some references in .vscode/settings.json

@robinAwallace
Copy link
Contributor

I can write some extra words about it in .../schema/README.md, it should mainly be having an already recommended plugin and some references in .vscode/settings.json

Yeah, I think it is worth mentioning that, that plugin would enable validation/help in vscode. As you basically already done in this PR description.

JSON Schema is a format for describing the structure of a JSON value,
meant to support validation of data.

This makes it usable for validating the structure of the cluster config
files since JSON and YAML have overlapping data models.

Being an open standard, there are multiple tools and libraries available
for doing validation, as well as generating documentation or boilerplate
code for type-aware languages.

The initial schema only covers the top level 'global' entry for easier
review, additional entries will be added in subsequent PRs

Schema validation for secrets.yaml currently acts on the sops-encrypted
file and thus only sees the basic structure.

Includes a tool to generate basic json (but yaml) schemas, although it
requires some post-processing so it is mostly useful to get a starting
point.

To validate, run `./bin/ck8s validate sc|wc`. If something is wrong it
should print the path and value that violates the schema to make it easy
to fix.

Would have been nice if yajsv could output the the values that fail
validation itself.

suggestions from code review

Co-authored-by: André Arnqvist <58822152+aarnq@users.noreply.github.com>
@Zash
Copy link
Contributor Author

Zash commented Mar 27, 2024

The remaining parts of config/schemas/config.yaml are now in

Each one adds a few more or less related sections to the config schema.

Five is a good number.

@Zash Zash changed the title Validate config against a declarative JSON Schema Config Structural Validation using a declarative JSON Schema Part 0: Scripts, Global Mar 27, 2024
@Zash Zash merged commit 7a75fc3 into main Mar 27, 2024
@Zash Zash deleted the ka/jsonschema branch March 27, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[5]Add structural config validation

3 participants