Skip to content

docs: Add new architectural walkthrough doc#987

Open
doshitan wants to merge 4 commits intomainfrom
doshitan/understanding-infra-docs
Open

docs: Add new architectural walkthrough doc#987
doshitan wants to merge 4 commits intomainfrom
doshitan/understanding-infra-docs

Conversation

@doshitan
Copy link
Contributor

It has felt like there is a bit of a missing middle ground in the docs for conceptualizing and then applying the concepts with the infra template. This doc is an attempt to fill the void. Trying to strike a balance between having enough detail to serve the purpose, but not too much to overwhelm and duplicate other documentation.

The visualization is the main point, with the rest of the doc capturing details that don't fit and somewhat of a basis for a script if you were to walk someone through the graphic. As well as a few other misc. tweaks and fixes to related docs.

Although this is in the AWS template repo, it's largely applicable to the current Azure template, so felt reasonable to just have a couple callouts on the differences between the AWS and Azure setup, rather than duplicate entirely.

@doshitan doshitan marked this pull request as ready for review December 11, 2025 20:15
@doshitan doshitan requested a review from a team as a code owner December 11, 2025 20:15
Copy link
Contributor

@sean-navapbc sean-navapbc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with minor changes - The documentation is valuable and well-structured. Just a few typos to fix before merging.

- Layer: Corresponds to root modules and actual resources.

![Visualization of infra code
layers](https://lucid.app/publicSegments/view/623affad-8b51-4482-86e2-f1a3ad1bd623/image.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The visualization is hosted on lucid.app. Consider:
- Is the image publicly accessible long-term?
- Should there be a fallback or the image committed to the repo?
- What if the Lucidchart account changes ownership?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the general approach we've taken before, like on the system architecture diagram (and that image in the README.md). It is a bit strange if non-Nava staff are using the template, as they can't easily actually make a copy for themselves, but that's not a new problem.

Is the image publicly accessible long-term?

Yes.

Should there be a fallback or the image committed to the repo?

I'm generally against committing a copy, but we could.

What if the Lucidchart account changes ownership?

Nava will have a huge migration effort if as an org we move off of Lucidchart, so this would just need to be a part of those plans.

Copy link
Contributor

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good and I support merging as is. The slice concept for me doesn't really quite gel, maybe we can user test this a bit with some engineers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to this doc somewhere from infra/README.md or somewhere?

Comment on lines +33 to +35
- Slice: Conceptual or organizational grouping, it does not directly correspond
to infrastructure resources itself, though may be expressed in the file
structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the right solution, but it feels like the definition of Slice can be a little more precise and narrow. Conceptual grouping feels like it can potentially refer to a lot of things, so until you see examples there isn't really a rigorous definition of what it means.

Also, I think "app environment" as a slice seems analogous to different accounts or different networks as a slice e.g. "nonprod network" or "prod network", but that doesn't seem captured here. Is it because "env-config" is in a folder but network configs are all in one file?

I wonder if there's a concept like "layer instance" to represent the difference between different sets of deployed resources within a layer backed by a different tfbackend file. Would that be helpful? It doesn't seem like the same as "slice" but maybe also a useful concept.


re: Configs, conceptually, network configs could have been structured the same as env-config, or vice versa, we could have put all env configs in a big file, but for now I think network configs are in one file because so far it's simple enough to fit in one file.

I think the way I think about it logically is:

config depends on
project config all layers
# there are no account level configs per account because we treat all accounts the same
network configs per network project config network layer
app configs per app network config, project config build repository layer, database layer, service layer
environment config per app environment app config, network config, project config database layer, service layer, and network layer

note: the network layer actually references both app config and app env configs in these lines

Comment on lines +38 to +39
![Visualization of infra code
layers](https://lucid.app/publicSegments/view/623affad-8b51-4482-86e2-f1a3ad1bd623/image.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be something we address later, but the items in the diagram are spread out which causes the text to be small, so if there's a way we can make the diagram more space efficient it can be more readable

Comment on lines +15 to +31
TL;DR for the why: group resources with a similar lifecycle and scope, reduce
code duplication and drift between environments, help streamline receiving
updates from upstream

It's important to understand most of the actual resource code [lives in
re-usable modules under `/infra/modules/`](/docs/infra/module-architecture.md).
Everything else largely exists to call that re-usable code with the correct
parameters at the correct time, [in a well structured
way](/docs/infra/module-dependencies.md). That "everything else" can be broken
down into three high-level "slices" (some with further internal divisions).

One last bit of exposition, as yet more new terminology has been introduced with
"slice". If you've read other documentation already you'll have likely seen
references to infrastructure "layers", but there are some conceptual
pseudo-layers that, for clarity, that this document calls a "slice". Some slices
directly correspond to a layer, other slices correspond to multiple layers. In
summary:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This part feels just a bit too conversational in tone? I wonder if we could formalize it just a tad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants