Skip to content

Commit 042679a

Browse files
Add blog on building scalable multi-tenant systems with aws cdk
1 parent bd472d7 commit 042679a

File tree

4 files changed

+278
-0
lines changed

4 files changed

+278
-0
lines changed
Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
---
2+
title: "Building Scalable Multi-Tenant Systems with AWS CDK: An IAC Approach"
3+
authorId: "mufaddal"
4+
date: 2024-10-30
5+
draft: false
6+
featured: true
7+
weight: 1
8+
---
9+
In this blog I will be taking you on a journey of building the scalable and efficient IAC solution that we build for our multi tenant system. Here we are not going to debate why we choose the CDK; that will be another discussion that can be highlighted in another blog. Instead, how we approached solving using AWS CDK is going to be discussed in this blog. Even if you are not very familiar with CDK, this blog can help to build a mental model of how we can think while writing the code for the infrastructure of such a complex system.
10+
11+
## What are Multi-tenant Systems?
12+
13+
A multi-tenancy architecture uses a single instance of a software application to serve multiple customers. Each customer is referred to as a tenant. Tenants can customize certain aspects of the application, such as the color of the user interface or business rules, but they cannot change the application's code.
14+
15+
While there are mainly three types of multitenant architecture.
16+
17+
1. One Application, One Database: All tenants share a single database.
18+
2. One Application, Multiple Databases: Each tenant has its own database while sharing the same application instance.
19+
3. Multiple Applications and Databases: This is the most complex architecture where multiple services and databases are deployed for each tenant.
20+
21+
In this blog, we will focus on the third architecture, which provides greater flexibility and isolation
22+
23+
## What is AWS CDK?
24+
25+
The AWS Cloud Development Kit (CDK) is an open-source software development framework that allows us to define and provision cloud infrastructure resources with a variety of programming languages.
26+
27+
AWS CDK, which is built on TypeScript, is tightly integrated with AWS CloudFormation, allowing it to leverage its strengths in infrastructure state management. In fact, CDK handles state management in the same way that CloudFormation does, making it easier to manage cloud resources.
28+
29+
## Understanding Our Requirements
30+
31+
Our use case involves the linear growth of services alongside the exponential growth of tenants.A critical requirement is that each tenant must have database isolation to ensure robust tenant data integrity and confidentiality.This leads us to choose an architecture where services and databases for each tenant are deployed in isolation.
32+
33+
Key requirements include:
34+
35+
1. Quick Tenant Onboarding: The onboarding process for new tenants should be streamlined.
36+
2. Service Types: We will differentiate between internal platform services (used internally) and product services (used by end-users), ensuring that platform services can communicate with product services across all tenants.
37+
38+
## Architectural Overview
39+
40+
![multi-tenant-architecture](images/blog/multi-tenant-system-with-aws-cdk/multi-tenant-architecture.png)
41+
42+
To visualize our architecture, consider the following components:
43+
44+
1. Platform Services: These are internal services that interact with product services across all tenants. For example, if SVC 1 is deployed for three tenants, User1, User2, and User3, platform services will connect with these isolated instances.
45+
2. Product Services: These services address specific business needs and are deployed individually for each tenant, complete with their own databases.
46+
3. Tenants: The end-users who utilize these services, ensuring they only access their own data due to database isolation.
47+
48+
## What do we know?
49+
50+
Now let’s briefly see what all things we have in our bucket and what is expected from IAC.
51+
52+
As we were using AWS as our cloud provider, we started looking into finalizing the architecture that we were going to use for our system. After all the R&D, we decided to go with the Multi-VPC architecture that is one of the recommendations from AWS, and yes, this was written in AWS CDK. And hence, taking recommendations from this CDK solution, we were able to achieve a single VPC for a single tenant, which then solved our complete isolation problem along with the platform VPC connectivity with the tenant VPCs. We will be looking at this in detail in this blog too.
53+
54+
Considering we have what we wanted for our networking infrastructure, then for applications we are going to use Fargate ECS services, RDS for databases, SSM for application environment variables, Secret Manager for application secrets, and Route 53 for maintaining the DNS records.
55+
56+
And for continuous integration and continuous deployment we are going to use the Github action. From all this decision, you might realize that we are avoiding anything self-hosted for now.
57+
58+
Before we start looking into CDK code, let me tell you I will only be going through the configuration file with you, not the actual code, because CDK only differs from other IAC tools in that it is written in imperative form, which means we make the configuration file public-facing and the actual code an abstraction, which then helps each member of the org to just learn how to manipulate the configuration file and not the actual code, which helps the infrastructure manipulation be very easy, quick, and scalable.
59+
60+
## IAC of Networking
61+
62+
Let’s first start looking into how we break down the [recommended](https://github.com/aws-samples/aws-vpc-builder-cdk/tree/main) networking architecture to fit our solution.
63+
64+
We took the reference from this [config](https://github.com/aws-samples/aws-vpc-builder-cdk/blob/main/config/sample-firewall-blog.vpcBuilder.yaml) file. Let’s see how we can visualize this configuration file and how the actual output will look like, which can be understood by the below diagram.
65+
66+
![multi-vpc](images/blog/multi-tenant-system-with-aws-cdk/multi-vpc.png)
67+
68+
Let’s discuss in a bit what the components are. Although most of the components are self-explanatory, first start with
69+
70+
Transit Gateway: I haven't mentioned it in the diagram, but to communicate between VPCs, we used the central transit gateway and added the required routes to the dynamicRoutes and defaultRoutes
71+
72+
```yaml
73+
transitGateways:
74+
central:
75+
style: transitGateway
76+
tgwDescription: Central Transit Gateway
77+
dynamicRoutes:
78+
- vpcName: CentralIngress
79+
routesTo: PlatformVpc
80+
inspectedBy: inspectionVpc
81+
- vpcName: CentralIngress
82+
routesTo: TenantVpcA
83+
inspectedBy: inspectionVpc
84+
- vpcName: CentralIngress
85+
routesTo: TenantVpcB
86+
inspectedBy: inspectionVpc
87+
- vpcName: PlatformVpc
88+
routesTo: TenantVpcA
89+
inspectedBy: inspectionVpc
90+
- vpcName: PlatformVpc
91+
routesTo: TenantVpcB
92+
inspectedBy: inspectionVpc
93+
defaultRoutes:
94+
- vpcName: inspectionVpc
95+
routesTo: centralEgress
96+
- vpcName: PlatformVpc
97+
routesTo: centralEgress
98+
inspectedBy: inspectionVpc
99+
- vpcName: TenantVpcA
100+
routesTo: centralEgress
101+
inspectedBy: inspectionVpc
102+
- vpcName: TenantVpcB
103+
routesTo: centralEgress
104+
inspectedBy: inspectionVpc
105+
```
106+
107+
Inspection VPC: This is our firewall VPC, which is going to be middleware between each communication between cross VPC.
108+
109+
```yaml
110+
providers:
111+
firewall:
112+
inspectionVpc:
113+
vpcCidr: 100.64.0.0/16
114+
useTransit: central
115+
style: awsNetworkFirewall
116+
firewallDescription: For Inspection Vpc
117+
firewallName: InspectionEgress
118+
```
119+
120+
For incoming traffic from the internet Central ingress is considered as ingress VPC, and similarly, central egress is the VPC from where all traffic will go out to the internet.
121+
122+
```yaml
123+
providers:
124+
internet:
125+
centralEgress:
126+
vpcCidr: 10.10.0.0/16
127+
useTransit: central
128+
style: natEgress
129+
130+
131+
vpcs:
132+
CentralIngress:
133+
style: workloadPublic
134+
vpcCidr: 10.1.0.0/19
135+
subnets:
136+
loadBalancerSubnet:
137+
cidrMask: 22
138+
PlatformVpc:
139+
style: workloadIsolated
140+
vpcCidr: 10.3.0.0/16
141+
providerInternet: centralEgress
142+
subnets:
143+
workloadSubnet:
144+
cidrMask: 24
145+
databaseSubnet:
146+
cidrMask: 24
147+
loadBalancerSubnet:
148+
cidrMask: 24
149+
```
150+
151+
Platform VPC has connectivity with tenants VPCs, and tenants are not having cross-connectivity as we can verify this with dynamicRoutes.
152+
153+
This setup was the first milestone as a part of the infrastructure, as now to onboard any new tenants we just need to add a small block of code and the routes like below.
154+
155+
```yaml
156+
vpcs:
157+
TenantVpcC:
158+
style: workloadIsolated
159+
vpcCidr: 10.9.0.0/16
160+
providerInternet: centralEgress
161+
subnets:
162+
workloadSubnet:
163+
cidrMask: 24
164+
databaseSubnet:
165+
cidrMask: 24
166+
loadBalancerSubnet:
167+
cidrMask: 24
168+
169+
170+
transitGateways:
171+
central:
172+
style: transitGateway
173+
tgwDescription: Central Transit Gateway
174+
dynamicRoutes:
175+
- vpcName: CentralIngress
176+
routesTo: TenantVpcC
177+
inspectedBy: inspectionVpc
178+
- vpcName: PlatformVpc
179+
routesTo: TenantVpcC
180+
inspectedBy: inspectionVpc
181+
defaultRoutes:
182+
- vpcName: TenantVpcC
183+
routesTo: centralEgress
184+
inspectedBy: inspectionVpc
185+
```
186+
187+
Moving forward from networking to application was going to be a little tricky because considering this networking setup using CDK, we have to be sure that we maintain the consistency across networking and application code for infrastructure.
188+
189+
So we had two options: Either edit the same code to add another support for the application, or create a new CDK project that will only care about the application, considering the networking part is already set up.
190+
191+
We choose to go with the 2nd approach because
192+
193+
1. Change in application-related configuration will be more aggressive than networking.
194+
2. To make application configuration manipulated by developers, we have to keep the unusual data, according to devs, as little as possible in the same place.
195+
3. Changes in networking configuration can impact the entire ecosystem, and hence maintenance of that should only come under specific teams like SRE/DevOps and should not be available to manipulate so easily by any member of the organization.
196+
4. By keeping application IAC separate, it also helps in automating the CI/CD, which is also another topic we can discuss in a further blog.
197+
198+
## IAC of Application
199+
200+
The basic idea of writing AWS CDK code is to bundle the unit of deployment into the same stack. CDK Stack represents a single CloudFormation stack, which is a collection of resources that are deployed together. So here,I have created the stack with a collection of resources that are going to be deployed together and are linked.
201+
202+
This is the most important thing to identify upfront: how much power you want to give on manipulation from the configuration file, because if you try to write the CDK code very generically, then it will, at the end, be going to become like a CloudFormation template, and if you keep everything very coupled, then it will also be going to be a challenge if you want to decouple that.
203+
204+
For example, here I created one type of service stack by identifying the business need: `ecsWithAlbNlbEfs`, in which the ECS service, along with the log group, ALB, EFS, and NLB, is also going to be deployed.
205+
206+
```yaml
207+
services:
208+
servicea:
209+
type: ecsWithAlbNlbEfs
210+
image: 123456789012.dkr.ecr.ap-south-1.amazonaws.com/servicea:v1.0.0
211+
desiredCount: 2
212+
memoryLimitMiB: 512
213+
cpu: 1024
214+
ephemeralStorageGiB: 10
215+
```
216+
217+
From the angle, if you see this, it will help you quickly deploy a similar kind of service, but what if service requirements come like it doesn’t want EFS or NLB? Then what? Either you will update that stack and make the creation of NLB and EFS dynamic, or you can create another stack.
218+
219+
AWS CDK is imperative, and making it dynamic can break in the future if you want to update a single type of service, and the impact will be on all the services with entire tenants, so to avoid such an incident, I must suggest creating a different bundle of stack for different types of services use cases, and when a new type of service requirement comes, just create a new service stack instead of updating the existing one.
220+
221+
```yaml
222+
services:
223+
servicea:
224+
type: ecsWithAlbNlbEfs
225+
image: 123456789012.dkr.ecr.ap-south-1.amazonaws.com/servicea:v1.0.0
226+
desiredCount: 2
227+
memoryLimitMiB: 512
228+
cpu: 1024
229+
ephemeralStorageGiB: 10
230+
serviceb:
231+
type: ecsWithAlbNlbWithoutEfs
232+
image: 123456789012.dkr.ecr.ap-south-1.amazonaws.com/serviceb:v1.0.0
233+
desiredCount: 2
234+
memoryLimitMiB: 512
235+
cpu: 1024
236+
ephemeralStorageGiB: 10
237+
```
238+
239+
So identifying the boundary of what to keep together and what to do differently with respect to the stack should be identified carefully; else there will be some efforts required to move resources from a stack to another.
240+
241+
![cdk-application-infra](images/blog/multi-tenant-system-with-aws-cdk/cdk-application-infra.png)
242+
243+
This above is the architecture overview of application infrastructure written in AWS CDK. From configuration file to visualization, it will help us understand how to write the CDK stacks to make tenants and service onboarding easier.
244+
245+
We created a bunch of stacks by identifying the problems.
246+
247+
### Common Infrastructure Stack
248+
249+
This is the first stack that we have written to create common IAM roles that are going to be used globally, such as the ECS task execution role and the GitHub action role.
250+
251+
### ECS Service Stack (with EFS)
252+
253+
The stack creates the ECS service for the tenants cluster by identifying via configuration file alongside the ALB, NLB, EFS, and security groups.
254+
255+
### RDS Stack
256+
257+
Keeping the stateful resources separate is one of the best practices that we followed, and hence the creation of the RDS stack is kept in a different stack along with the KMS key, security groups, and updating the secret manager with RDS credentials.
258+
259+
### Public ALB
260+
261+
This is one of the common stacks we identified to create a public-facing application load balancer separately by following practices of attaching ACM, proper security group.
262+
263+
### Internal ALB
264+
265+
CDK Stack that is used to create a separate internal ALB, such as in the platform VPC, to communicate with the tenants VPC services
266+
267+
## Conclusion
268+
269+
In conclusion, building a scalable and efficient multi-tenant system on AWS requires careful planning and design. By using AWS CDK, we were able to define and provision our cloud infrastructure resources in a flexible and scalable way. Our approach to separating IAC code for networking and applications allowed us to maintain consistency and make changes more easily. We hope that this blog post has provided a useful example of how to use AWS CDK to build a multi-tenant system.
270+
271+
We look forward to sharing more of our experiences in future blog posts with follow-up questions like below.
272+
273+
1. How to automate the provisioning and updating of CDK infrastructure using GitHub Actions?
274+
2. What are the key factors to consider when deciding between these two popular IaC tools Terraform and AWS CDK?
275+
3. What are the downfalls of choosing multiple applications and databases for a multi-tenant system with a multi-VPC AWS architecture?
276+
4. As the system grows, how to design an application deployment pipeline that accommodates multiple services and tenants?
277+
5. How to ensure a complex system is resilient in the face of disasters or outages?
278+
6. How to normalize our complex multi-VPC system to reduce costs and improve efficiency?
4.36 MB
Loading
255 KB
Loading
447 KB
Loading

0 commit comments

Comments
 (0)