-
Notifications
You must be signed in to change notification settings - Fork 163
Server Concepts
The above diagram shows a brief Backend.AI server-side architecture where the components are what you need to install and configure.
Each border-connected group of components is intended to be run on the same server, but you may separate them or merge different groups as you need. For example, you can use separate servers for the nginx reverse-proxy and the Backend.AI manager or run both on a single server. In the development setup, all these components run on a single PC such as your laptop.
You may use your own on-premise server farm or a public cloud service such as AWS, GCP, or Azure. The primary requirements are:
- The manager server (the HTTPS 443 port) should be exposed to the public Internet or the network that your client can access.
- The manager, agents, and all other database/storage servers should reside at the same local private network where any traffic between them are transparently allowed.
In Backend.AI, we generally call the containers spawned upon user requests as kernels. In detail, what the user requests is a compute session (with user-provided options), and kernels are the members of that session. This means that a single compute session may have multiple kernels across different agent servers for parallel and distribute processing.
Redis and PostgreSQL are used to keep track of liveness of agents and compute sessions (which may be composed of one or more kernels). They also store user metadata such as keypairs and resource usage statistics. You can just follow standard installation procedures for them. To spin up your Backend.AI cluster for the first time, you need to load the SQL schema into the PostgreSQL server, but nothing is required for the Redis server. Please check out the installation guides for details.
etcd is used to share configurations across all the manager and agent servers. To spin up your Backend.AI cluster for the first time, you need to preload some data into the etcd. Please check out the installation guides for details.
The network storage is used for providing "virtual folder" functions. The client users may create their own virtual folders to copy data files and shared library files, and then mount the virtual folder when spawning a new compute session to access them like local files.
The implementation can be anything that provides a local mount point at each server including both the manager and agents—Backend.AI only requires a known local UNIX path as configuration that must be same across all manager and agnet servers. Common setups may use a dedicated NFS or SMB server, but for more scalability, one might want to use distributed file systems such as GlusterFS or Alluxio where their local agents run on each Backend.AI agent servers providing fast in-memory cache while backed by another storage server/service such as AWS S3.
For local development setup, you may simply use a local empty directory for this.