Skip to content

Add markdown guide for data connect internals #8831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions docs/data-connect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
Firebase Data Connect, is a feature that enables developers to build GraphQL-based data APIs backed by existing SQL databases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comma after 'Firebase Data Connect' is grammatically incorrect and slightly hinders readability.

Suggested change
Firebase Data Connect, is a feature that enables developers to build GraphQL-based data APIs backed by existing SQL databases.
Firebase Data Connect is a feature that enables developers to build GraphQL-based data APIs backed by existing SQL databases.


### **Product Overview**

Data Connect acts as a bridge between a project's frontend (web or mobile app) and its backend Cloud SQL for PostgreSQL database. Developers define their data model and business logic using a GraphQL schema (`*.gql`). The Firebase CLI then bundles these definitions and deploys them as a managed Data Connect service. This service exposes a secure GraphQL endpoint that applications can query. The system automatically handles aspects like identity and access management (IAM) integration, schema validation, and database migrations. For local development, the Emulator Suite includes a Data Connect emulator that uses an in-memory PostgreSQL instance (pg-lite) to simulate the production environment, allowing for rapid iteration without needing a live database.

### **Project Structure**

The `firebase-tools` codebase is structured to separate core functionalities, command definitions, and emulator logic.

```
firebase-firebase-tools/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The repository name appears to be duplicated here. For accuracy, it should be firebase-tools/.

Suggested change
firebase-firebase-tools/
firebase-tools/

└── src/
├── commands/ # Defines user-facing CLI commands (e.g., deploy, init).
│ ├── dataconnect-*.ts # Specific commands for Data Connect.
├── dataconnect/ # Core logic for Data Connect feature.
│ ├── client.ts # Client for interacting with the Data Connect backend API.
│ ├── load.ts # Logic for loading and parsing dataconnect.yaml and connector.yaml.
│ ├── schemaMigration.ts # Handles SQL schema diffing and migration.
│ └── build.ts # Logic for building schema and connectors from source.
├── emulator/
│ ├── auth/ # Auth Emulator implementation.
│ ├── dataconnect/ # Data Connect Emulator specific logic.
│ │ └── pgliteServer.ts # Manages the in-memory pglite Postgres instance.
│ ├── storage/ # Storage Emulator implementation.
│ ├── controller.ts # Main controller to start, stop, and manage emulators.
│ ├── downloadableEmulators.ts # Handles downloading of JAR-based emulators.
│ ├── hub.ts # The central discovery service for all emulators.
│ └── registry.ts # A static registry for running emulators to discover each other.
Comment on lines +23 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The project structure diagram is missing the dataconnectEmulator.ts file, which is a key component mentioned later in the document. The description for the dataconnect/ directory could be more precise.

Suggested change
│ ├── dataconnect/ # Data Connect Emulator specific logic.
│ │ └── pgliteServer.ts # Manages the in-memory pglite Postgres instance.
│ ├── storage/ # Storage Emulator implementation.
│ ├── controller.ts # Main controller to start, stop, and manage emulators.
│ ├── downloadableEmulators.ts # Handles downloading of JAR-based emulators.
│ ├── hub.ts # The central discovery service for all emulators.
│ └── registry.ts # A static registry for running emulators to discover each other.
│ ├── dataconnect/ # Core logic for Data Connect feature.
│ │ ├── client.ts # Client for interacting with the Data Connect backend API.
│ │ ├── load.ts # Logic for loading and parsing dataconnect.yaml and connector.yaml.
│ │ ├── schemaMigration.ts # Handles SQL schema diffing and migration.
│ │ └── build.ts # Logic for building schema and connectors from source.
│ ├── emulator/
│ ├── auth/ # Auth Emulator implementation.
│ ├── dataconnect/ # Data Connect Emulator specific logic.
│ │ └── pgliteServer.ts # Manages the in-memory pglite Postgres instance.
│ │ └── dataconnectEmulator.ts # The Data Connect emulator.
│ ├── storage/ # Storage Emulator implementation.
│ ├── controller.ts # Main controller to start, stop, and manage emulators.
│ ├── downloadableEmulators.ts # Handles downloading of JAR-based emulators.
│ ├── hub.ts # The central discovery service for all emulators.
│ └── registry.ts # A static registry for running emulators to discover each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would expect those FDC code be there.

  • src/deploy/dataconnect
  • src/mcp/tools/dataconnect
  • template/init/dataconnect
  • src/init/dataconnect

├── api.ts # Defines production API origins and client configurations.
├── apiv2.ts # A modern, fetch-based HTTP client for making API requests.
├── auth.ts # Handles user authentication, including multi-account login.
├── command.ts # Base class for all CLI commands.
├── config.ts # Logic for parsing and handling firebase.json.
├── index.ts # Main entry point for the CLI.
└── requireAuth.ts # Middleware for commands that require authentication.
```

### **Core Components and Interactions**

#### **1. Command Framework**

The CLI is built on a command framework that simplifies defining and executing commands.

- **`src/command.ts`**: The `Command` class is the foundation for all CLI commands. It handles parsing command-line options, executing pre-action hooks (`before` functions), and running the main action logic.
- **`src/index.ts`**: This is the main entry point. It initializes `commander`, registers all commands from the `src/commands/` directory, and handles command dispatch.
- **`src/options.ts`**: Defines the `Options` interface, which is a common object passed to every command, containing parsed flags, configuration, and project information.

**Interaction Flow:**

1. `src/index.ts` parses the command-line arguments.
2. The corresponding command from `src/commands/` is located and its `runner` function is executed.
3. The `Command` class in `src/command.ts` first runs all `before` hooks (e.g., `requireAuth`, `requireConfig`).
4. It then executes the main `action` function, passing in the `options` object.

#### **2. Authentication**

Authentication is managed through a system that supports multiple logged-in Google accounts.

- **`src/auth.ts`**: This file contains the core logic for OAuth 2.0 flows, managing refresh and access tokens, and handling multiple user accounts. It interacts with `configstore` to persist credentials. `getGlobalDefaultAccount()` and `getAdditionalAccounts()` are key functions for retrieving user information.
- **`src/requireAuth.ts`**: A `before` hook used by commands that need the user to be authenticated. It ensures that a valid access token is available, refreshing it if necessary.
- **`src/defaultCredentials.ts`**: Manages Application Default Credentials (ADC) for interoperability with other Google Cloud tools.

**Interaction Flow:**

1. A command requiring authentication uses the `requireAuth` hook.
2. `requireAuth` calls functions from `auth.ts` to get the active user's tokens.
3. If tokens are expired, `auth.ts` uses the refresh token to get a new access token from the Google Auth API.
4. The new token is set in `apiv2.ts` for subsequent API calls.

#### **3. Emulator Suite**

The Emulator Suite provides local simulation of Firebase services.

- **`src/emulator/controller.ts`**: This is the main orchestrator. The `startAll` function is responsible for determining which emulators to start based on `firebase.json` and the `--only` flag.
- **`src/emulator/registry.ts`**: `EmulatorRegistry` is a static class that acts as a service discovery mechanism. Each emulator registers itself upon startup, allowing other emulators and the CLI to know its host and port.
- **`src/emulator/hub.ts`**: The `EmulatorHub` is the central endpoint for the Emulator UI and other tools to discover which emulators are running.
- **`src/emulator/downloadableEmulators.ts`**: Manages the download and execution of JAR-based emulators like Firestore and Database.
- **Individual Emulators (`src/emulator/<service>`):** Each emulator has its own directory and logic. For example, `src/emulator/auth/` contains the full implementation of the Auth emulator, including a lightweight Express server that mimics the production Identity Toolkit API.

**Interaction Flow:**

1. `firebase emulators:start` calls `controller.startAll`.
2. The controller determines which emulators to run and starts them in a specific order (e.g., Hub first).
3. Each emulator instance, upon starting, registers itself with the `EmulatorRegistry`.
4. The `EmulatorHub` uses the `EmulatorRegistry` to provide a `/emulators` endpoint for the Emulator UI.
5. Services like the Functions emulator use the registry to discover other emulators (e.g., to set `FIRESTORE_EMULATOR_HOST`).

#### **4. Data Connect**

Data Connect functionality is spread across its core logic directory, command files, and emulator files.

- **`src/dataconnect/load.ts`**: Responsible for reading and parsing `dataconnect.yaml` and the `connector.yaml` files within a project's source directory. It constructs the `ServiceInfo` object that represents the complete configuration of a Data Connect service.
- **`src/dataconnect/client.ts`**: An API client built on `apiv2.ts` for making requests to the production Data Connect service API (e.g., for creating services, and deploying schemas and connectors).
- **`src/dataconnect/schemaMigration.ts`**: Contains the logic for comparing a local GraphQL schema with a remote Cloud SQL database schema (`diffSchema`) and for applying the necessary SQL migrations (`migrateSchema`).
- **`src/emulator/dataconnectEmulator.ts`**: The Data Connect emulator. It uses the `fdc` binary to build the GraphQL schema and connectors into a format that can be served locally.
- **`src/emulator/dataconnect/pgliteServer.ts`**: Manages an in-memory PostgreSQL instance using `pg-lite`. This is used by the Data Connect emulator to provide a local database for development and testing.
- **`src/commands/dataconnect-*.ts`**: These files define the user-facing commands for Data Connect, such as `dataconnect:sdk:generate` and `dataconnect:sql:migrate`. They orchestrate calls to the core logic in `src/dataconnect/`.

**Interaction Flow (Local Emulation):**

1. `firebase emulators:start` with `dataconnect` in `firebase.json` triggers `DataConnectEmulator.start()`.
2. The emulator calls `DataConnectEmulator.build()`, which invokes the `fdc` binary to compile the schema and connectors.
3. If `autoconnectToPostgres` is enabled, it starts a `PostgresServer` (pg-lite).
4. The emulator then sends a request to its own internal endpoint to configure itself with the connection string for the local Postgres instance.

**Interaction Flow (Deployment):**

1. `firebase deploy --only dataconnect` is run.
2. The command first calls `migrateSchema` from `src/dataconnect/schemaMigration.ts` to ensure the Cloud SQL database schema is compatible. This may prompt the user to apply SQL changes.
3. If the schema is compatible, it then uses the `src/dataconnect/client.ts` to `upsertSchema` and `upsertConnector` for each connector, deploying the bundled schema and connector logic to the managed Data Connect service.

### **How to Contribute**

To contribute to the Firebase CLI, especially the Data Connect feature, follow these general steps:

1. **Understand the Core Components:** Familiarize yourself with the core components outlined above, particularly the command framework, authentication flow, and the emulator suite architecture.
2. **Trace a Command:** Pick a command (e.g., `firebase dataconnect:sdk:generate`) and trace its execution from `src/index.ts` through `src/command.ts` to the action function in the corresponding file in `src/commands/`.
3. **Study Data Connect Logic:** For Data Connect changes, spend time in the `src/dataconnect/` directory. `load.ts` and `client.ts` are central to how the CLI interacts with both local files and the backend service.
4. **Emulator Changes:** If you are working on the emulator, look at `src/emulator/controller.ts` to see how emulators are started and `src/emulator/registry.ts` for how they communicate. For the Data Connect emulator specifically, `src/emulator/dataconnectEmulator.ts` is the main entry point.
5. **Maintain Consistency:** When adding new commands or functionality, follow the existing patterns. Use the `Command` class for new commands, and leverage the shared `apiv2.ts` client for API requests. Ensure that new features are integrated with the Emulator Suite where applicable.
Loading