-
Notifications
You must be signed in to change notification settings - Fork 46
feat: Add Data Sync Plugin for external database synchronization #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
onyedikachi-david
wants to merge
4
commits into
outerbase:main
Choose a base branch
from
onyedikachi-david:feature/data-sync
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
b9fb9f8
feat: Add Data Sync Plugin for external database synchronization
onyedikachi-david 34be2b6
refactor: abstract database-specific code from data sync plugin for m…
onyedikachi-david 40023d1
docs: add documentation for data sync plugins
onyedikachi-david 7ab1edb
chore: update meta.json files
onyedikachi-david File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Data Sync Plugin | ||
|
|
||
| The Data Sync plugin enables automatic synchronization of data from external data sources (like PostgreSQL) to StarbaseDB's internal SQLite database. This plugin is useful for creating a close-to-edge replica of your data that can be queried as an alternative to querying the external database directly. | ||
|
|
||
| ## Features | ||
|
|
||
| - Automatic synchronization of specified tables from external to internal database | ||
| - Configurable sync interval | ||
| - Incremental updates based on timestamps and IDs | ||
| - Automatic schema mapping from PostgreSQL to SQLite types | ||
| - Persistent tracking of sync state | ||
| - Graceful handling of connection issues and errors | ||
| - Query interception hooks for monitoring and modification | ||
| - Debug endpoints for monitoring sync status | ||
|
|
||
| ## Installation | ||
|
|
||
| The plugin is included in the StarbaseDB core package. To use it, simply configure it in your `wrangler.toml` file: | ||
|
|
||
| ```toml | ||
| [plugins.data-sync] | ||
| sync_interval = 300 # Sync interval in seconds (default: 300) | ||
| tables = ["users", "products"] # List of tables to synchronize | ||
| ``` | ||
|
|
||
| ## Configuration Options | ||
|
|
||
| | Option | Type | Description | Default | | ||
| | --------------- | -------- | ----------------------------------------------- | ------- | | ||
| | `sync_interval` | number | The interval in seconds between sync operations | 300 | | ||
| | `tables` | string[] | Array of table names to synchronize | [] | | ||
|
|
||
| ## How It Works | ||
|
|
||
| 1. The plugin creates a metadata table in the internal database to track sync state | ||
| 2. For each configured table: | ||
| - Retrieves the table schema from the external database | ||
| - Creates a corresponding table in the internal database | ||
| - Periodically checks for new or updated records based on `created_at` timestamp and `id` | ||
|
||
| - Syncs new data to the internal database | ||
| - Updates the sync state in the metadata table | ||
| 3. Provides hooks for query interception: | ||
| - `beforeQuery`: For monitoring or modifying queries before execution | ||
| - `afterQuery`: For processing results after query execution | ||
|
|
||
| ## Requirements | ||
|
|
||
| - The external database tables must have: | ||
| - A `created_at` timestamp column for tracking changes | ||
| - An `id` column (numeric or string) for tracking record identity | ||
| - The external database must support the `information_schema` for retrieving table metadata | ||
|
|
||
| ## Type Mapping | ||
|
|
||
| The plugin automatically maps PostgreSQL types to SQLite types: | ||
|
|
||
| | PostgreSQL Type | SQLite Type | | ||
| | ---------------------------------------- | ----------- | | ||
| | integer, bigint | INTEGER | | ||
| | text, varchar, char | TEXT | | ||
| | boolean | INTEGER | | ||
| | timestamp, date | TEXT | | ||
| | numeric, decimal, real, double precision | REAL | | ||
| | json, jsonb | TEXT | | ||
|
|
||
| ## Example Usage | ||
|
|
||
| ```typescript | ||
| import { DataSyncPlugin } from '@starbasedb/plugins/data-sync' | ||
|
|
||
| // Initialize the plugin | ||
| const dataSyncPlugin = new DataSyncPlugin({ | ||
| sync_interval: 300, // 5 minutes | ||
| tables: ['users', 'orders'], | ||
| }) | ||
|
|
||
| // Add to your StarbaseDB configuration | ||
| const config = { | ||
| plugins: [dataSyncPlugin], | ||
| // ... other config options | ||
| } | ||
| ``` | ||
|
|
||
| ## Demo | ||
|
|
||
| A complete demo implementation is available in the `demo` directory. The demo shows: | ||
|
|
||
| - Setting up the plugin with PostgreSQL | ||
| - Using query hooks for monitoring | ||
| - Testing sync functionality | ||
| - Debugging and monitoring endpoints | ||
|
|
||
| See [Demo README](./demo/README.md) for detailed instructions. | ||
|
|
||
| ## Limitations | ||
|
|
||
| - The plugin currently assumes the presence of `created_at` and `id` columns | ||
| - Large tables may take longer to sync initially | ||
| - Deleted records in the external database are not automatically removed from the internal database | ||
| - The sync operation is pull-based and runs on a fixed interval | ||
|
|
||
| ## Security Notes | ||
|
|
||
| - Always use secure, randomly generated tokens for authentication | ||
| - Store sensitive credentials in environment variables | ||
| - In production, enable authentication and use secure database credentials | ||
| - The demo uses example tokens (like "ABC123") for illustration only | ||
|
|
||
| ## Contributing | ||
|
|
||
| Contributions are welcome! Please feel free to submit a Pull Request. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| # Data Sync Plugin Demo | ||
|
|
||
| This demo shows how to use the StarbaseDB Data Sync Plugin to synchronize data between an external PostgreSQL database and StarbaseDB. | ||
|
|
||
| ## Setup | ||
|
|
||
| 1. Install dependencies: | ||
|
|
||
| ```bash | ||
| pnpm install | ||
| ``` | ||
|
|
||
| 2. Set up environment variables: | ||
|
|
||
| ```bash | ||
| # Create a .dev.vars file in the demo directory | ||
| cat > plugins/data-sync/demo/.dev.vars << EOL | ||
| # Replace these with your own secure tokens - these are just examples | ||
| ADMIN_TOKEN=your_admin_token_here # e.g., a random string like "ABC123" | ||
| CLIENT_TOKEN=your_client_token_here # e.g., a random string like "DEF456" | ||
| DB_USER=postgres | ||
| DB_PASSWORD=postgres | ||
| EOL | ||
| ``` | ||
|
|
||
| 3. Use the existing PostgreSQL Docker container: | ||
|
|
||
| ```bash | ||
| # The container should already be running with: | ||
| docker run --name starbasedb-postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=demo -p 5432:5432 -d postgres:15 | ||
| ``` | ||
|
|
||
| 4. Load test data into the Docker container: | ||
|
|
||
| ```bash | ||
| # Copy the setup file into the container | ||
| docker cp setup.sql starbasedb-postgres:/setup.sql | ||
|
|
||
| # Execute the setup file in the container | ||
| docker exec -i starbasedb-postgres psql -U postgres -d demo -f /setup.sql | ||
| ``` | ||
|
|
||
| ## Running the Demo | ||
|
|
||
| 1. Start the development server: | ||
|
|
||
| ```bash | ||
| pnpm wrangler dev --config plugins/data-sync/demo/wrangler.toml | ||
| ``` | ||
|
|
||
| 2. Test the available endpoints: | ||
|
|
||
| ### Basic Status and Data | ||
|
|
||
| ```bash | ||
| # Check sync status | ||
| curl http://localhost:8787/sync-status | ||
|
|
||
| # View synced data | ||
| curl http://localhost:8787/sync-data | ||
| ``` | ||
|
|
||
| ### Testing Query Hooks | ||
|
|
||
| ```bash | ||
| # Test query interception | ||
| curl -X POST http://localhost:8787/test-query \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"sql": "SELECT * FROM users", "params": []}' | ||
| ``` | ||
|
|
||
| ### Force Sync | ||
|
|
||
| ```bash | ||
| # Trigger manual sync | ||
| curl -X POST http://localhost:8787/force-sync | ||
| ``` | ||
|
|
||
| ### Debug Information | ||
|
|
||
| ```bash | ||
| # View plugin debug information | ||
| curl http://localhost:8787/debug | ||
| ``` | ||
|
|
||
| ## How it Works | ||
|
|
||
| The demo plugin showcases these key aspects of the StarbaseDB plugin system: | ||
|
|
||
| 1. **Plugin Registration**: The plugin registers itself and the data sync plugin with StarbaseDB. | ||
|
|
||
| 2. **HTTP Endpoints**: | ||
|
|
||
| - `/sync-status`: Shows the current sync status and configured tables | ||
| - `/sync-data`: Shows the synchronized data | ||
| - `/test-query`: Tests query interception hooks | ||
| - `/force-sync`: Triggers manual synchronization | ||
| - `/debug`: Shows plugin configuration and state | ||
|
|
||
| 3. **Query Hooks**: | ||
| - `beforeQuery`: Logs and intercepts queries before execution | ||
| - `afterQuery`: Processes results after query execution | ||
|
|
||
| ## Configuration | ||
|
|
||
| The demo uses the following configuration in `wrangler.toml`: | ||
|
|
||
| - PostgreSQL connection details: | ||
| - Host: localhost | ||
| - Port: 5432 | ||
| - User: postgres | ||
| - Password: postgres | ||
| - Database: demo | ||
| - Schema: public | ||
| - Sync interval: 30 seconds | ||
| - Tables to sync: users and posts | ||
|
|
||
| ## Testing | ||
|
|
||
| 1. The demo automatically syncs data from the PostgreSQL database | ||
| 2. You can monitor the sync process through the `/sync-status` endpoint | ||
| 3. View the synced data through the `/sync-data` endpoint | ||
| 4. Test query hooks using the `/test-query` endpoint | ||
| 5. Trigger manual syncs using the `/force-sync` endpoint | ||
| 6. Monitor plugin state using the `/debug` endpoint | ||
|
|
||
| ## Notes | ||
|
|
||
| - This is a demo setup with authentication disabled for simplicity | ||
| - In production, you should enable authentication and use secure database credentials | ||
| - The sync interval is set to 30 seconds for demo purposes; adjust as needed | ||
| - The demo includes mock data for testing without a real database connection | ||
| - Query hooks are demonstrated with simulated queries |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| -- Create a test table | ||
| CREATE TABLE IF NOT EXISTS users ( | ||
| id SERIAL PRIMARY KEY, | ||
| name TEXT NOT NULL, | ||
| email TEXT NOT NULL, | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP | ||
| ); | ||
|
|
||
| -- Insert some test data | ||
| INSERT INTO users (name, email) VALUES | ||
| ('Alice Smith', 'alice@example.com'), | ||
| ('Bob Jones', 'bob@example.com'), | ||
| ('Charlie Brown', 'charlie@example.com'); | ||
|
|
||
| -- Create another test table | ||
| CREATE TABLE IF NOT EXISTS posts ( | ||
| id SERIAL PRIMARY KEY, | ||
| user_id INTEGER REFERENCES users(id), | ||
| title TEXT NOT NULL, | ||
| content TEXT, | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP | ||
| ); | ||
|
|
||
| -- Insert some test posts | ||
| INSERT INTO posts (user_id, title, content) VALUES | ||
| (1, 'First Post', 'Hello World!'), | ||
| (2, 'Testing', 'This is a test post'), | ||
| (3, 'Another Post', 'More test content'); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a Postgres table contains both a schema and table name (e.g.
users.profile) and SQLite only supports tables without schemas. Would the name of the table becomeusers.profile? Would we want the users moving forward to query that table with the${schema}.${table}name notation moving forward?I assume for any Postgres public schema tables we would just create them with simply their table name (e.g.
${table}) without a schema prefix, correct?Lastly, if the user did decide to do
public.userswould we have abeforeQueryhook that was smart enough in this plugin to know we could omitpublic.from it as that table is in our SQLite root?