Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"husky": "^9.1.7",
"lint-staged": "^15.2.11",
"prettier": "3.4.2",
"tsx": "^4.19.2",
"typescript": "^5.7.2",
"vitest": "^2.1.8",
"wrangler": "^3.96.0"
Expand All @@ -53,5 +54,6 @@
"*.{js,jsx,ts,tsx,json,css,md}": [
"prettier --write"
]
}
},
"packageManager": "pnpm@9.12.3+sha512.cce0f9de9c5a7c95bef944169cc5dfe8741abfb145078c0d508b868056848a87c81e626246cb60967cbd7fd29a6c062ef73ff840d96b3c86c40ac92cf4a813ee"
}
111 changes: 111 additions & 0 deletions plugins/data-sync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Data Sync Plugin

The Data Sync plugin enables automatic synchronization of data from external data sources (like PostgreSQL) to StarbaseDB's internal SQLite database. This plugin is useful for creating a close-to-edge replica of your data that can be queried as an alternative to querying the external database directly.

## Features

- Automatic synchronization of specified tables from external to internal database
- Configurable sync interval
- Incremental updates based on timestamps and IDs
- Automatic schema mapping from PostgreSQL to SQLite types
- Persistent tracking of sync state
- Graceful handling of connection issues and errors
- Query interception hooks for monitoring and modification
- Debug endpoints for monitoring sync status

## Installation

The plugin is included in the StarbaseDB core package. To use it, simply configure it in your `wrangler.toml` file:

```toml
[plugins.data-sync]
sync_interval = 300 # Sync interval in seconds (default: 300)
tables = ["users", "products"] # List of tables to synchronize
```

## Configuration Options

| Option | Type | Description | Default |
| --------------- | -------- | ----------------------------------------------- | ------- |
| `sync_interval` | number | The interval in seconds between sync operations | 300 |
| `tables` | string[] | Array of table names to synchronize | [] |

## How It Works

1. The plugin creates a metadata table in the internal database to track sync state
2. For each configured table:
- Retrieves the table schema from the external database
- Creates a corresponding table in the internal database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a Postgres table contains both a schema and table name (e.g. users.profile) and SQLite only supports tables without schemas. Would the name of the table become users.profile? Would we want the users moving forward to query that table with the ${schema}.${table} name notation moving forward?

I assume for any Postgres public schema tables we would just create them with simply their table name (e.g. ${table}) without a schema prefix, correct?

Lastly, if the user did decide to do public.users would we have a beforeQuery hook that was smart enough in this plugin to know we could omit public. from it as that table is in our SQLite root?

- Periodically checks for new or updated records based on `created_at` timestamp and `id`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a mechanism we can put in place for tables that don't contain a created_at or id column? Perhaps as part of the sync config where they define an array of tables they want to sync they can also say what key they want to use to annotate latest entry.

- Syncs new data to the internal database
- Updates the sync state in the metadata table
3. Provides hooks for query interception:
- `beforeQuery`: For monitoring or modifying queries before execution
- `afterQuery`: For processing results after query execution

## Requirements

- The external database tables must have:
- A `created_at` timestamp column for tracking changes
- An `id` column (numeric or string) for tracking record identity
- The external database must support the `information_schema` for retrieving table metadata

## Type Mapping

The plugin automatically maps PostgreSQL types to SQLite types:

| PostgreSQL Type | SQLite Type |
| ---------------------------------------- | ----------- |
| integer, bigint | INTEGER |
| text, varchar, char | TEXT |
| boolean | INTEGER |
| timestamp, date | TEXT |
| numeric, decimal, real, double precision | REAL |
| json, jsonb | TEXT |

## Example Usage

```typescript
import { DataSyncPlugin } from '@starbasedb/plugins/data-sync'

// Initialize the plugin
const dataSyncPlugin = new DataSyncPlugin({
sync_interval: 300, // 5 minutes
tables: ['users', 'orders'],
})

// Add to your StarbaseDB configuration
const config = {
plugins: [dataSyncPlugin],
// ... other config options
}
```

## Demo

A complete demo implementation is available in the `demo` directory. The demo shows:

- Setting up the plugin with PostgreSQL
- Using query hooks for monitoring
- Testing sync functionality
- Debugging and monitoring endpoints

See [Demo README](./demo/README.md) for detailed instructions.

## Limitations

- The plugin currently assumes the presence of `created_at` and `id` columns
- Large tables may take longer to sync initially
- Deleted records in the external database are not automatically removed from the internal database
- The sync operation is pull-based and runs on a fixed interval

## Security Notes

- Always use secure, randomly generated tokens for authentication
- Store sensitive credentials in environment variables
- In production, enable authentication and use secure database credentials
- The demo uses example tokens (like "ABC123") for illustration only

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
133 changes: 133 additions & 0 deletions plugins/data-sync/demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Data Sync Plugin Demo

This demo shows how to use the StarbaseDB Data Sync Plugin to synchronize data between an external PostgreSQL database and StarbaseDB.

## Setup

1. Install dependencies:

```bash
pnpm install
```

2. Set up environment variables:

```bash
# Create a .dev.vars file in the demo directory
cat > plugins/data-sync/demo/.dev.vars << EOL
# Replace these with your own secure tokens - these are just examples
ADMIN_TOKEN=your_admin_token_here # e.g., a random string like "ABC123"
CLIENT_TOKEN=your_client_token_here # e.g., a random string like "DEF456"
DB_USER=postgres
DB_PASSWORD=postgres
EOL
```

3. Use the existing PostgreSQL Docker container:

```bash
# The container should already be running with:
docker run --name starbasedb-postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=demo -p 5432:5432 -d postgres:15
```

4. Load test data into the Docker container:

```bash
# Copy the setup file into the container
docker cp setup.sql starbasedb-postgres:/setup.sql

# Execute the setup file in the container
docker exec -i starbasedb-postgres psql -U postgres -d demo -f /setup.sql
```

## Running the Demo

1. Start the development server:

```bash
pnpm wrangler dev --config plugins/data-sync/demo/wrangler.toml
```

2. Test the available endpoints:

### Basic Status and Data

```bash
# Check sync status
curl http://localhost:8787/sync-status

# View synced data
curl http://localhost:8787/sync-data
```

### Testing Query Hooks

```bash
# Test query interception
curl -X POST http://localhost:8787/test-query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT * FROM users", "params": []}'
```

### Force Sync

```bash
# Trigger manual sync
curl -X POST http://localhost:8787/force-sync
```

### Debug Information

```bash
# View plugin debug information
curl http://localhost:8787/debug
```

## How it Works

The demo plugin showcases these key aspects of the StarbaseDB plugin system:

1. **Plugin Registration**: The plugin registers itself and the data sync plugin with StarbaseDB.

2. **HTTP Endpoints**:

- `/sync-status`: Shows the current sync status and configured tables
- `/sync-data`: Shows the synchronized data
- `/test-query`: Tests query interception hooks
- `/force-sync`: Triggers manual synchronization
- `/debug`: Shows plugin configuration and state

3. **Query Hooks**:
- `beforeQuery`: Logs and intercepts queries before execution
- `afterQuery`: Processes results after query execution

## Configuration

The demo uses the following configuration in `wrangler.toml`:

- PostgreSQL connection details:
- Host: localhost
- Port: 5432
- User: postgres
- Password: postgres
- Database: demo
- Schema: public
- Sync interval: 30 seconds
- Tables to sync: users and posts

## Testing

1. The demo automatically syncs data from the PostgreSQL database
2. You can monitor the sync process through the `/sync-status` endpoint
3. View the synced data through the `/sync-data` endpoint
4. Test query hooks using the `/test-query` endpoint
5. Trigger manual syncs using the `/force-sync` endpoint
6. Monitor plugin state using the `/debug` endpoint

## Notes

- This is a demo setup with authentication disabled for simplicity
- In production, you should enable authentication and use secure database credentials
- The sync interval is set to 30 seconds for demo purposes; adjust as needed
- The demo includes mock data for testing without a real database connection
- Query hooks are demonstrated with simulated queries
28 changes: 28 additions & 0 deletions plugins/data-sync/demo/setup.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
-- Create a test table
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Insert some test data
INSERT INTO users (name, email) VALUES
('Alice Smith', 'alice@example.com'),
('Bob Jones', 'bob@example.com'),
('Charlie Brown', 'charlie@example.com');

-- Create another test table
CREATE TABLE IF NOT EXISTS posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
title TEXT NOT NULL,
content TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Insert some test posts
INSERT INTO posts (user_id, title, content) VALUES
(1, 'First Post', 'Hello World!'),
(2, 'Testing', 'This is a test post'),
(3, 'Another Post', 'More test content');
Loading