Taiga 2

Web application to store and retrieve immutable data in a folder organized way.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

See deployment for notes on how to deploy the project on a live system (Coming soon).

Prerequisites

Python 3.10 (required — the project pins >=3.10,<3.11)
Poetry (dependency manager)
Node.js and Yarn (for the React frontend)
Redis (Celery broker and result backend)
Docker (optional — only needed if you want to test file uploads via MiniStack)

Stack overview

Layer	Technologies
Database	SQLite (local dev), PostgreSQL (production), SQLAlchemy
Backend/API	Python 3.10, Flask, Connexion (Swagger/OpenAPI), Celery/Redis
Object storage	AWS S3 (production), MiniStack (optional, local dev)
Frontend	React, TypeScript, Webpack, Yarn

Installing

Install Python dependencies:
```
 poetry install
```

Install frontend dependencies:

 cd react_frontend && yarn install && cd ..

Copy the sample settings file (if you don't already have one):
```
 cp settings.cfg.sample settings.cfg
```

Create the dev database:

 poetry run bash -c 'source setup_env.sh && flask recreate-dev-db'

Running Locally

One command (recommended)

./dev.sh

This single script handles setup and launches all services via mprocs:

Starts Redis (if not already running)
Starts MiniStack Docker container for local S3 (if settings.cfg is configured for it)
Creates the S3 bucket and dev database if they don't exist
Launches the Webpack dev server, Flask app, and Celery worker via mprocs

mprocs gives you a TUI where you can switch between process outputs with j/k, restart individual processes with r, and quit everything with q.

Prerequisite: Install mprocs with brew install mprocs

Open your browser to: http://127.0.0.1:5000/taiga/

Manual (if you prefer separate terminals)

# Terminal 1 — Redis (skip if already running; check with `redis-cli ping`)
redis-server

# Terminal 2 — Webpack dev server (frontend hot reload)
poetry run bash -c 'source setup_env.sh && flask webpack'

# Terminal 3 — Flask app server
poetry run bash -c 'source setup_env.sh && flask run'

# Terminal 4 — Celery worker (async file conversion tasks)
poetry run bash -c 'source setup_env.sh && flask run-worker'

Open your browser to: http://127.0.0.1:5000/taiga/

Notes

You are automatically logged in as the seeded admin user (admin@broadinstitute.org) via the DEFAULT_USER_EMAIL setting.

Without S3 configured, you can browse/search the seeded data, create folders, and work with the UI. File uploads require either MiniStack or real AWS credentials (see below).

Local S3 with MiniStack (Optional)

MiniStack is a free, open-source AWS emulator that runs 33 AWS services (including S3 and STS) in a single Docker container. It lets you test the full upload pipeline locally without an AWS account.

Setup

Start MiniStack:

 docker run -d --name ministack -p 4566:4566 nahuelnucera/ministack

Create the local S3 bucket (run once):

python -c "import boto3; boto3.client('s3', endpoint_url='http://localhost:4566', aws_access_key_id='test', aws_secret_access_key='test').create_bucket(Bucket='taiga-dev')"

In settings.cfg, uncomment the MiniStack block (Option A) and comment out Option B:

S3_ENDPOINT_URL = 'http://localhost:4566'
AWS_ACCESS_KEY_ID = 'test'
AWS_SECRET_ACCESS_KEY = 'test'
S3_BUCKET = 'taiga-dev'

Restart Flask and the Celery worker to pick up the new settings.

Managing MiniStack

docker start ministack   # start (if previously stopped)
docker stop ministack    # stop
docker rm ministack      # remove entirely

File uploads and the S3 copy workaround

Warning: File uploads may fail with MiniStack because MiniStack omits the ETag header that boto3's high-level Bucket.copy() uses for validation. When S3_ENDPOINT_URL is set in settings.cfg, aws.copy_object() (in taiga2/third_party_clients/aws.py) automatically uses the low-level client API which does not require ETags. When S3_ENDPOINT_URL is empty or unset, the standard resource-level Bucket.copy() is used.

If you see upload failures locally (errors mentioning ETag or copy validation), make sure S3_ENDPOINT_URL is set to your MiniStack endpoint (http://localhost:4566).

Note on tests: The test suite does not set S3_ENDPOINT_URL, so tests always exercise the Bucket.copy() path. If you add S3_ENDPOINT_URL to the test config, imp_conv_test.py will fail because MockS3Client does not implement copy_object. This is intentional — tests validate the production (real AWS) code path.

Switching back to no-S3 mode

Set S3_ENDPOINT_URL = '' and clear the AWS keys in settings.cfg. The app runs fine without S3 — you just can't upload files.

Configuring AWS (Production)

We need two users: One IAM account (main) is used in general by the app to read/write to S3. The second (uploader) has it's rights delegated via STS on a short term basis. However, this user should only have access to upload to a single location within S3.

Permissions for the main user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::taiga2",
                "arn:aws:s3:::taiga2/*"
            ]
        }
    ]
}

Permissions for the "upload" user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1482441362000",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:HeadObject"
            ],
            "Resource": [
                "arn:aws:s3:::taiga2/upload/*"
            ]
        }
    ]
}

Configuring S3

Because we are using S3 to store the files, we need to correctly configure the S3 service, and its buckets.

Create the store Bucket

Please follow this tutorial from Amazon, on how to create a Bucket.

Configure the Bucket

We need now to be able to access to this Bucket programmatically, and through CORS (Cross Origin Resource Sharing):

For our case, it is pretty simple:

Select your bucket in your amazon S3 console
Click on Properties
Click on the Permissions accordion
Click on Edit CORS Configuration
Paste the following configuration into the page that should appear (CORS Configuration Editor):

[
  {
    "AllowedOrigins": ["*"],
    "AllowedMethods": ["GET", "POST", "PUT"],
    "ExposeHeaders": ["ETag"],
    "AllowedHeaders": ["*"]
  }
]

Warning: Be careful to not override your existing configuration!

Configure Taiga to use your Bucket

Edit settings.cfg and set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
Set S3_BUCKET to the bucket created above
Remove S3_ENDPOINT_URL (or leave it empty) so the app connects to real AWS

Adding user to admin group

INSERT INTO group_user_association (group_id, user_id) select 1, id FROM users WHERE name =
'pmontgom';

Running the tests

# Run all tests
poetry run pytest

# Run a specific test file
poetry run pytest taiga2/tests/datafile_test.py

# Run with verbose output (shows each test name)
poetry run pytest -v

Deployment to Production

Every push triggers the GitHub Actions workflow (.github/workflows/build-docker.yaml), which:

Builds the Docker image from the root Dockerfile
Runs pytest inside the image
Pushes to us.gcr.io/cds-docker-containers/taiga:ga-build-<run_number>
On main, also tags and pushes as us.gcr.io/cds-docker-containers/taiga:latest

Once the workflow completes, ssh into ubuntu@cds.team:

Pull the latest image: bash GOOGLE_APPLICATION_CREDENTIALS=/etc/google/auth/docker-pull-creds.json docker pull us.gcr.io/cds-docker-containers/taiga
Tag the image with taiga-prod and taiga-staging. For example: GOOGLE_APPLICATION_CREDENTIALS=/etc/google/auth/docker-pull-creds.json docker tag us.gcr.io/cds-docker-containers/taiga:ga-build-68 us.gcr.io/cds-docker-containers/taiga:taiga-staging and GOOGLE_APPLICATION_CREDENTIALS=/etc/google/auth/docker-pull-creds.json docker tag us.gcr.io/cds-docker-containers/taiga:ga-build-68 us.gcr.io/cds-docker-containers/taiga:taiga-prod
Restart the service: sudo systemctl restart taiga

If there's any problem, then you can look for information in the logs (stored at /var/log/taiga) or ask journald for the output from the service (journalctl -u taiga).

Migrate the database

If your model change in SQLAlchemy and you already have a database you can't drop/recreate, you can use Alembic to manage the migrations:

Example (but use accordingly to the state of you database, see Alembic):

(Note, this requires that you put the config for the production database at ../prod_settings.cfg so that it can find the current schema to compare against. You can test against a snapshot of a database by going to the AWS console and going to RDS, going to "Snapshots", and selecting "Restore Snapshot". You can then place the new database's endpoing into prod_settings.cfg.)

TAIGA_SETTINGS_FILE=prod_settings.cfg ./flask db migrate

Review the resulting generated migration. I've found I've had to re-order tables to ensure fk references are created successfully. Before applying the migration, take a snapshot of the current Taiga db.

Depending on your changes, you may be able to apply them without stopping the service and minimizing downtime. You can apply online changes if they are compatible with both the old and new versions of the code that will be deployed. In general changes that migrate data are not safe, but trivial changes like adding new nullable fields or new tables are safe. See "Applying online changes" for updating the DB without stopping the service.

Applying "offline" changes

(Stop the service)
ssh ubuntu@cds.team sudo systemctl stop taiga
(Apply changes to the DB)
TAIGA_SETTINGS_FILE=prod_settings.cfg ./flask db upgrade
(And then pull and start the new code)
ssh ubuntu@cds.team
GOOGLE_APPLICATION_CREDENTIALS=/etc/google/auth/docker-pull-creds.json docker pull us.gcr.io/cds-docker-containers/taiga
sudo systemctl start taiga

Applying "online" changes

(Apply changes to the DB)
TAIGA_SETTINGS_FILE=prod_settings.cfg ./flask db upgrade
(And then pull new code)
ssh ubuntu@cds.team
GOOGLE_APPLICATION_CREDENTIALS=/etc/google/auth/docker-pull-creds.json docker pull us.gcr.io/cds-docker-containers/taiga
(start the service running the new code)
sudo systemctl restart taiga

Undeletion

Users are able to delete datasets through the UI. We do not allow the undeletion directly, but in some extreme cases, we have a way of un-deleting: The api has the deprecation endpoint (/datasetVersion/{datasetVersionId}/deprecate) which could be use to turn a deleted dataset version to a deprecated one.

You can use a curl request, e.g curl -d '{"deprecationReason":"notNeeded"}' -H "Content-Type: application/json" -X POST http://cds.team/taiga/api/datasetVersion/{datasetVersionId_here}/deprecate

Contributing

Feel free to make a contribution and then submit a Pull Request!

Versioning

We use Git for versioning! If you don't know how to use it, we strongly recommend doing this tutorial.

Authors

Philip Montgomery - Initial work + advising on the current development + data processing
Remi Marenco - Prototype + current development

Acknowledgments

Cancer Data Science
Broad Institute

Name		Name	Last commit message	Last commit date
Latest commit History 1,143 Commits
.github/workflows		.github/workflows
migrations		migrations
react_frontend		react_frontend
taiga2		taiga2
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
arch.md		arch.md
autoapp.py		autoapp.py
build_docker.sh		build_docker.sh
bundle_js.sh		bundle_js.sh
dev.sh		dev.sh
dump_from_old_taiga.py		dump_from_old_taiga.py
flask		flask
install_prereqs.sh		install_prereqs.sh
mprocs.yaml		mprocs.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytype.cfg		pytype.cfg
recreate_dev_db.sh		recreate_dev_db.sh
settings.cfg.sample		settings.cfg.sample
setup.py		setup.py
setup_env.sh		setup_env.sh
write_version.sh		write_version.sh

Folders and files

Latest commit

History

Repository files navigation

Taiga 2

Getting Started

Prerequisites

Stack overview

Installing

Running Locally

One command (recommended)

Manual (if you prefer separate terminals)

Notes

Local S3 with MiniStack (Optional)

Setup

Managing MiniStack

File uploads and the S3 copy workaround

Switching back to no-S3 mode

Configuring AWS (Production)

Configuring S3

Create the store Bucket

Configure the Bucket

Configure Taiga to use your Bucket

Adding user to admin group

Running the tests

Deployment to Production

Migrate the database

Applying "offline" changes

Applying "online" changes

Undeletion

Contributing

Versioning

Authors

Acknowledgments

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages