A comprehensive platform that empowers researchers, organizations, and environmental advocates to share critical environmental datasets as interactive, accessible websites. Built on Datasette with a custom administrative panel for seamless data publishing and portal customization.
The EDGI Cloud Portal serves the critical mission of democratizing environmental data access and supporting evidence-based environmental policy. Every dataset shared brings us closer to a more informed, sustainable future.
- Full GitHub Flavored Markdown - Complete implementation with headers, lists, links, bold, italic, code blocks
- Dynamic Rendering - Custom render_links.py plugin for runtime markdown processing
- Protected URLs - Smart markdown processing that preserves URL integrity
- Metadata Generation - Added generate_metadata.py for dynamic configuration
- Forced Password Change - New users must change password on first login
- Enhanced Authentication - Improved session management and password policies
- User Activity Tracking - Comprehensive logging of user actions
- Unpublished Database Preview - Owners can preview databases before publishing
- Portal Homepage Preview - Test customizations before going live
- Access Verification - Improved permission checking for preview features
- JSONL Support - Added JSON Lines format for streaming data
- Progress Tracking - Real-time upload progress with cancellation
- Null Value Handling - Three configurable strategies for empty cells
- File Validation - Pre-upload size and format checking
- Cancel Feature - Abort long-running uploads gracefully
- Excel Processing - Fixed Excel file handling with proper null processing
- Name Validation - Check database name availability before import
- Size Validation - Verify file size limits before processing
- Error Messages - User-friendly feedback for validation failures
- Multi-Source Upload - CSV, Excel (.xlsx, .xls), JSONL, Google Sheets, and Web CSV
- Advanced Null Handling - Three strategies: empty string, preserve NULL, or skip rows
- Data Quality Analysis - Automated assessment and reporting
- Large File Support - Efficient processing with progress tracking
- Real-time Progress - Live upload progress with cancellation
- Connection Testing - Pre-upload validation for remote sources
-
GitHub Flavored Markdown - Full markdown support for rich content
# Headers **Bold text** and _italic text_ [Links](https://example.com) ## Lists - Bullet points 1. Numbered lists `code blocks` and more!
-
Dynamic Rendering - Markdown processed at runtime, not just at startup
-
Custom Homepages - Database-specific branding and descriptions
- Custom Branding - Professional portals with organization identity
- Instant Publishing - Share data with preview before publishing
- Advanced Search - Filter, sort, and explore datasets
- API Access - Programmatic data access for developers
- Trash System - Safe deletion with recovery options
- Role-Based Access - System admin and user roles
- Password Policies - Forced change on first login
- Activity Monitoring - Comprehensive logging
- User Management - Complete account lifecycle
- System Configuration - Runtime settings via admin interface
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ Data Layer │
│ (19 Templates) │ │ (9 Plugins) │ │ │
│ │ │ │ │ │
│ • Upload UI │◄──►│ • Upload Engine │◄──►│ • SQLite DBs │
│ • Management UI │ │ • Database Mgmt │ │ • Portal DB │
│ • Admin Panel │ │ • Markdown Render│ │ • User Data │
│ • Preview Mode │ │ • Auth System │ │ • File Storage │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Backend: Python 3.11+ with Datasette framework
- Database: SQLite with sqlite-utils
- Data Processing: Pandas, openpyxl for Excel
- Frontend: HTML5, Tailwind CSS 3.4
- Markdown: Custom GitHub Flavored Markdown processor
- Authentication: bcrypt with session management
- Deployment: Docker containers on Fly.io
edgi-cloud/
├── 📄 README.md # This documentation
├── 📄 requirements.txt # Python dependencies
├── 📄 Dockerfile # Container configuration
├── 📄 fly.toml # Fly.io deployment
├── 📄 init_db.py # Database initialization
├── 📄 migrate_db.py # Database migrations
│
├── 📁 plugins/ # Datasette plugins (9 modules)
│ ├── 📄 upload_table.py # Multi-source upload with progress
│ ├── 📄 common_utils.py # Shared utilities & markdown processor
│ ├── 📄 manage_databases.py # Database lifecycle management
│ ├── 📄 admin_panel.py # System administration
│ ├── 📄 create_database.py # Database creation workflows
│ ├── 📄 delete_db.py # Deletion with trash system
│ ├── 📄 user_profile.py # User account management
│ ├── 📄 render_links.py # Dynamic markdown rendering
│ └── 📄 generate_metadata.py # Metadata generation
│
├── 📁 templates/ # User interfaces (19 templates)
│ └── [Template files for all UI components]
│
├── 📁 static/ # Static assets
│ ├── 📄 styles.css # Custom styles
│ └── 📁 js/ # JavaScript modules
│
└── 📁 data/ # Data storage
├── 📄 portal.db # Main portal database
└── 📁 {user_id}/ # User databases
- Forced Password Change - New users must set their own password
- Preview Access Control - Strict permission checking for unpublished content
- Upload Validation - Pre-upload checks for file size and format
- Session Management - Enhanced cookie-based authentication
- Password Security: bcrypt hashing
- CSRF Protection: Token validation
- Input Validation: Comprehensive sanitization
- SQL Injection Prevention: Parameterized queries
- XSS Protection: HTML sanitization
- CSV/TSV - With intelligent delimiter detection
- Excel - .xlsx and .xls with null handling
- JSONL - JSON Lines for streaming data
- Google Sheets - Public sheets import
- Web CSV - Direct URL import
# Option 1: Convert to empty string (default)
NULL, N/A, nan → ""
# Option 2: Preserve as database NULL
NULL, N/A, nan → NULL
# Option 3: Skip problematic rows
Rows with >80% empty → Skip- Progress Tracking - Real-time percentage display
- Cancellation - Abort uploads mid-process
- Size Validation - Check before processing
- Quality Analysis - Data quality scoring
- Error Recovery - Graceful failure handling
- Python 3.11+
- Git
- Docker (for deployment)
-
Clone the repository
git clone https://github.com/edgi-govdata-archiving/edgi-cloud.git cd edgi-cloud -
Set up Python environment
For example:
python -m venv .venv
source .venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtIf you want to start with a completely blank database, take these steps:
rm data/portal.db
export DEFAULT_PASSWORD=[some-password]You can use the script generate_admin_password.py to generate a moderately secure password:
script/generate_admin_password.py-
Initialize database
python init_db.py python migrate_db.py # Apply latest schema updates -
Generate metadata
python plugins/generate_metadata.py
-
Start development server
datasette serve data/portal.db \ --metadata metadata.json \ --plugins-dir=plugins \ --template-dir=templates \ --static static:static
-
Access the portal
- Navigate to
http://localhost:8001 - Default login:
admin / resette2025!(perhaps) - Change password on first login
- Navigate to
-
Deploy to Fly.io
fly deploy
You will likely get a warning that looks like this:
WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.
You can fix this by configuring your app to listen on the following addresses:
- 0.0.0.0:8001
Found these processes inside the machine with open listening sockets:
PROCESS | ADDRESSES
-----------------*----------------------------------------
/.fly/hallpass | [fdaa:2c:9baa:a7b:17a:e4d4:785a:2]:22
This is just Fly.io blocking the SSH port.
NOTE: Currently, this is set up to deploy only via the command line. To stop the Fly.io servers from running, scale down to 0 machines using:
fly scale count 0- Set secrets
fly secrets set CSRF_SECRET_KEY="your-secret-key" fly secrets set DEFAULT_PASSWORD="secure-password"
- User Management - Create users with forced password change
- Portal Customization - Full markdown editor for homepage
- Database Oversight - Preview and manage all databases
- System Settings - Configure limits and policies
- Activity Monitoring - Track user actions
- Trash Management - Recover deleted databases
- Database Creation - Import or create new databases
- Upload Data - Multiple formats with progress tracking
- Markdown Content - Rich text descriptions with GitHub Flavored Markdown
- Preview Mode - Test before publishing
- Publishing Control - Draft and published states
- Profile Management - Password changes and settings
CSRF_SECRET_KEY=your-secret-key
PORTAL_DB_PATH=/data/portal.db
RESETTE_DATA_DIR=/data
DEFAULT_PASSWORD=initial-admin-password
APP_URL=https://your-domain.fly.devmax_file_size- Upload size limitmax_databases_per_user- Database quotatrash_retention_days- Recovery periodallowed_extensions- Permitted file types
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
- Enhanced markdown support implementation
- Upload system improvements
- Security enhancements
- Preview functionality
- v2.5.0 - Full GitHub Flavored Markdown support
- v2.4.0 - Forced password change for new users
- v2.3.0 - Preview mode for unpublished content
- v2.2.0 - JSONL upload support
- v2.1.0 - Enhanced null value handling
- v2.0.0 - Upload cancellation and progress tracking
MIT License - see LICENSE file for details
Built by the Environmental Data & Governance Initiative (EDGI) to democratize environmental data access.
Special thanks to:
- The Datasette community
- Environmental researchers and activists
- Open source contributors
- Issues: GitHub Issues
- Documentation: Wiki
- Contact: EDGI
Democratizing environmental data access, one dataset at a time.