A comprehensive platform that empowers researchers, organizations, and environmental advocates to share critical environmental datasets as interactive, accessible websites. Built on Datasette with a custom administrative panel for seamless data publishing and portal customization.
The EDGI Cloud Portal serves the critical mission of democratizing environmental data access and supporting evidence-based environmental policy. Every dataset shared brings us closer to a more informed, sustainable future.
- Full GitHub Flavored Markdown - Complete implementation with headers, lists, links, bold, italic, code blocks
- Dynamic Rendering - Custom render_links.py plugin for runtime markdown processing
- Protected URLs - Smart markdown processing that preserves URL integrity
- Metadata Generation - Added generate_metadata.py for dynamic configuration
- Forced Password Change - New users must change password on first login
- Enhanced Authentication - Improved session management and password policies
- User Activity Tracking - Comprehensive logging of user actions
- Unpublished Database Preview - Owners can preview databases before publishing
- Portal Homepage Preview - Test customizations before going live
- Access Verification - Improved permission checking for preview features
- JSONL Support - Added JSON Lines format for streaming data
- Progress Tracking - Real-time upload progress with cancellation
- Null Value Handling - Three configurable strategies for empty cells
- File Validation - Pre-upload size and format checking
- Cancel Feature - Abort long-running uploads gracefully
- Excel Processing - Fixed Excel file handling with proper null processing
- Name Validation - Check database name availability before import
- Size Validation - Verify file size limits before processing
- Error Messages - User-friendly feedback for validation failures
- Multi-Source Upload - CSV, Excel (.xlsx, .xls), JSONL, Google Sheets, and Web CSV
- Advanced Null Handling - Three strategies: empty string, preserve NULL, or skip rows
- Data Quality Analysis - Automated assessment and reporting
- Large File Support - Efficient processing with progress tracking
- Real-time Progress - Live upload progress with cancellation
- Connection Testing - Pre-upload validation for remote sources
- GitHub Flavored Markdown - Full markdown support for rich content
# Headers **Bold text** and *italic text* [Links](https://example.com) ## Lists - Bullet points 1. Numbered lists `code blocks` and more!
- Dynamic Rendering - Markdown processed at runtime, not just at startup
- Custom Homepages - Database-specific branding and descriptions
- Custom Branding - Professional portals with organization identity
- Instant Publishing - Share data with preview before publishing
- Advanced Search - Filter, sort, and explore datasets
- API Access - Programmatic data access for developers
- Trash System - Safe deletion with recovery options
- Role-Based Access - System admin and user roles
- Password Policies - Forced change on first login
- Activity Monitoring - Comprehensive logging
- User Management - Complete account lifecycle
- System Configuration - Runtime settings via admin interface
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β Data Layer β
β (19 Templates) β β (9 Plugins) β β β
β β β β β β
β β’ Upload UI βββββΊβ β’ Upload Engine βββββΊβ β’ SQLite DBs β
β β’ Management UI β β β’ Database Mgmt β β β’ Portal DB β
β β’ Admin Panel β β β’ Markdown Renderβ β β’ User Data β
β β’ Preview Mode β β β’ Auth System β β β’ File Storage β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
- Backend: Python 3.11+ with Datasette framework
- Database: SQLite with sqlite-utils
- Data Processing: Pandas, openpyxl for Excel
- Frontend: HTML5, Tailwind CSS 3.4
- Markdown: Custom GitHub Flavored Markdown processor
- Authentication: bcrypt with session management
- Deployment: Docker containers on Fly.io
edgi-cloud/
βββ π README.md # This documentation
βββ π requirements.txt # Python dependencies
βββ π Dockerfile # Container configuration
βββ π fly.toml # Fly.io deployment
βββ π init_db.py # Database initialization
βββ π migrate_db.py # Database migrations
β
βββ π plugins/ # Datasette plugins (9 modules)
β βββ π upload_table.py # Multi-source upload with progress
β βββ π common_utils.py # Shared utilities & markdown processor
β βββ π manage_databases.py # Database lifecycle management
β βββ π admin_panel.py # System administration
β βββ π create_database.py # Database creation workflows
β βββ π delete_db.py # Deletion with trash system
β βββ π user_profile.py # User account management
β βββ π render_links.py # Dynamic markdown rendering
β βββ π generate_metadata.py # Metadata generation
β
βββ π templates/ # User interfaces (19 templates)
β βββ [Template files for all UI components]
β
βββ π static/ # Static assets
β βββ π styles.css # Custom styles
β βββ π js/ # JavaScript modules
β
βββ π data/ # Data storage
βββ π portal.db # Main portal database
βββ π {user_id}/ # User databases
- Forced Password Change - New users must set their own password
- Preview Access Control - Strict permission checking for unpublished content
- Upload Validation - Pre-upload checks for file size and format
- Session Management - Enhanced cookie-based authentication
- Password Security: bcrypt hashing
- CSRF Protection: Token validation
- Input Validation: Comprehensive sanitization
- SQL Injection Prevention: Parameterized queries
- XSS Protection: HTML sanitization
- CSV/TSV - With intelligent delimiter detection
- Excel - .xlsx and .xls with null handling
- JSONL - JSON Lines for streaming data
- Google Sheets - Public sheets import
- Web CSV - Direct URL import
# Option 1: Convert to empty string (default)
NULL, N/A, nan β ""
# Option 2: Preserve as database NULL
NULL, N/A, nan β NULL
# Option 3: Skip problematic rows
Rows with >80% empty β Skip
- Progress Tracking - Real-time percentage display
- Cancellation - Abort uploads mid-process
- Size Validation - Check before processing
- Quality Analysis - Data quality scoring
- Error Recovery - Graceful failure handling
- Python 3.11+
- Git
- Docker (for deployment)
-
Clone the repository
git clone https://github.com/edgi-govdata-archiving/edgi-cloud.git cd edgi-cloud
-
Set up Python environment
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt
-
Initialize database
python init_db.py python migrate_db.py # Apply latest schema updates
-
Generate metadata
python plugins/generate_metadata.py
-
Start development server
datasette serve data/portal.db \ --metadata metadata.json \ --plugins-dir=plugins \ --template-dir=templates \ --static static:static \ --reload
-
Access the portal
- Navigate to
http://localhost:8001
- Default login:
admin / edgi2025!
- Change password on first login
- Navigate to
-
Deploy to Fly.io
fly deploy
-
Set secrets
fly secrets set CSRF_SECRET_KEY="your-secret-key" fly secrets set DEFAULT_PASSWORD="secure-password"
- User Management - Create users with forced password change
- Portal Customization - Full markdown editor for homepage
- Database Oversight - Preview and manage all databases
- System Settings - Configure limits and policies
- Activity Monitoring - Track user actions
- Trash Management - Recover deleted databases
- Database Creation - Import or create new databases
- Upload Data - Multiple formats with progress tracking
- Markdown Content - Rich text descriptions with GitHub Flavored Markdown
- Preview Mode - Test before publishing
- Publishing Control - Draft and published states
- Profile Management - Password changes and settings
CSRF_SECRET_KEY=your-secret-key
PORTAL_DB_PATH=/data/portal.db
RESETTE_DATA_DIR=/data
DEFAULT_PASSWORD=initial-admin-password
APP_URL=https://your-domain.fly.dev
max_file_size
- Upload size limitmax_databases_per_user
- Database quotatrash_retention_days
- Recovery periodallowed_extensions
- Permitted file types
# Run tests
python -m pytest tests/
# With coverage
coverage run -m pytest
coverage report
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
- Enhanced markdown support implementation
- Upload system improvements
- Security enhancements
- Preview functionality
- v2.5.0 - Full GitHub Flavored Markdown support
- v2.4.0 - Forced password change for new users
- v2.3.0 - Preview mode for unpublished content
- v2.2.0 - JSONL upload support
- v2.1.0 - Enhanced null value handling
- v2.0.0 - Upload cancellation and progress tracking
MIT License - see LICENSE file for details
Built by the Environmental Data & Governance Initiative (EDGI) to democratize environmental data access.
Special thanks to:
- The Datasette community
- Environmental researchers and activists
- Open source contributors
- Issues: GitHub Issues
- Documentation: Wiki
- Contact: EDGI
Democratizing environmental data access, one dataset at a time.