This project provides a FastAPI-based service for data ingestion, backup, and restoration using PySpark and MySQL. It includes functionality to migrate CSV data to a MySQL database, perform backups, and restore data from backups. The API also includes secure authentication using JWT tokens.
- Data Ingestion: Insert data into MySQL tables with schema validation.
- Backup: Create backups of MySQL tables.
- Restore: Restore tables from specific backup timestamps.
- Schema Management: Load table schemas dynamically from JSON files.
- Secure Authentication: OAuth2 with JWT for secure access to endpoints.
- Reports and Metrics: Generate visual and tabular reports for hiring data and department performance.
- Python 3.8+
- MySQL database
- PySpark
- FastAPI
- MySQL Connector JAR file
-
Clone the repository:
git clone <repository-url> cd <repository-folder>
-
Install dependencies:
pip install -r requirements.txt
-
Set up the
.env
file in the root directory with the following variables:DATABASE_URL=jdbc:mysql://<host>:<port>/<database> DB_USER=<your-database-username> DB_PASS=<your-database-password> SPARK_TEMP_DIR=./temp SPARK_MYSQL_JAR=C:\spark-3.5.5-bin-hadoop3\jars\mysql-connector-j-8.0.33.jar SCHEMA_PATH=data/schema CSV_PATH=data/csv SECRET_KEY=<your-secret-key> ALGORITHM=HS256 ACCESS_TOKEN_EXPIRE_MINUTES=30 USER=<your-username> PASSWORD=<your-password>
-
Place your schema JSON files in the
data/schema
directory and your CSV files in thedata/csv
directory.
Start the FastAPI server:
uvicorn api.main:app --reload
The API will be available at http://127.0.0.1:8000
.
- POST
/token
- Description: Obtain a JWT token for authentication.
- Request Body (form-data example):
curl -X POST http://l27.0.0.1:8000/token \ -F "username=<username>" \ -F "password=<password>>"
- Response:
{ "access_token": "your-jwt-token", "token_type": "bearer" }
- POST
/ingest
- Description: Insert data into a MySQL table.
- Request Body:
{ "table": "table_name", "rows": [ { "field1": "value1", "field2": "value2" } ] }
- Authentication: Requires a valid JWT token in the
Authorization
header.
- GET
/backup
- Description: Create backups for all tables.
- Authentication: Requires a valid JWT token in the
Authorization
header.
- GET
/backup/{table_name}/timestamps
- Description: List available backup timestamps for a table.
- Authentication: Requires a valid JWT token in the
Authorization
header.
- POST
/restore/{table_name}/{timestamp}
- Description: Restore a table to a specific backup timestamp.
- Authentication: Requires a valid JWT token in the
Authorization
header.
- GET
/metrics/hiring_quarterly/{target_year}
- Description: Retrieve the number of hires per quarter by department and job for a specific year.
- Response:
{ "year": 2021, "data": [ { "department": "Engineering", "job": "Developer", "quarter": 1, "count": 10 }, { "department": "HR", "job": "Recruiter", "quarter": 2, "count": 5 } ] }
- GET
/reports/hiring.html
- Description: Generate a visual report (stacked bar chart) of hires per quarter by department for a specific year.
- Query Parameters:
year
: The year to report on (default: 2021).
- Response: HTML page with an embedded chart.
- GET
/metrics/departments_above_average/{year_int}
- Description: Retrieve departments with above-average hires for a specific year.
- Response:
{ "year": 2021, "data": [ { "id": 1, "department": "Engineering", "count": 50 }, { "id": 2, "department": "HR", "count": 30 } ] }
- GET
/reports/departments_above_average.html
- Description: Generate a visual report (pie chart) of departments with above-average hires for a specific year.
- Query Parameters:
year
: The year to report on (default: 2021).
- Response: HTML page with an embedded chart.
The spark/migration.py
script migrates CSV data to MySQL tables. Run it as follows:
python spark/migration.py
- Authentication: All endpoints (except
/token
) require a valid JWT token for access. - Environment Variables: Sensitive information such as database credentials and secret keys are stored in the
.env
file and loaded securely.
.
├── api/
│ ├── main.py # FastAPI application
├── spark/
│ ├── migration.py # Data migration script
│ ├── backup_service.py # Backup and restore utilities
├── data/
│ ├── schema/ # JSON schema files
│ ├── csv/ # CSV data files
├── .env # Environment variables
├── requirements.txt # Python dependencies
└── README.md # Project documentation
This project is licensed under the MIT License.