DSCI551 | Group 48: David Tovmasyan, Jinyang Du, Wenjing Huang
A comprehensive Django-based Database Management System for job postings that enhances the job search and recruitment process by providing a user-friendly, efficient, and interactive job posting platform with distributed database architecture.
- Overview
- Features
- Architecture
- Prerequisites
- Installation
- Configuration
- Usage
- Project Structure
- API Endpoints
- Contributing
- License
This project implements a distributed job posting management system using Django with MySQL database partitioning. The system features:
- Distributed Database Architecture: Uses hash-based partitioning across 3 MySQL databases
- Web Scraping: Automated LinkedIn job data collection using Selenium and BeautifulSoup
- User Authentication: Complete user registration, login, and profile management
- Job Management: CRUD operations for job postings with advanced search capabilities
- Responsive UI: Modern Bootstrap-based interface with interactive features
- π Advanced Job Search: Search by title, location, company, and filters
- π Database Partitioning: Hash-based distribution across multiple databases
- π€ Automated Data Collection: LinkedIn scraping with Selenium WebDriver
- π€ User Management: Registration, authentication, and profile management
- πΎ Data Import/Export: CSV-based data management with custom commands
- π± Responsive Design: Mobile-friendly Bootstrap interface
- β Favorites System: Save and manage favorite job postings
- ποΈ Date Range Operations: Bulk delete jobs within specified time periods
The system uses a distributed database architecture with:
- Primary Database: SQLite for user management and session data
- Partitioned Databases: 3 MySQL databases for job data distribution
- Hash Function: Custom partitioning algorithm based on job location state codes
- Django ORM: Database-agnostic data access layer
Before running this project, ensure you have:
- Python 3.8 or higher
- MySQL 8.0 or higher
- pip (Python package installer)
- Git
git clone https://github.com/Jinyangd/DSCI551_Group48_Project.git
cd DSCI551_Group48_Projectpip install django
pip install mysqlclient
pip install crispy-forms
pip install crispy-bootstrap4
pip install selenium
pip install beautifulsoup4
pip install requestsCreate three MySQL databases for job data partitioning:
CREATE DATABASE db_one;
CREATE DATABASE db_two;
CREATE DATABASE db_three;Edit django_project/django_project/settings.py and update the database configurations:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'jobs'
},
'first': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'db_one',
'USER': 'your_username',
'PASSWORD': 'your_password',
'HOST': 'localhost',
'PORT': '3306',
},
'second': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'db_two',
'USER': 'your_username',
'PASSWORD': 'your_password',
'HOST': 'localhost',
'PORT': '3306',
},
'third': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'db_three',
'USER': 'your_username',
'PASSWORD': 'your_password',
'HOST': 'localhost',
'PORT': '3306',
}
}python manage.py makemigrations
python manage.py migrate
python manage.py migrate --database=first
python manage.py migrate --database=second
python manage.py migrate --database=thirdCreate a .env file in the project root (optional):
SECRET_KEY=your-secret-key-here
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1Collect static files for production:
python manage.py collectstaticDownload the example.csv file with pre-processed job data.
python linkedin_scrape.pyNote: Scraping may take 2+ hours depending on the number of jobs and network conditions.
python manage.py import_jobs "path/to/your/jobs.csv"python manage.py runserverAccess the application at: http://127.0.0.1:8000/
python manage.py remove_jobs "all" "2024-01-01" "2024-12-31"python manage.py createsuperuserdsci551_group48/
βββ django_project/ # Main Django project
β βββ django_project/ # Project settings and configuration
β βββ blog/ # Job posting application
β β βββ management/ # Custom Django commands
β β βββ templates/ # HTML templates
β β βββ static/ # CSS, JS, and static files
β β βββ models.py # Job model and database logic
β βββ users/ # User authentication app
β βββ manage.py # Django management script
βββ linkedin_scrape.py # LinkedIn data scraper
βββ jobs # SQLite database file
βββ media/ # User-uploaded files
- Hash Function Implementation: Database partitioning logic
- LinkedIn Scraper: Automated data collection using Selenium
- Job Model: Database schema and business logic
- Django Settings: Project configuration
GET /- Home page with job listingsGET /job/<int:pk>/- Job detail viewGET /job/new/- Create new job postingPOST /job/<int:pk>/update/- Update job postingPOST /job/<int:pk>/delete/- Delete job posting
GET /register/- User registrationGET /login/- User loginGET /profile/- User profileGET /logout/- User logout
GET /search/- Job search interfaceGET /favorites/- User's favorite jobsGET /user/<str:username>/- User's job postings
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is part of the DSCI551 course at USC. All rights reserved.
- David Tovmasyan - Backend Development & Database Architecture
- Jinyang Du - Frontend Development & UI/UX Design
- Wenjing Huang - Data Scraping & System Integration
For questions or support, please contact the development team or create an issue in the repository.
Built with β€οΈ for DSCI551 Database Systems