This project demonstrates an automated human scream detection system using Machine Learning and Deep Learning techniques. Scream detection plays a crucial role in public safety, surveillance, and emergency response systems. In this project, audio recordings are processed, converted into meaningful feature representations (MFCCs), and classified to identify whether a scream is present. This project was developed as part of my final-year project focusing on real-time threat detection and audio-based crime prevention.
- Project Overview
- Why I Chose This Project
- Problem This Project Solves
- Dataset
- Flow of the Project
- Files in This Repository
- Tech Stack Used and Why
- Usage Instructions
- Results and Insights
- Author
- Contact
The objective of this project is to automatically detect human screams from audio samples. Using a deep learning classification model, the system identifies whether a given audio segment contains a scream or not. The project includes steps like audio preprocessing, MFCC feature extraction, model development, evaluation, and prediction. This pipeline enables efficient and accurate acoustic-based threat detection.
Human scream detection is crucial for applications such as public safety monitoring, surveillance, emergency response, and crime prevention. I chose this project because it involves real-world problem solving using deep learning techniques on audio data. Working on this project helped me understand audio signal processing, feature extraction with MFCC, and building classification models. It also strengthened my ability to develop systems that can turn raw audio data into actionable insights.
In emergency or high-risk situations, recognizing human screams quickly is critical. Manual monitoring of audio feeds is inefficient and prone to errors. This project provides an automated solution to detect screams, enabling faster responses to potential emergencies, improving public safety, and assisting law enforcement agencies in real-time threat detection.
The dataset used consists of labeled audio recordings, categorized as scream and non-scream. It contains a variety of environmental sounds, human vocal expressions, and background noise to make the model robust across different scenarios.
Dataset Link: Download Human Scream Detection Dataset from Kaggle
For feature extraction, each audio file is preprocessed (trimmed, resampled, converted to mono) and then transformed into MFCC (Mel-Frequency Cepstral Coefficients) features. These features serve as input to the deep learning classification model.
The workflow of this project is designed to transform raw audio into meaningful predictions. The steps include:
1. Load Dataset
Audio files are loaded from scream and non-scream folders.
2. Audio Preprocessing
- Noise reduction
- Resampling to a fixed rate
- Conversion to mono
- Normalization
3. Feature Extraction (MFCC)
MFCCs are extracted from each audio file to convert waveform signals into frequency-based feature representations.
4. Splitting the Dataset
The dataset is divided into training and testing subsets to evaluate model performance.
5. Model Building
Maching Learning Models like SVM , RF and deep learning classification model CNN is built using layers such as:
- Dense layers
- Dropout for regularization
- Softmax/Sigmoid output layer (depending on binary/multi-class setup)
6. Model Training
The model is trained on MFCC feature vectors for optimal classification.
7. Evaluation
Accuracy, loss curves, and confusion matrix are generated to measure model performance.
8. Prediction
The trained model predicts whether new audio samples contain a scream.
scream_detection.ipynb – Jupyter Notebook containing the complete implementation
IEEE Format Paper/ – Project Documentation
Project Report/ – Elobrated Project Documentation
requirements.txt – List of dependencies for the project
README.md – Project documentation
Python: Core programming language for audio analysis and model development
Librosa: Audio processing and MFCC feature extraction
NumPy: Numerical computations
Pandas: Data handling and metadata organization
Matplotlib & Seaborn: Visualization of waveforms, spectrograms, and metrics
TensorFlow / Keras: Deep learning model creation and training
Scikit-learn: Dataset splitting, evaluation metrics, and confusion matrix
These tools provide a complete ecosystem for audio processing, feature extraction, deep learning, and evaluation.
1. Clone the repository
git clone https://github.com/JAHNAVIDINGARI/Human-Scream-Detection-for-controlling-Crime-Rate.git
2. Navigate to the project directory
cd human-scream-detection
3. Install dependencies
pip install -r requirements.txt
4. Run the notebook
Open scream_detection.ipynb, update dataset path if required, and execute all cells to train and evaluate the model.
The deep learning model trained on the Human Scream Detection dataset achieved good performance in classifying scream and non-scream audio clips. The model was able to learn distinguishing features from the audio signals after preprocessing and MFCC feature extraction.
Key results from the model include:
The CNN model out-perfomed and achieved an accuracy of approximately 95% t.
The loss curve showed steady convergence during training, indicating proper learning without overfitting.
Precision, Recall, and F1-Score indicate that the model is reliable in detecting human screams while minimizing false positives.
The confusion matrix shows clear separation between scream and non-scream classes, demonstrating that the model can effectively identify screams even in noisy environments.
These results confirm that the approach is robust for real-world applications such as public safety monitoring, emergency detection systems, and surveillance. By deploying this model, authorities can detect distress signals in real-time and respond promptly.
Jahnavi Dingari
Sandeep Pulata
Sountharrajan S
For queries, collaboration, or further discussion regarding this project, please reach out via LinkedIn or email:
LinkedIn: https://www.linkedin.com/in/jahnavi-dingari
Email: jahnavidingari04@gmail.com