Skip to content

security & IOT Questions analysis from stack overflow #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13,404 changes: 13,404 additions & 0 deletions stack_overflow_security_questions_analysis/IoT-Security-Dataset.csv

Large diffs are not rendered by default.

48 changes: 48 additions & 0 deletions stack_overflow_security_questions_analysis/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import streamlit as st
import pandas as pd
import joblib
import re
from sklearn.feature_extraction.text import TfidfVectorizer

# Load the dataset
df = pd.read_csv('IoT-Security-Dataset.csv')

# Load the saved Random Forest model
rf_model_loaded = joblib.load('random_forest_model.pkl')

# Load and fit the TF-IDF vectorizer on the dataset
tfidf_vectorizer = TfidfVectorizer(max_features=5000)
tfidf_vectorizer.fit(df['Cleaned Sentence'])

# Function to preprocess the input text
def preprocess_text(text):
text = text.lower()
text = re.sub(r'\W', ' ', text)
text = re.sub(r'\d', ' ', text)
text = re.sub(r'\s+[a-z]\s+', ' ', text)
text = re.sub(r'\s+', ' ', text).strip()
return text

# Function to predict if a question is security-related
def predict_security(question, model, vectorizer):
clean_question = preprocess_text(question)
question_tfidf = vectorizer.transform([clean_question])
prediction = model.predict(question_tfidf)
return prediction[0]

# Streamlit app
st.title("Security text Predictor")

st.write("Enter your question below to determine if it is related to security.")

user_question = st.text_area("Your Question")

if st.button("Predict"):
if user_question.strip() != "":
prediction = predict_security(user_question, rf_model_loaded, tfidf_vectorizer)
if prediction == 0:
st.success("This question is security-related.")
else:
st.info("This question is not security-related.")
else:
st.error("Please enter a question.")
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 42 additions & 0 deletions stack_overflow_security_questions_analysis/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Stack Overflow iot security question analysis and predictor

## Models used

- logistic regression
- Random Forest
- SVM
- GBM

## Libraries Used

1. **joblib**: To dowload and laod the model
2. **plotly**: For plotting zooming and 3d visualizations
3. **Matplotlib**: For plotting and visualizing the detection results.
4. **Pandas**: For image manipulation.
5. **NumPy**: For efficient numerical operations.
6. **Streamlit** : for building web app gui.

## dowload model from drive

https://drive.google.com/file/d/12h_fU5WI3KQvXH_qG7RoIKnteceb2fLw/view?usp=sharing

## How to Use

1. **Clone the Repository**:
```sh
git clone url_to_this_repository
```

2. **Install Dependencies**:
```sh
pip install -r requirements.txt
```

3. **Run the Model**:
```python
python main.py
```

4. **View Results**: The script will allow you to predict the text or question from stack overflow is security based or not.


Loading