This repo contains a Colab-style notebook NN_project.ipynb that builds an Intrusion Detection System (IDS) using classic KDD Cup 1999 network traffic data. It compares a simple baseline (Logistic Regression) with a 1D Convolutional Neural Network (CNN) implemented in TensorFlow/Keras.
- Python (notebook)
- Data/ML:
pandas,numpy,scikit-learn - Deep Learning:
tensorflow/keras - Viz:
matplotlib,seaborn
- Source: KDD Cup 1999 (10% subset used in the notebook).
- File expected by the notebook:
kddcup.data_10_percent.gz
You will be prompted in Colab to upload this file. - Targets: The raw
targetlabels are mapped into 5 attack categories via the notebook’sattacks_typesmapping:normaldos(denial of service)prober2l(remote to local)u2r(user to root)
- Load the gzipped CSV with KDD features and a
targetcolumn. - Map raw attack strings to the 5-category Attack Type label.
- Drop the raw
targetcolumn; useAttack TypeasY. - Scale features with
MinMaxScaler. - Train/test split with
train_test_split(test_size=0.33, random_state=42).
Note: The CNN is trained on an input shape of (30, 1), meaning the notebook prepares/uses 30 numeric features reshaped as sequences for 1D convs.
sklearn.linear_model.LogisticRegression(max_iter=500, random_state=42)- Evaluated with accuracy, classification report, and confusion matrix.
The model defined in the notebook (simplified):
inputs = Input(shape=(30, 1))
y = Conv1D(62, 3, padding="same", activation="relu")(inputs)
y = MaxPooling1D(pool_size=2)(y)
y1 = Flatten()(y)
y = Dropout(0.5)(y)
y = Conv1D(62, 3, padding="same", activation="relu")(inputs)
y = MaxPooling1D(pool_size=2)(y)
y2 = Flatten()(y)
y = Dropout(0.5)(y)
y = Conv1D(124, 3, padding="same", activation="relu")(inputs)
y = MaxPooling1D(pool_size=2)(y)
y = Flatten()(y)
y = Dropout(0.5)(y)
y = Dense(256, activation="relu")(y)
y = Dropout(0.5)(y)
y = Dense(5, activation="softmax")(y)
y = Concatenate()([y, y1, y2])
outputs = Dense(5, activation="softmax")(y)
cnn_model = Model(inputs=inputs, outputs=outputs)- Loss:
sparse_categorical_crossentropy - Optimizer:
adam - Metric:
accuracy - Training:
epochs=10,batch_size=32 - Evaluation mirrors the baseline (accuracy, classification report, confusion matrix).
- Upload
NN_project.ipynbto Colab. - Run cells in order. When prompted, upload
kddcup.data_10_percent.gz. - The notebook will:
- preprocess the data,
- train Logistic Regression and the CNN,
- print reports and show confusion matrices.
- Create and activate a virtual environment.
- Install deps:
pip install tensorflow scikit-learn pandas numpy matplotlib seaborn
- Launch Jupyter and open the notebook:
jupyter lab # or jupyter notebook - Run cells; place
kddcup.data_10_percent.gznext to the notebook or adjust the path.
.
├── NN_project.ipynb # Main notebook
├── README.md # You are here
- If you change features, update the CNN input shape
(30, 1)accordingly. - For GPU training in Colab, enable Runtime → Change runtime type → GPU.
-
Classification Model - Logistic regression
-
Classification Model - CNN Model
-
Feature correlation matrix:
-
Logistic regression - Confusion Matrix
-
CNN model - Confusion Matrix:
- KDD Cup 1999 Dataset (Intrusion Detection)
- TensorFlow/Keras & scikit-learn documentation