FAQ

❓ Frequently Asked Questions (FAQ)

Quick Answers to Common Questions

FAQ Help

📑 Table of Contents

General Questions What is Logistic Regression? When should I use it? Linear vs Logistic Regression?	Technical Questions How to choose learning rate? How many iterations? Why scale features?

🎯 General Questions

What is Logistic Regression?

Answer:

Logistic Regression is a classification algorithm (not regression!) used for binary classification problems. It predicts the probability that an instance belongs to a particular class.

Key Points:

✅ Outputs probabilities between 0 and 1
✅ Uses sigmoid activation function
✅ Optimized with gradient descent
✅ Simple, interpretable, and effective

Example Use Cases:

Email: Spam or Not Spam
Medical: Disease or No Disease
Finance: Fraud or Legitimate
Marketing: Click or No Click

When should I use Logistic Regression?

Answer:

Use Logistic Regression when:

✅ Good For Binary classification (2 classes) Linearly separable data Need probability estimates Want interpretable model Small to medium datasets Baseline model comparison Real-time predictions	❌ Not Ideal For Multi-class without modification Non-linear decision boundaries Complex feature interactions Very large datasets Image/text classification (use deep learning) When accuracy is paramount

What's the difference between Linear and Logistic Regression?

Answer:

Aspect	Linear Regression	Logistic Regression
Type	Regression	Classification
Output	Continuous values (-∞ to +∞)	Probabilities (0 to 1)
Activation	None (identity)	Sigmoid function
Cost Function	Mean Squared Error	Binary Cross-Entropy
Use Case	Predict house prices	Predict spam/not spam
Example Output	250,000 (price in $)	0.85 (85% spam)

Visual Difference:

Linear Regression              Logistic Regression
      y                              y
      │    /                         │        ┌────
      │   /                          │       /
      │  /                           │      /
      │ /                            │     /
      └────────── x                  └────────── x
   (Continuous line)            (S-shaped curve)

Is this implementation suitable for production?

Answer:

**For Learning: YES! ** ✅
For Production: Use scikit-learn ⚠️

Why this implementation:

✅ Learn algorithm internals
✅ Understand mathematics
✅ Educational purposes
✅ Small projects/prototypes

For production, use sklearn:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

Reasons:

⚡ Highly optimized (C/Cython)
🛡️ Battle-tested and robust
📚 Well-documented
🔧 More features (multiclass, regularization, etc.)
🐛 Bug fixes and maintenance

🔧 Technical Questions

How do I choose the learning rate?

Answer:

Rule of Thumb: Start with 0.01 and adjust based on results.

Method 1: Trial and Error

learning_rates = [0.001, 0.01, 0.1, 1.0]

for lr in learning_rates: 
    model = LogisticRegression(learning_rate=lr, n_iterations=1000)
    model.fit(X_train, y_train)
    
    print(f"LR = {lr}:")
    print(f"  Final cost: {model.cost_history[-1]:.4f}")
    print(f"  Test accuracy: {model.score(X_test, y_test):.4f}\n")

Method 2: Visual Inspection

for lr in [0.001, 0.01, 0.1, 1.0]:
    model = LogisticRegression(learning_rate=lr, n_iterations=200)
    model.fit(X_train, y_train)
    plt.plot(model.cost_history, label=f'LR={lr}')

plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Finding Optimal Learning Rate')
plt.legend()
plt.show()

Guidelines:

Learning Rate	Behavior	Recommendation
< 0.0001	Very slow convergence	❌ Too slow
0.001 - 0.01	Smooth, steady decrease	✅ Good default
0.1 - 0.5	Fast convergence	⚠️ Monitor carefully
> 1.0	Oscillation or divergence	❌ Too high

How many iterations do I need?

Answer:

Typical Range: 500 - 2000 iterations

Method 1: Plot Cost History

model = LogisticRegression(learning_rate=0.01, n_iterations=2000, verbose=True)
model.fit(X_train, y_train)

plt.plot(model.cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Cost History - Check Convergence')
plt.grid(True, alpha=0.3)
plt.show()

Look for:

✅ Cost plateaus (converged) → Can stop
⚠️ Still decreasing → Need more iterations
❌ Oscillating → Reduce learning rate

Method 2: Early Stopping

class LogisticRegressionEarlyStopping(LogisticRegression):
    def fit(self, X, y, patience=50, min_delta=1e-4):
        # ... training loop ... 
        
        if i > patience:
            recent_costs = self.cost_history[-patience:]
            if max(recent_costs) - min(recent_costs) < min_delta:
                print(f"Early stopping at iteration {i}")
                break

Guidelines:

Small dataset (< 1000 samples): 500-1000 iterations
Medium dataset (1000-10000): 1000-2000 iterations
Large dataset: Use mini-batch + early stopping

Why do I need to scale features?

Answer:

Without Scaling:

Feature 1 (Age):      20 - 80
Feature 2 (Income): 20,000 - 200,000

→ Income dominates gradient updates!

With Scaling:

Feature 1 (Age):    -1.5 to 1.5
Feature 2 (Income): -1.5 to 1.5

→ Equal contribution to learning!

Visual Impact:

Without Scaling              With Scaling
                            
Cost Function Contours:    

      Income                    Feature 2
        │                          │
     │││││                       ───────
     │││││                       ───────
     │││││                       ───────
        └──── Age                └──── Feature 1

 (Elongated ellipse)         (Circular)
  Slow convergence          Fast convergence

Code Example:

from sklearn.preprocessing import StandardScaler

# Before scaling
model = LogisticRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train, y_train)  # May not converge! 

# After scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler. transform(X_test)

model = LogisticRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train_scaled, y_train)  # Converges smoothly!

Bottom Line: Always scale features for faster, more stable training! ✅

What's the difference between fit_transform and transform?

Answer:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# fit_transform: Learn parameters AND transform
X_train_scaled = scaler.fit_transform(X_train)
# This computes mean and std from X_train, then scales it

# transform: Only transform (use learned parameters)
X_test_scaled = scaler.transform(X_test)
# This uses the mean and std from X_train to scale X_test

CRITICAL RULE:

✅ CORRECT

# Fit on TRAIN only
scaler. fit(X_train)

# Transform both
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

❌ WRONG

# DON'T fit on test! 
X_test_scaled = scaler.fit_transform(X_test)

# This causes data leakage!

**Why? ** Test data should simulate "unseen" data. If you fit on test data, you're "cheating"!

How do I handle imbalanced datasets?

Answer:

Problem: 95% class 0, 5% class 1 → Model predicts all class 0 and gets 95% accuracy!

Solutions:

1. Class Weights

model = LogisticRegressionWeighted(
    learning_rate=0.01,
    n_iterations=1000,
    class_weight='balanced'
)

2. Resampling

from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

print("Original:", np.bincount(y_train))
print("Resampled:", np.bincount(y_resampled))

3. Adjust Threshold

# Instead of default 0.5
probabilities = model.predict_proba(X_test)
predictions = (probabilities >= 0.3).astype(int)  # Lower threshold

4. Use Different Metrics

from sklearn.metrics import f1_score, precision_score, recall_score

# Don't rely on accuracy alone!
print("F1 Score:", f1_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))

What is regularization and when should I use it?

Answer:

Regularization adds a penalty to prevent overfitting by discouraging large weights.

When to Use:

Symptom	Solution
Train accuracy >> Test accuracy	✅ Add L2 regularization
Model too complex	✅ Add L2 regularization
Many features, few samples	✅ Add L2 regularization
Want feature selection	✅ Add L1 regularization

L2 Example:

model = LogisticRegressionL2(
    learning_rate=0.01,
    n_iterations=1000,
    lambda_reg=0.1  # Start here, tune between 0.001 and 10
)
model.fit(X_train_scaled, y_train)

How to Choose λ (lambda):

lambdas = [0.001, 0.01, 0.1, 1, 10]
best_lambda = None
best_score = 0

for lam in lambdas:
    model = LogisticRegressionL2(learning_rate=0.01, n_iterations=1000, lambda_reg=lam)
    model.fit(X_train_scaled, y_train)
    score = model.score(X_val_scaled, y_val)
    
    if score > best_score: 
        best_score = score
        best_lambda = lam

print(f"Best lambda: {best_lambda}")

Can Logistic Regression handle multi-class classification?

Answer:

**Yes, with modifications! **

Method 1: One-vs-Rest (OvR)

# Train 3 binary classifiers for 3 classes
# Class 0 vs (1,2)
# Class 1 vs (0,2)
# Class 2 vs (0,1)

# Predict using highest probability

Method 2: Use Scikit-Learn

from sklearn.linear_model import LogisticRegression

# Automatically handles multi-class
model = LogisticRegression(multi_class='ovr')  # or 'multinomial'
model. fit(X_train, y_train)

This Repository:

✅ Focuses on binary classification
✅ Educational implementation
⚠️ For multi-class, use sklearn

📊 Performance Questions

What accuracy should I expect?

Answer:

**It depends on the dataset! **

Baselines:

Random guessing (balanced): 50%
Random guessing (90% class 0): 90% (but useless!)
Majority class baseline: Predict most common class

Typical Performance:

✅ Good model: 75-90% accuracy
🌟 Excellent model: 90-95% accuracy
⚠️ > 99%: Check for data leakage or very easy problem

Better Metrics for Classification:

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

Focus on:

Precision: Of predicted positives, how many are correct?
Recall: Of actual positives, how many did we catch?
F1-Score: Harmonic mean of precision and recall
ROC-AUC: Overall performance across thresholds

My model has 99% accuracy but doesn't work. Why?

Answer:

Likely Causes:

1. Data Leakage ⚠️

# WRONG: Fit scaler on all data
scaler = StandardScaler()
X_all_scaled = scaler.fit_transform(X)  # Includes test data!
X_train, X_test = train_test_split(X_all_scaled, ...)

# CORRECT: Fit only on training
X_train, X_test = train_test_split(X, ...)
scaler.fit(X_train)
X_train_scaled = scaler. transform(X_train)
X_test_scaled = scaler. transform(X_test)

2. Target Leakage

# Features that shouldn't be available at prediction time
# Example: Using "purchase_amount" to predict "will_purchase"

3. Class Imbalance

# 99% class 0, 1% class 1
# Model predicts all class 0 → 99% accuracy but useless! 

# Check: 
print(np.bincount(y_test))
print(np.bincount(y_pred))

4. Training on Test Data

# WRONG
model. fit(X_test, y_test)
accuracy = model.score(X_test, y_test)  # Of course it's high!

How to Detect:

Check if test accuracy >> typical for problem
Look at confusion matrix
Verify data pipeline
Check feature engineering

How do I improve model performance?

Answer:

Checklist to Improve Performance:

1. Data Quality 🧹

# Remove duplicates
df = df.drop_duplicates()

# Handle missing values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_filled = imputer.fit_transform(X)

# Remove outliers
from scipy import stats
z_scores = np.abs(stats.zscore(X))
X_clean = X[(z_scores < 3).all(axis=1)]

2. Feature Engineering 🔧

# Add polynomial features
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Feature interactions
X['age_income'] = X['age'] * X['income']

# Domain-specific features

3. Hyperparameter Tuning ⚙️

# Grid search
param_grid = {
    'learning_rate': [0.001, 0.01, 0.1],
    'n_iterations': [500, 1000, 2000],
    'lambda_reg': [0.01, 0.1, 1.0]
}

best_score = 0
for lr in param_grid['learning_rate']: 
    for iters in param_grid['n_iterations']:
        for lam in param_grid['lambda_reg']: 
            model = LogisticRegressionL2(lr, iters, lam)
            model.fit(X_train, y_train)
            score = model.score(X_val, y_val)
            
            if score > best_score: 
                best_score = score
                best_params = {'lr': lr, 'iters': iters, 'lambda':  lam}

4. Handle Class Imbalance ⚖️

# Use class weights
model = LogisticRegressionWeighted(class_weight='balanced')

# Or resample
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote. fit_resample(X, y)

5. Ensemble Methods 🎯

# Train multiple models and vote
models = [model1, model2, model3]
predictions = [m.predict(X_test) for m in models]
final_pred = np.round(np.mean(predictions, axis=0))

6. Get More Data 📊

More samples → Better generalization
Data augmentation (for images/text)

🛠️ Implementation Questions

Why use NumPy instead of pure Python?

Answer:

**Speed! ** NumPy is 10-100x faster.

Comparison:

import numpy as np
import time

# Pure Python (slow ❌)
def python_dot(X, weights):
    result = []
    for i in range(len(X)):
        total = 0
        for j in range(len(weights)):
            total += X[i][j] * weights[j]
        result.append(total)
    return result

# NumPy (fast ✅)
def numpy_dot(X, weights):
    return np.dot(X, weights)

# Test
X = np.random.rand(10000, 50)
weights = np.random.rand(50)

# Python
start = time. time()
python_dot(X. tolist(), weights.tolist())
python_time = time.time() - start

# NumPy
start = time.time()
numpy_dot(X, weights)
numpy_time = time.time() - start

print(f"Python time: {python_time:.4f}s")
print(f"NumPy time: {numpy_time:. 4f}s")
print(f"Speedup: {python_time / numpy_time:.1f}x")

Output:

Python time: 0.8234s
NumPy time: 0.0012s
Speedup: 686.2x

**Why so fast? **

✅ Written in C
✅ Vectorized operations
✅ Optimized memory access
✅ SIMD instructions

Can I save my trained model?

Answer:

Yes! Use pickle.

import pickle

# Save model
with open('logistic_model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Also save scaler! 
with open('scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

print("✅ Model saved!")

# Load model
with open('logistic_model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

with open('scaler.pkl', 'rb') as f:
    loaded_scaler = pickle.load(f)

# Use loaded model
X_new_scaled = loaded_scaler.transform(X_new)
predictions = loaded_model.predict(X_new_scaled)

print("✅ Model loaded and used!")

Complete Save/Load Function:

def save_model(model, scaler, filename='model.pkl'):
    """Save model and scaler together"""
    model_data = {
        'model': model,
        'scaler': scaler,
        'weights': model.weights,
        'bias': model.bias,
        'learning_rate': model.learning_rate,
        'n_iterations':  model.n_iterations
    }
    
    with open(filename, 'wb') as f:
        pickle.dump(model_data, f)
    
    print(f"✅ Model saved to {filename}")

def load_model(filename='model.pkl'):
    """Load model and scaler"""
    with open(filename, 'rb') as f:
        model_data = pickle.load(f)
    
    print(f"✅ Model loaded from {filename}")
    return model_data['model'], model_data['scaler']

# Usage
save_model(model, scaler, 'my_model.pkl')
model, scaler = load_model('my_model.pkl')

How do I use this model in a web app?

Answer:

Example Flask App:

from flask import Flask, request, jsonify
import pickle
import numpy as np

app = Flask(__name__)

# Load model at startup
with open('model.pkl', 'rb') as f:
    model_data = pickle.load(f)
    model = model_data['model']
    scaler = model_data['scaler']

@app.route('/predict', methods=['POST'])
def predict():
    """
    Endpoint for predictions
    
    Input JSON:
    {
        "features": [1. 5, 2.3, 0.8, ...]
    }
    """
    try:
        # Get data from request
        data = request. get_json()
        features = np.array(data['features']).reshape(1, -1)
        
        # Scale features
        features_scaled = scaler. transform(features)
        
        # Make prediction
        prediction = model.predict(features_scaled)[0]
        probability = model.predict_proba(features_scaled)[0]
        
        # Return result
        return jsonify({
            'prediction': int(prediction),
            'probability': float(probability),
            'status': 'success'
        })
    
    except Exception as e:
        return jsonify({
            'status': 'error',
            'message': str(e)
        }), 400

if __name__ == '__main__':
    app.run(debug=True)

Test the API:

import requests

response = requests.post('http://localhost:5000/predict',
                        json={'features': [1.5, 2.3, 0.8, 1.2]})

print(response.json())
# Output: {'prediction': 1, 'probability': 0.8765, 'status': 'success'}

🎓 Learning Questions

I'm new to machine learning. Where should I start?

Answer:

Learning Path:

1. Prerequisites 📚

✅ Python basics
✅ NumPy fundamentals
✅ Basic linear algebra (vectors, matrices)
✅ Basic calculus (derivatives)

2. Start Here 🚀

Read Getting Started
Understand Mathematical Foundation
Study Implementation Guide
Practice with notebooks

3. Resources 📖

Andrew Ng's Machine Learning Course (Coursera)
"Introduction to Statistical Learning" (free book)
GeeksforGeeks tutorials
This repository's wiki!

4. Practice Projects 💪

Iris dataset classification
Titanic survival prediction
Credit card fraud detection
Customer churn prediction

What math do I need to know?

Answer:

Essential Math:

1. Linear Algebra 📐

# Dot product
z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b

# Matrix form
z = Xw + b

# NumPy: 
z = np.dot(X, weights) + bias

2. Calculus 📊

# Derivative of sigmoid
σ'(z) = σ(z)(1 - σ(z))

# Gradient (partial derivatives)
∂J/∂w = (1/m) * X^T * (ŷ - y)

3. Probability 🎲

# Sigmoid outputs probability
P(y=1|x) = σ(w^T x + b)
P(y=0|x) = 1 - P(y=1|x)

4. Logarithms 📉

# Used in cost function
cost = -[y*log(ŷ) + (1-y)*log(1-ŷ)]

**Don't worry! ** You can still use the implementation and learn math gradually.

How is this different from sklearn?

Answer:

Aspect	This Repo	Scikit-Learn
Purpose	Learning & understanding	Production use
Speed	Slower (pure Python/NumPy)	Faster (C/Cython)
Features	Basic implementation	Full-featured
Customization	Easy to modify	Harder to modify
Documentation	Educational	Production-focused
Use For	Learning, teaching, prototypes	Real applications

When to use each:

Use This Repo:

📚 Learning how algorithms work
🎓 Teaching machine learning
🔬 Experimenting with modifications
🚀 Quick prototypes

Use Scikit-Learn:

🏭 Production systems
⚡ Performance-critical apps
🛡️ Need reliability
📊 Complex ML pipelines

💡 Best Practices

What are the most common mistakes?

Answer:

Top 10 Mistakes:

1. Not Scaling Features ❌

# WRONG
model.fit(X_train, y_train)

# RIGHT
scaler = StandardScaler()
X_train_scaled = scaler. fit_transform(X_train)
model.fit(X_train_scaled, y_train)

2. Fitting Scaler on Test Data ❌

# WRONG
X_test_scaled = scaler. fit_transform(X_test)

# RIGHT
X_test_scaled = scaler.transform(X_test)

3. Using Wrong Metrics ❌

# WRONG: Only accuracy for imbalanced data
print("Accuracy:", accuracy_score(y_test, y_pred))

# RIGHT: Multiple metrics
print("F1:", f1_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))

4. Ignoring Data Leakage ❌

# WRONG:  Scale before split
X_scaled = scaler.fit_transform(X)
X_train, X_test = train_test_split(X_scaled, y)

# RIGHT: Split then scale
X_train, X_test = train_test_split(X, y)
scaler. fit(X_train)
X_train_scaled = scaler. transform(X_train)
X_test_scaled = scaler. transform(X_test)

5. Not Setting Random Seed ❌

# WRONG: Results not reproducible
X_train, X_test = train_test_split(X, y)

# RIGHT:  Reproducible results
X_train, X_test = train_test_split(X, y, random_state=42)

6. Learning Rate Too High ❌

# WRONG: Cost explodes
model = LogisticRegression(learning_rate=10. 0)

# RIGHT
model = LogisticRegression(learning_rate=0.01)

7. Not Checking for NaN ❌

# WRONG: Train with NaN values
model. fit(X_train, y_train)  # May have NaN! 

# RIGHT
assert not np.isnan(X_train).any(), "Data has NaN!"
assert not np.isnan(y_train).any(), "Labels have NaN!"

8. Testing on Training Data ❌

# WRONG
model.fit(X_train, y_train)
accuracy = model.score(X_train, y_train)  # Overly optimistic!

# RIGHT
accuracy = model.score(X_test, y_test)

9. Ignoring Class Imbalance ❌

# WRONG: Ignore 95-5 split
model. fit(X_train, y_train)

# RIGHT: Use class weights
model = LogisticRegressionWeighted(class_weight='balanced')

10. Not Monitoring Training ❌

# WRONG: Train blindly
model.fit(X_train, y_train)

# RIGHT: Check convergence
model.plot_cost_history()

🆘 Still Have Questions?

Where can I get more help?

Answer:

📚 Documentation Read the Wiki Pages	🐛 Issues Open a GitHub Issue	💬 Community Ask on Stack Overflow

🎉 Happy Learning!

← Troubleshooting | Back to Home

FAQ

❓ Frequently Asked Questions (FAQ)

Quick Answers to Common Questions

📑 Table of Contents

General Questions

Technical Questions

🎯 General Questions

What is Logistic Regression?

When should I use Logistic Regression?

✅ Good For

❌ Not Ideal For

What's the difference between Linear and Logistic Regression?

Is this implementation suitable for production?

🔧 Technical Questions

How do I choose the learning rate?

How many iterations do I need?

Why do I need to scale features?

What's the difference between fit_transform and transform?

✅ CORRECT

❌ WRONG

How do I handle imbalanced datasets?

What is regularization and when should I use it?

Can Logistic Regression handle multi-class classification?

📊 Performance Questions

What accuracy should I expect?

My model has 99% accuracy but doesn't work. Why?

How do I improve model performance?

🛠️ Implementation Questions

Why use NumPy instead of pure Python?

Can I save my trained model?

How do I use this model in a web app?

🎓 Learning Questions

I'm new to machine learning. Where should I start?

What math do I need to know?

How is this different from sklearn?

💡 Best Practices

What are the most common mistakes?

🆘 Still Have Questions?

Where can I get more help?

📚 Documentation

🐛 Issues

💬 Community

🎉 Happy Learning!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally