Skip to content
SRIJA DE CHOWDHURY edited this page Jan 4, 2026 · 1 revision

❓ Frequently Asked Questions (FAQ)

Quick Answers to Common Questions

FAQ Help


πŸ“‘ Table of Contents

General Questions

Technical Questions


🎯 General Questions

What is Logistic Regression?

Answer:

Logistic Regression is a classification algorithm (not regression!) used for binary classification problems. It predicts the probability that an instance belongs to a particular class.

Key Points:

  • βœ… Outputs probabilities between 0 and 1
  • βœ… Uses sigmoid activation function
  • βœ… Optimized with gradient descent
  • βœ… Simple, interpretable, and effective

Example Use Cases:

  • Email: Spam or Not Spam
  • Medical: Disease or No Disease
  • Finance: Fraud or Legitimate
  • Marketing: Click or No Click

When should I use Logistic Regression?

Answer:

Use Logistic Regression when:

βœ… Good For

  • Binary classification (2 classes)
  • Linearly separable data
  • Need probability estimates
  • Want interpretable model
  • Small to medium datasets
  • Baseline model comparison
  • Real-time predictions

❌ Not Ideal For

  • Multi-class without modification
  • Non-linear decision boundaries
  • Complex feature interactions
  • Very large datasets
  • Image/text classification (use deep learning)
  • When accuracy is paramount

What's the difference between Linear and Logistic Regression?

Answer:

Aspect Linear Regression Logistic Regression
Type Regression Classification
Output Continuous values (-∞ to +∞) Probabilities (0 to 1)
Activation None (identity) Sigmoid function
Cost Function Mean Squared Error Binary Cross-Entropy
Use Case Predict house prices Predict spam/not spam
Example Output 250,000 (price in $) 0.85 (85% spam)

Visual Difference:

Linear Regression              Logistic Regression
      y                              y
      β”‚    /                         β”‚        β”Œβ”€β”€β”€β”€
      β”‚   /                          β”‚       /
      β”‚  /                           β”‚      /
      β”‚ /                            β”‚     /
      └────────── x                  └────────── x
   (Continuous line)            (S-shaped curve)

Is this implementation suitable for production?

Answer:

**For Learning: YES! ** βœ…
For Production: Use scikit-learn ⚠️

Why this implementation:

  • βœ… Learn algorithm internals
  • βœ… Understand mathematics
  • βœ… Educational purposes
  • βœ… Small projects/prototypes

For production, use sklearn:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

Reasons:

  • ⚑ Highly optimized (C/Cython)
  • πŸ›‘οΈ Battle-tested and robust
  • πŸ“š Well-documented
  • πŸ”§ More features (multiclass, regularization, etc.)
  • πŸ› Bug fixes and maintenance

πŸ”§ Technical Questions

How do I choose the learning rate?

Answer:

Rule of Thumb: Start with 0.01 and adjust based on results.

Method 1: Trial and Error

learning_rates = [0.001, 0.01, 0.1, 1.0]

for lr in learning_rates: 
    model = LogisticRegression(learning_rate=lr, n_iterations=1000)
    model.fit(X_train, y_train)
    
    print(f"LR = {lr}:")
    print(f"  Final cost: {model.cost_history[-1]:.4f}")
    print(f"  Test accuracy: {model.score(X_test, y_test):.4f}\n")

Method 2: Visual Inspection

for lr in [0.001, 0.01, 0.1, 1.0]:
    model = LogisticRegression(learning_rate=lr, n_iterations=200)
    model.fit(X_train, y_train)
    plt.plot(model.cost_history, label=f'LR={lr}')

plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Finding Optimal Learning Rate')
plt.legend()
plt.show()

Guidelines:

Learning Rate Behavior Recommendation
< 0.0001 Very slow convergence ❌ Too slow
0.001 - 0.01 Smooth, steady decrease βœ… Good default
0.1 - 0.5 Fast convergence ⚠️ Monitor carefully
> 1.0 Oscillation or divergence ❌ Too high

How many iterations do I need?

Answer:

Typical Range: 500 - 2000 iterations

Method 1: Plot Cost History

model = LogisticRegression(learning_rate=0.01, n_iterations=2000, verbose=True)
model.fit(X_train, y_train)

plt.plot(model.cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Cost History - Check Convergence')
plt.grid(True, alpha=0.3)
plt.show()

Look for:

  • βœ… Cost plateaus (converged) β†’ Can stop
  • ⚠️ Still decreasing β†’ Need more iterations
  • ❌ Oscillating β†’ Reduce learning rate

Method 2: Early Stopping

class LogisticRegressionEarlyStopping(LogisticRegression):
    def fit(self, X, y, patience=50, min_delta=1e-4):
        # ... training loop ... 
        
        if i > patience:
            recent_costs = self.cost_history[-patience:]
            if max(recent_costs) - min(recent_costs) < min_delta:
                print(f"Early stopping at iteration {i}")
                break

Guidelines:

  • Small dataset (< 1000 samples): 500-1000 iterations
  • Medium dataset (1000-10000): 1000-2000 iterations
  • Large dataset: Use mini-batch + early stopping

Why do I need to scale features?

Answer:

Without Scaling:

Feature 1 (Age):      20 - 80
Feature 2 (Income): 20,000 - 200,000

β†’ Income dominates gradient updates!

With Scaling:

Feature 1 (Age):    -1.5 to 1.5
Feature 2 (Income): -1.5 to 1.5

β†’ Equal contribution to learning!

Visual Impact:

Without Scaling              With Scaling
                            
Cost Function Contours:    

      Income                    Feature 2
        β”‚                          β”‚
     β”‚β”‚β”‚β”‚β”‚                       ───────
     β”‚β”‚β”‚β”‚β”‚                       ───────
     β”‚β”‚β”‚β”‚β”‚                       ───────
        └──── Age                └──── Feature 1

 (Elongated ellipse)         (Circular)
  Slow convergence          Fast convergence

Code Example:

from sklearn.preprocessing import StandardScaler

# Before scaling
model = LogisticRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train, y_train)  # May not converge! 

# After scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler. transform(X_test)

model = LogisticRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train_scaled, y_train)  # Converges smoothly! 

Bottom Line: Always scale features for faster, more stable training! βœ…


What's the difference between fit_transform and transform?

Answer:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# fit_transform: Learn parameters AND transform
X_train_scaled = scaler.fit_transform(X_train)
# This computes mean and std from X_train, then scales it

# transform: Only transform (use learned parameters)
X_test_scaled = scaler.transform(X_test)
# This uses the mean and std from X_train to scale X_test

CRITICAL RULE:

βœ… CORRECT

# Fit on TRAIN only
scaler. fit(X_train)

# Transform both
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

❌ WRONG

# DON'T fit on test! 
X_test_scaled = scaler.fit_transform(X_test)

# This causes data leakage!

**Why? ** Test data should simulate "unseen" data. If you fit on test data, you're "cheating"!


How do I handle imbalanced datasets?

Answer:

Problem: 95% class 0, 5% class 1 β†’ Model predicts all class 0 and gets 95% accuracy!

Solutions:

1. Class Weights

model = LogisticRegressionWeighted(
    learning_rate=0.01,
    n_iterations=1000,
    class_weight='balanced'
)

2. Resampling

from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

print("Original:", np.bincount(y_train))
print("Resampled:", np.bincount(y_resampled))

3. Adjust Threshold

# Instead of default 0.5
probabilities = model.predict_proba(X_test)
predictions = (probabilities >= 0.3).astype(int)  # Lower threshold

4. Use Different Metrics

from sklearn.metrics import f1_score, precision_score, recall_score

# Don't rely on accuracy alone!
print("F1 Score:", f1_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))

What is regularization and when should I use it?

Answer:

Regularization adds a penalty to prevent overfitting by discouraging large weights.

When to Use:

Symptom Solution
Train accuracy >> Test accuracy βœ… Add L2 regularization
Model too complex βœ… Add L2 regularization
Many features, few samples βœ… Add L2 regularization
Want feature selection βœ… Add L1 regularization

L2 Example:

model = LogisticRegressionL2(
    learning_rate=0.01,
    n_iterations=1000,
    lambda_reg=0.1  # Start here, tune between 0.001 and 10
)
model.fit(X_train_scaled, y_train)

How to Choose Ξ» (lambda):

lambdas = [0.001, 0.01, 0.1, 1, 10]
best_lambda = None
best_score = 0

for lam in lambdas:
    model = LogisticRegressionL2(learning_rate=0.01, n_iterations=1000, lambda_reg=lam)
    model.fit(X_train_scaled, y_train)
    score = model.score(X_val_scaled, y_val)
    
    if score > best_score: 
        best_score = score
        best_lambda = lam

print(f"Best lambda: {best_lambda}")

Can Logistic Regression handle multi-class classification?

Answer:

**Yes, with modifications! **

Method 1: One-vs-Rest (OvR)

# Train 3 binary classifiers for 3 classes
# Class 0 vs (1,2)
# Class 1 vs (0,2)
# Class 2 vs (0,1)

# Predict using highest probability

Method 2: Use Scikit-Learn

from sklearn.linear_model import LogisticRegression

# Automatically handles multi-class
model = LogisticRegression(multi_class='ovr')  # or 'multinomial'
model. fit(X_train, y_train)

This Repository:

  • βœ… Focuses on binary classification
  • βœ… Educational implementation
  • ⚠️ For multi-class, use sklearn

πŸ“Š Performance Questions

What accuracy should I expect?

Answer:

**It depends on the dataset! **

Baselines:

  • Random guessing (balanced): 50%
  • Random guessing (90% class 0): 90% (but useless!)
  • Majority class baseline: Predict most common class

Typical Performance:

  • βœ… Good model: 75-90% accuracy
  • 🌟 Excellent model: 90-95% accuracy
  • ⚠️ > 99%: Check for data leakage or very easy problem

Better Metrics for Classification:

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

Focus on:

  • Precision: Of predicted positives, how many are correct?
  • Recall: Of actual positives, how many did we catch?
  • F1-Score: Harmonic mean of precision and recall
  • ROC-AUC: Overall performance across thresholds

My model has 99% accuracy but doesn't work. Why?

Answer:

Likely Causes:

1. Data Leakage ⚠️

# WRONG: Fit scaler on all data
scaler = StandardScaler()
X_all_scaled = scaler.fit_transform(X)  # Includes test data!
X_train, X_test = train_test_split(X_all_scaled, ...)

# CORRECT: Fit only on training
X_train, X_test = train_test_split(X, ...)
scaler.fit(X_train)
X_train_scaled = scaler. transform(X_train)
X_test_scaled = scaler. transform(X_test)

2. Target Leakage

# Features that shouldn't be available at prediction time
# Example: Using "purchase_amount" to predict "will_purchase"

3. Class Imbalance

# 99% class 0, 1% class 1
# Model predicts all class 0 β†’ 99% accuracy but useless! 

# Check: 
print(np.bincount(y_test))
print(np.bincount(y_pred))

4. Training on Test Data

# WRONG
model. fit(X_test, y_test)
accuracy = model.score(X_test, y_test)  # Of course it's high! 

How to Detect:

  • Check if test accuracy >> typical for problem
  • Look at confusion matrix
  • Verify data pipeline
  • Check feature engineering

How do I improve model performance?

Answer:

Checklist to Improve Performance:

1. Data Quality 🧹

# Remove duplicates
df = df.drop_duplicates()

# Handle missing values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_filled = imputer.fit_transform(X)

# Remove outliers
from scipy import stats
z_scores = np.abs(stats.zscore(X))
X_clean = X[(z_scores < 3).all(axis=1)]

2. Feature Engineering πŸ”§

# Add polynomial features
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Feature interactions
X['age_income'] = X['age'] * X['income']

# Domain-specific features

3. Hyperparameter Tuning βš™οΈ

# Grid search
param_grid = {
    'learning_rate': [0.001, 0.01, 0.1],
    'n_iterations': [500, 1000, 2000],
    'lambda_reg': [0.01, 0.1, 1.0]
}

best_score = 0
for lr in param_grid['learning_rate']: 
    for iters in param_grid['n_iterations']:
        for lam in param_grid['lambda_reg']: 
            model = LogisticRegressionL2(lr, iters, lam)
            model.fit(X_train, y_train)
            score = model.score(X_val, y_val)
            
            if score > best_score: 
                best_score = score
                best_params = {'lr': lr, 'iters': iters, 'lambda':  lam}

4. Handle Class Imbalance βš–οΈ

# Use class weights
model = LogisticRegressionWeighted(class_weight='balanced')

# Or resample
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote. fit_resample(X, y)

5. Ensemble Methods 🎯

# Train multiple models and vote
models = [model1, model2, model3]
predictions = [m.predict(X_test) for m in models]
final_pred = np.round(np.mean(predictions, axis=0))

6. Get More Data πŸ“Š

  • More samples β†’ Better generalization
  • Data augmentation (for images/text)

πŸ› οΈ Implementation Questions

Why use NumPy instead of pure Python?

Answer:

**Speed! ** NumPy is 10-100x faster.

Comparison:

import numpy as np
import time

# Pure Python (slow ❌)
def python_dot(X, weights):
    result = []
    for i in range(len(X)):
        total = 0
        for j in range(len(weights)):
            total += X[i][j] * weights[j]
        result.append(total)
    return result

# NumPy (fast βœ…)
def numpy_dot(X, weights):
    return np.dot(X, weights)

# Test
X = np.random.rand(10000, 50)
weights = np.random.rand(50)

# Python
start = time. time()
python_dot(X. tolist(), weights.tolist())
python_time = time.time() - start

# NumPy
start = time.time()
numpy_dot(X, weights)
numpy_time = time.time() - start

print(f"Python time: {python_time:.4f}s")
print(f"NumPy time: {numpy_time:. 4f}s")
print(f"Speedup: {python_time / numpy_time:.1f}x")

Output:

Python time: 0.8234s
NumPy time: 0.0012s
Speedup: 686.2x

**Why so fast? **

  • βœ… Written in C
  • βœ… Vectorized operations
  • βœ… Optimized memory access
  • βœ… SIMD instructions

Can I save my trained model?

Answer:

Yes! Use pickle.

import pickle

# Save model
with open('logistic_model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Also save scaler! 
with open('scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)

print("βœ… Model saved!")

# Load model
with open('logistic_model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

with open('scaler.pkl', 'rb') as f:
    loaded_scaler = pickle.load(f)

# Use loaded model
X_new_scaled = loaded_scaler.transform(X_new)
predictions = loaded_model.predict(X_new_scaled)

print("βœ… Model loaded and used!")

Complete Save/Load Function:

def save_model(model, scaler, filename='model.pkl'):
    """Save model and scaler together"""
    model_data = {
        'model': model,
        'scaler': scaler,
        'weights': model.weights,
        'bias': model.bias,
        'learning_rate': model.learning_rate,
        'n_iterations':  model.n_iterations
    }
    
    with open(filename, 'wb') as f:
        pickle.dump(model_data, f)
    
    print(f"βœ… Model saved to {filename}")

def load_model(filename='model.pkl'):
    """Load model and scaler"""
    with open(filename, 'rb') as f:
        model_data = pickle.load(f)
    
    print(f"βœ… Model loaded from {filename}")
    return model_data['model'], model_data['scaler']

# Usage
save_model(model, scaler, 'my_model.pkl')
model, scaler = load_model('my_model.pkl')

How do I use this model in a web app?

Answer:

Example Flask App:

from flask import Flask, request, jsonify
import pickle
import numpy as np

app = Flask(__name__)

# Load model at startup
with open('model.pkl', 'rb') as f:
    model_data = pickle.load(f)
    model = model_data['model']
    scaler = model_data['scaler']

@app.route('/predict', methods=['POST'])
def predict():
    """
    Endpoint for predictions
    
    Input JSON:
    {
        "features": [1. 5, 2.3, 0.8, ...]
    }
    """
    try:
        # Get data from request
        data = request. get_json()
        features = np.array(data['features']).reshape(1, -1)
        
        # Scale features
        features_scaled = scaler. transform(features)
        
        # Make prediction
        prediction = model.predict(features_scaled)[0]
        probability = model.predict_proba(features_scaled)[0]
        
        # Return result
        return jsonify({
            'prediction': int(prediction),
            'probability': float(probability),
            'status': 'success'
        })
    
    except Exception as e:
        return jsonify({
            'status': 'error',
            'message': str(e)
        }), 400

if __name__ == '__main__':
    app.run(debug=True)

Test the API:

import requests

response = requests.post('http://localhost:5000/predict',
                        json={'features': [1.5, 2.3, 0.8, 1.2]})

print(response.json())
# Output: {'prediction': 1, 'probability': 0.8765, 'status': 'success'}

πŸŽ“ Learning Questions

I'm new to machine learning. Where should I start?

Answer:

Learning Path:

1. Prerequisites πŸ“š

  • βœ… Python basics
  • βœ… NumPy fundamentals
  • βœ… Basic linear algebra (vectors, matrices)
  • βœ… Basic calculus (derivatives)

2. Start Here πŸš€

  1. Read Getting Started
  2. Understand Mathematical Foundation
  3. Study Implementation Guide
  4. Practice with notebooks

3. Resources πŸ“–

  • Andrew Ng's Machine Learning Course (Coursera)
  • "Introduction to Statistical Learning" (free book)
  • GeeksforGeeks tutorials
  • This repository's wiki!

4. Practice Projects πŸ’ͺ

  • Iris dataset classification
  • Titanic survival prediction
  • Credit card fraud detection
  • Customer churn prediction

What math do I need to know?

Answer:

Essential Math:

1. Linear Algebra πŸ“

# Dot product
z = w₁x₁ + wβ‚‚xβ‚‚ + ... + wβ‚™xβ‚™ + b

# Matrix form
z = Xw + b

# NumPy: 
z = np.dot(X, weights) + bias

2. Calculus πŸ“Š

# Derivative of sigmoid
Οƒ'(z) = Οƒ(z)(1 - Οƒ(z))

# Gradient (partial derivatives)
βˆ‚J/βˆ‚w = (1/m) * X^T * (Ε· - y)

3. Probability 🎲

# Sigmoid outputs probability
P(y=1|x) = Οƒ(w^T x + b)
P(y=0|x) = 1 - P(y=1|x)

4. Logarithms πŸ“‰

# Used in cost function
cost = -[y*log(Ε·) + (1-y)*log(1-Ε·)]

**Don't worry! ** You can still use the implementation and learn math gradually.


How is this different from sklearn?

Answer:

Aspect This Repo Scikit-Learn
Purpose Learning & understanding Production use
Speed Slower (pure Python/NumPy) Faster (C/Cython)
Features Basic implementation Full-featured
Customization Easy to modify Harder to modify
Documentation Educational Production-focused
Use For Learning, teaching, prototypes Real applications

When to use each:

Use This Repo:

  • πŸ“š Learning how algorithms work
  • πŸŽ“ Teaching machine learning
  • πŸ”¬ Experimenting with modifications
  • πŸš€ Quick prototypes

Use Scikit-Learn:

  • 🏭 Production systems
  • ⚑ Performance-critical apps
  • πŸ›‘οΈ Need reliability
  • πŸ“Š Complex ML pipelines

πŸ’‘ Best Practices

What are the most common mistakes?

Answer:

Top 10 Mistakes:

1. Not Scaling Features ❌

# WRONG
model.fit(X_train, y_train)

# RIGHT
scaler = StandardScaler()
X_train_scaled = scaler. fit_transform(X_train)
model.fit(X_train_scaled, y_train)

2. Fitting Scaler on Test Data ❌

# WRONG
X_test_scaled = scaler. fit_transform(X_test)

# RIGHT
X_test_scaled = scaler.transform(X_test)

3. Using Wrong Metrics ❌

# WRONG: Only accuracy for imbalanced data
print("Accuracy:", accuracy_score(y_test, y_pred))

# RIGHT: Multiple metrics
print("F1:", f1_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))

4. Ignoring Data Leakage ❌

# WRONG:  Scale before split
X_scaled = scaler.fit_transform(X)
X_train, X_test = train_test_split(X_scaled, y)

# RIGHT: Split then scale
X_train, X_test = train_test_split(X, y)
scaler. fit(X_train)
X_train_scaled = scaler. transform(X_train)
X_test_scaled = scaler. transform(X_test)

5. Not Setting Random Seed ❌

# WRONG: Results not reproducible
X_train, X_test = train_test_split(X, y)

# RIGHT:  Reproducible results
X_train, X_test = train_test_split(X, y, random_state=42)

6. Learning Rate Too High ❌

# WRONG: Cost explodes
model = LogisticRegression(learning_rate=10. 0)

# RIGHT
model = LogisticRegression(learning_rate=0.01)

7. Not Checking for NaN ❌

# WRONG: Train with NaN values
model. fit(X_train, y_train)  # May have NaN! 

# RIGHT
assert not np.isnan(X_train).any(), "Data has NaN!"
assert not np.isnan(y_train).any(), "Labels have NaN!"

8. Testing on Training Data ❌

# WRONG
model.fit(X_train, y_train)
accuracy = model.score(X_train, y_train)  # Overly optimistic!

# RIGHT
accuracy = model.score(X_test, y_test)

9. Ignoring Class Imbalance ❌

# WRONG: Ignore 95-5 split
model. fit(X_train, y_train)

# RIGHT: Use class weights
model = LogisticRegressionWeighted(class_weight='balanced')

10. Not Monitoring Training ❌

# WRONG: Train blindly
model.fit(X_train, y_train)

# RIGHT: Check convergence
model.plot_cost_history()

πŸ†˜ Still Have Questions?

Where can I get more help?

Answer:

πŸ“š Documentation

Read the Wiki Pages

πŸ› Issues

Open a GitHub Issue

πŸ’¬ Community

Ask on Stack Overflow


πŸŽ‰ Happy Learning!

← Troubleshooting | Back to Home

Clone this wiki locally