-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
Answer:
Logistic Regression is a classification algorithm (not regression!) used for binary classification problems. It predicts the probability that an instance belongs to a particular class.
Key Points:
- β Outputs probabilities between 0 and 1
- β Uses sigmoid activation function
- β Optimized with gradient descent
- β Simple, interpretable, and effective
Example Use Cases:
- Email: Spam or Not Spam
- Medical: Disease or No Disease
- Finance: Fraud or Legitimate
- Marketing: Click or No Click
Answer:
Use Logistic Regression when:
|
|
Answer:
| Aspect | Linear Regression | Logistic Regression |
|---|---|---|
| Type | Regression | Classification |
| Output | Continuous values (-β to +β) | Probabilities (0 to 1) |
| Activation | None (identity) | Sigmoid function |
| Cost Function | Mean Squared Error | Binary Cross-Entropy |
| Use Case | Predict house prices | Predict spam/not spam |
| Example Output | 250,000 (price in $) | 0.85 (85% spam) |
Visual Difference:
Linear Regression Logistic Regression
y y
β / β βββββ
β / β /
β / β /
β / β /
βββββββββββ x βββββββββββ x
(Continuous line) (S-shaped curve)
Answer:
**For Learning: YES! ** β
For Production: Use scikit-learn
Why this implementation:
- β Learn algorithm internals
- β Understand mathematics
- β Educational purposes
- β Small projects/prototypes
For production, use sklearn:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)Reasons:
- β‘ Highly optimized (C/Cython)
- π‘οΈ Battle-tested and robust
- π Well-documented
- π§ More features (multiclass, regularization, etc.)
- π Bug fixes and maintenance
Answer:
Rule of Thumb: Start with 0.01 and adjust based on results.
Method 1: Trial and Error
learning_rates = [0.001, 0.01, 0.1, 1.0]
for lr in learning_rates:
model = LogisticRegression(learning_rate=lr, n_iterations=1000)
model.fit(X_train, y_train)
print(f"LR = {lr}:")
print(f" Final cost: {model.cost_history[-1]:.4f}")
print(f" Test accuracy: {model.score(X_test, y_test):.4f}\n")Method 2: Visual Inspection
for lr in [0.001, 0.01, 0.1, 1.0]:
model = LogisticRegression(learning_rate=lr, n_iterations=200)
model.fit(X_train, y_train)
plt.plot(model.cost_history, label=f'LR={lr}')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Finding Optimal Learning Rate')
plt.legend()
plt.show()Guidelines:
| Learning Rate | Behavior | Recommendation |
|---|---|---|
| < 0.0001 | Very slow convergence | β Too slow |
| 0.001 - 0.01 | Smooth, steady decrease | β Good default |
| 0.1 - 0.5 | Fast convergence | |
| > 1.0 | Oscillation or divergence | β Too high |
Answer:
Typical Range: 500 - 2000 iterations
Method 1: Plot Cost History
model = LogisticRegression(learning_rate=0.01, n_iterations=2000, verbose=True)
model.fit(X_train, y_train)
plt.plot(model.cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Cost History - Check Convergence')
plt.grid(True, alpha=0.3)
plt.show()Look for:
- β Cost plateaus (converged) β Can stop
β οΈ Still decreasing β Need more iterations- β Oscillating β Reduce learning rate
Method 2: Early Stopping
class LogisticRegressionEarlyStopping(LogisticRegression):
def fit(self, X, y, patience=50, min_delta=1e-4):
# ... training loop ...
if i > patience:
recent_costs = self.cost_history[-patience:]
if max(recent_costs) - min(recent_costs) < min_delta:
print(f"Early stopping at iteration {i}")
breakGuidelines:
- Small dataset (< 1000 samples): 500-1000 iterations
- Medium dataset (1000-10000): 1000-2000 iterations
- Large dataset: Use mini-batch + early stopping
Answer:
Without Scaling:
Feature 1 (Age): 20 - 80
Feature 2 (Income): 20,000 - 200,000
β Income dominates gradient updates!
With Scaling:
Feature 1 (Age): -1.5 to 1.5
Feature 2 (Income): -1.5 to 1.5
β Equal contribution to learning!
Visual Impact:
Without Scaling With Scaling
Cost Function Contours:
Income Feature 2
β β
βββββ βββββββ
βββββ βββββββ
βββββ βββββββ
βββββ Age βββββ Feature 1
(Elongated ellipse) (Circular)
Slow convergence Fast convergence
Code Example:
from sklearn.preprocessing import StandardScaler
# Before scaling
model = LogisticRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train, y_train) # May not converge!
# After scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler. transform(X_test)
model = LogisticRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train_scaled, y_train) # Converges smoothly! Bottom Line: Always scale features for faster, more stable training! β
Answer:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# fit_transform: Learn parameters AND transform
X_train_scaled = scaler.fit_transform(X_train)
# This computes mean and std from X_train, then scales it
# transform: Only transform (use learned parameters)
X_test_scaled = scaler.transform(X_test)
# This uses the mean and std from X_train to scale X_testCRITICAL RULE:
# Fit on TRAIN only
scaler. fit(X_train)
# Transform both
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test) |
# DON'T fit on test!
X_test_scaled = scaler.fit_transform(X_test)
# This causes data leakage! |
**Why? ** Test data should simulate "unseen" data. If you fit on test data, you're "cheating"!
Answer:
Problem: 95% class 0, 5% class 1 β Model predicts all class 0 and gets 95% accuracy!
Solutions:
1. Class Weights
model = LogisticRegressionWeighted(
learning_rate=0.01,
n_iterations=1000,
class_weight='balanced'
)2. Resampling
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
print("Original:", np.bincount(y_train))
print("Resampled:", np.bincount(y_resampled))3. Adjust Threshold
# Instead of default 0.5
probabilities = model.predict_proba(X_test)
predictions = (probabilities >= 0.3).astype(int) # Lower threshold4. Use Different Metrics
from sklearn.metrics import f1_score, precision_score, recall_score
# Don't rely on accuracy alone!
print("F1 Score:", f1_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))Answer:
Regularization adds a penalty to prevent overfitting by discouraging large weights.
When to Use:
| Symptom | Solution |
|---|---|
| Train accuracy >> Test accuracy | β Add L2 regularization |
| Model too complex | β Add L2 regularization |
| Many features, few samples | β Add L2 regularization |
| Want feature selection | β Add L1 regularization |
L2 Example:
model = LogisticRegressionL2(
learning_rate=0.01,
n_iterations=1000,
lambda_reg=0.1 # Start here, tune between 0.001 and 10
)
model.fit(X_train_scaled, y_train)How to Choose Ξ» (lambda):
lambdas = [0.001, 0.01, 0.1, 1, 10]
best_lambda = None
best_score = 0
for lam in lambdas:
model = LogisticRegressionL2(learning_rate=0.01, n_iterations=1000, lambda_reg=lam)
model.fit(X_train_scaled, y_train)
score = model.score(X_val_scaled, y_val)
if score > best_score:
best_score = score
best_lambda = lam
print(f"Best lambda: {best_lambda}")Answer:
**Yes, with modifications! **
Method 1: One-vs-Rest (OvR)
# Train 3 binary classifiers for 3 classes
# Class 0 vs (1,2)
# Class 1 vs (0,2)
# Class 2 vs (0,1)
# Predict using highest probabilityMethod 2: Use Scikit-Learn
from sklearn.linear_model import LogisticRegression
# Automatically handles multi-class
model = LogisticRegression(multi_class='ovr') # or 'multinomial'
model. fit(X_train, y_train)This Repository:
- β Focuses on binary classification
- β Educational implementation
β οΈ For multi-class, use sklearn
Answer:
**It depends on the dataset! **
Baselines:
- Random guessing (balanced): 50%
- Random guessing (90% class 0): 90% (but useless!)
- Majority class baseline: Predict most common class
Typical Performance:
- β Good model: 75-90% accuracy
- π Excellent model: 90-95% accuracy
β οΈ > 99%: Check for data leakage or very easy problem
Better Metrics for Classification:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))Focus on:
- Precision: Of predicted positives, how many are correct?
- Recall: Of actual positives, how many did we catch?
- F1-Score: Harmonic mean of precision and recall
- ROC-AUC: Overall performance across thresholds
Answer:
Likely Causes:
1. Data Leakage
# WRONG: Fit scaler on all data
scaler = StandardScaler()
X_all_scaled = scaler.fit_transform(X) # Includes test data!
X_train, X_test = train_test_split(X_all_scaled, ...)
# CORRECT: Fit only on training
X_train, X_test = train_test_split(X, ...)
scaler.fit(X_train)
X_train_scaled = scaler. transform(X_train)
X_test_scaled = scaler. transform(X_test)2. Target Leakage
# Features that shouldn't be available at prediction time
# Example: Using "purchase_amount" to predict "will_purchase"3. Class Imbalance
# 99% class 0, 1% class 1
# Model predicts all class 0 β 99% accuracy but useless!
# Check:
print(np.bincount(y_test))
print(np.bincount(y_pred))4. Training on Test Data
# WRONG
model. fit(X_test, y_test)
accuracy = model.score(X_test, y_test) # Of course it's high! How to Detect:
- Check if test accuracy >> typical for problem
- Look at confusion matrix
- Verify data pipeline
- Check feature engineering
Answer:
Checklist to Improve Performance:
1. Data Quality π§Ή
# Remove duplicates
df = df.drop_duplicates()
# Handle missing values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_filled = imputer.fit_transform(X)
# Remove outliers
from scipy import stats
z_scores = np.abs(stats.zscore(X))
X_clean = X[(z_scores < 3).all(axis=1)]2. Feature Engineering π§
# Add polynomial features
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
# Feature interactions
X['age_income'] = X['age'] * X['income']
# Domain-specific features3. Hyperparameter Tuning βοΈ
# Grid search
param_grid = {
'learning_rate': [0.001, 0.01, 0.1],
'n_iterations': [500, 1000, 2000],
'lambda_reg': [0.01, 0.1, 1.0]
}
best_score = 0
for lr in param_grid['learning_rate']:
for iters in param_grid['n_iterations']:
for lam in param_grid['lambda_reg']:
model = LogisticRegressionL2(lr, iters, lam)
model.fit(X_train, y_train)
score = model.score(X_val, y_val)
if score > best_score:
best_score = score
best_params = {'lr': lr, 'iters': iters, 'lambda': lam}4. Handle Class Imbalance βοΈ
# Use class weights
model = LogisticRegressionWeighted(class_weight='balanced')
# Or resample
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote. fit_resample(X, y)5. Ensemble Methods π―
# Train multiple models and vote
models = [model1, model2, model3]
predictions = [m.predict(X_test) for m in models]
final_pred = np.round(np.mean(predictions, axis=0))6. Get More Data π
- More samples β Better generalization
- Data augmentation (for images/text)
Answer:
**Speed! ** NumPy is 10-100x faster.
Comparison:
import numpy as np
import time
# Pure Python (slow β)
def python_dot(X, weights):
result = []
for i in range(len(X)):
total = 0
for j in range(len(weights)):
total += X[i][j] * weights[j]
result.append(total)
return result
# NumPy (fast β
)
def numpy_dot(X, weights):
return np.dot(X, weights)
# Test
X = np.random.rand(10000, 50)
weights = np.random.rand(50)
# Python
start = time. time()
python_dot(X. tolist(), weights.tolist())
python_time = time.time() - start
# NumPy
start = time.time()
numpy_dot(X, weights)
numpy_time = time.time() - start
print(f"Python time: {python_time:.4f}s")
print(f"NumPy time: {numpy_time:. 4f}s")
print(f"Speedup: {python_time / numpy_time:.1f}x")Output:
Python time: 0.8234s
NumPy time: 0.0012s
Speedup: 686.2x
**Why so fast? **
- β Written in C
- β Vectorized operations
- β Optimized memory access
- β SIMD instructions
Answer:
Yes! Use pickle.
import pickle
# Save model
with open('logistic_model.pkl', 'wb') as f:
pickle.dump(model, f)
# Also save scaler!
with open('scaler.pkl', 'wb') as f:
pickle.dump(scaler, f)
print("β
Model saved!")
# Load model
with open('logistic_model.pkl', 'rb') as f:
loaded_model = pickle.load(f)
with open('scaler.pkl', 'rb') as f:
loaded_scaler = pickle.load(f)
# Use loaded model
X_new_scaled = loaded_scaler.transform(X_new)
predictions = loaded_model.predict(X_new_scaled)
print("β
Model loaded and used!")Complete Save/Load Function:
def save_model(model, scaler, filename='model.pkl'):
"""Save model and scaler together"""
model_data = {
'model': model,
'scaler': scaler,
'weights': model.weights,
'bias': model.bias,
'learning_rate': model.learning_rate,
'n_iterations': model.n_iterations
}
with open(filename, 'wb') as f:
pickle.dump(model_data, f)
print(f"β
Model saved to {filename}")
def load_model(filename='model.pkl'):
"""Load model and scaler"""
with open(filename, 'rb') as f:
model_data = pickle.load(f)
print(f"β
Model loaded from {filename}")
return model_data['model'], model_data['scaler']
# Usage
save_model(model, scaler, 'my_model.pkl')
model, scaler = load_model('my_model.pkl')Answer:
Example Flask App:
from flask import Flask, request, jsonify
import pickle
import numpy as np
app = Flask(__name__)
# Load model at startup
with open('model.pkl', 'rb') as f:
model_data = pickle.load(f)
model = model_data['model']
scaler = model_data['scaler']
@app.route('/predict', methods=['POST'])
def predict():
"""
Endpoint for predictions
Input JSON:
{
"features": [1. 5, 2.3, 0.8, ...]
}
"""
try:
# Get data from request
data = request. get_json()
features = np.array(data['features']).reshape(1, -1)
# Scale features
features_scaled = scaler. transform(features)
# Make prediction
prediction = model.predict(features_scaled)[0]
probability = model.predict_proba(features_scaled)[0]
# Return result
return jsonify({
'prediction': int(prediction),
'probability': float(probability),
'status': 'success'
})
except Exception as e:
return jsonify({
'status': 'error',
'message': str(e)
}), 400
if __name__ == '__main__':
app.run(debug=True)Test the API:
import requests
response = requests.post('http://localhost:5000/predict',
json={'features': [1.5, 2.3, 0.8, 1.2]})
print(response.json())
# Output: {'prediction': 1, 'probability': 0.8765, 'status': 'success'}Answer:
Learning Path:
1. Prerequisites π
- β Python basics
- β NumPy fundamentals
- β Basic linear algebra (vectors, matrices)
- β Basic calculus (derivatives)
2. Start Here π
- Read Getting Started
- Understand Mathematical Foundation
- Study Implementation Guide
- Practice with notebooks
3. Resources π
- Andrew Ng's Machine Learning Course (Coursera)
- "Introduction to Statistical Learning" (free book)
- GeeksforGeeks tutorials
- This repository's wiki!
4. Practice Projects πͺ
- Iris dataset classification
- Titanic survival prediction
- Credit card fraud detection
- Customer churn prediction
Answer:
Essential Math:
1. Linear Algebra π
# Dot product
z = wβxβ + wβxβ + ... + wβxβ + b
# Matrix form
z = Xw + b
# NumPy:
z = np.dot(X, weights) + bias2. Calculus π
# Derivative of sigmoid
Ο'(z) = Ο(z)(1 - Ο(z))
# Gradient (partial derivatives)
βJ/βw = (1/m) * X^T * (Ε· - y)3. Probability π²
# Sigmoid outputs probability
P(y=1|x) = Ο(w^T x + b)
P(y=0|x) = 1 - P(y=1|x)4. Logarithms π
# Used in cost function
cost = -[y*log(Ε·) + (1-y)*log(1-Ε·)]**Don't worry! ** You can still use the implementation and learn math gradually.
Answer:
| Aspect | This Repo | Scikit-Learn |
|---|---|---|
| Purpose | Learning & understanding | Production use |
| Speed | Slower (pure Python/NumPy) | Faster (C/Cython) |
| Features | Basic implementation | Full-featured |
| Customization | Easy to modify | Harder to modify |
| Documentation | Educational | Production-focused |
| Use For | Learning, teaching, prototypes | Real applications |
When to use each:
Use This Repo:
- π Learning how algorithms work
- π Teaching machine learning
- π¬ Experimenting with modifications
- π Quick prototypes
Use Scikit-Learn:
- π Production systems
- β‘ Performance-critical apps
- π‘οΈ Need reliability
- π Complex ML pipelines
Answer:
Top 10 Mistakes:
1. Not Scaling Features β
# WRONG
model.fit(X_train, y_train)
# RIGHT
scaler = StandardScaler()
X_train_scaled = scaler. fit_transform(X_train)
model.fit(X_train_scaled, y_train)2. Fitting Scaler on Test Data β
# WRONG
X_test_scaled = scaler. fit_transform(X_test)
# RIGHT
X_test_scaled = scaler.transform(X_test)3. Using Wrong Metrics β
# WRONG: Only accuracy for imbalanced data
print("Accuracy:", accuracy_score(y_test, y_pred))
# RIGHT: Multiple metrics
print("F1:", f1_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))4. Ignoring Data Leakage β
# WRONG: Scale before split
X_scaled = scaler.fit_transform(X)
X_train, X_test = train_test_split(X_scaled, y)
# RIGHT: Split then scale
X_train, X_test = train_test_split(X, y)
scaler. fit(X_train)
X_train_scaled = scaler. transform(X_train)
X_test_scaled = scaler. transform(X_test)5. Not Setting Random Seed β
# WRONG: Results not reproducible
X_train, X_test = train_test_split(X, y)
# RIGHT: Reproducible results
X_train, X_test = train_test_split(X, y, random_state=42)6. Learning Rate Too High β
# WRONG: Cost explodes
model = LogisticRegression(learning_rate=10. 0)
# RIGHT
model = LogisticRegression(learning_rate=0.01)7. Not Checking for NaN β
# WRONG: Train with NaN values
model. fit(X_train, y_train) # May have NaN!
# RIGHT
assert not np.isnan(X_train).any(), "Data has NaN!"
assert not np.isnan(y_train).any(), "Labels have NaN!"8. Testing on Training Data β
# WRONG
model.fit(X_train, y_train)
accuracy = model.score(X_train, y_train) # Overly optimistic!
# RIGHT
accuracy = model.score(X_test, y_test)9. Ignoring Class Imbalance β
# WRONG: Ignore 95-5 split
model. fit(X_train, y_train)
# RIGHT: Use class weights
model = LogisticRegressionWeighted(class_weight='balanced')10. Not Monitoring Training β
# WRONG: Train blindly
model.fit(X_train, y_train)
# RIGHT: Check convergence
model.plot_cost_history()Answer:
|
Read the Wiki Pages |
Open a GitHub Issue |
Ask on Stack Overflow |