-
-
Notifications
You must be signed in to change notification settings - Fork 342
Feat-CNN #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat-CNN #231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new machine learning and dynamic programming algorithms in R, with a focus on introducing a CNN implementation using Keras. It also includes a from-scratch k-NN and Kadane’s algorithm with examples.
- Add a CNN model definition using Keras with summary output
- Implement a from-scratch k-NN (classification and regression) with normalization, weighting, and examples
- Implement Kadane’s algorithm (including circular variant) with examples and timing
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| machine_learning/cnn.r | Introduces a CNN architecture and prints model summary; provides comments on complexity and usage. |
| machine_learning/k-NN.r | Adds a complete k-NN implementation (train/predict), utilities, and example runs for classification and regression. |
| dynamic_programming/kadane's_algo.r | Adds Kadane’s algorithm (max subarray) plus a circular variant and multiple example tests. |
| filters = 32, kernel_size = c(3, 3), activation = "relu", | ||
| input_shape = c(28, 28, 1), padding = "same" |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code hard-codes input_shape = c(28, 28, 1), which supports MNIST-like grayscale, but the PR description claims compatibility with CIFAR-10 (32×32×3). Either generalize input_shape via parameters (e.g., img_height, img_width, channels) or update the comments to reflect the currently supported input.
| @@ -0,0 +1,272 @@ | |||
| # k-Nearest Neighbors implementation in R | |||
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File naming is inconsistent with repository conventions; avoid uppercase and punctuation in filenames. Please rename machine_learning/k-NN.r to a lowercase, underscore-based name such as machine_learning/k_nearest_neighbors.r or machine_learning/knn.r.
| @@ -0,0 +1,162 @@ | |||
| # Kadane's Algorithm in R | |||
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename contains an apostrophe, which is problematic for tooling and inconsistent with repository naming patterns. Please rename dynamic_programming/kadane's_algo.r to dynamic_programming/kadane_algorithm.r or dynamic_programming/kadane.r (lowercase, underscores, no special characters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the '
| # --------------------------- | ||
| knn_accuracy <- function(y_true, y_pred) { | ||
| if (length(y_true) != length(y_pred)) stop("Lengths mismatch.") | ||
| mean(y_true == y_pred) |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing factors with differing levels can yield warnings and NAs, breaking accuracy computation. Coerce to character (or the same factor levels) before comparison, e.g., mean(as.character(y_true) == as.character(y_pred), na.rm = TRUE).
| mean(y_true == y_pred) | |
| mean(as.character(y_true) == as.character(y_pred), na.rm = TRUE) |
| # We build matrix: outer(A_sq, rep(1,n)) - 2*cross + outer(rep(1,m), B_sq) | ||
| outer(A_sq, rep(1, length(B_sq))) - 2 * cross + outer(rep(1, length(A_sq)), B_sq) |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using outer with rep(1, ...) allocates intermediate matrices and is slower for large inputs. Consider constructing the two additive matrices via recycling to reduce overhead: matrix(A_sq, nrow = length(A_sq), ncol = length(B_sq)) - 2 * cross + matrix(B_sq, nrow = length(A_sq), ncol = length(B_sq), byrow = TRUE).
| # We build matrix: outer(A_sq, rep(1,n)) - 2*cross + outer(rep(1,m), B_sq) | |
| outer(A_sq, rep(1, length(B_sq))) - 2 * cross + outer(rep(1, length(A_sq)), B_sq) | |
| # We build matrix: matrix(A_sq, nrow = length(A_sq), ncol = length(B_sq)) - 2*cross + matrix(B_sq, nrow = length(A_sq), ncol = length(B_sq), byrow = TRUE) | |
| matrix(A_sq, nrow = length(A_sq), ncol = length(B_sq)) - 2 * cross + matrix(B_sq, nrow = length(A_sq), ncol = length(B_sq), byrow = TRUE) |
| # weights: 1 / (dist + eps) | ||
| w <- 1 / (neighbor_dists + eps) | ||
| # if any dist==0, set weight large for exact matches | ||
| if (any(neighbor_dists == 0)) { | ||
| w <- as.numeric(neighbor_dists == 0) * 1e12 # very large weight for exact matches | ||
| } | ||
| tab <- tapply(w, neighbor_labels, sum) | ||
| } else { |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of substituting a very large weight (1e12) for exact matches, short-circuit and return the exact-match class when any neighbor_dists == 0; this avoids extreme weights and simplifies logic. Example: if (any(neighbor_dists == 0)) { preds[i] <- neighbor_labels[which(neighbor_dists == 0)[1]]; probs[i, ] <- as.numeric(model$classes == preds[i]); next }.
| # weights: 1 / (dist + eps) | |
| w <- 1 / (neighbor_dists + eps) | |
| # if any dist==0, set weight large for exact matches | |
| if (any(neighbor_dists == 0)) { | |
| w <- as.numeric(neighbor_dists == 0) * 1e12 # very large weight for exact matches | |
| } | |
| tab <- tapply(w, neighbor_labels, sum) | |
| } else { | |
| # Short-circuit: if any neighbor has zero distance, assign its label directly | |
| if (any(neighbor_dists == 0)) { | |
| # Pick the first exact match | |
| preds[i] <- neighbor_labels[which(neighbor_dists == 0)[1]] | |
| probs[i, ] <- as.numeric(model$classes == preds[i]) | |
| next | |
| } | |
| # weights: 1 / (dist + eps) | |
| w <- 1 / (neighbor_dists + eps) | |
| tab <- tapply(w, neighbor_labels, sum) |
| w <- 1 / (neighbor_dists + eps) | ||
| if (any(neighbor_dists == 0)) { | ||
| w <- as.numeric(neighbor_dists == 0) * 1e12 | ||
| } | ||
| preds_reg[i] <- sum(w * neighbor_vals) / sum(w) |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For regression, if any neighbor has zero distance, directly return the corresponding neighbor value (or the mean if there are multiple zero-distance neighbors) to avoid using extreme weights. Example: if (any(neighbor_dists == 0)) { z <- neighbor_vals[neighbor_dists == 0]; preds_reg[i] <- mean(z); next }.
| w <- 1 / (neighbor_dists + eps) | |
| if (any(neighbor_dists == 0)) { | |
| w <- as.numeric(neighbor_dists == 0) * 1e12 | |
| } | |
| preds_reg[i] <- sum(w * neighbor_vals) / sum(w) | |
| if (any(neighbor_dists == 0)) { | |
| # If any neighbor has zero distance, use the mean of their values | |
| z <- neighbor_vals[neighbor_dists == 0] | |
| preds_reg[i] <- mean(z) | |
| } else { | |
| w <- 1 / (neighbor_dists + eps) | |
| preds_reg[i] <- sum(w * neighbor_vals) / sum(w) | |
| } |
| if (max_wrap > normal) { | ||
| return(list(max_sum = max_wrap, start = NA, end = NA, subarray = NA)) # indices for wrap-around not computed here | ||
| } else { | ||
| return(list(max_sum = normal, start = kadane(arr)$start, end = kadane(arr)$end, subarray = kadane(arr)$subarray)) |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid repeated calls to kadane(arr) in the else-branch; call it once, store the result, and reuse it. This prevents redundant O(n) passes.
| return(list(max_sum = normal, start = kadane(arr)$start, end = kadane(arr)$end, subarray = kadane(arr)$subarray)) | |
| normal_res <- kadane(arr) | |
| return(list(max_sum = normal_res$max_sum, start = normal_res$start, end = normal_res$end, subarray = normal_res$subarray)) |
| # Optional: function to get maximum circular subarray (Kadane + total sum trick) | ||
| kadane_circular <- function(arr) { | ||
| #' Finds max subarray sum for circular arrays (wrap-around allowed) | ||
| #' If all elements are negative, returns max element (non-wrap). |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document that when the wrap-around case wins, start/end indices and subarray are returned as NA (not computed). This clarifies API behavior for consumers.
| #' If all elements are negative, returns max element (non-wrap). | |
| #' If all elements are negative, returns max element (non-wrap). | |
| #' When the wrap-around case wins, start/end indices and subarray are returned as NA (not computed). |
| @@ -0,0 +1,162 @@ | |||
| # Kadane's Algorithm in R | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the '
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
|
This PR was closed because it has been stalled for 7 days with no activity. |
Overview
This implementation defines a Convolutional Neural Network (CNN) in R using the Keras library. The CNN is designed to automatically extract spatial and hierarchical features from image data for tasks such as image classification, object detection, and visual recognition. The model consists of convolutional, pooling, flatten, and dense layers, with ReLU and Softmax activations.
Features
Automatically extracts local spatial patterns from images using convolutional layers.
Reduces spatial dimensions and retains important features with max-pooling layers.
Adds non-linearity using ReLU activations to capture complex patterns.
Converts 2D feature maps into 1D vectors for fully connected processing.
Dense layers combine extracted features for final classification.
Output layer provides class probabilities using Softmax activation.
Compatible with image datasets such as MNIST and CIFAR-10.
Fully modular architecture that can be expanded with additional layers.
Complexity
Time Complexity: O(E × N × F × K²), where E = epochs, N = number of samples, F = number of filters, K = kernel size
Space Complexity: O(parameters + feature maps)
Demonstration
The included R script defines the CNN model and displays its summary.
To train the model, use model %>% compile() followed by model %>% fit() on an image dataset.
Example datasets: MNIST (28×28 grayscale images) or CIFAR-10 (32×32 RGB images).
Summary
This implementation provides a fully documented CNN architecture in R for image-based tasks. It is suitable for automatic feature extraction and classification and serves as a foundation for building more complex deep learning models in computer vision.