Skip to content

Conversation

@chenghuaWang
Copy link
Collaborator

@chenghuaWang chenghuaWang commented Oct 22, 2025

  • Implement Conv2DOp and LayerNorm2DOp for CPU backend
  • Add ARM NEON optimized kernels for conv2d and layernorm2d
  • Integrate new ops into IR and op registration system
  • Extend Tensor API with size() and flatten() methods
  • Update RTTI and OpTypes to support new operations

Summary by CodeRabbit

  • New Features

    • Added Conv2D and LayerNorm2D operations with CPU backend support, including ARM-optimized kernels.
    • Integrated DeepSeek OCR model with vision encoder components.
    • Added tensor utility methods and scaled dot-product attention function.
  • Tests

    • Added Conv2D kernel tests with reference implementation for validation.

- Implement Conv2DOp and LayerNorm2DOp for CPU backend
- Add ARM NEON optimized kernels for conv2d and layernorm2d
- Integrate new ops into IR and op registration system
- Extend Tensor API with size() and flatten() methods
- Update RTTI and OpTypes to support new operations
@coderabbitai
Copy link

coderabbitai bot commented Oct 22, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR adds Conv2D and LayerNorm2D operations to the CPU backend with ARM-optimized kernels, introduces DeepSeek OCR model architecture including vision encoders and tokenizers, extends the Tensor API with utility methods, integrates new operations into the IR/RTTI system, and provides comprehensive test coverage for new kernels.

Changes

Cohort / File(s) Summary
Conv2D CPU Backend
mllm/backends/cpu/CPUBackend.cpp, mllm/backends/cpu/ops/Conv2DOp.hpp, mllm/backends/cpu/ops/Conv2DOp.cpp
Registers CPU Conv2D operation factory and implements CPU Conv2D op with load/forward logic, im2col-based weight packing, and architecture-specific BLAS or custom matmul paths
Conv2D ARM Kernels
mllm/backends/cpu/kernels/arm/conv2d.hpp, mllm/backends/cpu/kernels/arm/conv2d.cpp
Implements ARM-optimized NEON-accelerated im2col transformation functions for input and weight data with padding, stride, and dilation support
Conv2D Core Operations
mllm/core/aops/Conv2DOp.hpp, mllm/core/aops/Conv2DOp.cpp
Defines Conv2D abstract operation with parameter loading, IR tracing, shape inference, and parameter export; includes options struct and impl type enum
Conv2D Neural Network Layer
mllm/nn/layers/Conv2D.hpp, mllm/nn/layers/Conv2D.cpp
Wraps Conv2D aop as a reusable neural network layer with parameterized constructor and weight/bias accessors
LayerNorm2D CPU Backend
mllm/backends/cpu/CPUBackend.cpp, mllm/backends/cpu/ops/LayerNorm2DOp.hpp, mllm/backends/cpu/ops/LayerNorm2DOp.cpp
Registers CPU LayerNorm2D operation factory and implements CPU LayerNorm2D op with architecture-conditional dispatch (ARM vs x86 NYI)
LayerNorm2D ARM Kernels
mllm/backends/cpu/kernels/arm/layernorm2d.hpp, mllm/backends/cpu/kernels/arm/layernorm2d.cpp
Implements ARM-optimized NEON-accelerated 2D layer normalization with channel-wise mean/variance computation and fused affine transformation
LayerNorm2D Core Operations
mllm/core/aops/LayerNorm2DOp.hpp, mllm/core/aops/LayerNorm2DOp.cpp
Defines LayerNorm2D abstract operation with parameter loading, IR tracing, shape-preserving reshape, and parameter export
LayerNorm2D Neural Network Layer
mllm/nn/layers/LayerNorm2D.hpp, mllm/nn/layers/LayerNorm2D.cpp
Wraps LayerNorm2D aop as a reusable layer with multiple constructor variants
IR/RTTI Updates
mllm/compile/ir/GeneratedRTTIKind.hpp, mllm/compile/ir/NodeRTTIClassOfImpl.hpp, mllm/compile/ir/linalg/Op.hpp, mllm/compile/ir/linalg/Op.cpp, mllm/compile/ir/rtti_kind_gen.py
Adds LayerNorm2DOp to IR node kinds, RTTI macros, and linalg op declarations; updates code generator
Core Type Definitions
mllm/core/OpTypes.hpp
Adds kLayerNorm2D enum value (60) and string mapping
Kernel Include Centralization
mllm/backends/cpu/kernels/Kernels.hpp
Exports Conv2D and LayerNorm2D ARM kernel headers
Tensor API Extensions
mllm/core/Tensor.hpp, mllm/core/Tensor.cpp
Adds size(int32_t id) accessor for dimension lookup and flatten(int32_t dim) method for shape flattening with support for negative indices
Functional API Extensions
mllm/nn/Functional.hpp, mllm/nn/Functional.cpp
Adds scaledDotProductAttention() computing Q·K^T scaled softmax·V with optional masking
Module Forward Declarations
mllm/nn/Module.hpp
Renames ModuleLists/ModuleListsSuffix forward declarations to ModuleList/ModuleListSuffix (singular)
Neural Network Module Exports
mllm/nn/Nn.hpp
Exports Conv2D.hpp and LayerNorm2D.hpp for public API
DeepSeek OCR Model Configuration
mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp
Defines DpskOcrConfig struct with nested LanguageConfig, ProjectorConfig, VisionConfig; parses model hyperparameters, token IDs, layer counts, and init types from configuration data
DeepSeek OCR Vision Encoders
mllm/models/deepseek_ocr/deepencoder.hpp
Implements CLIP-style vision embeddings, Vision Transformer components (attention, MLP blocks, transformer layers), and SAM-style image encoder with positional embeddings and relative position handling
DeepSeek OCR Tokenizer
mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp
Defines DpskOcrTokenizer stub with BPE backend; tokenization methods currently TODO/placeholder
DeepSeek OCR Documentation
mllm/models/deepseek_ocr/README.md, mllm/models/deepseek_ocr/conversation.hpp, mllm/models/deepseek_ocr/modeling_deepseek_ocr.hpp
Adds model README with branding/links and licensing headers
Conv2D Kernel Tests
tests/cpu/Conv2DKernelTest.hpp, tests/cpu/KernelTest.cpp
Implements naive reference conv2d, Conv2DModule wrapper, and Conv2DKernelTest harness; adds im2col test case (224×224 input, 14×14 kernel, stride 14)
Test Infrastructure
tests/cpu/FlashAttentionKernelTest.hpp, tests/cpu/KernelTest.cpp
Removes PrefixCache.hpp dependency; integrates Conv2D tests into main test launcher with RUN_ALL_TESTS and resource reporting

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CPUConv2DOp
    participant ARM Kernels
    participant BLAS
    participant Output

    User->>CPUConv2DOp: load(weights, bias)
    CPUConv2DOp->>ARM Kernels: conv2d_fp32_im2col_weight(src, packed_weights)
    ARM Kernels-->>CPUConv2DOp: packed_weights

    User->>CPUConv2DOp: forward(input)
    CPUConv2DOp->>ARM Kernels: conv2d_fp32_im2col_input(input → col_data)
    ARM Kernels-->>CPUConv2DOp: col_data
    
    CPUConv2DOp->>BLAS: matmul(packed_weights, col_data)
    BLAS-->>CPUConv2DOp: output_gemm
    
    CPUConv2DOp->>CPUConv2DOp: apply_bias(output_gemm)
    CPUConv2DOp-->>Output: final_output
Loading
sequenceDiagram
    participant Input
    participant LayerNorm2DOp
    participant ARM NEON
    participant Output

    Input->>LayerNorm2DOp: forward(x, weight, bias)
    Note over LayerNorm2DOp: Iterate per (batch, spatial_pos)
    
    LayerNorm2DOp->>ARM NEON: compute_mean_channels(x)
    ARM NEON-->>LayerNorm2DOp: mean_per_pos
    
    LayerNorm2DOp->>ARM NEON: compute_variance_channels(x, mean)
    ARM NEON-->>LayerNorm2DOp: variance_per_pos
    
    LayerNorm2DOp->>LayerNorm2DOp: inv_std = 1/sqrt(var + eps)
    
    LayerNorm2DOp->>ARM NEON: normalize_affine((x-mean)*inv_std*weight+bias)
    ARM NEON-->>LayerNorm2DOp: normalized
    
    LayerNorm2DOp-->>Output: y
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: This PR spans heterogeneous components requiring distinct reasoning paths: ARM NEON kernel optimizations with vectorized intrinsics (medium density), multi-layer operation integration across aops/backends/nn (structural consistency), DeepSeek model architecture with transformer blocks and attention mechanisms (high logic density), and IR/RTTI registration patterns (straightforward but multi-file). The change count exceeds 30 files with mixed complexity (simple includes, intricate kernels, architecture definitions). While many changes follow established patterns, the NEON optimizations, Conv2D im2col logic, LayerNorm2D channel-wise statistics, and Vision Transformer blocks require careful verification.

Suggested reviewers

  • oreomaker
  • yirongjie

Poem

🐰 Convolutions bloom in ARM's embrace,
LayerNorms dance through channels' grace,
DeepSeek's eyes see pixels whole,
Tensors flatten, sizes scroll—
The warren grows, computation flows!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The pull request title "feat(deepseek-ocr): deepseek-ocr support(On working)" is partially related to the changeset. While the PR does include DeepSeek OCR model support files (configuration, encoder, tokenizer), the primary technical changes are the implementation of Conv2DOp and LayerNorm2DOp operations for the CPU backend with ARM NEON optimization, along with IR integration, Tensor API extensions (size() and flatten()), and RTTI updates. These core infrastructure changes represent the main focus of the changeset and would benefit other models beyond deepseek-ocr. The title captures the use case but obscures the fundamental operational additions that enable that support. Consider revising the title to emphasize the primary technical contribution, such as "feat: add Conv2D and LayerNorm2D operations with ARM NEON kernels and deepseek-ocr model support" or "feat(cpu): implement Conv2D and LayerNorm2D ops for deepseek-ocr". This would clarify that the PR introduces new operations to the backend infrastructure, with deepseek-ocr being the initial consumer model.
Docstring Coverage ⚠️ Warning Docstring coverage is 2.63% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Comment @coderabbitai help to get the list of available commands and usage tips.

@chenghuaWang
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Oct 22, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 30

🧹 Nitpick comments (32)
mllm/models/deepseek_ocr/modeling_deepseek_ocr.hpp (1)

1-2: Consider deferring empty placeholder files.

This file contains only the copyright header with no implementation. Since the PR is marked as "On working", consider deferring the commit of empty placeholder files until they contain actual code to reduce noise in the codebase.

mllm/models/deepseek_ocr/conversation.hpp (1)

1-2: Consider deferring empty placeholder files.

This file contains only the copyright header with no implementation. Consider deferring the commit of empty placeholder files until they contain actual code.

mllm/core/Tensor.hpp (2)

479-485: Consider improving the documentation for size().

The method declaration is fine, but the doc comment @brief is empty. Consider adding a description such as:

/**
 * @brief Gets the size of a specific dimension.
 *
 * @param id Dimension index (supports negative indexing)
 * @return int32_t Size of the specified dimension
 */

533-539: Consider improving the documentation for flatten().

The method declaration is fine and follows the existing pattern of using 0x7fffffff as a sentinel value (consistent with other methods in this file). However, the doc comment @brief is empty. Consider adding a description such as:

/**
 * @brief Flattens the tensor from the specified dimension onward.
 *
 * @param dim Starting dimension for flattening (default: flatten all dimensions)
 * @return Tensor Flattened tensor view
 */
mllm/nn/layers/LayerNorm2D.hpp (1)

17-17: Remove ineffective const on parameter and avoid magic number default.

Const on by-value param is noise, and 1e-6 should be a named default.

Apply this diff:

-  explicit LayerNorm2D(const int32_t num_channels, float eps = 1e-6);
+  static constexpr float kDefaultEps = 1e-6f;
+  explicit LayerNorm2D(int32_t num_channels, float eps = kDefaultEps);
mllm/backends/cpu/ops/LayerNorm2DOp.cpp (1)

4-4: Remove unused include.

isn’t used. Drop it to speed up builds slightly.

-#include <cstring>
mllm/nn/layers/Conv2D.hpp (2)

12-16: Prefer fixed-size containers for 2D parameters.

kernel_size/stride/padding/dilation are always 2D. Using std::array<int32_t, 2> conveys intent, avoids heap allocs, and prevents size bugs.

-  Conv2D(int32_t in_channels, int32_t out_channels, const std::vector<int32_t>& kernel_size,
-         const std::vector<int32_t>& stride_size, const std::vector<int32_t>& padding_size,
-         const std::vector<int32_t>& dilation_size, bool bias = true,
+  Conv2D(int32_t in_channels, int32_t out_channels, const std::array<int32_t,2>& kernel_size,
+         const std::array<int32_t,2>& stride_size, const std::array<int32_t,2>& padding_size,
+         const std::array<int32_t,2>& dilation_size, bool bias = true,
          aops::Conv2DOpImplType impl_type = aops::Conv2DOpImplType::kDefault);

19-21: Mark accessors noexcept.

These are non-throwing in normal operation; add noexcept to signal and enable minor optimizations.

-  [[nodiscard]] Tensor weight() const;
+  [[nodiscard]] Tensor weight() const noexcept;
 
-  [[nodiscard]] Tensor bias() const;
+  [[nodiscard]] Tensor bias() const noexcept;
mllm/backends/cpu/ops/Conv2DOp.cpp (3)

4-4: Remove unused include.

isn’t referenced.

-#include <cstring>

86-97: Out_h/out_w derived from output; compute/validate explicitly.

Use input + params to compute out_h/out_w and verify against output.shape() to prevent silent shape mismatches.

-          int MATMUL_M = options_.out_channels;
-          int MATMUL_K = options_.in_channels * kernel_size[0] * kernel_size[1];
-          int MATMUL_N = output.shape()[2] * output.shape()[3];
+          const int MATMUL_M = options_.out_channels;
+          const int MATMUL_K = options_.in_channels * kernel_size[0] * kernel_size[1];
+          const int in_h = input.shape()[2], in_w = input.shape()[3];
+          const int out_h = (in_h + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) / stride[0] + 1;
+          const int out_w = (in_w + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) / stride[1] + 1;
+          MLLM_CHECK(output.shape()[2] == out_h && output.shape()[3] == out_w, "Conv2D output shape mismatch");
+          int MATMUL_N = out_h * out_w; // batch handled below

105-126: Bias/arch paths fine; improve error messages and ensure batch-aware matmul.

  • After packing B batches, set MATMUL_N = B * out_h * out_w and pass that to matmul.
  • Minor: “Pls” → “Please”.
-            case aops::MatMulOpType::kBLAS: {
+            case aops::MatMulOpType::kBLAS: {
 #if defined(MLLM_USE_BLAS)
-              blas::matmul_fp32(weight_.ptr<mllm_fp32_t>(), packed_inputs.ptr<mllm_fp32_t>(), output.ptr<mllm_fp32_t>(),
-                                options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr, MATMUL_M, MATMUL_N, MATMUL_K, false, false);
+              blas::matmul_fp32(weight_.ptr<mllm_fp32_t>(), packed_inputs.ptr<mllm_fp32_t>(), output.ptr<mllm_fp32_t>(),
+                                options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr,
+                                MATMUL_M, /*N=*/B * MATMUL_N, MATMUL_K, false, false);
 #else
-              NYI("BLAS not supported. Pls set MLLM_USE_BLAS=ON to enable BLAS supports in cmake.");
+              NYI("BLAS not supported. Please set MLLM_USE_BLAS=ON to enable BLAS in CMake.");
 #endif
               break;
             }
             case aops::MatMulOpType::kMllmBlas: {
               auto thread_count = options_.getThreads();
 #if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
-              arm::mllm_blas_matmul_fp32(MATMUL_M, MATMUL_K, MATMUL_N, output.ptr<mllm_fp32_t>(), weight_.ptr<mllm_fp32_t>(),
-                                         packed_inputs.ptr<mllm_fp32_t>(), options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr,
+              arm::mllm_blas_matmul_fp32(MATMUL_M, MATMUL_K, /*N=*/B * MATMUL_N,
+                                         output.ptr<mllm_fp32_t>(), weight_.ptr<mllm_fp32_t>(),
+                                         packed_inputs.ptr<mllm_fp32_t>(),
+                                         options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr,
                                          false, false, thread_count);
mllm/backends/cpu/ops/Conv2DOp.hpp (1)

20-25: Minor: mark factory method noexcept/final.

Creation shouldn’t throw; also mark final for clarity.

-class CPUConv2DOpFactory : public TypedOpFactory<OpTypes::kConv2D, aops::Conv2DOpOptions> {
+class CPUConv2DOpFactory final : public TypedOpFactory<OpTypes::kConv2D, aops::Conv2DOpOptions> {
  public:
-  std::shared_ptr<BaseOp> createOpImpl(const aops::Conv2DOpOptions& options) override {
+  std::shared_ptr<BaseOp> createOpImpl(const aops::Conv2DOpOptions& options) noexcept override {
     return std::make_shared<CPUConv2DOp>(options);
   }
 };
mllm/backends/cpu/kernels/arm/conv2d.hpp (1)

21-31: Parameter types and const/noexcept polish.

Use fixed-width ints for dims and add noexcept; keeps ABI explicit and hints compiler.

-void conv2d_fp32_im2col_input(const float* input_data, const int channels, const int height, const int width,
-                              const int kernel_h, const int kernel_w, const int pad_h, const int pad_w, const int stride_h,
-                              const int stride_w, const int dilation_h, const int dilation_w, float* col_data);
+void conv2d_fp32_im2col_input(const float* input_data,
+                              int32_t channels, int32_t height, int32_t width,
+                              int32_t kernel_h, int32_t kernel_w,
+                              int32_t pad_h, int32_t pad_w,
+                              int32_t stride_h, int32_t stride_w,
+                              int32_t dilation_h, int32_t dilation_w,
+                              float* col_data) noexcept;
@@
-void conv2d_fp32_im2col_weight(const float* src_weight, float* packed_weight, int out_channels, int in_channels, int kernel_h,
-                               int kernel_w);
+void conv2d_fp32_im2col_weight(const float* src_weight, float* packed_weight,
+                               int32_t out_channels, int32_t in_channels,
+                               int32_t kernel_h, int32_t kernel_w) noexcept;
mllm/core/aops/LayerNorm2DOp.hpp (2)

11-14: Initialize options and avoid magic number.

Give num_channels a safe default and name eps.

-struct LayerNorm2DOpOptions : public BaseOpOptions<LayerNorm2DOpOptions> {
-  int32_t num_channels;
-  float eps = 1e-6;
-};
+struct LayerNorm2DOpOptions : public BaseOpOptions<LayerNorm2DOpOptions> {
+  static constexpr float kDefaultEps = 1e-6f;
+  int32_t num_channels{-1};
+  float eps{kDefaultEps};
+};

32-36: Add const accessors to avoid unintended mutation.

Expose const refs alongside mutable ones.

-  inline Tensor& weight() { return weight_; }
-  inline Tensor& bias() { return bias_; }
-  inline LayerNorm2DOpOptions& options() { return options_; }
+  inline Tensor& weight() { return weight_; }
+  inline const Tensor& weight() const { return weight_; }
+  inline Tensor& bias() { return bias_; }
+  inline const Tensor& bias() const { return bias_; }
+  inline LayerNorm2DOpOptions& options() { return options_; }
+  inline const LayerNorm2DOpOptions& options() const { return options_; }
mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp (1)

45-50: Initialize BPE explicitly (clarity).

Make construction explicit and consider wiring file_path when implemented.

-  preprocessor::BPE bpe_;
+  preprocessor::BPE bpe_{};
mllm/backends/cpu/kernels/arm/layernorm2d.cpp (2)

60-79: Optional: readability of loop index 'c'.

Declare 'int c = 0;' once before NEON sections to avoid reusing scope across separated blocks.

-#if defined(__ARM_NEON)
-      float32x4_t inv_std_vec = vdupq_n_f32(inv_std);
-      c = 0;
+#if defined(__ARM_NEON)
+      int c = 0;
+      float32x4_t inv_std_vec = vdupq_n_f32(inv_std);

11-85: Optional: aliasing/perf hints.

If your toolchain supports it, annotate pointers with restrict (project macro) and consider a fused mean/var pass to reduce memory traffic. Not required for functionality.

mllm/core/aops/LayerNorm2DOp.cpp (2)

16-21: Validate pulled tensors before view; enforce channel count.

Protect against missing params and mismatched sizes before reshaping.

 void LayerNorm2DOp::load(const ParameterFile::ptr_t& ploader) {
-  weight_ = ploader->pull(getName() + ".weight");
-  weight_ = weight_.view({options_.num_channels});
-  bias_ = ploader->pull(getName() + ".bias");
-  bias_ = bias_.view({options_.num_channels});
+  weight_ = ploader->pull(getName() + ".weight");
+  bias_   = ploader->pull(getName() + ".bias");
+
+  if (!weight_) { MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing parameter: {}.weight", getName()); }
+  if (!bias_)   { MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing parameter: {}.bias", getName()); }
+
+  MLLM_RT_ASSERT_EQ(weight_.numel(), options_.num_channels);
+  MLLM_RT_ASSERT_EQ(bias_.numel(),   options_.num_channels);
+  weight_ = weight_.view({options_.num_channels});
+  bias_   = bias_.view({options_.num_channels});
 }

49-54: Avoid pushing empty params.

Skip pushing absent tensors.

-  p->push(getName() + ".weight", weight_);
-  p->push(getName() + ".bias", bias_);
+  if (weight_) p->push(getName() + ".weight", weight_);
+  if (bias_)   p->push(getName() + ".bias", bias_);
mllm/core/aops/Conv2DOp.cpp (3)

52-53: Remove unused local _op.

Avoid unused-variable warnings.

-  auto _op = ir_ctx->create<ir::linalg::Conv2DOp>(shared_from_this(), i_irs, o_irs);
+  ir_ctx->create<ir::linalg::Conv2DOp>(shared_from_this(), i_irs, o_irs);

63-68: Unreachable fallback after fatal error macro.

MLLM_ERROR_EXIT likely terminates; fallback tensor creation is dead code. Prefer one behavior.

-  if (ishape.size() != 4) {
-    MLLM_ERROR_EXIT(ExitCode::kCoreError, "Conv2DOp expects 4D input, got {} D", ishape.size());
-    outputs.emplace_back(Tensor::empty(i.shape(), i.dtype(), i.device()));
-    return;
-  }
+  if (ishape.size() != 4) {
+    MLLM_ERROR_EXIT(ExitCode::kCoreError, "Conv2DOp expects 4D input, got {} D", ishape.size());
+    return;
+  }

106-111: Skip pushing absent bias.

Avoid writing an empty bias tensor when options_.bias is true but bias_ wasn’t loaded.

-  p->push(getName() + ".weight", weight_);
-  if (options_.bias) { p->push(getName() + ".bias", bias_); }
+  if (weight_) p->push(getName() + ".weight", weight_);
+  if (options_.bias && bias_) { p->push(getName() + ".bias", bias_); }
tests/cpu/Conv2DKernelTest.hpp (4)

13-16: Reduce swappable-parameter risk in naive_conv2d.

Bundle shape/stride/pad/dilation into a struct to avoid call‑site mix‑ups and simplify the signature.

-void naive_conv2d(const float* input_data, const float* weight_data, const float* bias_data, float* output_data,
-                  int in_channels, int in_h, int in_w, int out_channels, int kernel_h, int kernel_w, int pad_h, int pad_w,
-                  int stride_h, int stride_w, int dilation_h, int dilation_w) {
+struct Conv2DArgs {
+  int in_channels, in_h, in_w, out_channels;
+  int kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w;
+};
+void naive_conv2d(const float* input_data, const float* weight_data, const float* bias_data, float* output_data,
+                  const Conv2DArgs& a) {

…and replace uses (in_channelsa.in_channels, etc.). Update the sole call site accordingly.


53-57: Avoid partially-initialized module; remove default ctor.

Conv2DModule() isn’t used and leaves conv2d_ unconfigured. Delete it to prevent accidental misuse.

-class Conv2DModule : public nn::Module {
+class Conv2DModule : public nn::Module {
   nn::Conv2D conv2d_;
 
  public:
-  Conv2DModule() = default;
+  Conv2DModule() = delete;

86-94: Don’t push bias when unused.

Keep test inputs aligned with op options.

-    param->push("emb.bias", bias_param);
+    if (bias) { param->push("emb.bias", bias_param); }

95-103: Add deterministic seed for reproducibility.

Reduce flaky diffs across runs.

-    auto input = Tensor::random({1, in_channel, I_H, I_W}, -1, 1, kFloat32, kCPU);
+    test::setRandomSeed(42);
+    auto input = Tensor::random({1, in_channel, I_H, I_W}, -1, 1, kFloat32, kCPU);

If test::setRandomSeed isn’t available, use the project’s equivalent or seed the RNG used by Tensor::random.

mllm/core/aops/Conv2DOp.hpp (1)

60-65: Offer const accessors to preserve invariants.

Expose read-only views by default; keep mutable ones only if needed.

-  inline Tensor& weight() { return weight_; }
-  inline Tensor& bias() { return bias_; }
-  inline Conv2DOpOptions& options() { return options_; }
+  inline const Tensor& weight() const { return weight_; }
+  inline const Tensor& bias()   const { return bias_; }
+  inline const Conv2DOpOptions& options() const { return options_; }
+  inline Tensor& weight() { return weight_; }   // keep if mutation needed
+  inline Tensor& bias()   { return bias_; }     // keep if mutation needed
+  inline Conv2DOpOptions& options() { return options_; }
mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp (1)

14-99: Harden config parsing with defaults/guards.

Several reads assume presence; add contains() checks or helper getters with defaults to avoid exceptions on missing keys.

I can draft a small get_or<T>(json, key, default) helper to wrap contains/is_null and casting across the repeated fields. Do you want that?

mllm/models/deepseek_ocr/deepencoder.hpp (3)

185-189: Avoid hard-coded 1024; use embed_dim_.

Keeps code future-proof if CLIP dims change.

-    auto class_embeds = class_embedding_.weight().view({1, 1, 1024});
+    auto class_embeds = class_embedding_.weight().view({1, 1, embed_dim_});

404-423: Reserved identifier: rename __interpolateLinear1d.

Leading double underscore is reserved; also update call site.

-  Tensor __interpolateLinear1d(const Tensor& input, int output_size) {
+  Tensor interpolateLinear1d(const Tensor& input, int output_size) {
@@
-      rel_pos_resized = __interpolateLinear1d(rel_pos, max_rel_dist);
+      rel_pos_resized = interpolateLinear1d(rel_pos, max_rel_dist);

Also applies to: 431-433


64-79: Prefer member-initializers and constants/config instead of magic numbers.

Move embed/image/patch sizes to compile-time constants or DpskOcrConfig; initialize members in the ctor initializer list.

-  CLIPVisionEmbeddings(const std::string& name, const DpskOcrConfig& config) : nn::Module(name) {
-    embed_dim_ = 1024;
-    image_size_ = 224;
-    patch_size_ = 14;
+  CLIPVisionEmbeddings(const std::string& name, const DpskOcrConfig& config)
+      : nn::Module(name),
+        embed_dim_(/*config.clip_embed_dim*/1024),
+        image_size_(/*config.clip_image_size*/224),
+        patch_size_(/*config.clip_patch*/14) {

If config already carries these, wire them instead of literals. Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6c000e and 74a989b.

📒 Files selected for processing (39)
  • mllm/backends/cpu/CPUBackend.cpp (2 hunks)
  • mllm/backends/cpu/kernels/Kernels.hpp (1 hunks)
  • mllm/backends/cpu/kernels/arm/conv2d.cpp (1 hunks)
  • mllm/backends/cpu/kernels/arm/conv2d.hpp (1 hunks)
  • mllm/backends/cpu/kernels/arm/layernorm2d.cpp (1 hunks)
  • mllm/backends/cpu/kernels/arm/layernorm2d.hpp (1 hunks)
  • mllm/backends/cpu/ops/Conv2DOp.cpp (1 hunks)
  • mllm/backends/cpu/ops/Conv2DOp.hpp (1 hunks)
  • mllm/backends/cpu/ops/LayerNorm2DOp.cpp (1 hunks)
  • mllm/backends/cpu/ops/LayerNorm2DOp.hpp (1 hunks)
  • mllm/compile/ir/GeneratedRTTIKind.hpp (2 hunks)
  • mllm/compile/ir/NodeRTTIClassOfImpl.hpp (2 hunks)
  • mllm/compile/ir/linalg/Op.cpp (1 hunks)
  • mllm/compile/ir/linalg/Op.hpp (2 hunks)
  • mllm/compile/ir/rtti_kind_gen.py (1 hunks)
  • mllm/core/OpTypes.hpp (2 hunks)
  • mllm/core/Tensor.cpp (2 hunks)
  • mllm/core/Tensor.hpp (2 hunks)
  • mllm/core/aops/Conv2DOp.cpp (1 hunks)
  • mllm/core/aops/Conv2DOp.hpp (1 hunks)
  • mllm/core/aops/LayerNorm2DOp.cpp (1 hunks)
  • mllm/core/aops/LayerNorm2DOp.hpp (1 hunks)
  • mllm/models/deepseek_ocr/README.md (1 hunks)
  • mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp (1 hunks)
  • mllm/models/deepseek_ocr/conversation.hpp (1 hunks)
  • mllm/models/deepseek_ocr/deepencoder.hpp (1 hunks)
  • mllm/models/deepseek_ocr/modeling_deepseek_ocr.hpp (1 hunks)
  • mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp (1 hunks)
  • mllm/nn/Functional.cpp (1 hunks)
  • mllm/nn/Functional.hpp (1 hunks)
  • mllm/nn/Module.hpp (1 hunks)
  • mllm/nn/Nn.hpp (2 hunks)
  • mllm/nn/layers/Conv2D.cpp (1 hunks)
  • mllm/nn/layers/Conv2D.hpp (1 hunks)
  • mllm/nn/layers/LayerNorm2D.cpp (1 hunks)
  • mllm/nn/layers/LayerNorm2D.hpp (1 hunks)
  • tests/cpu/Conv2DKernelTest.hpp (1 hunks)
  • tests/cpu/FlashAttentionKernelTest.hpp (0 hunks)
  • tests/cpu/KernelTest.cpp (1 hunks)
💤 Files with no reviewable changes (1)
  • tests/cpu/FlashAttentionKernelTest.hpp
🧰 Additional context used
🪛 Clang (14.0.6)
mllm/nn/layers/LayerNorm2D.hpp

[error] 6-6: 'mllm/nn/Layer.hpp' file not found

(clang-diagnostic-error)


[error] 17-17: parameter 'num_channels' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions

(readability-avoid-const-params-in-decls,-warnings-as-errors)


[error] 17-17: 1e-6 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/layernorm2d.hpp

[error] 6-6: 'mllm/utils/CPUArchHelper.hpp' file not found

(clang-diagnostic-error)

mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp

[error] 11-11: 'vector' file not found

(clang-diagnostic-error)


[error] 23-23: constructor does not initialize these fields: bpe_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

mllm/nn/layers/Conv2D.hpp

[error] 3-3: 'mllm/nn/Layer.hpp' file not found

(clang-diagnostic-error)

mllm/backends/cpu/kernels/arm/conv2d.cpp

[error] 3-3: 'mllm/backends/cpu/kernels/arm/conv2d.hpp' file not found

(clang-diagnostic-error)

mllm/backends/cpu/ops/LayerNorm2DOp.cpp

[error] 4-4: 'cstring' file not found

(clang-diagnostic-error)

mllm/nn/layers/LayerNorm2D.cpp

[error] 4-4: 'mllm/core/aops/LayerNorm2DOp.hpp' file not found

(clang-diagnostic-error)

mllm/core/aops/LayerNorm2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)


[error] 11-11: constructor does not initialize these fields: num_channels

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 13-13: 1e-6 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 39-39: member variable 'weight_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 40-40: member variable 'bias_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 41-41: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/backends/cpu/ops/Conv2DOp.cpp

[error] 4-4: 'cstring' file not found

(clang-diagnostic-error)

mllm/backends/cpu/kernels/arm/layernorm2d.cpp

[error] 3-3: 'mllm/backends/cpu/kernels/arm/conv2d.hpp' file not found

(clang-diagnostic-error)

mllm/compile/ir/linalg/Op.hpp

[error] 217-217: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 217-217: class 'LayerNorm2DOp' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/cpu/ops/LayerNorm2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)

mllm/nn/layers/Conv2D.cpp

[error] 1-1: 'mllm/nn/layers/Conv2D.hpp' file not found

(clang-diagnostic-error)

mllm/core/aops/Conv2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)


[error] 15-15: constructor does not initialize these fields: in_channels, out_channels, kernel_size, stride, padding, dilation

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 29-29: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 39-39: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 67-67: member variable 'weight_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 68-68: member variable 'bias_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 69-69: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/core/Tensor.hpp

[error] 539-539: 0x7fffffff is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/conv2d.hpp

[error] 6-6: 'mllm/utils/CPUArchHelper.hpp' file not found

(clang-diagnostic-error)

mllm/backends/cpu/ops/Conv2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)

mllm/models/deepseek_ocr/deepencoder.hpp

[error] 5-5: 'cmath' file not found

(clang-diagnostic-error)


[error] 51-51: constructor does not initialize these fields: embed_dim_, image_size_, patch_size_, class_embedding_, patch_embedding_, num_patches_, num_positions_, position_embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 64-64: constructor does not initialize these fields: class_embedding_, patch_embedding_, position_embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 64-64: 2 adjacent parameters of 'CLIPVisionEmbeddings' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 65-65: 'embed_dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 65-65: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 66-66: 'image_size_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 66-66: 224 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 67-67: 'patch_size_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 67-67: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 68-68: 'num_patches_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 69-69: 'num_positions_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 81-81: 2 adjacent parameters of 'getAbsPos' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 103-103: variable 'channels' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 108-108: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 109-109: variable name 'a' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 109-109: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 111-111: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 112-112: 2.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 112-112: 3.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 113-113: do not use 'else' after 'return'

(readability-else-after-return,-warnings-as-errors)


[error] 113-113: 2.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 113-113: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 114-114: 5.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 114-114: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 114-114: 8.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 114-114: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 114-114: 4.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 114-114: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 116-116: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 120-120: variable 'scale_y' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 121-121: variable 'scale_x' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 123-123: loop variable name 'c' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)


[error] 124-124: variable 'src_channel_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 125-125: variable 'dst_channel_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 163-163: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 169-169: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)


[error] 194-194: constructor does not initialize these fields: fc1_, fc2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 202-202: constructor does not initialize these fields: fc1_, fc2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 202-202: 4 adjacent parameters of 'NoTPFeedForward' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 208-208: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 213-213: constructor does not initialize these fields: num_heads_, n_local_heads_, head_dim_, max_seq_len_, qkv_proj_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 224-224: constructor does not initialize these fields: qkv_proj_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 224-224: 2 adjacent parameters of 'NoTPAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 225-225: 'num_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 225-225: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 226-226: 'n_local_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 226-226: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 227-227: 'head_dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 227-227: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 227-227: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 228-228: 'max_seq_len_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 228-228: 256 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 234-234: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 236-236: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 256-256: constructor does not initialize these fields: n_heads_, dim_, head_dim_, layer_norm1_, layer_norm2_, layer_id_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 266-266: member variable 'layer_id_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)


[error] 270-270: constructor does not initialize these fields: layer_norm1_, layer_norm2_, layer_id_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 270-270: 2 adjacent parameters of 'NoTPTransformerBlock' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 271-271: 'n_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 271-271: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 272-272: 'dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 272-272: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 273-273: 'head_dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 273-273: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 273-273: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 280-280: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 281-281: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 283-283: variable name 'h' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 289-289: constructor does not initialize these fields: num_layers_, layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 296-296: constructor does not initialize these fields: layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 296-296: 2 adjacent parameters of 'NoTPTransformer' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 297-297: 'num_layers_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 297-297: 24 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 302-302: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 310-310: constructor does not initialize these fields: pre_layernorm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 318-318: constructor does not initialize these fields: pre_layernorm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 318-318: 2 adjacent parameters of 'VitModel' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 327-327: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 328-328: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 341-341: constructor does not initialize these fields: proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 347-347: constructor does not initialize these fields: proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 347-347: 2 adjacent parameters of 'PatchEmbed' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 352-352: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 353-353: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 359-359: constructor does not initialize these fields: lin1_, lin2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 367-367: constructor does not initialize these fields: lin1_, lin2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 367-367: 4 adjacent parameters of 'MLPBlock' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 373-373: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 378-378: constructor does not initialize these fields: num_heads_, use_rel_pos_, qkv_, proj_, rel_pos_h_, rel_pos_w_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 390-390: constructor does not initialize these fields: qkv_, proj_, rel_pos_h_, rel_pos_w_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 390-390: 3 adjacent parameters of 'Attention' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 391-391: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 393-393: 'num_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 394-394: 'use_rel_pos_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 404-404: declaration uses identifier '__interpolateLinear1d', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)


[error] 404-404: 2 adjacent parameters of '__interpolateLinear1d' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 406-406: variable 'input_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 407-407: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)


[error] 410-410: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)


[error] 411-411: variable name 'x0' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 412-412: variable 'x1' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 412-412: variable name 'x1' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 413-413: variable name 'w1' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 413-413: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)


[error] 414-414: variable name 'w0' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 414-414: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 426-426: 2 adjacent parameters of 'getRelPos' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 428-428: variable 'rel_pos_resized' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 430-430: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)


[error] 439-439: variable 'q_scale' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 440-440: variable 'k_scale' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 446-446: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)


[error] 447-447: variable 'embedding_dim' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 452-452: variable 'relative_coord_float' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 453-453: variable 'relative_coord_long' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 458-458: loop variable name 'd' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)


[error] 470-470: 2 adjacent parameters of 'addDecomposedRelPos' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 470-470: 2 adjacent parameters of 'addDecomposedRelPos' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 475-475: variable name 'Rh' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 475-475: invalid case style for variable 'Rh'

(readability-identifier-naming,-warnings-as-errors)


[error] 476-476: variable name 'Rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 476-476: invalid case style for variable 'Rw'

(readability-identifier-naming,-warnings-as-errors)


[error] 478-478: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 478-478: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)


[error] 487-487: variable 'r_q_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 488-488: variable 'Rh_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 488-488: invalid case style for variable 'Rh_ptr'

(readability-identifier-naming,-warnings-as-errors)


[error] 489-489: variable 'rel_h_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 508-508: variable 'r_q_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 509-509: variable 'Rw_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 509-509: invalid case style for variable 'Rw_ptr'

(readability-identifier-naming,-warnings-as-errors)


[error] 510-510: variable 'rel_w_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 535-535: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 536-536: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 537-537: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 537-537: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)


[error] 538-538: variable name 'H' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 538-538: invalid case style for variable 'H'

(readability-identifier-naming,-warnings-as-errors)


[error] 539-539: variable name 'W' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 539-539: invalid case style for variable 'W'

(readability-identifier-naming,-warnings-as-errors)


[error] 580-580: constructor does not initialize these fields: norm1_, norm2_, window_size_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 590-590: constructor does not initialize these fields: norm1_, norm2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 590-590: 4 adjacent parameters of 'Block' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 591-591: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 599-599: 'window_size_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)


[error] 662-662: constructor does not initialize these fields: blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 668-668: constructor does not initialize these fields: blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 668-668: 4 adjacent parameters of 'Blocks' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 671-671: variable 'is_in' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)


[error] 672-672: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 681-681: constructor does not initialize these fields: pos_embed_, neck_, net_2_, net_3_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 692-692: constructor does not initialize these fields: pos_embed_, neck_, net_2_, net_3_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 692-692: 2 adjacent parameters of 'ImageEncoderViT' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 719-719: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 720-720: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/core/aops/LayerNorm2DOp.cpp

[error] 4-4: 'mllm/core/aops/LayerNorm2DOp.hpp' file not found

(clang-diagnostic-error)

mllm/core/aops/Conv2DOp.cpp

[error] 4-4: 'mllm/core/aops/Conv2DOp.hpp' file not found

(clang-diagnostic-error)

tests/cpu/Conv2DKernelTest.hpp

[error] 3-3: 'unordered_map' file not found

(clang-diagnostic-error)


[error] 13-13: function 'naive_conv2d' has cognitive complexity of 33 (threshold 25)

(readability-function-cognitive-complexity,-warnings-as-errors)


[error] 13-13: 2 adjacent parameters of 'naive_conv2d' of similar type ('const float *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 14-14: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 14-14: 3 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 14-14: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 14-14: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 15-15: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 21-21: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)


[error] 25-25: variable name 'iy' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 26-26: variable name 'ix' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 32-32: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)


[error] 32-32: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)


[error] 38-38: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)


[error] 41-41: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)


[error] 47-47: constructor does not initialize these fields: conv2d_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 53-53: constructor does not initialize these fields: conv2d_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 53-53: 3 adjacent parameters of 'Conv2DModule' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 53-53: 2 adjacent parameters of 'Conv2DModule' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 53-53: 2 adjacent parameters of 'Conv2DModule' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 59-59: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)


[error] 65-65: class 'Conv2DKernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)


[error] 70-70: method 'testConv2DOnce' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 73-73: invalid case style for variable 'I_H'

(readability-identifier-naming,-warnings-as-errors)


[error] 74-74: invalid case style for variable 'I_W'

(readability-identifier-naming,-warnings-as-errors)


[error] 75-75: invalid case style for variable 'K_H'

(readability-identifier-naming,-warnings-as-errors)


[error] 76-76: invalid case style for variable 'K_W'

(readability-identifier-naming,-warnings-as-errors)


[error] 77-77: invalid case style for variable 'S_H'

(readability-identifier-naming,-warnings-as-errors)


[error] 78-78: invalid case style for variable 'S_W'

(readability-identifier-naming,-warnings-as-errors)


[error] 79-79: invalid case style for variable 'P_H'

(readability-identifier-naming,-warnings-as-errors)


[error] 80-80: invalid case style for variable 'P_W'

(readability-identifier-naming,-warnings-as-errors)


[error] 98-98: variable name 'GT' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)


[error] 98-98: invalid case style for variable 'GT'

(readability-identifier-naming,-warnings-as-errors)


[error] 107-107: redundant boolean literal in conditional return statement

(readability-simplify-boolean-expr,-warnings-as-errors)


[error] 113-113: method 'testConv2D' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)


[error] 113-113: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 186-186: function-like macro 'RTTI_RK_OP_LINALGIROP_LAYERNORM2DOP_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp

[error] 5-5: 'mllm/core/aops/LinearOp.hpp' file not found

(clang-diagnostic-error)


[error] 11-11: constructor does not initialize these fields: clip_linear_impl_type, sam_linear_impl_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 14-14: constructor does not initialize these fields: clip_linear_impl_type, sam_linear_impl_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)


[error] 139-139: 3.7362 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

tests/cpu/KernelTest.cpp

[error] 873-873: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)


[error] 876-876: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 877-877: 224 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 878-878: 224 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 879-879: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 880-880: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 881-881: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)


[error] 882-882: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/nn/Functional.cpp

[error] 121-121: 4 adjacent parameters of 'scaledDotProductAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

🪛 markdownlint-cli2 (0.18.1)
mllm/models/deepseek_ocr/README.md

51-51: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-android
  • GitHub Check: build-macos

@@ -0,0 +1,89 @@
// Copyright (c) MLLM Team.
// Licensed under the MIT License.
#include "mllm/backends/cpu/kernels/arm/conv2d.hpp"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove unrelated conv2d include; include CPUArchHelper directly.

This header isn’t used here and caused a missing-header error in tools. Include the arch helper instead.

-#include "mllm/backends/cpu/kernels/arm/conv2d.hpp"
+#include "mllm/utils/CPUArchHelper.hpp"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include "mllm/backends/cpu/kernels/arm/conv2d.hpp"
#include "mllm/utils/CPUArchHelper.hpp"
🧰 Tools
🪛 Clang (14.0.6)

[error] 3-3: 'mllm/backends/cpu/kernels/arm/conv2d.hpp' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents
In mllm/backends/cpu/kernels/arm/layernorm2d.cpp around line 3, remove the
unrelated #include "mllm/backends/cpu/kernels/arm/conv2d.hpp" and replace it
with the CPU arch helper header; add an include for the CPUArchHelper header
(e.g. #include "mllm/backends/cpu/cpu_arch_helper.hpp") so the file uses the
correct helper instead of pulling in conv2d.hpp.

Comment on lines +41 to +67
// Pack data
switch (options_.impl_type) {
case aops::Conv2DOpImplType::kDefault: {
// We will do im2col algorithm when using default impl. We will packing weight here.
MLLM_INFO("Packing Conv2D weight to im2col format. kh={}, kw={}, pw={}, ph={}, dw={}, dh={}, sw={}, sh={}",
kernel_size[0], kernel_size[1], padding[0], padding[1], dilation[0], dilation[1], stride[0], stride[1]);
auto packed_weight = Tensor::empty(
{
options_.out_channels,
options_.in_channels * kernel_size[0] * kernel_size[1],

},
weight_.dtype(), weight_.device())
.alloc();
#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
arm::conv2d_fp32_im2col_weight(weight_.ptr<mllm_fp32_t>(), packed_weight.ptr<mllm_fp32_t>(), options_.out_channels,
options_.in_channels, kernel_size[0], kernel_size[1]);
#else
MLLM_ERROR_EXIT(ExitCode::kCoreError, "Unsupported architecture for packing conv2d weight into im2col format.");
#endif
weight_ = packed_weight;
break;
}
default: {
NYI("Unsupported impl type")
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

ARM-only weight packing; add baseline fallback or guard earlier.

Currently exits on non-ARM. If x86 builds are supported anywhere, provide a scalar packing path or guard the op factory registration to avoid selecting Conv2D on unsupported hosts.

-#else
-      MLLM_ERROR_EXIT(ExitCode::kCoreError, "Unsupported architecture for packing conv2d weight into im2col format.");
+#else
+      NYI("conv2d weight packing (im2col) not implemented on this host arch");
+      return;
 #endif
🤖 Prompt for AI Agents
In mllm/backends/cpu/ops/Conv2DOp.cpp around lines 41 to 67, the current weight
packing only has an ARM-specific path and calls MLLM_ERROR_EXIT for other
architectures; implement a non-ARM fallback or prevent registration on non-ARM
hosts. Fix by either (A) adding a portable scalar im2col weight-packing
implementation using simple nested loops to transform weight_ into the packed
layout and use that when not on ARM, or (B) move the architecture check earlier
and guard the Conv2D op factory/registration so this implementation is never
selected on non-ARM builds; choose one approach and remove the MLLM_ERROR_EXIT,
ensuring weight_ is set to the packed tensor in all compiled configurations.

Comment on lines 98 to 104
#if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
// step 1. im2col inputs to tmp
auto packed_inputs = Tensor::empty({MATMUL_K, MATMUL_N}, input.dtype(), input.device()).alloc();
arm::conv2d_fp32_im2col_input(input.ptr<mllm_fp32_t>(), options_.in_channels, input.shape()[1], input.shape()[2],
kernel_size[0], kernel_size[1], padding[0], padding[1], stride[0], stride[1],
dilation[0], dilation[1], packed_inputs.ptr<mllm_fp32_t>());
// step 2. Do matmul
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Bug: wrong H/W arguments and batch dimension ignored in im2col.

  • You pass input.shape()[1] (C) where H is expected.
  • Batch dimension B is ignored, producing wrong results for B > 1.
-          auto packed_inputs = Tensor::empty({MATMUL_K, MATMUL_N}, input.dtype(), input.device()).alloc();
-          arm::conv2d_fp32_im2col_input(input.ptr<mllm_fp32_t>(), options_.in_channels, input.shape()[1], input.shape()[2],
-                                        kernel_size[0], kernel_size[1], padding[0], padding[1], stride[0], stride[1],
-                                        dilation[0], dilation[1], packed_inputs.ptr<mllm_fp32_t>());
+          const int B = input.shape()[0];
+          auto packed_inputs =
+              Tensor::empty({MATMUL_K, B * MATMUL_N}, input.dtype(), input.device()).alloc();
+          const int in_h = input.shape()[2], in_w = input.shape()[3];
+          const int out_h = output.shape()[2], out_w = output.shape()[3];
+          const int in_plane = options_.in_channels * in_h * in_w;
+          for (int b = 0; b < B; ++b) {
+            const mllm_fp32_t* in_ptr = input.ptr<mllm_fp32_t>() + b * in_plane;
+            mllm_fp32_t* out_ptr = packed_inputs.ptr<mllm_fp32_t>() + b * (MATMUL_K * out_h * out_w);
+            arm::conv2d_fp32_im2col_input(in_ptr, options_.in_channels, in_h, in_w,
+                                          kernel_size[0], kernel_size[1],
+                                          padding[0], padding[1],
+                                          stride[0], stride[1],
+                                          dilation[0], dilation[1],
+                                          out_ptr);
+          }

🏁 Script executed:

#!/bin/bash
# Confirm conv2d_fp32_im2col_input signature (param order)
rg -nP 'conv2d_fp32_im2col_input\s*\(' -S -C3 mllm | sed -n '1,120p'

Length of output: 2469


I've confirmed the function signature. Now I need to understand the context around MATMUL_K, MATMUL_N, and the output tensor to fully verify the proposed fix.

Now let me check the actual code context to see how the packed_inputs are used after the im2col call:

Based on my verification:

Bug confirmed: Critical H/W parameter swap and batch dimension missing.

The review comment is correct. The function signature expects:

  • conv2d_fp32_im2col_input(input_data, channels, **height**, **width**, ...)

But the current call passes:

  • input.ptr(), options_.in_channels, input.shape()[1], input.shape()[2], ...

This passes channelsksizeksize (im2col expects (C, H, W) per image), but:

  • input.shape()[1] is C (channels), not H
  • input.shape()[2] is H (height), not W
  • Correct should be input.shape()[2] for H and input.shape()[3] for W

Additionally, batch processing requires looping over batch samples per the standard im2col pattern, which the current code omits entirely—only the first batch element is processed.

The proposed fix correctly addresses both issues with batch-aware tensor allocation and per-batch im2col calls.

🤖 Prompt for AI Agents
In mllm/backends/cpu/ops/Conv2DOp.cpp around lines 98–104, the im2col call
passes wrong spatial dims and omits batching: swap the height/width arguments so
you pass input.shape()[2] as height and input.shape()[3] as width (not
shape()[1] and shape()[2]), allocate the packed_inputs tensor to include the
batch dimension (size MATMUL_K per image times batch) and then loop over the
batch: for each batch index compute the per-image input pointer offset, call
conv2d_fp32_im2col_input with that per-image pointer and correct H/W, and write
the output into the corresponding slice of packed_inputs so subsequent matmul
uses all images.

Comment on lines +12 to +35
void CPULayerNorm2DOp::forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) {
auto& i = inputs[0];
auto& o = outputs[0];

auto i_shape = i.shape();
auto N = i_shape[0];
auto C = i_shape[1];
auto H = i_shape[2];
auto W = i_shape[3];

switch (i.dtype()) {
case kFloat32: {
#if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
NYI("Not impl for x86 64");
#elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
arm::layernorm2d_fp32(i.ptr<mllm_fp32_t>(), weight_.ptr<mllm_fp32_t>(), bias_.ptr<mllm_fp32_t>(), o.ptr<mllm_fp32_t>(), N,
C, H, W, options_.eps);
#endif
break;
}
default: {
NYI("Not support data type");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add shape/dtype checks and handle optional gamma/beta.

  • Enforce 4D NCHW inputs and matching output shape.
  • Guard against missing/empty weight_/bias_ by passing nullptrs.
  • Validate C matches options_.num_channels if set.
  • Keep ARM path; return NYI for others, but report clearer error.
 void CPULayerNorm2DOp::forward(const std::vector<Tensor>& inputs, std::vector<Tensor>& outputs) {
-  auto& i = inputs[0];
-  auto& o = outputs[0];
+  MLLM_CHECK(inputs.size() == 1 && outputs.size() == 1, "LayerNorm2D expects 1 input and 1 output");
+  const auto& i = inputs[0];
+  auto& o = outputs[0];
+  MLLM_CHECK(i.ndim() == 4, "LayerNorm2D expects 4D NCHW input");
+  MLLM_CHECK(o.shape() == i.shape(), "LayerNorm2D output must match input shape");
 
   auto i_shape = i.shape();
   auto N = i_shape[0];
   auto C = i_shape[1];
   auto H = i_shape[2];
   auto W = i_shape[3];
 
   switch (i.dtype()) {
     case kFloat32: {
 #if defined(MLLM_HOST_ARCH_X86_64) || defined(MLLM_HOST_ARCH_X86)
-      NYI("Not impl for x86 64");
+      NYI("LayerNorm2D kFloat32 is not implemented on x86 yet");
 #elif defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
-      arm::layernorm2d_fp32(i.ptr<mllm_fp32_t>(), weight_.ptr<mllm_fp32_t>(), bias_.ptr<mllm_fp32_t>(), o.ptr<mllm_fp32_t>(), N,
-                            C, H, W, options_.eps);
+      const mllm_fp32_t* gamma = weight_.empty() ? nullptr : weight_.ptr<mllm_fp32_t>();
+      const mllm_fp32_t* beta  = bias_.empty()   ? nullptr : bias_.ptr<mllm_fp32_t>();
+      MLLM_CHECK(options_.num_channels == 0 || options_.num_channels == C,
+                 "LayerNorm2D: options_.num_channels={} mismatches input C={}", options_.num_channels, C);
+      MLLM_CHECK(options_.eps > 0.f, "LayerNorm2D: eps must be > 0");
+      arm::layernorm2d_fp32(i.ptr<mllm_fp32_t>(), gamma, beta, o.ptr<mllm_fp32_t>(), N, C, H, W, options_.eps);
 #endif
       break;
     }
     default: {
       NYI("Not support data type");
     }
   }
 }

if (options_.bias) { bias_ = ploader->pull(getName() + ".bias"); }
break;
}
default: NYI("Unsupported model file version")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing semicolon after NYI macro in default case.

This will fail to compile if NYI doesn’t inject a trailing semicolon. Add it explicitly.

-    default: NYI("Unsupported model file version")
+    default: NYI("Unsupported model file version");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
default: NYI("Unsupported model file version")
default: NYI("Unsupported model file version");
🤖 Prompt for AI Agents
In mllm/core/aops/Conv2DOp.cpp around line 36, the default case uses
NYI("Unsupported model file version") without a trailing semicolon which can
cause a compile error if the NYI macro doesn't supply one; add an explicit
semicolon after the NYI(...) call in the default case so it reads as a complete
statement.

Comment on lines 11 to 17
#include <vector>
#include <unordered_map>

#include "mllm/models/ARGeneration.hpp"
#include "mllm/preprocessor/tokenizers/BPE.hpp"
#include "mllm/preprocessor/tokenizers/Unicode.hpp"
#include "mllm/preprocessor/tokenizers/AutoTokenizer.hpp"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include missing standard headers used by declarations.

Add and for std::string/int64_t.

 #include <vector>
 #include <unordered_map>
+#include <string>
+#include <cstdint>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <vector>
#include <unordered_map>
#include "mllm/models/ARGeneration.hpp"
#include "mllm/preprocessor/tokenizers/BPE.hpp"
#include "mllm/preprocessor/tokenizers/Unicode.hpp"
#include "mllm/preprocessor/tokenizers/AutoTokenizer.hpp"
#include <vector>
#include <unordered_map>
#include <string>
#include <cstdint>
#include "mllm/models/ARGeneration.hpp"
#include "mllm/preprocessor/tokenizers/BPE.hpp"
#include "mllm/preprocessor/tokenizers/Unicode.hpp"
#include "mllm/preprocessor/tokenizers/AutoTokenizer.hpp"
🧰 Tools
🪛 Clang (14.0.6)

[error] 11-11: 'vector' file not found

(clang-diagnostic-error)

🤖 Prompt for AI Agents
In mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp around lines 11 to 17,
the header is missing standard includes for types used in declarations; add
#include <string> and #include <cstdint> to the include block (place them
alongside <vector> and <unordered_map>) so std::string and int64_t are available
to the header.

Comment on lines 22 to 50
class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {
explicit DpskOcrTokenizer(const std::string& file_path) { preprocessor::initLocal(); }

std::vector<std::wstring> _tokenize(const std::string& str) override {
// TODO
return {};
}

std::vector<std::wstring> tokenize(const std::string& str) override {
// TODO
return {};
}

std::wstring _detokenize(int64_t pos_idx) override {
// TODO
return L"";
}

std::wstring detokenize(int64_t pos_idx) override {
// TODO
return _detokenize(pos_idx);
}

Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }

private:
// For text
preprocessor::BPE bpe_;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Constructor and overrides are private; add public:.

Without a public section, you can't instantiate the tokenizer and overrides become inaccessible.

-class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {
-  explicit DpskOcrTokenizer(const std::string& file_path) { preprocessor::initLocal(); }
+class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {
+ public:
+  explicit DpskOcrTokenizer(const std::string& file_path)
+      : mllm::preprocessor::AutoTokenizer(), bpe_() {
+    preprocessor::initLocal();
+    (void)file_path; // TODO: load BPE from file_path
+  }
@@
-  Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }
+  Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {
explicit DpskOcrTokenizer(const std::string& file_path) { preprocessor::initLocal(); }
std::vector<std::wstring> _tokenize(const std::string& str) override {
// TODO
return {};
}
std::vector<std::wstring> tokenize(const std::string& str) override {
// TODO
return {};
}
std::wstring _detokenize(int64_t pos_idx) override {
// TODO
return L"";
}
std::wstring detokenize(int64_t pos_idx) override {
// TODO
return _detokenize(pos_idx);
}
Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }
private:
// For text
preprocessor::BPE bpe_;
};
class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {
public:
explicit DpskOcrTokenizer(const std::string& file_path)
: mllm::preprocessor::AutoTokenizer(), bpe_() {
preprocessor::initLocal();
(void)file_path; // TODO: load BPE from file_path
}
std::vector<std::wstring> _tokenize(const std::string& str) override {
// TODO
return {};
}
std::vector<std::wstring> tokenize(const std::string& str) override {
// TODO
return {};
}
std::wstring _detokenize(int64_t pos_idx) override {
// TODO
return L"";
}
std::wstring detokenize(int64_t pos_idx) override {
// TODO
return _detokenize(pos_idx);
}
Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }
private:
// For text
preprocessor::BPE bpe_;
};
🧰 Tools
🪛 Clang (14.0.6)

[error] 23-23: constructor does not initialize these fields: bpe_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp around lines 22 to 50,
the class members (constructor and virtual override methods) are declared under
the default private section, making the tokenizer uninstantiable and overrides
inaccessible; move the constructor and all override methods
(DpskOcrTokenizer(const std::string&), _tokenize, tokenize, _detokenize,
detokenize, convert2Ids) under a public: section so they are publicly accessible
while keeping bpe_ private.

Comment on lines +121 to +128
Tensor scaledDotProductAttention(const Tensor& Q, const Tensor& K, const Tensor& V, const Tensor& mask) {
auto scale = Q.size(-1);
scale = (1.f / sqrtf(scale));
auto attn_weight = matmul(Q, K, false, true) * scale;
if (mask) { attn_weight = attn_weight + mask; }
attn_weight = softmax(attn_weight, -1);
return matmul(attn_weight, V);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Improve type clarity in scale computation.

The variable scale is reused for two different purposes (dimension size, then scaling factor), which hurts readability. Additionally, the implicit conversion from int32_t to float could be made explicit.

Apply this diff for clearer code:

 Tensor scaledDotProductAttention(const Tensor& Q, const Tensor& K, const Tensor& V, const Tensor& mask) {
-  auto scale = Q.size(-1);
-  scale = (1.f / sqrtf(scale));
+  const auto d_k = static_cast<float>(Q.size(-1));
+  const float scale = 1.0f / sqrtf(d_k);
   auto attn_weight = matmul(Q, K, false, true) * scale;
   if (mask) { attn_weight = attn_weight + mask; }
   attn_weight = softmax(attn_weight, -1);
   return matmul(attn_weight, V);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Tensor scaledDotProductAttention(const Tensor& Q, const Tensor& K, const Tensor& V, const Tensor& mask) {
auto scale = Q.size(-1);
scale = (1.f / sqrtf(scale));
auto attn_weight = matmul(Q, K, false, true) * scale;
if (mask) { attn_weight = attn_weight + mask; }
attn_weight = softmax(attn_weight, -1);
return matmul(attn_weight, V);
}
Tensor scaledDotProductAttention(const Tensor& Q, const Tensor& K, const Tensor& V, const Tensor& mask) {
const auto d_k = static_cast<float>(Q.size(-1));
const float scale = 1.0f / sqrtf(d_k);
auto attn_weight = matmul(Q, K, false, true) * scale;
if (mask) { attn_weight = attn_weight + mask; }
attn_weight = softmax(attn_weight, -1);
return matmul(attn_weight, V);
}
🧰 Tools
🪛 Clang (14.0.6)

[error] 121-121: 4 adjacent parameters of 'scaledDotProductAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

🤖 Prompt for AI Agents
In mllm/nn/Functional.cpp around lines 121 to 128, the variable `scale` is
reused first as a dimension and then as the floating-point scaling factor which
reduces readability and relies on implicit int-to-float conversion; change this
by introducing an integer dimension variable (e.g., `int dim = Q.size(-1);`) and
then compute a separate float scaling factor (e.g., `float scale = 1.f /
sqrtf(static_cast<float>(dim));`), and use that `scale` in the attention
computation; keep the rest of the logic unchanged.

Comment on lines +25 to +27
Tensor Conv2D::weight() const { return std::static_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp())->weight(); }

Tensor Conv2D::bias() const { return std::static_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp())->bias(); }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use dynamic cast and check the underlying op type.

Static cast can UB if impl() is not aops::Conv2DOp. Prefer dynamic cast with a clear check.

-Tensor Conv2D::weight() const { return std::static_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp())->weight(); }
-
-Tensor Conv2D::bias() const { return std::static_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp())->bias(); }
+Tensor Conv2D::weight() const {
+  auto op = std::dynamic_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp());
+  MLLM_CHECK(op, "Conv2D underlying op is not aops::Conv2DOp");
+  return op->weight();
+}
+
+Tensor Conv2D::bias() const {
+  auto op = std::dynamic_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp());
+  MLLM_CHECK(op, "Conv2D underlying op is not aops::Conv2DOp");
+  return op->bias();
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Tensor Conv2D::weight() const { return std::static_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp())->weight(); }
Tensor Conv2D::bias() const { return std::static_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp())->bias(); }
Tensor Conv2D::weight() const {
auto op = std::dynamic_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp());
MLLM_CHECK(op, "Conv2D underlying op is not aops::Conv2DOp");
return op->weight();
}
Tensor Conv2D::bias() const {
auto op = std::dynamic_pointer_cast<aops::Conv2DOp>(impl()->getInstancedOp());
MLLM_CHECK(op, "Conv2D underlying op is not aops::Conv2DOp");
return op->bias();
}
🤖 Prompt for AI Agents
In mllm/nn/layers/Conv2D.cpp around lines 25 to 27, the code uses
std::static_pointer_cast to convert impl()->getInstancedOp() to aops::Conv2DOp
which can cause undefined behavior if the underlying op is not that type;
replace the static cast with std::dynamic_pointer_cast<aops::Conv2DOp>, check
the resulting pointer for null, and handle the error (e.g., throw a descriptive
std::runtime_error or assert) before calling weight() or bias() so you only
dereference when the cast succeeds.

Comment on lines 44 to 48
template<typename T>
class ModuleLists;
class ModuleList;

template<typename T>
class ModuleListsSuffix;
class ModuleListSuffix;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the forward declaration mismatch.

The forward declaration at line 48 declares ModuleListSuffix, but the actual class definition at line 202 is named ModuleListSuffixed (with "ed" at the end). This naming inconsistency will cause compilation errors if the forward-declared type is referenced before its definition.

Apply this diff to align the forward declaration with the class name:

 template<typename T>
-class ModuleListSuffix;
+class ModuleListSuffixed;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
template<typename T>
class ModuleLists;
class ModuleList;
template<typename T>
class ModuleListsSuffix;
class ModuleListSuffix;
template<typename T>
class ModuleList;
template<typename T>
class ModuleListSuffixed;
🤖 Prompt for AI Agents
In mllm/nn/Module.hpp around lines 44 to 48, the forward declaration declares
ModuleListSuffix but the actual class is named ModuleListSuffixed; update the
forward declaration to match the real class name (declare template<typename T>
class ModuleListSuffixed;) so the forward declaration and definition are
consistent and avoid compilation errors.

…ng utilities

- Add `Conversation` class to manage prompt templates and conversation history
- Support multiple separator styles: DeepSeek, DeepSeekV2, PLAIN, ALIGNMENT
- Implement methods for generating prompts, converting to Gradio/OpenAI formats
- Add global registry for conversation templates with initialization function
- Include image loading functionality from conversation messages
- Refactor NoTPAttention module with proper tensor operations and attention mechanism
- Implement padding logic in Block class for window-based processing
- Add cubic interpolation and absolute position embedding resizing in ImageEncoderViT
… modes and padding strategies

Added comprehensive interpolation and padding support for CPU backend:
- InterpolateOp with nearest, linear, bilinear, bicubic, and trilinear modes
- PadOp with constant, reflect, replicate, and circular padding modes
- Updated CPU backend to register new operations
- Fixed conv2d weight packing and bias addition logic
- Added documentation for new functional APIs

The implementation supports various tensor dimensions and aligns with PyTorch-style behavior for consistent results.
- Implement `dynamic_preprocess` to generate image tiles with aspect ratio handling
- Add `Image::crop` with PIL-style out-of-bounds padding
- Introduce `ImageTransform` system with composable transforms (Resize, CenterCrop, etc.)
- Support common preprocessing pipelines via `BasicImageTransform`
- Enable thumbnail generation and grid-based tiling for OCR models

feat(tokenizer): remove unused header include

- Remove unnecessary `ARGeneration.hpp` include in tokenization header

chore(deepseek_ocr): add missing `<limits>` header

- Include `<limits>` for `std::numeric_limits` usage in aspect ratio calculation
- Replace `keep_aspect_ratio` option with `antialias` in interpolate operations
- Implement Gaussian blur utility for antialiasing in bilinear and bicubic modes
- Update documentation to reflect the new `antialias` parameter
- Refactor internal interpolate functions to use new API with antialiasing
- Fix incorrect OpType in InterpolateOp constructor
- Simplify position embedding interpolation using new functional API
- Remove custom padding logic in favor of `nn::functional::pad`
- Add `mlp_projector_linear_impl_type` to `DpskOcrConfig` for configuring
  the linear implementation type in MLP projector
- Update conversation preprocessing templates:
  - Replace `<|sft▁begin|>` with `<|sft begin|>`
  - Adjust separator tags and role markers in deepseek and deepseekv2 templates
- Refactor `dynamic_preprocess` to `dynamicPreprocess`:
  - Return both processed images and target aspect ratio
  - Improve tiling logic using candidate aspect ratios within min/max limits
  - Support optional thumbnail generation based on block count
- Introduce `MlpProjector` module for mapping visual tokens to text token space
- Include necessary headers (`<set>`, `<algorithm>`, `<cmath>`) in preprocessing header
- Implement `formatMessages` function to process conversation JSON with trimming
  and role/content extraction
- Add `DeepseekOCRForCausalLM` class with basic inference logic and image handling
- Update tokenizer implementation to support special tokens and proper tokenization
- Remove unused `<unordered_map>` include in tokenizer header
- Add tokenizers-cpp as a submodule under mllm/ext/vendors
- Add opencv-mobile as a submodule under mllm/ext/vendors
- Update .gitmodules to include the new submodules
- Set update policy to 'none' for tokenizers-cpp
- Add README.md to document the purpose of mllm extension
- Add CMakeLists.txt for deepseek_ocr example
- Register deepseek_ocr example in examples/CMakeLists.txt
- Add llvm-project as a submodule with none update strategy
- Update conversation preprocessing with corrected special tokens
- Register conversation templates with optional end tokens
- Initialize templates in model inference and log processed prompt
… model

- Implement `StackOp` for CPU backend with optimized memory copy for contiguous tensors
  and fallback for non-contiguous cases
- Register `StackOp` in IR and core op system with proper RTTI and factory integration
- Update DeepSeek-OCR model preprocessing to use `stack` for image tensor batching
- Add `stack` functional API in `nn::functional`
- Extend tokenizer utilities and string splitting helpers for better text processing
- Introduce image padding utility that mirrors PIL's ImageOps.pad behavior
- Add extension options in CMake for future LLVM and tokenizers-cpp integrations
- Update example inference call with grounding prompt for OCR task
- Integrate utfcpp library for proper UTF-8 handling
- Replace wide string-based tokenization with UTF-8 compatible logic
- Update tokenizer to inherit from AutoTokenizerUTF8
- Add BPEUTF8 class for byte-level BPE tokenization with UTF-8 support
- Modify preprocessing logic to handle Unicode characters correctly
- Update CMakeLists.txt to include utfcpp headers and install rules

This change enables the DeepSeek OCR model to correctly process Unicode text
inputs, including CJK characters and other multibyte UTF-8 sequences, improving
internationalization support.
- Remove tokenizers-cpp submodule from .gitmodules
- Add new tokenizers submodule pointing to meta-pytorch/tokenizers
- Update README.md to reflect the change in tokenizer dependency
- Update subproject commit reference for the new tokenizers module
…support

- Add `DpskOcrTokenizer` with byte-level BPE tokenization
- Support encoding/decoding using UTF-8 code points
- Integrate regex-based pre-tokenizer from GPT-2 standard
- Update `AutoTokenizerUTF8` interface to include encode/decode methods
- Refactor `BPEUTF8` to work with UTF-32 code point vectors
- Add unicode processing utilities from llama.cpp
- Enable meta-torch-tokenizers extension in CMake

fix(tokenizer): replace tokenize with encode in deepseek-ocr model

- Use `encode` instead of `tokenize` for proper id conversion
- Adjust sequence mask generation accordingly

build(cmake): add extension support and tokenizer options

- Introduce `MLLM_EXT_ENABLE` option
- Rename `MLLM_EXT_ENABLE_TOKENIZERS_CPP` to `MLLM_EXT_ENABLE_META_TORCH_TOKENIZERS`
- Add subdirectory for external components when enabled

test(tokenizer): add print statements for tokenizer debugging

- Print tokenization and encoding results for various inputs
- Include space, newline, and unicode characters

chore(fmt): add formatter for vector<string> and vector<int64_t>

- Enable formatted printing of string and int64_t vectors
- Support cleaner debug output in examples
… tokens

- Add new special tokens including `<image>`, `<|grounding|>`, `<tr>`, and `</tr>` to
  `DpskOcrTokenizer`
- Implement `TrieUTF8` to properly handle UTF-8 encoded special tokens in tokenization
- Update `tokenize()` and `decode()` methods to correctly process and detokenize strings
  with special tokens
- Remove outdated debug print statements from example code
- Modify `fmt::format_to` to avoid extra quotes around vector elements
- Refactor tokenization logic to integrate regex and byte-level processing within
  special token handling

This change enhances the DeepSeek OCR tokenizer's ability to work with complex input
sequences involving special tokens and improves overall string handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant