feat(deepseek-ocr): deepseek-ocr support(On working) #486

chenghuaWang · 2025-10-22T14:27:14Z

Implement Conv2DOp and LayerNorm2DOp for CPU backend
Add ARM NEON optimized kernels for conv2d and layernorm2d
Integrate new ops into IR and op registration system
Extend Tensor API with size() and flatten() methods
Update RTTI and OpTypes to support new operations

Summary by CodeRabbit

New Features
- Added Conv2D and LayerNorm2D operations with CPU backend support, including ARM-optimized kernels.
- Integrated DeepSeek OCR model with vision encoder components.
- Added tensor utility methods and scaled dot-product attention function.
Tests
- Added Conv2D kernel tests with reference implementation for validation.

- Implement Conv2DOp and LayerNorm2DOp for CPU backend - Add ARM NEON optimized kernels for conv2d and layernorm2d - Integrate new ops into IR and op registration system - Extend Tensor API with size() and flatten() methods - Update RTTI and OpTypes to support new operations

coderabbitai · 2025-10-22T14:27:43Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR adds Conv2D and LayerNorm2D operations to the CPU backend with ARM-optimized kernels, introduces DeepSeek OCR model architecture including vision encoders and tokenizers, extends the Tensor API with utility methods, integrates new operations into the IR/RTTI system, and provides comprehensive test coverage for new kernels.

Changes

Cohort / File(s)	Summary
Conv2D CPU Backend `mllm/backends/cpu/CPUBackend.cpp`, `mllm/backends/cpu/ops/Conv2DOp.hpp`, `mllm/backends/cpu/ops/Conv2DOp.cpp`	Registers CPU Conv2D operation factory and implements CPU Conv2D op with load/forward logic, im2col-based weight packing, and architecture-specific BLAS or custom matmul paths
Conv2D ARM Kernels `mllm/backends/cpu/kernels/arm/conv2d.hpp`, `mllm/backends/cpu/kernels/arm/conv2d.cpp`	Implements ARM-optimized NEON-accelerated im2col transformation functions for input and weight data with padding, stride, and dilation support
Conv2D Core Operations `mllm/core/aops/Conv2DOp.hpp`, `mllm/core/aops/Conv2DOp.cpp`	Defines Conv2D abstract operation with parameter loading, IR tracing, shape inference, and parameter export; includes options struct and impl type enum
Conv2D Neural Network Layer `mllm/nn/layers/Conv2D.hpp`, `mllm/nn/layers/Conv2D.cpp`	Wraps Conv2D aop as a reusable neural network layer with parameterized constructor and weight/bias accessors
LayerNorm2D CPU Backend `mllm/backends/cpu/CPUBackend.cpp`, `mllm/backends/cpu/ops/LayerNorm2DOp.hpp`, `mllm/backends/cpu/ops/LayerNorm2DOp.cpp`	Registers CPU LayerNorm2D operation factory and implements CPU LayerNorm2D op with architecture-conditional dispatch (ARM vs x86 NYI)
LayerNorm2D ARM Kernels `mllm/backends/cpu/kernels/arm/layernorm2d.hpp`, `mllm/backends/cpu/kernels/arm/layernorm2d.cpp`	Implements ARM-optimized NEON-accelerated 2D layer normalization with channel-wise mean/variance computation and fused affine transformation
LayerNorm2D Core Operations `mllm/core/aops/LayerNorm2DOp.hpp`, `mllm/core/aops/LayerNorm2DOp.cpp`	Defines LayerNorm2D abstract operation with parameter loading, IR tracing, shape-preserving reshape, and parameter export
LayerNorm2D Neural Network Layer `mllm/nn/layers/LayerNorm2D.hpp`, `mllm/nn/layers/LayerNorm2D.cpp`	Wraps LayerNorm2D aop as a reusable layer with multiple constructor variants
IR/RTTI Updates `mllm/compile/ir/GeneratedRTTIKind.hpp`, `mllm/compile/ir/NodeRTTIClassOfImpl.hpp`, `mllm/compile/ir/linalg/Op.hpp`, `mllm/compile/ir/linalg/Op.cpp`, `mllm/compile/ir/rtti_kind_gen.py`	Adds LayerNorm2DOp to IR node kinds, RTTI macros, and linalg op declarations; updates code generator
Core Type Definitions `mllm/core/OpTypes.hpp`	Adds kLayerNorm2D enum value (60) and string mapping
Kernel Include Centralization `mllm/backends/cpu/kernels/Kernels.hpp`	Exports Conv2D and LayerNorm2D ARM kernel headers
Tensor API Extensions `mllm/core/Tensor.hpp`, `mllm/core/Tensor.cpp`	Adds `size(int32_t id)` accessor for dimension lookup and `flatten(int32_t dim)` method for shape flattening with support for negative indices
Functional API Extensions `mllm/nn/Functional.hpp`, `mllm/nn/Functional.cpp`	Adds `scaledDotProductAttention()` computing Q·K^T scaled softmax·V with optional masking
Module Forward Declarations `mllm/nn/Module.hpp`	Renames ModuleLists/ModuleListsSuffix forward declarations to ModuleList/ModuleListSuffix (singular)
Neural Network Module Exports `mllm/nn/Nn.hpp`	Exports Conv2D.hpp and LayerNorm2D.hpp for public API
DeepSeek OCR Model Configuration `mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp`	Defines DpskOcrConfig struct with nested LanguageConfig, ProjectorConfig, VisionConfig; parses model hyperparameters, token IDs, layer counts, and init types from configuration data
DeepSeek OCR Vision Encoders `mllm/models/deepseek_ocr/deepencoder.hpp`	Implements CLIP-style vision embeddings, Vision Transformer components (attention, MLP blocks, transformer layers), and SAM-style image encoder with positional embeddings and relative position handling
DeepSeek OCR Tokenizer `mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp`	Defines DpskOcrTokenizer stub with BPE backend; tokenization methods currently TODO/placeholder
DeepSeek OCR Documentation `mllm/models/deepseek_ocr/README.md`, `mllm/models/deepseek_ocr/conversation.hpp`, `mllm/models/deepseek_ocr/modeling_deepseek_ocr.hpp`	Adds model README with branding/links and licensing headers
Conv2D Kernel Tests `tests/cpu/Conv2DKernelTest.hpp`, `tests/cpu/KernelTest.cpp`	Implements naive reference conv2d, Conv2DModule wrapper, and Conv2DKernelTest harness; adds im2col test case (224×224 input, 14×14 kernel, stride 14)
Test Infrastructure `tests/cpu/FlashAttentionKernelTest.hpp`, `tests/cpu/KernelTest.cpp`	Removes PrefixCache.hpp dependency; integrates Conv2D tests into main test launcher with RUN_ALL_TESTS and resource reporting

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CPUConv2DOp
    participant ARM Kernels
    participant BLAS
    participant Output

    User->>CPUConv2DOp: load(weights, bias)
    CPUConv2DOp->>ARM Kernels: conv2d_fp32_im2col_weight(src, packed_weights)
    ARM Kernels-->>CPUConv2DOp: packed_weights

    User->>CPUConv2DOp: forward(input)
    CPUConv2DOp->>ARM Kernels: conv2d_fp32_im2col_input(input → col_data)
    ARM Kernels-->>CPUConv2DOp: col_data
    
    CPUConv2DOp->>BLAS: matmul(packed_weights, col_data)
    BLAS-->>CPUConv2DOp: output_gemm
    
    CPUConv2DOp->>CPUConv2DOp: apply_bias(output_gemm)
    CPUConv2DOp-->>Output: final_output

sequenceDiagram
    participant Input
    participant LayerNorm2DOp
    participant ARM NEON
    participant Output

    Input->>LayerNorm2DOp: forward(x, weight, bias)
    Note over LayerNorm2DOp: Iterate per (batch, spatial_pos)
    
    LayerNorm2DOp->>ARM NEON: compute_mean_channels(x)
    ARM NEON-->>LayerNorm2DOp: mean_per_pos
    
    LayerNorm2DOp->>ARM NEON: compute_variance_channels(x, mean)
    ARM NEON-->>LayerNorm2DOp: variance_per_pos
    
    LayerNorm2DOp->>LayerNorm2DOp: inv_std = 1/sqrt(var + eps)
    
    LayerNorm2DOp->>ARM NEON: normalize_affine((x-mean)*inv_std*weight+bias)
    ARM NEON-->>LayerNorm2DOp: normalized
    
    LayerNorm2DOp-->>Output: y

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: This PR spans heterogeneous components requiring distinct reasoning paths: ARM NEON kernel optimizations with vectorized intrinsics (medium density), multi-layer operation integration across aops/backends/nn (structural consistency), DeepSeek model architecture with transformer blocks and attention mechanisms (high logic density), and IR/RTTI registration patterns (straightforward but multi-file). The change count exceeds 30 files with mixed complexity (simple includes, intricate kernels, architecture definitions). While many changes follow established patterns, the NEON optimizations, Conv2D im2col logic, LayerNorm2D channel-wise statistics, and Vision Transformer blocks require careful verification.

Suggested reviewers

oreomaker
yirongjie

Poem

🐰 Convolutions bloom in ARM's embrace,
LayerNorms dance through channels' grace,
DeepSeek's eyes see pixels whole,
Tensors flatten, sizes scroll—
The warren grows, computation flows!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The pull request title "feat(deepseek-ocr): deepseek-ocr support(On working)" is partially related to the changeset. While the PR does include DeepSeek OCR model support files (configuration, encoder, tokenizer), the primary technical changes are the implementation of Conv2DOp and LayerNorm2DOp operations for the CPU backend with ARM NEON optimization, along with IR integration, Tensor API extensions (size() and flatten()), and RTTI updates. These core infrastructure changes represent the main focus of the changeset and would benefit other models beyond deepseek-ocr. The title captures the use case but obscures the fundamental operational additions that enable that support.	Consider revising the title to emphasize the primary technical contribution, such as "feat: add Conv2D and LayerNorm2D operations with ARM NEON kernels and deepseek-ocr model support" or "feat(cpu): implement Conv2D and LayerNorm2D ops for deepseek-ocr". This would clarify that the PR introduces new operations to the backend infrastructure, with deepseek-ocr being the initial consumer model.
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.63% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chenghuaWang · 2025-10-22T14:27:57Z

@coderabbitai review

coderabbitai · 2025-10-22T14:28:11Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 30

🧹 Nitpick comments (32)

mllm/models/deepseek_ocr/modeling_deepseek_ocr.hpp (1)

1-2: Consider deferring empty placeholder files.

This file contains only the copyright header with no implementation. Since the PR is marked as "On working", consider deferring the commit of empty placeholder files until they contain actual code to reduce noise in the codebase.

mllm/models/deepseek_ocr/conversation.hpp (1)

1-2: Consider deferring empty placeholder files.

This file contains only the copyright header with no implementation. Consider deferring the commit of empty placeholder files until they contain actual code.

mllm/core/Tensor.hpp (2)

479-485: Consider improving the documentation for size().

The method declaration is fine, but the doc comment @brief is empty. Consider adding a description such as:
/**
 * @brief Gets the size of a specific dimension.
 *
 * @param id Dimension index (supports negative indexing)
 * @return int32_t Size of the specified dimension
 */
533-539: Consider improving the documentation for flatten().

The method declaration is fine and follows the existing pattern of using 0x7fffffff as a sentinel value (consistent with other methods in this file). However, the doc comment @brief is empty. Consider adding a description such as:
/**
 * @brief Flattens the tensor from the specified dimension onward.
 *
 * @param dim Starting dimension for flattening (default: flatten all dimensions)
 * @return Tensor Flattened tensor view
 */

mllm/nn/layers/LayerNorm2D.hpp (1)

17-17: Remove ineffective const on parameter and avoid magic number default.

Const on by-value param is noise, and 1e-6 should be a named default.

Apply this diff:
-  explicit LayerNorm2D(const int32_t num_channels, float eps = 1e-6);
+  static constexpr float kDefaultEps = 1e-6f;
+  explicit LayerNorm2D(int32_t num_channels, float eps = kDefaultEps);

mllm/backends/cpu/ops/LayerNorm2DOp.cpp (1)

4-4: Remove unused include.

isn’t used. Drop it to speed up builds slightly.
-#include <cstring>

mllm/nn/layers/Conv2D.hpp (2)

12-16: Prefer fixed-size containers for 2D parameters.

kernel_size/stride/padding/dilation are always 2D. Using std::array<int32_t, 2> conveys intent, avoids heap allocs, and prevents size bugs.

-  Conv2D(int32_t in_channels, int32_t out_channels, const std::vector<int32_t>& kernel_size,
-         const std::vector<int32_t>& stride_size, const std::vector<int32_t>& padding_size,
-         const std::vector<int32_t>& dilation_size, bool bias = true,
+  Conv2D(int32_t in_channels, int32_t out_channels, const std::array<int32_t,2>& kernel_size,
+         const std::array<int32_t,2>& stride_size, const std::array<int32_t,2>& padding_size,
+         const std::array<int32_t,2>& dilation_size, bool bias = true,
          aops::Conv2DOpImplType impl_type = aops::Conv2DOpImplType::kDefault);

19-21: Mark accessors noexcept.

These are non-throwing in normal operation; add noexcept to signal and enable minor optimizations.

-  [[nodiscard]] Tensor weight() const;
+  [[nodiscard]] Tensor weight() const noexcept;
 
-  [[nodiscard]] Tensor bias() const;
+  [[nodiscard]] Tensor bias() const noexcept;

mllm/backends/cpu/ops/Conv2DOp.cpp (3)

4-4: Remove unused include.

isn’t referenced.

-#include <cstring>

86-97: Out_h/out_w derived from output; compute/validate explicitly.

Use input + params to compute out_h/out_w and verify against output.shape() to prevent silent shape mismatches.

-          int MATMUL_M = options_.out_channels;
-          int MATMUL_K = options_.in_channels * kernel_size[0] * kernel_size[1];
-          int MATMUL_N = output.shape()[2] * output.shape()[3];
+          const int MATMUL_M = options_.out_channels;
+          const int MATMUL_K = options_.in_channels * kernel_size[0] * kernel_size[1];
+          const int in_h = input.shape()[2], in_w = input.shape()[3];
+          const int out_h = (in_h + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) / stride[0] + 1;
+          const int out_w = (in_w + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) / stride[1] + 1;
+          MLLM_CHECK(output.shape()[2] == out_h && output.shape()[3] == out_w, "Conv2D output shape mismatch");
+          int MATMUL_N = out_h * out_w; // batch handled below

105-126: Bias/arch paths fine; improve error messages and ensure batch-aware matmul.

After packing B batches, set MATMUL_N = B * out_h * out_w and pass that to matmul.
Minor: “Pls” → “Please”.

-            case aops::MatMulOpType::kBLAS: {
+            case aops::MatMulOpType::kBLAS: {
 #if defined(MLLM_USE_BLAS)
-              blas::matmul_fp32(weight_.ptr<mllm_fp32_t>(), packed_inputs.ptr<mllm_fp32_t>(), output.ptr<mllm_fp32_t>(),
-                                options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr, MATMUL_M, MATMUL_N, MATMUL_K, false, false);
+              blas::matmul_fp32(weight_.ptr<mllm_fp32_t>(), packed_inputs.ptr<mllm_fp32_t>(), output.ptr<mllm_fp32_t>(),
+                                options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr,
+                                MATMUL_M, /*N=*/B * MATMUL_N, MATMUL_K, false, false);
 #else
-              NYI("BLAS not supported. Pls set MLLM_USE_BLAS=ON to enable BLAS supports in cmake.");
+              NYI("BLAS not supported. Please set MLLM_USE_BLAS=ON to enable BLAS in CMake.");
 #endif
               break;
             }
             case aops::MatMulOpType::kMllmBlas: {
               auto thread_count = options_.getThreads();
 #if defined(MLLM_HOST_ARCH_ARM64) || defined(MLLM_HOST_ARCH_ARM)
-              arm::mllm_blas_matmul_fp32(MATMUL_M, MATMUL_K, MATMUL_N, output.ptr<mllm_fp32_t>(), weight_.ptr<mllm_fp32_t>(),
-                                         packed_inputs.ptr<mllm_fp32_t>(), options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr,
+              arm::mllm_blas_matmul_fp32(MATMUL_M, MATMUL_K, /*N=*/B * MATMUL_N,
+                                         output.ptr<mllm_fp32_t>(), weight_.ptr<mllm_fp32_t>(),
+                                         packed_inputs.ptr<mllm_fp32_t>(),
+                                         options_.bias ? bias_.ptr<mllm_fp32_t>() : nullptr,
                                          false, false, thread_count);

mllm/backends/cpu/ops/Conv2DOp.hpp (1)

20-25: Minor: mark factory method noexcept/final.

Creation shouldn’t throw; also mark final for clarity.

-class CPUConv2DOpFactory : public TypedOpFactory<OpTypes::kConv2D, aops::Conv2DOpOptions> {
+class CPUConv2DOpFactory final : public TypedOpFactory<OpTypes::kConv2D, aops::Conv2DOpOptions> {
  public:
-  std::shared_ptr<BaseOp> createOpImpl(const aops::Conv2DOpOptions& options) override {
+  std::shared_ptr<BaseOp> createOpImpl(const aops::Conv2DOpOptions& options) noexcept override {
     return std::make_shared<CPUConv2DOp>(options);
   }
 };

mllm/backends/cpu/kernels/arm/conv2d.hpp (1)

21-31: Parameter types and const/noexcept polish.

Use fixed-width ints for dims and add noexcept; keeps ABI explicit and hints compiler.

-void conv2d_fp32_im2col_input(const float* input_data, const int channels, const int height, const int width,
-                              const int kernel_h, const int kernel_w, const int pad_h, const int pad_w, const int stride_h,
-                              const int stride_w, const int dilation_h, const int dilation_w, float* col_data);
+void conv2d_fp32_im2col_input(const float* input_data,
+                              int32_t channels, int32_t height, int32_t width,
+                              int32_t kernel_h, int32_t kernel_w,
+                              int32_t pad_h, int32_t pad_w,
+                              int32_t stride_h, int32_t stride_w,
+                              int32_t dilation_h, int32_t dilation_w,
+                              float* col_data) noexcept;
@@
-void conv2d_fp32_im2col_weight(const float* src_weight, float* packed_weight, int out_channels, int in_channels, int kernel_h,
-                               int kernel_w);
+void conv2d_fp32_im2col_weight(const float* src_weight, float* packed_weight,
+                               int32_t out_channels, int32_t in_channels,
+                               int32_t kernel_h, int32_t kernel_w) noexcept;

mllm/core/aops/LayerNorm2DOp.hpp (2)

11-14: Initialize options and avoid magic number.

Give num_channels a safe default and name eps.

-struct LayerNorm2DOpOptions : public BaseOpOptions<LayerNorm2DOpOptions> {
-  int32_t num_channels;
-  float eps = 1e-6;
-};
+struct LayerNorm2DOpOptions : public BaseOpOptions<LayerNorm2DOpOptions> {
+  static constexpr float kDefaultEps = 1e-6f;
+  int32_t num_channels{-1};
+  float eps{kDefaultEps};
+};

32-36: Add const accessors to avoid unintended mutation.

Expose const refs alongside mutable ones.

-  inline Tensor& weight() { return weight_; }
-  inline Tensor& bias() { return bias_; }
-  inline LayerNorm2DOpOptions& options() { return options_; }
+  inline Tensor& weight() { return weight_; }
+  inline const Tensor& weight() const { return weight_; }
+  inline Tensor& bias() { return bias_; }
+  inline const Tensor& bias() const { return bias_; }
+  inline LayerNorm2DOpOptions& options() { return options_; }
+  inline const LayerNorm2DOpOptions& options() const { return options_; }

mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp (1)

45-50: Initialize BPE explicitly (clarity).

Make construction explicit and consider wiring file_path when implemented.
-  preprocessor::BPE bpe_;
+  preprocessor::BPE bpe_{};

mllm/backends/cpu/kernels/arm/layernorm2d.cpp (2)

60-79: Optional: readability of loop index 'c'.

Declare 'int c = 0;' once before NEON sections to avoid reusing scope across separated blocks.
-#if defined(__ARM_NEON)
-      float32x4_t inv_std_vec = vdupq_n_f32(inv_std);
-      c = 0;
+#if defined(__ARM_NEON)
+      int c = 0;
+      float32x4_t inv_std_vec = vdupq_n_f32(inv_std);
11-85: Optional: aliasing/perf hints.

If your toolchain supports it, annotate pointers with restrict (project macro) and consider a fused mean/var pass to reduce memory traffic. Not required for functionality.

mllm/core/aops/LayerNorm2DOp.cpp (2)

16-21: Validate pulled tensors before view; enforce channel count.

Protect against missing params and mismatched sizes before reshaping.

 void LayerNorm2DOp::load(const ParameterFile::ptr_t& ploader) {
-  weight_ = ploader->pull(getName() + ".weight");
-  weight_ = weight_.view({options_.num_channels});
-  bias_ = ploader->pull(getName() + ".bias");
-  bias_ = bias_.view({options_.num_channels});
+  weight_ = ploader->pull(getName() + ".weight");
+  bias_   = ploader->pull(getName() + ".bias");
+
+  if (!weight_) { MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing parameter: {}.weight", getName()); }
+  if (!bias_)   { MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing parameter: {}.bias", getName()); }
+
+  MLLM_RT_ASSERT_EQ(weight_.numel(), options_.num_channels);
+  MLLM_RT_ASSERT_EQ(bias_.numel(),   options_.num_channels);
+  weight_ = weight_.view({options_.num_channels});
+  bias_   = bias_.view({options_.num_channels});
 }

49-54: Avoid pushing empty params.

Skip pushing absent tensors.

-  p->push(getName() + ".weight", weight_);
-  p->push(getName() + ".bias", bias_);
+  if (weight_) p->push(getName() + ".weight", weight_);
+  if (bias_)   p->push(getName() + ".bias", bias_);

mllm/core/aops/Conv2DOp.cpp (3)

52-53: Remove unused local _op.

Avoid unused-variable warnings.

-  auto _op = ir_ctx->create<ir::linalg::Conv2DOp>(shared_from_this(), i_irs, o_irs);
+  ir_ctx->create<ir::linalg::Conv2DOp>(shared_from_this(), i_irs, o_irs);

63-68: Unreachable fallback after fatal error macro.

MLLM_ERROR_EXIT likely terminates; fallback tensor creation is dead code. Prefer one behavior.

-  if (ishape.size() != 4) {
-    MLLM_ERROR_EXIT(ExitCode::kCoreError, "Conv2DOp expects 4D input, got {} D", ishape.size());
-    outputs.emplace_back(Tensor::empty(i.shape(), i.dtype(), i.device()));
-    return;
-  }
+  if (ishape.size() != 4) {
+    MLLM_ERROR_EXIT(ExitCode::kCoreError, "Conv2DOp expects 4D input, got {} D", ishape.size());
+    return;
+  }

106-111: Skip pushing absent bias.

Avoid writing an empty bias tensor when options_.bias is true but bias_ wasn’t loaded.

-  p->push(getName() + ".weight", weight_);
-  if (options_.bias) { p->push(getName() + ".bias", bias_); }
+  if (weight_) p->push(getName() + ".weight", weight_);
+  if (options_.bias && bias_) { p->push(getName() + ".bias", bias_); }

tests/cpu/Conv2DKernelTest.hpp (4)

13-16: Reduce swappable-parameter risk in naive_conv2d.

Bundle shape/stride/pad/dilation into a struct to avoid call‑site mix‑ups and simplify the signature.
-void naive_conv2d(const float* input_data, const float* weight_data, const float* bias_data, float* output_data,
-                  int in_channels, int in_h, int in_w, int out_channels, int kernel_h, int kernel_w, int pad_h, int pad_w,
-                  int stride_h, int stride_w, int dilation_h, int dilation_w) {
+struct Conv2DArgs {
+  int in_channels, in_h, in_w, out_channels;
+  int kernel_h, kernel_w, pad_h, pad_w, stride_h, stride_w, dilation_h, dilation_w;
+};
+void naive_conv2d(const float* input_data, const float* weight_data, const float* bias_data, float* output_data,
+                  const Conv2DArgs& a) {
…and replace uses (in_channels → a.in_channels, etc.). Update the sole call site accordingly.

53-57: Avoid partially-initialized module; remove default ctor.

Conv2DModule() isn’t used and leaves conv2d_ unconfigured. Delete it to prevent accidental misuse.
-class Conv2DModule : public nn::Module {
+class Conv2DModule : public nn::Module {
   nn::Conv2D conv2d_;
 
  public:
-  Conv2DModule() = default;
+  Conv2DModule() = delete;
86-94: Don’t push bias when unused.

Keep test inputs aligned with op options.
-    param->push("emb.bias", bias_param);
+    if (bias) { param->push("emb.bias", bias_param); }
95-103: Add deterministic seed for reproducibility.

Reduce flaky diffs across runs.
-    auto input = Tensor::random({1, in_channel, I_H, I_W}, -1, 1, kFloat32, kCPU);
+    test::setRandomSeed(42);
+    auto input = Tensor::random({1, in_channel, I_H, I_W}, -1, 1, kFloat32, kCPU);
If test::setRandomSeed isn’t available, use the project’s equivalent or seed the RNG used by Tensor::random.

mllm/core/aops/Conv2DOp.hpp (1)

60-65: Offer const accessors to preserve invariants.

Expose read-only views by default; keep mutable ones only if needed.

-  inline Tensor& weight() { return weight_; }
-  inline Tensor& bias() { return bias_; }
-  inline Conv2DOpOptions& options() { return options_; }
+  inline const Tensor& weight() const { return weight_; }
+  inline const Tensor& bias()   const { return bias_; }
+  inline const Conv2DOpOptions& options() const { return options_; }
+  inline Tensor& weight() { return weight_; }   // keep if mutation needed
+  inline Tensor& bias()   { return bias_; }     // keep if mutation needed
+  inline Conv2DOpOptions& options() { return options_; }

mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp (1)

14-99: Harden config parsing with defaults/guards.

Several reads assume presence; add contains() checks or helper getters with defaults to avoid exceptions on missing keys.

I can draft a small get_or<T>(json, key, default) helper to wrap contains/is_null and casting across the repeated fields. Do you want that?

mllm/models/deepseek_ocr/deepencoder.hpp (3)

185-189: Avoid hard-coded 1024; use embed_dim_.

Keeps code future-proof if CLIP dims change.
-    auto class_embeds = class_embedding_.weight().view({1, 1, 1024});
+    auto class_embeds = class_embedding_.weight().view({1, 1, embed_dim_});
404-423: Reserved identifier: rename __interpolateLinear1d.

Leading double underscore is reserved; also update call site.
-  Tensor __interpolateLinear1d(const Tensor& input, int output_size) {
+  Tensor interpolateLinear1d(const Tensor& input, int output_size) {
@@
-      rel_pos_resized = __interpolateLinear1d(rel_pos, max_rel_dist);
+      rel_pos_resized = interpolateLinear1d(rel_pos, max_rel_dist);
Also applies to: 431-433

64-79: Prefer member-initializers and constants/config instead of magic numbers.

Move embed/image/patch sizes to compile-time constants or DpskOcrConfig; initialize members in the ctor initializer list.
-  CLIPVisionEmbeddings(const std::string& name, const DpskOcrConfig& config) : nn::Module(name) {
-    embed_dim_ = 1024;
-    image_size_ = 224;
-    patch_size_ = 14;
+  CLIPVisionEmbeddings(const std::string& name, const DpskOcrConfig& config)
+      : nn::Module(name),
+        embed_dim_(/*config.clip_embed_dim*/1024),
+        image_size_(/*config.clip_image_size*/224),
+        patch_size_(/*config.clip_patch*/14) {
If config already carries these, wire them instead of literals. Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6c000e and 74a989b.

📒 Files selected for processing (39)

mllm/backends/cpu/CPUBackend.cpp (2 hunks)
mllm/backends/cpu/kernels/Kernels.hpp (1 hunks)
mllm/backends/cpu/kernels/arm/conv2d.cpp (1 hunks)
mllm/backends/cpu/kernels/arm/conv2d.hpp (1 hunks)
mllm/backends/cpu/kernels/arm/layernorm2d.cpp (1 hunks)
mllm/backends/cpu/kernels/arm/layernorm2d.hpp (1 hunks)
mllm/backends/cpu/ops/Conv2DOp.cpp (1 hunks)
mllm/backends/cpu/ops/Conv2DOp.hpp (1 hunks)
mllm/backends/cpu/ops/LayerNorm2DOp.cpp (1 hunks)
mllm/backends/cpu/ops/LayerNorm2DOp.hpp (1 hunks)
mllm/compile/ir/GeneratedRTTIKind.hpp (2 hunks)
mllm/compile/ir/NodeRTTIClassOfImpl.hpp (2 hunks)
mllm/compile/ir/linalg/Op.cpp (1 hunks)
mllm/compile/ir/linalg/Op.hpp (2 hunks)
mllm/compile/ir/rtti_kind_gen.py (1 hunks)
mllm/core/OpTypes.hpp (2 hunks)
mllm/core/Tensor.cpp (2 hunks)
mllm/core/Tensor.hpp (2 hunks)
mllm/core/aops/Conv2DOp.cpp (1 hunks)
mllm/core/aops/Conv2DOp.hpp (1 hunks)
mllm/core/aops/LayerNorm2DOp.cpp (1 hunks)
mllm/core/aops/LayerNorm2DOp.hpp (1 hunks)
mllm/models/deepseek_ocr/README.md (1 hunks)
mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp (1 hunks)
mllm/models/deepseek_ocr/conversation.hpp (1 hunks)
mllm/models/deepseek_ocr/deepencoder.hpp (1 hunks)
mllm/models/deepseek_ocr/modeling_deepseek_ocr.hpp (1 hunks)
mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp (1 hunks)
mllm/nn/Functional.cpp (1 hunks)
mllm/nn/Functional.hpp (1 hunks)
mllm/nn/Module.hpp (1 hunks)
mllm/nn/Nn.hpp (2 hunks)
mllm/nn/layers/Conv2D.cpp (1 hunks)
mllm/nn/layers/Conv2D.hpp (1 hunks)
mllm/nn/layers/LayerNorm2D.cpp (1 hunks)
mllm/nn/layers/LayerNorm2D.hpp (1 hunks)
tests/cpu/Conv2DKernelTest.hpp (1 hunks)
tests/cpu/FlashAttentionKernelTest.hpp (0 hunks)
tests/cpu/KernelTest.cpp (1 hunks)

💤 Files with no reviewable changes (1)

tests/cpu/FlashAttentionKernelTest.hpp

🧰 Additional context used

🪛 Clang (14.0.6)

mllm/nn/layers/LayerNorm2D.hpp

[error] 6-6: 'mllm/nn/Layer.hpp' file not found

(clang-diagnostic-error)

[error] 17-17: parameter 'num_channels' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions

(readability-avoid-const-params-in-decls,-warnings-as-errors)

[error] 17-17: 1e-6 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/layernorm2d.hpp

[error] 6-6: 'mllm/utils/CPUArchHelper.hpp' file not found

(clang-diagnostic-error)

mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp

[error] 11-11: 'vector' file not found

(clang-diagnostic-error)

[error] 23-23: constructor does not initialize these fields: bpe_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

mllm/nn/layers/Conv2D.hpp

[error] 3-3: 'mllm/nn/Layer.hpp' file not found

(clang-diagnostic-error)

mllm/backends/cpu/kernels/arm/conv2d.cpp

[error] 3-3: 'mllm/backends/cpu/kernels/arm/conv2d.hpp' file not found

(clang-diagnostic-error)

mllm/backends/cpu/ops/LayerNorm2DOp.cpp

[error] 4-4: 'cstring' file not found

(clang-diagnostic-error)

mllm/nn/layers/LayerNorm2D.cpp

[error] 4-4: 'mllm/core/aops/LayerNorm2DOp.hpp' file not found

(clang-diagnostic-error)

mllm/core/aops/LayerNorm2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)

[error] 11-11: constructor does not initialize these fields: num_channels

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 13-13: 1e-6 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 39-39: member variable 'weight_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 40-40: member variable 'bias_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 41-41: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/backends/cpu/ops/Conv2DOp.cpp

[error] 4-4: 'cstring' file not found

(clang-diagnostic-error)

mllm/backends/cpu/kernels/arm/layernorm2d.cpp

[error] 3-3: 'mllm/backends/cpu/kernels/arm/conv2d.hpp' file not found

(clang-diagnostic-error)

mllm/compile/ir/linalg/Op.hpp

[error] 217-217: parameter name 'p' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 217-217: class 'LayerNorm2DOp' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

mllm/backends/cpu/ops/LayerNorm2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)

mllm/nn/layers/Conv2D.cpp

[error] 1-1: 'mllm/nn/layers/Conv2D.hpp' file not found

(clang-diagnostic-error)

mllm/core/aops/Conv2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)

[error] 15-15: constructor does not initialize these fields: in_channels, out_channels, kernel_size, stride, padding, dilation

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 29-29: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 39-39: variable name 'it' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 67-67: member variable 'weight_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 68-68: member variable 'bias_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 69-69: member variable 'options_' has protected visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

mllm/core/Tensor.hpp

[error] 539-539: 0x7fffffff is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/backends/cpu/kernels/arm/conv2d.hpp

[error] 6-6: 'mllm/utils/CPUArchHelper.hpp' file not found

(clang-diagnostic-error)

mllm/backends/cpu/ops/Conv2DOp.hpp

[error] 6-6: 'mllm/core/BaseOp.hpp' file not found

(clang-diagnostic-error)

mllm/models/deepseek_ocr/deepencoder.hpp

[error] 5-5: 'cmath' file not found

(clang-diagnostic-error)

[error] 51-51: constructor does not initialize these fields: embed_dim_, image_size_, patch_size_, class_embedding_, patch_embedding_, num_patches_, num_positions_, position_embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 64-64: constructor does not initialize these fields: class_embedding_, patch_embedding_, position_embedding_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 64-64: 2 adjacent parameters of 'CLIPVisionEmbeddings' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 65-65: 'embed_dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 65-65: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 66-66: 'image_size_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 66-66: 224 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 67-67: 'patch_size_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 67-67: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 68-68: 'num_patches_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 69-69: 'num_positions_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 81-81: 2 adjacent parameters of 'getAbsPos' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 103-103: variable 'channels' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 108-108: parameter name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 109-109: variable name 'a' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 109-109: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 111-111: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 112-112: 2.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 112-112: 3.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 112-112: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 113-113: do not use 'else' after 'return'

(readability-else-after-return,-warnings-as-errors)

[error] 113-113: 2.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 113-113: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 114-114: 5.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 114-114: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 114-114: 8.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 114-114: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 114-114: 4.0f is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 114-114: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 116-116: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 120-120: variable 'scale_y' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 121-121: variable 'scale_x' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 123-123: loop variable name 'c' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)

[error] 124-124: variable 'src_channel_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 125-125: variable 'dst_channel_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 163-163: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 169-169: repeated branch in conditional chain

(bugprone-branch-clone,-warnings-as-errors)

[error] 194-194: constructor does not initialize these fields: fc1_, fc2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 202-202: constructor does not initialize these fields: fc1_, fc2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 202-202: 4 adjacent parameters of 'NoTPFeedForward' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 208-208: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 213-213: constructor does not initialize these fields: num_heads_, n_local_heads_, head_dim_, max_seq_len_, qkv_proj_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 224-224: constructor does not initialize these fields: qkv_proj_, out_proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 224-224: 2 adjacent parameters of 'NoTPAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 225-225: 'num_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 225-225: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 226-226: 'n_local_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 226-226: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 227-227: 'head_dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 227-227: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 227-227: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 228-228: 'max_seq_len_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 228-228: 256 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 234-234: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 236-236: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 256-256: constructor does not initialize these fields: n_heads_, dim_, head_dim_, layer_norm1_, layer_norm2_, layer_id_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 266-266: member variable 'layer_id_' has public visibility

(cppcoreguidelines-non-private-member-variables-in-classes,-warnings-as-errors)

[error] 270-270: constructor does not initialize these fields: layer_norm1_, layer_norm2_, layer_id_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 270-270: 2 adjacent parameters of 'NoTPTransformerBlock' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 271-271: 'n_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 271-271: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 272-272: 'dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 272-272: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 273-273: 'head_dim_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 273-273: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 273-273: 16 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 280-280: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 281-281: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 283-283: variable name 'h' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 289-289: constructor does not initialize these fields: num_layers_, layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 296-296: constructor does not initialize these fields: layers_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 296-296: 2 adjacent parameters of 'NoTPTransformer' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 297-297: 'num_layers_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 297-297: 24 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 302-302: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 310-310: constructor does not initialize these fields: pre_layernorm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 318-318: constructor does not initialize these fields: pre_layernorm_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 318-318: 2 adjacent parameters of 'VitModel' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 327-327: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 328-328: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 341-341: constructor does not initialize these fields: proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 347-347: constructor does not initialize these fields: proj_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 347-347: 2 adjacent parameters of 'PatchEmbed' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 352-352: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 353-353: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 359-359: constructor does not initialize these fields: lin1_, lin2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 367-367: constructor does not initialize these fields: lin1_, lin2_, act_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 367-367: 4 adjacent parameters of 'MLPBlock' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 373-373: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 378-378: constructor does not initialize these fields: num_heads_, use_rel_pos_, qkv_, proj_, rel_pos_h_, rel_pos_w_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 390-390: constructor does not initialize these fields: qkv_, proj_, rel_pos_h_, rel_pos_w_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 390-390: 3 adjacent parameters of 'Attention' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 391-391: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 393-393: 'num_heads_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 394-394: 'use_rel_pos_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 404-404: declaration uses identifier '__interpolateLinear1d', which is a reserved identifier

(bugprone-reserved-identifier,-warnings-as-errors)

[error] 404-404: 2 adjacent parameters of '__interpolateLinear1d' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 406-406: variable 'input_size' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 407-407: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)

[error] 410-410: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)

[error] 411-411: variable name 'x0' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 412-412: variable 'x1' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 412-412: variable name 'x1' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 413-413: variable name 'w1' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 413-413: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)

[error] 414-414: variable name 'w0' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 414-414: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 426-426: 2 adjacent parameters of 'getRelPos' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 428-428: variable 'rel_pos_resized' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 430-430: if with identical then and else branches

(bugprone-branch-clone,-warnings-as-errors)

[error] 439-439: variable 'q_scale' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 440-440: variable 'k_scale' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 446-446: narrowing conversion from 'int' to 'float'

(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions,-warnings-as-errors)

[error] 447-447: variable 'embedding_dim' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 452-452: variable 'relative_coord_float' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 453-453: variable 'relative_coord_long' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 458-458: loop variable name 'd' is too short, expected at least 2 characters

(readability-identifier-length,-warnings-as-errors)

[error] 470-470: 2 adjacent parameters of 'addDecomposedRelPos' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 470-470: 2 adjacent parameters of 'addDecomposedRelPos' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 475-475: variable name 'Rh' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 475-475: invalid case style for variable 'Rh'

(readability-identifier-naming,-warnings-as-errors)

[error] 476-476: variable name 'Rw' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 476-476: invalid case style for variable 'Rw'

(readability-identifier-naming,-warnings-as-errors)

[error] 478-478: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 478-478: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)

[error] 487-487: variable 'r_q_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 488-488: variable 'Rh_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 488-488: invalid case style for variable 'Rh_ptr'

(readability-identifier-naming,-warnings-as-errors)

[error] 489-489: variable 'rel_h_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 508-508: variable 'r_q_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 509-509: variable 'Rw_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 509-509: invalid case style for variable 'Rw_ptr'

(readability-identifier-naming,-warnings-as-errors)

[error] 510-510: variable 'rel_w_ptr' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 535-535: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 536-536: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 537-537: variable name 'B' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 537-537: invalid case style for variable 'B'

(readability-identifier-naming,-warnings-as-errors)

[error] 538-538: variable name 'H' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 538-538: invalid case style for variable 'H'

(readability-identifier-naming,-warnings-as-errors)

[error] 539-539: variable name 'W' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 539-539: invalid case style for variable 'W'

(readability-identifier-naming,-warnings-as-errors)

[error] 580-580: constructor does not initialize these fields: norm1_, norm2_, window_size_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 590-590: constructor does not initialize these fields: norm1_, norm2_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 590-590: 4 adjacent parameters of 'Block' of convertible types are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 591-591: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 599-599: 'window_size_' should be initialized in a member initializer of the constructor

(cppcoreguidelines-prefer-member-initializer,-warnings-as-errors)

[error] 662-662: constructor does not initialize these fields: blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 668-668: constructor does not initialize these fields: blocks_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 668-668: 4 adjacent parameters of 'Blocks' of similar type are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 671-671: variable 'is_in' is not initialized

(cppcoreguidelines-init-variables,-warnings-as-errors)

[error] 672-672: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 681-681: constructor does not initialize these fields: pos_embed_, neck_, net_2_, net_3_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 692-692: constructor does not initialize these fields: pos_embed_, neck_, net_2_, net_3_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 692-692: 2 adjacent parameters of 'ImageEncoderViT' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 719-719: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 720-720: variable name 'x' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

mllm/core/aops/LayerNorm2DOp.cpp

[error] 4-4: 'mllm/core/aops/LayerNorm2DOp.hpp' file not found

(clang-diagnostic-error)

mllm/core/aops/Conv2DOp.cpp

[error] 4-4: 'mllm/core/aops/Conv2DOp.hpp' file not found

(clang-diagnostic-error)

tests/cpu/Conv2DKernelTest.hpp

[error] 3-3: 'unordered_map' file not found

(clang-diagnostic-error)

[error] 13-13: function 'naive_conv2d' has cognitive complexity of 33 (threshold 25)

(readability-function-cognitive-complexity,-warnings-as-errors)

[error] 13-13: 2 adjacent parameters of 'naive_conv2d' of similar type ('const float *') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 14-14: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 14-14: 3 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 14-14: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 14-14: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 15-15: 2 adjacent parameters of 'naive_conv2d' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 21-21: floating point literal has suffix 'f', which is not uppercase

(readability-uppercase-literal-suffix,-warnings-as-errors)

[error] 25-25: variable name 'iy' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 26-26: variable name 'ix' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 32-32: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)

[error] 32-32: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)

[error] 38-38: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)

[error] 41-41: do not use pointer arithmetic

(cppcoreguidelines-pro-bounds-pointer-arithmetic,-warnings-as-errors)

[error] 47-47: constructor does not initialize these fields: conv2d_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 53-53: constructor does not initialize these fields: conv2d_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 53-53: 3 adjacent parameters of 'Conv2DModule' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 53-53: 2 adjacent parameters of 'Conv2DModule' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 53-53: 2 adjacent parameters of 'Conv2DModule' of similar type ('int') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 59-59: 2 adjacent parameters of 'forward' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

[error] 65-65: class 'Conv2DKernelTest' defines a default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator

(cppcoreguidelines-special-member-functions,-warnings-as-errors)

[error] 70-70: method 'testConv2DOnce' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 73-73: invalid case style for variable 'I_H'

(readability-identifier-naming,-warnings-as-errors)

[error] 74-74: invalid case style for variable 'I_W'

(readability-identifier-naming,-warnings-as-errors)

[error] 75-75: invalid case style for variable 'K_H'

(readability-identifier-naming,-warnings-as-errors)

[error] 76-76: invalid case style for variable 'K_W'

(readability-identifier-naming,-warnings-as-errors)

[error] 77-77: invalid case style for variable 'S_H'

(readability-identifier-naming,-warnings-as-errors)

[error] 78-78: invalid case style for variable 'S_W'

(readability-identifier-naming,-warnings-as-errors)

[error] 79-79: invalid case style for variable 'P_H'

(readability-identifier-naming,-warnings-as-errors)

[error] 80-80: invalid case style for variable 'P_W'

(readability-identifier-naming,-warnings-as-errors)

[error] 98-98: variable name 'GT' is too short, expected at least 3 characters

(readability-identifier-length,-warnings-as-errors)

[error] 98-98: invalid case style for variable 'GT'

(readability-identifier-naming,-warnings-as-errors)

[error] 107-107: redundant boolean literal in conditional return statement

(readability-simplify-boolean-expr,-warnings-as-errors)

[error] 113-113: method 'testConv2D' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

[error] 113-113: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

mllm/compile/ir/NodeRTTIClassOfImpl.hpp

[error] 186-186: function-like macro 'RTTI_RK_OP_LINALGIROP_LAYERNORM2DOP_IMPL' used; consider a 'constexpr' template function

(cppcoreguidelines-macro-usage,-warnings-as-errors)

mllm/models/deepseek_ocr/configuration_deepseek_ocr.hpp

[error] 5-5: 'mllm/core/aops/LinearOp.hpp' file not found

(clang-diagnostic-error)

[error] 11-11: constructor does not initialize these fields: clip_linear_impl_type, sam_linear_impl_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 14-14: constructor does not initialize these fields: clip_linear_impl_type, sam_linear_impl_type

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

[error] 139-139: 3.7362 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

tests/cpu/KernelTest.cpp

[error] 873-873: all parameters should be named in a function

(readability-named-parameter,-warnings-as-errors)

[error] 876-876: 1024 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 877-877: 224 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 878-878: 224 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 879-879: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 880-880: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 881-881: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

[error] 882-882: 14 is a magic number; consider replacing it with a named constant

(cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers,-warnings-as-errors)

mllm/nn/Functional.cpp

[error] 121-121: 4 adjacent parameters of 'scaledDotProductAttention' of similar type ('const int &') are easily swapped by mistake

(bugprone-easily-swappable-parameters,-warnings-as-errors)

🪛 markdownlint-cli2 (0.18.1)

mllm/models/deepseek_ocr/README.md

51-51: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-android
GitHub Check: build-macos

mllm/backends/cpu/kernels/arm/layernorm2d.cpp

mllm/backends/cpu/ops/Conv2DOp.cpp

mllm/backends/cpu/ops/LayerNorm2DOp.cpp

mllm/core/aops/Conv2DOp.cpp

mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp

coderabbitai · 2025-10-22T14:37:25Z

mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp

+class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {
+  explicit DpskOcrTokenizer(const std::string& file_path) { preprocessor::initLocal(); }
+
+  std::vector<std::wstring> _tokenize(const std::string& str) override {
+    // TODO
+    return {};
+  }
+
+  std::vector<std::wstring> tokenize(const std::string& str) override {
+    // TODO
+    return {};
+  }
+
+  std::wstring _detokenize(int64_t pos_idx) override {
+    // TODO
+    return L"";
+  }
+
+  std::wstring detokenize(int64_t pos_idx) override {
+    // TODO
+    return _detokenize(pos_idx);
+  }
+
+  Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }
+
+ private:
+  // For text
+  preprocessor::BPE bpe_;
+};


⚠️ Potential issue | 🔴 Critical

Constructor and overrides are private; add public:.

Without a public section, you can't instantiate the tokenizer and overrides become inaccessible.

-class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer { - explicit DpskOcrTokenizer(const std::string& file_path) { preprocessor::initLocal(); } +class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer { + public: + explicit DpskOcrTokenizer(const std::string& file_path) + : mllm::preprocessor::AutoTokenizer(), bpe_() { + preprocessor::initLocal(); + (void)file_path; // TODO: load BPE from file_path + } @@ - Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); } + Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {

explicit DpskOcrTokenizer(const std::string& file_path) { preprocessor::initLocal(); }

std::vector<std::wstring> _tokenize(const std::string& str) override {

// TODO

return {};

}

std::vector<std::wstring> tokenize(const std::string& str) override {

// TODO

return {};

}

std::wstring _detokenize(int64_t pos_idx) override {

// TODO

return L"";

}

std::wstring detokenize(int64_t pos_idx) override {

// TODO

return _detokenize(pos_idx);

}

Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }

private:

// For text

preprocessor::BPE bpe_;

};

class DpskOcrTokenizer final : public mllm::preprocessor::AutoTokenizer {

public:

explicit DpskOcrTokenizer(const std::string& file_path)

: mllm::preprocessor::AutoTokenizer(), bpe_() {

preprocessor::initLocal();

(void)file_path; // TODO: load BPE from file_path

}

std::vector<std::wstring> _tokenize(const std::string& str) override {

// TODO

return {};

}

std::vector<std::wstring> tokenize(const std::string& str) override {

// TODO

return {};

}

std::wstring _detokenize(int64_t pos_idx) override {

// TODO

return L"";

}

std::wstring detokenize(int64_t pos_idx) override {

// TODO

return _detokenize(pos_idx);

}

Tensor convert2Ids(const std::vector<std::wstring>& strs) override { return Tensor::nil(); }

private:

// For text

preprocessor::BPE bpe_;

};

🧰 Tools

🪛 Clang (14.0.6)

[error] 23-23: constructor does not initialize these fields: bpe_

(cppcoreguidelines-pro-type-member-init,-warnings-as-errors)

🤖 Prompt for AI Agents

In mllm/models/deepseek_ocr/tokenization_deepseek_ocr.hpp around lines 22 to 50, the class members (constructor and virtual override methods) are declared under the default private section, making the tokenizer uninstantiable and overrides inaccessible; move the constructor and all override methods (DpskOcrTokenizer(const std::string&), _tokenize, tokenize, _detokenize, detokenize, convert2Ids) under a public: section so they are publicly accessible while keeping bpe_ private.

mllm/nn/Functional.cpp

mllm/nn/layers/Conv2D.cpp

coderabbitai · 2025-10-22T14:37:25Z

mllm/nn/Module.hpp

 template<typename T>
-class ModuleLists;
+class ModuleList;

 template<typename T>
-class ModuleListsSuffix;
+class ModuleListSuffix;


⚠️ Potential issue | 🔴 Critical

Fix the forward declaration mismatch.

The forward declaration at line 48 declares ModuleListSuffix, but the actual class definition at line 202 is named ModuleListSuffixed (with "ed" at the end). This naming inconsistency will cause compilation errors if the forward-declared type is referenced before its definition.

Apply this diff to align the forward declaration with the class name:

template<typename T> -class ModuleListSuffix; +class ModuleListSuffixed;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

template<typename T>

class ModuleLists;

class ModuleList;

template<typename T>

class ModuleListsSuffix;

class ModuleListSuffix;

template<typename T>

class ModuleList;

template<typename T>

class ModuleListSuffixed;

🤖 Prompt for AI Agents

In mllm/nn/Module.hpp around lines 44 to 48, the forward declaration declares ModuleListSuffix but the actual class is named ModuleListSuffixed; update the forward declaration to match the real class name (declare template<typename T> class ModuleListSuffixed;) so the forward declaration and definition are consistent and avoid compilation errors.

…ng utilities - Add `Conversation` class to manage prompt templates and conversation history - Support multiple separator styles: DeepSeek, DeepSeekV2, PLAIN, ALIGNMENT - Implement methods for generating prompts, converting to Gradio/OpenAI formats - Add global registry for conversation templates with initialization function - Include image loading functionality from conversation messages - Refactor NoTPAttention module with proper tensor operations and attention mechanism - Implement padding logic in Block class for window-based processing - Add cubic interpolation and absolute position embedding resizing in ImageEncoderViT

… modes and padding strategies Added comprehensive interpolation and padding support for CPU backend: - InterpolateOp with nearest, linear, bilinear, bicubic, and trilinear modes - PadOp with constant, reflect, replicate, and circular padding modes - Updated CPU backend to register new operations - Fixed conv2d weight packing and bias addition logic - Added documentation for new functional APIs The implementation supports various tensor dimensions and aligns with PyTorch-style behavior for consistent results.

- Implement `dynamic_preprocess` to generate image tiles with aspect ratio handling - Add `Image::crop` with PIL-style out-of-bounds padding - Introduce `ImageTransform` system with composable transforms (Resize, CenterCrop, etc.) - Support common preprocessing pipelines via `BasicImageTransform` - Enable thumbnail generation and grid-based tiling for OCR models feat(tokenizer): remove unused header include - Remove unnecessary `ARGeneration.hpp` include in tokenization header chore(deepseek_ocr): add missing `<limits>` header - Include `<limits>` for `std::numeric_limits` usage in aspect ratio calculation

- Replace `keep_aspect_ratio` option with `antialias` in interpolate operations - Implement Gaussian blur utility for antialiasing in bilinear and bicubic modes - Update documentation to reflect the new `antialias` parameter - Refactor internal interpolate functions to use new API with antialiasing - Fix incorrect OpType in InterpolateOp constructor - Simplify position embedding interpolation using new functional API - Remove custom padding logic in favor of `nn::functional::pad`

- Add `mlp_projector_linear_impl_type` to `DpskOcrConfig` for configuring the linear implementation type in MLP projector - Update conversation preprocessing templates: - Replace `<｜sft▁begin｜>` with `<｜sft begin｜>` - Adjust separator tags and role markers in deepseek and deepseekv2 templates - Refactor `dynamic_preprocess` to `dynamicPreprocess`: - Return both processed images and target aspect ratio - Improve tiling logic using candidate aspect ratios within min/max limits - Support optional thumbnail generation based on block count - Introduce `MlpProjector` module for mapping visual tokens to text token space - Include necessary headers (`<set>`, `<algorithm>`, `<cmath>`) in preprocessing header

- Implement `formatMessages` function to process conversation JSON with trimming and role/content extraction - Add `DeepseekOCRForCausalLM` class with basic inference logic and image handling - Update tokenizer implementation to support special tokens and proper tokenization - Remove unused `<unordered_map>` include in tokenizer header

- Add tokenizers-cpp as a submodule under mllm/ext/vendors - Add opencv-mobile as a submodule under mllm/ext/vendors - Update .gitmodules to include the new submodules - Set update policy to 'none' for tokenizers-cpp - Add README.md to document the purpose of mllm extension - Add CMakeLists.txt for deepseek_ocr example - Register deepseek_ocr example in examples/CMakeLists.txt

- Add llvm-project as a submodule with none update strategy - Update conversation preprocessing with corrected special tokens - Register conversation templates with optional end tokens - Initialize templates in model inference and log processed prompt

… model - Implement `StackOp` for CPU backend with optimized memory copy for contiguous tensors and fallback for non-contiguous cases - Register `StackOp` in IR and core op system with proper RTTI and factory integration - Update DeepSeek-OCR model preprocessing to use `stack` for image tensor batching - Add `stack` functional API in `nn::functional` - Extend tokenizer utilities and string splitting helpers for better text processing - Introduce image padding utility that mirrors PIL's ImageOps.pad behavior - Add extension options in CMake for future LLVM and tokenizers-cpp integrations - Update example inference call with grounding prompt for OCR task

- Integrate utfcpp library for proper UTF-8 handling - Replace wide string-based tokenization with UTF-8 compatible logic - Update tokenizer to inherit from AutoTokenizerUTF8 - Add BPEUTF8 class for byte-level BPE tokenization with UTF-8 support - Modify preprocessing logic to handle Unicode characters correctly - Update CMakeLists.txt to include utfcpp headers and install rules This change enables the DeepSeek OCR model to correctly process Unicode text inputs, including CJK characters and other multibyte UTF-8 sequences, improving internationalization support.

- Remove tokenizers-cpp submodule from .gitmodules - Add new tokenizers submodule pointing to meta-pytorch/tokenizers - Update README.md to reflect the change in tokenizer dependency - Update subproject commit reference for the new tokenizers module

…support - Add `DpskOcrTokenizer` with byte-level BPE tokenization - Support encoding/decoding using UTF-8 code points - Integrate regex-based pre-tokenizer from GPT-2 standard - Update `AutoTokenizerUTF8` interface to include encode/decode methods - Refactor `BPEUTF8` to work with UTF-32 code point vectors - Add unicode processing utilities from llama.cpp - Enable meta-torch-tokenizers extension in CMake fix(tokenizer): replace tokenize with encode in deepseek-ocr model - Use `encode` instead of `tokenize` for proper id conversion - Adjust sequence mask generation accordingly build(cmake): add extension support and tokenizer options - Introduce `MLLM_EXT_ENABLE` option - Rename `MLLM_EXT_ENABLE_TOKENIZERS_CPP` to `MLLM_EXT_ENABLE_META_TORCH_TOKENIZERS` - Add subdirectory for external components when enabled test(tokenizer): add print statements for tokenizer debugging - Print tokenization and encoding results for various inputs - Include space, newline, and unicode characters chore(fmt): add formatter for vector<string> and vector<int64_t> - Enable formatted printing of string and int64_t vectors - Support cleaner debug output in examples

… tokens - Add new special tokens including `<image>`, `<|grounding|>`, `<tr>`, and `</tr>` to `DpskOcrTokenizer` - Implement `TrieUTF8` to properly handle UTF-8 encoded special tokens in tokenization - Update `tokenize()` and `decode()` methods to correctly process and detokenize strings with special tokens - Remove outdated debug print statements from example code - Modify `fmt::format_to` to avoid extra quotes around vector elements - Refactor tokenization logic to integrate regex and byte-level processing within special token handling This change enhances the DeepSeek OCR tokenizer's ability to work with complex input sequences involving special tokens and improves overall string handling.

…nfig - Introduce `DpskOcrConfig` for model configuration management - Update `DeepseekOCRForCausalLM` constructor to accept config object - Load model weights using `mllm::load` with version specification - Simplify example main function by removing debug prints and exits Also includes: - Add `Tensor::item()` method for scalar access - Improve formatting for vector and tuple types in logging - Set default values for linear implementation types in config - Register module parameters with full names for better tracing - Fix neck Conv2D output channels from 12 to 256 - Add cast2fp32 quantization pipeline support - Handle None quant_cfg in QuantizeSolver methods - Enable streaming quantization when no config path is provided

- Add CPUMaskedScatterOp and its factory to CPUBackend - Implement forward logic with broadcasting support for masked scatter - Register the new op in the CPU op factory list - Update RTTI and IR definitions to include MaskedScatterOp - Add necessary headers and update model usage in DeepseekOCR - Expose maskedScatter function in nn::functional API

…ules - Introduce `DeepseekV2MLP` for feed-forward operations with configurable linear implementations - Add `MoEGate` for computing gating scores in mixture-of-experts layers - Implement `DeepseekV2MoE` to support routing inputs to multiple expert networks - Update configuration to include `llm_mlp_linear_impl_type` for MLP layer customization - Include `<optional>` header for optional parameters in module constructors fix(cpu): replace incorrect conv2d header with CPUArchHelper - Corrected misplaced header include in layernorm2d.cpp to properly support ARM architecture checks

fix(Conv2DOp): add missing semicolon and improve bias registration logic feat(Conv2DOp): add runtime assertions for kernel, stride, padding, and dilation sizes fix(LayerNorm2DOp): improve parameter registration with shared init region fix(deepseek_ocr): move batch_size declaration after pixel_values assignment fix(deepseek_ocr): correct height_crop_num indexing in spatial crop usage feat(Tensor): implement clone method for tensor copying

Rename the contribution guidelines file from markdown to reStructuredText format. Also move and rename ArgSortOp files to contribute documentation. docs(cpu_backend): add fa2_radix_paged to index Include fa2_radix_paged documentation in the CPU backend index. Move related files to appropriate locations. feat(examples): remove debug print statements Remove unnecessary model printing and early return in deepseek_ocr example. Enable actual inference execution in the example.

feat(pad): simplify shape copy in PadOp feat(deepseek_ocr): add layer index to Attention and Block modules feat(deepseek_ocr): reshape qkv tensors explicitly in Attention feat(deepseek_ocr): enable rel_pos tensor repeat for cpu backend compatibility feat(deepseek_ocr): initialize layer_idx_ in Blocks module refactor(deepseek_ocr): remove debug print statements in Block module fix(deepseek_ocr): annotate batch size assumption as TODO in CLIPVisionEmbeddings

- Add fast path for softmax when axis is -1 (last dimension) - Support both fp32 and fp16 data types with architecture-specific implementations - Enable parallel execution based on thread options - Handle negative axis indexing correctly fix(tensor): correct unsqueeze dimension calculation - Adjust negative dimension indexing logic in unsqueeze operation - Add 1 to dim adjustment to match expected behavior feat(deepseek-ocr): improve vision embeddings and attention mechanisms - Update position_ids registration with proper view shape - Fix tensor slicing in CLIPVisionEmbeddings by using squeezed tensor - Remove batch size assertion and support dynamic batch sizes - Replace class embedding expansion with repeat operation - Add contiguous operations in attention modules for performance - Use view instead of reshape in attention output processing - Optimize MoE final output computation by adjusting sum operation order - Add temporary contiguous calls to work around performance issues in concat operations - Clean up unnecessary comments and fix tensor indexing in feature concatenation

- Added `MLLM_TRACY_ENABLE` option in CMakeLists.txt for enabling Tracy profiler - Introduced new quantization configuration file for DeepSeek OCR model using KAI quantization methods - Updated Conv2DOp to correctly handle output tensor indexing during BLAS operations - Added support for configurable linear implementation types in DeepSeek OCR model - Improved image resizing with bicubic interpolation and better bounds checking - Fixed scaling calculation in scaled dot-product attention using standard sqrtf - Minor code cleanup and debugging prints added in DeepSeek OCR encoder The changes span across build configuration, model quantization, CPU backend ops, model definitions, and image preprocessing utilities. The addition of Tracy profiler support provides enhanced performance analysis capabilities. The quantization config enables more efficient model inference through k-ai quantization techniques. CPU convolution fixes ensure proper tensor handling. Model updates allow flexible linear layer implementations. Image processing improvements offer better quality and safety.

- Switch model loading path from w32a32 to w4a8-i8mm-kai variant - Remove debug print statements and exit calls in encoder and model files - Adjust maskedScatter call to unsqueeze images_seq_mask for proper tensor shape - Replace fmt::print with std::cout for token decoding output - Clean up unused kai quantization rules in quantization config file

… paths - Replace hardcoded paths with command-line arguments for model, tokenizer, and config - Update include paths to use angle brackets instead of quotes - Adjust image and output paths for better portability - Add help option to display usage information

… initialization - Reformat `makeRotaryPosEmbedding` function signature to improve readability - Initialize `eos_token_id_` from config in DeepseekOCRForCausalLM constructor

chenghuaWang requested review from oreomaker and yirongjie as code owners October 22, 2025 14:27

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

chenghuaWang added 2 commits October 23, 2025 10:53

chenghuaWang requested review from liang1232018 and xumengwei as code owners October 23, 2025 06:52

chenghuaWang added 19 commits October 23, 2025 16:23

chenghuaWang and others added 4 commits October 28, 2025 22:43

refactor(deepseek_ocr): reformat function signature and add eos token…

960b9eb

… initialization - Reformat `makeRotaryPosEmbedding` function signature to improve readability - Initialize `eos_token_id_` from config in DeepseekOCRForCausalLM constructor

Merge branch 'v2' into v2

6b4f17d

chenghuaWang merged commit f8eec9a into UbiquitousLearning:v2 Oct 29, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(deepseek-ocr): deepseek-ocr support(On working) #486

feat(deepseek-ocr): deepseek-ocr support(On working) #486

Uh oh!

chenghuaWang commented Oct 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 22, 2025 •

edited

Loading

Review skipped

Uh oh!

chenghuaWang commented Oct 22, 2025

Uh oh!

coderabbitai bot commented Oct 22, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat(deepseek-ocr): deepseek-ocr support(On working) #486

feat(deepseek-ocr): deepseek-ocr support(On working) #486

Uh oh!

Conversation

chenghuaWang commented Oct 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

chenghuaWang commented Oct 22, 2025

Uh oh!

coderabbitai bot commented Oct 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chenghuaWang commented Oct 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2025 •

edited

Loading