powerful framework ，do we support transformers safetensors weight format and config.json restore new model? #29

mullerhai · 2026-01-06T03:09:42Z

mullerhai
Jan 6, 2026

I think a crucial step now is to first download the model in safetensors format and the config.json. We then need to implement the interpretation of the config.json to restore the model's architecture and reassign the weights to their corresponding model layers. These are all very critical steps.

Adversing · 2026-01-06T09:34:32Z

Adversing
Jan 6, 2026
Maintainer

Hello, thank you for your interest in Brain4J!

Brain4J currently supports ONNX format for model interoperability (see onnx.proto in brain4j-core). We also have a dedicated brain4j-llm module with architecture adapters (like GPT2Adapter) that handle model loading and inference.

Regarding safetensors + config.json support: this is not natively supported yet, but we recognize its value for Hugging Face ecosystem compatibility. If you have a specific model architecture in mind (GPT-2, LLaMA, etc.), please let us know. We could prioritize that adapter. PRs are also welcome if you'd like to contribute!
For now, ONNX might be a viable workaround since most HF models can be exported to ONNX format.

0 replies

mullerhai · 2026-01-06T09:50:40Z

mullerhai
Jan 6, 2026
Author

Hello, thank you for your interest in Brain4J!

Brain4J currently supports ONNX format for model interoperability (see onnx.proto in brain4j-core). We also have a dedicated brain4j-llm module with architecture adapters (like GPT2Adapter) that handle model loading and inference.

Regarding safetensors + config.json support: this is not natively supported yet, but we recognize its value for Hugging Face ecosystem compatibility. If you have a specific model architecture in mind (GPT-2, LLaMA, etc.), please let us know. We could prioritize that adapter. PRs are also welcome if you'd like to contribute! For now, ONNX might be a viable workaround since most HF models can be exported to ONNX format.

Thank you very much for your reply. I think brain4j is a very vibrant deep learning project, especially when I discovered that we have the ability to test model adapters for safetensors, which is a very forward-thinking attempt. I also very much look forward to Java having a place in future large models.

Relatively speaking, the TorchScript and ONNX exported from Python seem to only contain the weights, lacking the model structure. I am very much looking forward to us being able to do large model fine-tuning on the Java side in the future, so I am very much looking forward to the combination of safetensors and config.json being able to implement a retrainable model structure on the Java side.

Transformers does support ONNX, but we have also encountered some problems. Especially, some large model structures in Transformers are complex and contain dynamic layers, which leads to failed ONNX exports. For example, https://huggingface.co/Aratako/T5Gemma-TTS-2b-2b, this model is very difficult to deal with. I tried several times but failed to export it. If you are willing to try, you can give it a shot.

I think the most crucial thing is if we can replicate the logic in the Transformers library that restores the model structure from config.json on the Java side, that would be perfect. Brain4j has already implemented some common layers, but PyTorch seems to have over 100 different layers. I think in the future, Brain4j should support the development of more layers. Also, it would be very helpful if the brain4j math module could support reading and writing common Python data formats like numpy, pickle, hdf5, and imdb.

0 replies

xEcho1337 · 2026-01-06T11:47:00Z

xEcho1337
Jan 6, 2026
Maintainer

At the moment, modern LLMs are not supported mainly due to the lack of a RoPE implementation, which is a hard requirement for most recent transformer architectures.

In parallel, we are focusing on fixing a few critical issues in the GPU backend. GPU stability and correctness are currently a higher priority than adding new model architectures.

Once these foundations are solid, brain4j-llm may expand to support multiple LLM architectures and potentially fine-tuning, but this is not on the immediate roadmap.

Regarding Python model/data formats: they are tightly coupled to the Python ecosystem and often rely on implementation details that are not portable across languages. Because of this, direct support is not currently planned. (as of now)

0 replies

mullerhai · 2026-01-06T12:39:52Z

mullerhai
Jan 6, 2026
Author

At the moment, modern LLMs are not supported mainly due to the lack of a RoPE implementation, which is a hard requirement for most recent transformer architectures.

In parallel, we are focusing on fixing a few critical issues in the GPU backend. GPU stability and correctness are currently a higher priority than adding new model architectures.

Once these foundations are solid, brain4j-llm may expand to support multiple LLM architectures and potentially fine-tuning, but this is not on the immediate roadmap.

Regarding Python model/data formats: they are tightly coupled to the Python ecosystem and often rely on implementation details that are not portable across languages. Because of this, direct support is not currently planned. (as of now)

Brain4j is very impressive and has already implemented many features. It would be a great experience if we could write CUDA kernel functions in Java in the future. Actually, there are already Java, Scala 3, and Kotlin solutions for reading and writing numpy, pickle, and hdf5 files, which we can find and reference. I am quite optimistic about Brain4j, but for future enterprise-level deployment, many more modules need to be developed, such as support for NCCL or Gloo for distributed training, support for mixed precision, support for AOT, support for more tensor operators, and so on. The engineering effort for these is enormous, and it will take a great deal of effort to become a modern deep learning framework with high barriers to entry.

0 replies

xEcho1337 · 2026-01-06T13:53:01Z

xEcho1337
Jan 6, 2026
Maintainer

Brain4j is very impressive and has already implemented many features. It would be a great experience if we could write CUDA kernel functions in Java in the future. Actually, there are already Java, Scala 3, and Kotlin solutions for reading and writing numpy, pickle, and hdf5 files, which we can find and reference. I am quite optimistic about Brain4j, but for future enterprise-level deployment, many more modules need to be developed, such as support for NCCL or Gloo for distributed training, support for mixed precision, support for AOT, support for more tensor operators, and so on. The engineering effort for these is enormous, and it will take a great deal of effort to become a modern deep learning framework with high barriers to entry.

Could you be more specific about how you would want to write CUDA kernels directly from Java? In the spare time I'm experimenting with a possible future backend that targets CUDA on NVIDIA, Metal on macOS or and OpenCL elsewhere. The idea is to rely on a common kernel language, such as Slang, to avoid rewriting 3 times the same kernels.

You are also right about interpretability: there are a few JVM solutions for NumPy and pickle (and possibly HDF5 as well). The main reason why we haven't adopted them so far is the deliberate choice to keep the core JAR small, portable and fully controllable. That said, this doesn't exclude optional or modular integrations on the future.

Regarding enterprise level features, I agree that these are essential for a modern framework, but the engineering effort is enormous. Brain4J is currently developed mostly by me and a friend of mine, so progress is much slower than common ML frameworks, which are often developed and mantained by full time developer teams.

2 replies

mullerhai Jan 7, 2026
Author

Brain4j is very impressive and has already implemented many features. It would be a great experience if we could write CUDA kernel functions in Java in the future. Actually, there are already Java, Scala 3, and Kotlin solutions for reading and writing numpy, pickle, and hdf5 files, which we can find and reference. I am quite optimistic about Brain4j, but for future enterprise-level deployment, many more modules need to be developed, such as support for NCCL or Gloo for distributed training, support for mixed precision, support for AOT, support for more tensor operators, and so on. The engineering effort for these is enormous, and it will take a great deal of effort to become a modern deep learning framework with high barriers to entry.

Could you be more specific about how you would want to write CUDA kernels directly from Java? In the spare time I'm experimenting with a possible future backend that targets CUDA on NVIDIA, Metal on macOS or and OpenCL elsewhere. The idea is to rely on a common kernel language, such as Slang, to avoid rewriting 3 times the same kernels.

You are also right about interpretability: there are a few JVM solutions for NumPy and pickle (and possibly HDF5 as well). The main reason why we haven't adopted them so far is the deliberate choice to keep the core JAR small, portable and fully controllable. That said, this doesn't exclude optional or modular integrations on the future.

Regarding enterprise level features, I agree that these are essential for a modern framework, but the engineering effort is enormous. Brain4J is currently developed mostly by me and a friend of mine, so progress is much slower than common ML frameworks, which are often developed and mantained by full time developer teams.

brain4j is remarkable, having already implemented many features. Writing CUDA kernel functions in Java is already achievable using javacpp-cuda. Additionally, there's javacpp-pytorch, which has essentially fully-compiled C++ libtorch, now supporting CUDA NCCL and Gloo for distributed training, mixed precision, and AOT JIT model inference, with support for FlashAttention and all layer implementations. It is already a fully enterprise-grade, trustworthy project. We have already implemented and passed acceptance tests for a complete PyTorch on Java & Scala3 lesson and a graph neural network framework, PyTorch Geometric on Java, as well as a near-complete implementation of PyTorch RL on Java. We have also implemented solutions for LLM training and inference, as well as for computer vision and recommendation systems. javacpp-pytorch is more like a giant that will bring a massive disruptive storm. It essentially possesses all the features of modern deep learning frameworks, though it still has some imperfections, perhaps which brain4j can fill. Furthermore, storch, developed by sbrunk, is a Scala3 binding based on javacpp-pytorch. brain4j has made many bold attempts, and they all inspire each other. At the very least, brain4j has given me great confidence in using safetensors and config.json. In the future, it will be possible to meet the requirements of fine-tuning large models on the Java side.

xEcho1337 Jan 8, 2026
Maintainer

I think we may be approaching this from slightly different perspectives. Brain4J didn’t start as an attempt to replicate or compete with large production frameworks like PyTorch. It originally began as a learning and research-oriented project, with the goal of understanding and implementing core ML and DL concepts end-to-end, without the abstraction layers and complexity that naturally arise in very large frameworks. Because of that, Brain4J prioritizes simplicity, transparency, and full JVM-native integration over feature completeness. Being fully written in Java also makes it significantly lighter and easier to embed in JVM applications compared to large native bindings. Frameworks like PyTorch (and their Java bindings) clearly excel at large-scale, enterprise-grade workloads. Brain4J instead aims to be a compact, native reference for experimentation, education, and exploring alternative designs within the JVM ecosystem. That said, Brain4J has been steadily evolving over time and is gradually becoming more robust and usable in more serious contexts as well, while still staying aligned with its original goals. I see these approaches as complementary rather than competing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brain4J Inc.

powerful framework ，do we support transformers safetensors weight format and config.json restore new model? #29

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Brain4J Inc.

powerful framework ，do we support transformers safetensors weight format and config.json restore new model? #29

Uh oh!

mullerhai Jan 6, 2026

Replies: 5 comments · 2 replies

Uh oh!

Adversing Jan 6, 2026 Maintainer

Uh oh!

mullerhai Jan 6, 2026 Author

Uh oh!

xEcho1337 Jan 6, 2026 Maintainer

Uh oh!

mullerhai Jan 6, 2026 Author

Uh oh!

Uh oh!

xEcho1337 Jan 6, 2026 Maintainer

Uh oh!

mullerhai Jan 7, 2026 Author

Uh oh!

xEcho1337 Jan 8, 2026 Maintainer

mullerhai
Jan 6, 2026

Replies: 5 comments 2 replies

Adversing
Jan 6, 2026
Maintainer

mullerhai
Jan 6, 2026
Author

xEcho1337
Jan 6, 2026
Maintainer

mullerhai
Jan 6, 2026
Author

xEcho1337
Jan 6, 2026
Maintainer

mullerhai Jan 7, 2026
Author

xEcho1337 Jan 8, 2026
Maintainer