🚀 Added

💪 `inference 1.0.0` just landed 🔥

We are excited to announce the official 1.0.0 release of Inference - which was announced 2 weeks ago with 1.0.0rc1 preview release.

Over the past years, Inference has evolved from a lightweight prediction server into a widely adopted runtime powering local deployments, Docker workloads, edge devices, and production systems. After hundreds of releases, the project has matured — and so has the need for something faster, more modular, and more future-proof.

inference 1.0.0 closes one chapter and opens another. This release introduces a new prediction engine that will serve as the foundation for future development.

⚡ New prediction engine: `inference-models`

We are introducing inference-models, a redesigned engine to run models focused on:

faster model loading and inference
improved resource utilization
better modularity and extensibility
cleaner separation between serving and model runtime
support from different backends - including TensorRT

Important

With inference 1.0.0 we released also first stable build of inference-models 0.19.0. You can use the engine in inference - just set env variable USE_INFERENCE_MODELS=True

Caution

The new inference-models engine is wrapped with adapters - to serve as dropdown replacement for old engine. We are making it default engine on Roboflow platform, but clients running inference locally have the USE_INFERENCE_MODELS set to False by default. We would like all clients to test the new engine - when the flag is not set, inference works as usually.
In approximately 2 weeks, with inference 1.1.0 release - we will make inference-models default engine for everyone.

Caution

inference-models is completely new backend, we've fixed a lot of problems and bugs. As a result - predictions from your model may be different - but according to our tests, quality-wise they are better. That being said, we still may have introduced some minor bugs - please report us any problems - we will do our best to fix problems 🙏

🛣️ Roadmap

Todays release is just a start for broader changes in inference - the plan for the future is the following:

shortly after release, we will complete our work around Roboflow platform - including migration of small fraction of models not onboarded into new registry used by inference-models and adjusting automations on the platform - until finished, clients who very recently uploaded or renamed models may be impacted by HTTP 404 - contact us to receive support in such cases.
there will be consecutive hot-fixes (if needed) - released as 1.0.x versions.
clients running inference locally should test inference-models backend now, as in approximately 2 weeks, inference-models will become default engine
We have still some work to do in 1.x.x - mainly to provide patches - but we start a march towards 2.0, which should bring new quality for other components of inference - stay tuned for updates.
You should expect that new contributions to inference will be based on inference-models engine and may not work if you don't migrate.

Caution

One of the problem we have not addressed in 1.0.0 is models cache purge - new inference-models engine uses different structure of the local cache than old engine. As a result - inference server with USE_INFERENCE_MODELS=True does not perform clean-up on volume with models pulled from the platform. If you run locally, generally that should not be an issue, since we expect clients only use limited number of different models in their deployments.
If you use large amount of models or when your disk space is tight, running new inference you should perform periodic clean-ups of /tmp/cache. This issue will be addressed before 1.1.0 release.

🎨 Semantic Segmentation in `inference`

Thanks to @leeclemnet, DeepLabV3Plus segmentation model was onboarded to inference and can be used by clients.

📐 Area Measurement block 🤝 Workflows

Thanks to @jeku46 we can now measure area size with Workflows.

🚧 Maintanence

add missing ffmpeg package for dev by @rafel-roboflow in #2009
fix expose sam3 with proper envs by @rafel-roboflow in #2011
Detections Class Replacement support for strings by @Erol444 in #2000
fix: Send termination_reason via data channel on WebRTC stream timeout by @balthazur in #2008
Remove content length validation to allow for chunked responses by @dkosowski87 in #2015
added processing_timeout support to webrtc's StreamConfig dataclass by @Erol444 in #2017
fix: Return 400 instead of 500 for raw bytes sent as base64 image by @bigbitbus in #2016
Added claude sonnet 4.6 by @Erol444 in #2014
Fix mkdocs-macros Jinja2 syntax errors in generated block docs by @yeldarby in #2012
Add remote GPU processing time collection and forwarding by @hansent in #2007
Add semantic-segmentation endpoints + deep_lab_v3_plus by @leeclemnet in #2018
Update CODEOWNERS: Add dkosowski87 and reorganize team assignments by @hansent in #2021
Add support for gemini 3.1 pro in gemini block by @Erol444 in #2024
Add area_measurement workflow block by @jeku46 in #2013
Auto-detect Jetson JetPack version in CLI server start by @alexnorell in #1958
Ged rid of unstable assertions on predictions in e2e tests by @PawelPeczek-Roboflow in #2026
ENT-884: Add workflow_version_id support to inference pipeline by @NVergunst-ROBO in #2022
Add JetPack 7.1 support for NVIDIA Thor by @alexnorell in #1935

🏅 New Contributors

@dkosowski87 made their first contribution in #2015
@leeclemnet made their first contribution in #2018

Full Changelog: v0.64.8...v1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🚀 Added

💪 `inference 1.0.0` just landed 🔥

⚡ New prediction engine: `inference-models`

🛣️ Roadmap

🎨 Semantic Segmentation in `inference`

📐 Area Measurement block 🤝 Workflows

🚧 Maintanence

🏅 New Contributors

Contributors

Uh oh!

v1.0.0

🚀 Added

💪 inference 1.0.0 just landed 🔥

⚡ New prediction engine: inference-models

🛣️ Roadmap

🎨 Semantic Segmentation in inference

📐 Area Measurement block 🤝 Workflows

🚧 Maintanence

🏅 New Contributors

Contributors

Uh oh!

💪 `inference 1.0.0` just landed 🔥

⚡ New prediction engine: `inference-models`

🎨 Semantic Segmentation in `inference`