Releases: google-ai-edge/mediapipe
MediaPipe v0.10.32
Build changes
- Enables ml drift metal delegate as inference calculator backend.
- [mediapipe] support armv7 (32 bit in mediapipe tasks)
- Do not assume canvas is BGRA in RenderToWebGpuCanvas.
- Fix sampling logic in ImageToTensorConverterWebGpu.
- Migrate GlShaderCalculator to API3.
- Migrate gl_shader_calculator_test to use API3 builder.
Bazel changes
- [mediapipe] verion bump to 0.10.27
- Dawn has completed these changes, so the old paths are no longer used.
- Integrate tiny Juno inpainting graph into GenAiProcessor
- Readme for API3
- Add Resources::ResolveId to enable placeholder resource ids usage.
- Web LLM: a few more small edits for Gemma3n
- Include headers from global namespace
- Add comment to Eigen version in WORKSPACE to remind about synchronization with TensorFlow's Eigen dependency
- Migrating VisibilityCopyCalculator to API3.
- Update from Bazel v6.5.0 to v7.4.1, Protobuf v3.19.1 to v5.28.3. Other packages also update the version within WORKSPACE.
- Fix for weight cache on Windows.
- Create Selfie Segmentation Demo App for LiteRT NPU.
- Add Any support for API3
- Adding AudioBuffer support to web LLM Inference API to handle more audio input types for MM models
- Provide API3 interface for PassThroughCalculator using newly added Any type.
- Migrate MergeCalculator to API3 and newly introduced Any type.
- pybind11 version and py_proto_library macro update.
- Add test for PacketResamplerCalculator with a very short video.
- Initial version of sync function runner for API3
- Fix function runner error reporting.
- [mediapipe] version bump
- Migrate CombinedPredictionCalculator to API3
- Clean up CombinedPredictionCalculator
- Currently, wrapping a TextureFrame in a media-pipe Packet assumes the texture is 8-bit RGBA. This patch allows specifying other texture formats to support common color formats like RGBA16F for HDR content.
- Support timestamp bound updates in function runner.
- Migrate TensorsToSegmentationCalculator to MediaPipe API3.
- Add OneOf support for API3.
- Provide ineference calculator API3 interface.
- Migrate LandmarksToMatrixCalculator to API3
- Update MediaPipe OSS to C++20.
- Add a flag to use
fp16activations in tests. - Migrate HandednessToMatrixCalculator to API3.
- Update
xnnpackversion. - Use the new
xnn_reduce_mean_squaredreduction for the RMSNorm. - Migrate ImageToTensorCalculator to API3.
- Consistently use MutexLock instead of manual locking/unlocking
- Add ImageProcessingOptions to FaceDetector C API
- Enable node names as compile time strings in OSS.
- Migrate API3 nodes to use compile time string names.
- Fall back to producer context in gpu_buffer.GetReadView
- Document api3 GetOrDie / VisitOrDie
- Update log for missing InferenceCalculatorXnnpack registration.
- Add NodeName for non-generic calculator context.
- Add
ImageProcessingOptionssupport to FaceLandmarker C API. - Migrate FaceLandmarker C API to use MediaPipe Image
- Update CombinedPredictionCalculator test to new Runner
- Fix comment about when things die.
- Migrate WebGpuShaderCalculator to MediaPipe API3.
- Proto changes for Tiny Gemma on ml_drift
- Enable API3 FunctionRunner for WEB
- Bump MediaPipe version to 0.10.29.
- Add CompareAndSaveImageOutputDynamic to compare to a dynamic golden instead of a file.
- Bump MediaPipe version to 0.10.30.
- Improve error message of graph validation, to include node calculator name
- Refactor Hand Landmarker C API to use new MP Image
- Add ExternalGlTextureSyncMode to require efficient synchronization.
- Add an option to get a Packet for API3 OneOf input.
- Migrate GpuBufferToImageFrameCalculator to API3.
- Add support to pass a single visitor in VisitOrDie for OneOf inputs.
- Add VisitAsPacketOrDie for OneOf inputs.
- Add MediaPipe Tasks C API for AudioClassifier.
- Ensure correct type of #api3 Packet.
- Support fractional frame rates in MediaPipe video processing.
- Add
ImageProcessingOptionssupport to Object Detector C API. - Add
ImageProcessingOptionssupport to MediaPipe PoseLandmarker C API. - Update object detector to apply ImageFrame C API
- Update pose landmarker to apply ImageFrame C API
- Migrate HandAssociationCalculator to MediaPipe API3.
- Adding ImageProcesingOptions to image_classifier C API.
- Adding ImageProcesingOptions to gesture_recognizer C API.
- Migrate GestureRecognizer (C API) to MpImagePtr
- Migrate ImageFrameToGpuBufferCalculator to API3
- Set default thread_num to LiteRT::CPU delegate
- Qualify Packet and MakePacket while in mediapipe::api3 namespace to avoid future collisions with api3::Packet / api3::MakePacket
- Remove redundant empty parentheses from lambdas in MediaPipe API3.
- Migrate ImageClassifier (C API) to MpImagePtr.
- Update HandAssociationCalculatorTest to new Runner
- Add
ImageProcessingOptionssupport to Image Segmenter C API - Migrate TensorsToSegmentationCalculator to API3
- Extend lifetime of Image data when MpImage is constructed from an existing MediaPipe Image
- Allow GetData() calls for contiginuous images
- Migrate ImageSegmenter C API to MpImagePtr
- Add GetLabels() to the ImageSegmenter C API
- Generalize UnpackMediaSequenceCalculator's support for encoded media streams.
- Clean up unused variables
- Get rid of _with_options in favor of optional param (C & Python API)
- Migrate Python ImageSegmenter to C API
- Refactor Image Embedder C API to use MpImage
- Simplify FunctionRunner template types.
- Simplify setting options in
HandAssociationCalculatorTest - Update MediaPipe C API vision task result callbacks to use
MpStatus. - Offer attachments functionality from WebGPU service.
- Enable creation of WebGPU service from explicitly provided wgpu::Device.
- Remove unnecessary checks.
- Enables RGBA input with RGB output.
- Modify the Has{} generic function to check across the different value kinds of
- Refactor TextClassifier C API to use MpStatus.
- Update C API for TextEmbedder to use MpStatus
- Update C API for LanguageDetector to return MpStatus
- Add ImageProcessingOptions support to the MediaPipe ImageEmbedder C API.
- Add ImageProcessingOptions to InteractiveSegmenter C API and migrate to MpImage
- Fix counting pixels with different colors for image comparison tests
- Remove redundant
has_confidence_masksfield fromImageSegmenterResult. - Migrate ImageClassifier C API to use MpStatus.
- Refactor Face Detector C API to use
MpStatus. - Update ImageSegmenter C API to return MP Status
- Add support for XNNPACK's SLOW_CONSISTENT_ARITHMETIC flag
- Update ImageEmbedder C API to return MP Status
- Update InteractiveSegmenter C API to return MP Status
- Get rid of _with_options in favor of optional param (C)
- Refactor Gesture Recognizer C API to use MpStatus.
- Update ObjectDetector C API to return MpStatus
- Update HandLandmarker C API to return MP Status
- Update PoseLandmarker C API to return MpStatus
- Fix libmediapipe.so compilation on Windows
- Remove no longer used Image types
- Add side packets support for FunctionRunner.
- Fix documentation.
- Update bot assignees in bot_config.yml.
- Added new R8 mode for GlShaderCalculator
- Add visibility declarations for Windows
- Centralize WebGPU header includes
- Web Solutions: patch for importScripts error with modules in workers
- Small cleanups in MP Task C++ segmentation graphs and ModelTaskGraph
- Update AudioClassifier to retain error messages
- Retain error messages in the Metadata API
- Retain error messages in Language Detector
- Update GestureRecognizer to retain error messages
- Update FaceLandmarker C API to use new naming and return type convention
- Retain error messages in TextEmbedder
- Retain error messages in HandLandmarker
- Retain error messages in ImageEmbedder
- Retain Error Messages in Object Detector
- Small cleanup of scheduler_queue
- Refactor ImageClassifier C API to return error messages.
- Allow empty Tensors in InferenceCalculator
- Update ImageSegmenter to retain error messages
- Retain error messages for PoseLandmarker
- Retain error messages in MpImage
- Allow packing of input streams into empty SequenceExample.
- Retain error messages in Interactive Segmenter
- Replace custom test macros with EXPECT_EQ/ASSERT_EQ
- Add MpErrorFree to avoid missing function on Windows
- Bump MP version to 0.10.31
- Add Kotlin support to MediaPipe OSS repo
- Prepare for functiongemma with MP web LLM API
- Add experimental mapSync support in GetTexture2dData.
- Fail with error at empty decoded image in OpenCVEncodedImageToImageFrameCalculator. Empty decoded image can happen if decoding fails.
- Support option dependencies in mediapipe proto rules
- Fix logging large one-dimensional vertical data
- Adds GPU output support for category masks (copy only, result listener zero-copy case is not addressed yet)
- Add wgpu::ExternalTexture support
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Allow users to configure NPU delegate
- Remove all references to subgraph reshaping which is enabled by default
- Don't swallow Task exceptions for synchronous use cases
- Update score thresholds for Java classifier/embedder tests
- Restore default .so location
- Add RegionOfInterest Proto to Java Protobuf list
- Don't assume images are RGB
- Allow users to configure the NPU delegate
iOS
- Expose preferredBackends
- Add stream cancellation support in swift API.
- Add audio modality support to iOS GenAI inference.
- Use wrapper type for RenderData
- Remove MP AudioEmbedder
Javascript
- Web LLM: basic .wav audio support for Gemma 3N
- Web LLM: Small fix to multimodal error message strings
- Add a test case to test empty packet inputs.
- Migrate tasks tests to use common image test util.
- Enab...
MediaPipe v0.10.26
16kb Page Size Support
- All the latest Android packages from Google Maven are now supporting the Android 16kb page size.
0.10.26.1includes also the support for ARM v7 CPUs (32-bit).
Bazel changes
- mediapipe task version bump
- Introduce new variant of TFLiteModelLoader::LoadFromPath that allows to specify the mmap mode
- Add DefaultSidePacketCalculator unit test under calculators/core
- Add a test parameter to ignore pixels above diff limit
-
- Add MaskOverlayCalculator unit test under calculators/image
- Web LLM: make GetSizeInTokens work for first vision-capable models
- Log invalid format in proto lite mode.
- Needed in order to update Dawn to match the standard webgpu.h, here:
- Migrate InverseMatrixCalculator to API3
- Migrate WarpAffineCalculator to #mediapipe-api3 + introduce GetGenericContext
- Migrate landmark_projection_calculator to API3
- Inference calculator refactoring.
- Adding int32-vector output to constant side-packet calculator
- Switch usages of
ShaderModuleWGSLDescriptortoShaderSourceWGSL - [mediapipe] update to Android SDK and NDK 26 -> 28
- Migrate world_landmark_projection_calculator to API3
- Migrate landmarks_refinement_calculator to API3
- [mediapipe] upgrade docker image to use JDK 21
- Cleanup projection calculator from reliance on C++ variable scoping and add a new test case.
- Update std::once_flag/call_once to absl::once_flag/call_once in OSS version
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
- Add audio modality support to LLM Inference API.
- [mediapipe] update opencv dependency
MediaPipe v0.10.25
Bazel changes
- Make ThreadPoolExecutorOptions callable from Java/Kotlin
- Add contract validator for API3
- API3 Extract reusable part of API2 graph builder.
- Adding license headers.
- Post release version bump
- API3 calculators should default to timestamp offset 0 (the same default as in API2) for consistency and it should be possible to unset the default.
- API3 graph, stream & side_packet
- Add SRQ config option into TransformerParams proto.
- Graph API2 API3 interop.
- Adding basic vision+text testing capabilities to web LLM Inference API
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
- Drops duplicated Android manifest file.
- Makes LlmTaskRunner internal.
Javascript
- Web LLM: Minor refactoring to allow more usage of newer LLM code paths
MediaPipe v0.10.24
Build changes
- Add FdFinishedFunc util to mediapipe
- rename a config setting to BUILD_FOR_OSS
- #mediapipe #ios remove custom cpp version (rely on the common cpp version set at build time)
- Rely on the common cpp version set at build time.
Framework and core calculator improvements
- Update C++ Graph Builder to support source layers.
- Bump MP version for release 0.10.23.
- Add Back-Edge support in Graph builder.
- Add a destructor to WebGpuAsyncFuture that correctly frees any pending future.
- Add tools for logging Tensors, ImageFrames and cv::Mats
- Add a utility for creating a view of a Tensor into an OpenCV Mat
- Add WebGpuCreateRenderPipelineAsync utility.
- [mediapipe] update documentation mentioning python versions
- Bump MP version for release 0.10.24.
- Add support for
GemmaV2-2Bvia XNNPACK. - Remove obsolete checks that integer division rounds to zero.
- Inline SafeIntStrongIntValidator::SanityCheck function
- Debug logging: Fix and properly support logging RGBA images
- Fix modules/face_detection documentation.
- Add LogHalideBuffer variant for logging Halide buffers
- Add support for
GemmaV3-1Bmodels using XNNPACK. - Correct documentation to reflect actual behavior.
- Fix KleidiAI repository URL.
- Removed usage of deprecated InitFromGraphWithTransforms.
- Dynamically quantize inputs only once before projecting to queries, keys, and values.
- add an enum option to spectrogram calculator to output frames with all channels instead of vector of matrices
- Fix GlBufferView (bug: incomplete move constructor)
- Don't recreate write views on the same internal-only-use tensor (which triggers error messages) and fix read/write view usages.
- Support loading PackWeightsCache from a file descriptor
- Update flag description to use correct name for input_token_limit
- Reduce logging frequency for some warnings.
- Allow header output for all resampling strategies.
- Fix failing build:
blaze --blazerc=/dev/null build //third_party/mediapipe/examples/ios/facedetectioncpu:FaceDetectionCpuApp.apple_binary --config=ios_arm64 --ios_minimum_os=12.0 - Add std::vector output support to ConstantSidePacketCalculator
- Avoid creating unused StatusRep objects on each CalculatorNode::ProcessNode call
- Avoid creating multiple status reps on each mediapipe::tool::StatusStop() call
- Add option to process timestamp bound for ImmediateMuxCalculator.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Move the callback registration into the InferenceSession.
- Add
updateSessionConfiggetSetencePieceProcessorAPI to Java interface. - Add getSessionOptions method to LlmInferenceSession.
- This enables cloning for OpenCL-backed inference sessions
- Adding support for prompt templates
- Adds support to cancel async generation.
- Expose the max number of image to process to unlock vision for multi-modal processing
- Remove unnecessary chunk for add image API
- Declare the dependency of the OpenCL libraries, so that clients don't have to.
iOS
- Add vision modality support in swift API.
- Moving skia conversion to LLM c lib.
Javascript
- Remove artificial limits on maxBufferSize and maxStorageBufferBindingSize for LLM Inference on web.
- Use different parameters (topk, temperature) for gemma3
- Web LLM Inference: better error messaging for re-entry occurring from callback
- Add toggle for allowing the forcing of float32 precision for LLM Inference on web
Python
- Create a Packet containing a vector of ImageFrames. Get a list of ImageFrames from a Packet.
- Remove unused parameter from a docstring.
- Avoid unnecessary copy of ImageFrames.
- Add extra settings (disallowing service default initialization) for the base solution and allow setting it from pose solution.
- No public description
- Create a script that runs the AI Edge Converter for all models in models.json
- Support bundling additional .tflite models in .task
- Enabling LoRA for Gemma3 conversions
- Update llm bundler to put vision in .task
MediaPipe Dependencies
- Update WASM files for 0.10.22 release
MediaPipe v0.10.22
Build changes
- [mediapipe] standardize import of androidx_annotation_annotation
- [mediapipe] standardize import of androidx_appcompact
- [mediapipe] standardize import of androidx_constraint_layout
- [mediapipe] standardize import of androidx_core
- [mediapipe] standardize import of androidx_legacy_legacy_support_v4
- [mediapipe] delete unused 3p android_library androidx_material
- [mediapipe] standardize import of androidx_recyclereview
- [mediapipe] standardize import of camerax
- Fix llm_engine_main build for DRISHTI_DISABLE_GPU=1
Framework and core calculator improvements
- Updating Troubleshooting with VLOG info.
- Update tensors_to_image_calculator.cc
- Delegate memory-mapping the model file to the resource system
- Add static helpers to timestamp classes
- Remove use of designated initializers in tflite_model_loader.cc
- Add support for INT64 in
VectorIntToTensorCalculator. - Use renamed wgpu::ImageCopy* structures.
- [mediapipe] improve mediapipe_java_proto_src_extractor
- Bump MP version for release 0.10.22.
- [mediapipe] improve maven artifact template
- Add two_tap_fir_filter_calculator and update com_google_audio_tools revision.
- Adds check to reject services with an empty shared_ptr
- Adds check to ensure input tensors match model tensor size & type
- Replace
MapNamewithStaticMapin places where it's not important to useMapName - Make DelayedReleaser an "attachement" of the GlContext instance.
- Utility functions that create RGB images for testing.
- Avoids the sharing of GL contexts between nested mediapipe graphs.
- Adds output stream stats to GraphRuntimeInfo
- Move ImageFrames while splitting a vector of ImageFrames.
- Add input stream to control zoom factor used in content_zooming_calculator.
- Introduces GPU synchronization when accessing GetOpenGlBufferReadViews from a different OpenGL context than was used for the GetOpenGlBufferWriteView.
- Adds documentation about graph runtime monitoring.
- Use wgpu::ShaderSourceWGSL instead of wgpu::ShaderModuleWGSLDescriptor.
- [mediapipe] restore mediapipe_aar.bzl
- Add CreateWgslShader utility.
- Update resource loading in WebGpuShaderCalculator to latest API.
- Added prompt templates for session in C API
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- [mediapipe] clean up an unused target ":llm" in core
- [mediapipe] correct the protobuf_lite dependency
- [mediapipe] move llm jni from "core" to "genai"
- [mediapipe] move llm proto from "core" to "genai"
- [mediapipe] build genai tasks with exact dependencies
- [mediapipe] create genai's specifc ProgressListener and ErrorHandler
- [mediapipe] build vision and image_generator tasks with exact dependencies
- Don't use MediaPipeException in JNI layer
- Make generateResponseAsync() return a ListenableFuture and add ProgressCallback to its arguments
- Update JNI to enable litert CPU backend for LLM inference.
- Delete engine when task is closed.
iOS
- Add sequenceBatchSize option when setting up the inference engine..
Javascript
- Fix DrawingUtils constructor failing in Web Workers
- Change starting LoraModel ids from 0 to 1.
- Add a function to determine what type of model (handwritten, converted) a file is
- Fix tee not cancelling the parent stream when both children are cancelled
- Distinguish between '.bin' and '.task' in createFrom*
- Move streamToUint8Array from task runner lib to model loading utility lib, so the graph runner extensions would be able to utilize it.
MediaPipe Dependencies
- Update WASM files for 0.10.21-rc.20250303 release
- Update WASM files for 0.10.22 release
MediaPipe v0.10.21
Framework and core calculator improvements
- Update tensors_to_image_calculator.cc
- Fix incorrect name in ValidateRequiredSidePacketTypes status message.
- Delegate memory-mapping the model file to the resource system
- Add documentation for GpuOrigin::DEFAULT
- Add multiclass nms options for object detector.
- Add static helpers to timestamp classes
- Add Dockerfiles to allow users to build their own wheels
- Remove std::aligned_storage.
- Nit: add details to "no implementation available" error message
- Remove use of designated initializers in tflite_model_loader.cc
- Add resample_time_series_calculator.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Make LLM classes non-final to support mocking.
- Adds TopP parameter in the LLM Inference API.
- Add CPU / GPU options in Java LLM Inference Task.
- Do not require Proto types in public API.
Javascript
- [Web LLM] Fix for duplicate timestamp issue that could occur when loading two LoRA models in immediate succession
- Return error code and file error message in C API for both PredictSync and PredictAsync
- Added isIdle function to check whether web LlmInference instance is ready for work.
- Make the parameters for generateResponse optional.
Model Maker changes
- Enable the option of exporting a model with a fixed batch size.
- Use Optional[int] instead of int | None for pre 3.10 python
- Make LLM classes non-final to support mocking.
- Adds TopP parameter in the LLM Inference API.
MediaPipe v0.10.20
Build changes
- Add comments to explain how to configure OpenCV in the opencv_macos.BUILD file.
- Add libc++_shared.so to MediaPipe Android examples.
- Add linkstatic to OpenCV prebuilts
Framework and core calculator improvements
- Fix ParseFromString() compilation issue in OSS
- All the dead links fixed
- Add troubleshooting tip for unsupported XNNPACK flags during build
- Add UniqueId::Dup.
- Updating the XNNPACK latest commit hash
- Format Workspace file
- Add EglSync wrapper.
- Update Bazel version to 6.5.0
- Update sync_wait to support UniqueFd.
- Fix GlContext includes
- Add EglSyncPoint/CreateEglSyncPoint.
- Bump MediaPipe version to 0.10.19.
- More perfetto tracking for EglSync.
- Patch for supporting WebGPU .deviceInfo during API migration.
- Log the Tensor multi-write error message only once.
- Enable GpuBufferStorageAhwb ASYNC usage for use case: AhwbView write -> GlTextureView read
- Add IsSignaled function (the previous SyncWait for checking status triggers unnecessary StrFormat)
- Add type information to error message when accessing an empty packet.
- Add SharedFD type
- Update SyncWait/IsSignaled to work with SharedFd.
- Enable SharedFd usage in EglSync
- Adding VLOG overrides - MediaPipe utilizes VLOG heavily, but it's not straightforward for how to enable this when running an Android app. VLOG overrides allow to relatively quickly enable VLOGs for various modules within MediaPipe.
- Updating Troubleshooting with VLOG info.
- Slice only the tokens which are needed for the next stage of the LLM pipeline.
- Adds DebugInputStreamHandler.
- Delete YUVImage copy and move operations
- Adds GetGraphRuntimeInfo methods which generates runtime debugging information about the state of InputStreams.
- Add a sample script to run LLM inference on Android via the MediaPipe LLM inference engine.
- Update bot_config.yml
- Add option to set max sequence size in PackMediaSequenceCalculator instead of having it hard coded.
- Update run_llm_inference.sh with recommended models.
- Allow to read the input frame rate from the header in the input side stream and to limit the frame rate.
- Add stream operator<< for TypeId
- Extract native-to-UTF8 path string conversion; add FormatLastError()
- Update comment in yuv_image.h
- Introduce
shadow_copyparameter toPathToResourceAsFile - Fix header includes after refactoring
- Migrate away from status builders
- Avoids the creation of two "default" GpuExecutor instances
- Adds and integrates GraphRuntimeInfoLogger into CalculatorGraph.
- nit: don't overwrite InitializeDefaultExecutor argument "use_application_thread"
- Add Name() access to the source names in api2.
- Add memory mapping and locking to file helpers
- Fix Windows build
- Fix Windows build, part 2
- Support memory mapping in resources.
- Bump MediaPipe version to 0.10.20.
- Enable log message output for messages larger than 4096 bytes.
- Add vision modality to the C API
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Adds the canonical
toBuildermethod to theLlmInferenceOptionsobject. - Add Vision Modality to the MediaPipe LLM JNI Layer
- Add vision modality to the Java LLM API
- Remove unused Proto dependency
iOS
- Fixed empty pose world landmarks in iOS holistic landmarker
Javascript
- Improve logging to allow users to understand 1) which InferenceCalculator backend is used (without extra VLOG flags) and 2) when a model is loaded (including its size).
- nits: Remove linter warnings, fix unused includes.
Python
- Update the expected accuracy for text embedder test.
- Remove the check for start and stop tokens in the LLM bundler.
Model Maker changes
- Move tensorflow lite python calls to ai-edge-litert.
MediaPipe Dependencies
- Update WASM files
MediaPipe v0.10.18
Build changes
- Following open-sourcing webgpu with open-sourcing one of its dependencies
third_party/emscripten - Add pillow, pyyaml, and requests to model_maker BUILD
Framework and core calculator improvements
- Loading resources through calculator and subgraph contexts and configuring through kResourcesService.
- Use std::make_unique
- Moves OnDiskCacheHelper class into a separate file / compilation target
- Pools: report buffer specs on failure, fix status propagation, fix includes
- Open-Source MediaPipe's WebGPU helpers.
- BatchMatul uses transpose parameter.
- Introduce Resource to represent a generic resource (file content, embedded/in-memory resource) for reading.
- Bump up the version number to 0.10.16
- Migrate from AdapterProperties to AdapterInfo
- Migrate from Resource::ReadContents to Resources::Get (using ForEachLine where required)
- Update Resources docs to mention ForEachLine (so devs don't fallback to ReadContents in such a case)
- Adjust WebGPU device registration
- Fix includes/copies/checks for BuildLabelMapFromFiles
- Migrate to BuildLabelMapFromFiles.
- Update Python version requirements in setup.py
- Introduce Resources with mapping, so graphs can use placeholders instead of actual resource paths.
- Remove Resources::ReadContents & add Resource::TryReleaseAsString.
- Fix ports for multi side outputs.
- Update solution android apps with explicit exported attribute.
- Ensure kResourcesService is set before CalculatorGraph is initialized (otherwise subgraphs/nodes may get the wrong default resources).
- Switch inference tests to ResourceProviderCalculator & update builder to refer MODEL_RESOURCE.
- Migrate modules to use ResourceProviderCalculator.
- Support single tensor input in TensorsToImageCalculator
- Migrate TfLiteModelLoader to use MP Resources.
- Remove deprecated TfLiteModelLoader::LoadFromPath.
- Fix for isIOS() platform util on worker and non-worker contexts
- Support single tensor input in TensorsToSegmentationCalculator
- Makes CalculatorContext::GetGraphServiceManager() private
- BatchMatMul can handle cases where ndims != 4 and quantization
- RmsNorm has an optional scale parameter.
- Allowed variable audio packet size by setting num_samples to null.
- Fix technically correct but confusing example in top level comments.
- Removing
ReturnTypehelper, since it's part of the standard now. - Update XNNPack to 9/24
- Enable LoRA conversion support for Gemma2-2B
- Improve warning when InferenceCalculator backends are not linked
- Bump MediaPipe version to 0.10.17.
- Update OpenCV to a version that compiles with C++ 17
- Force xnnpack when CPU inference is enforced
- Install PyBind before TensorFlow to get the MediaPipe version
- Change MP version to 0.10.18
- Add validation to LLM bundler, alternative takePicture method to support custom thread executor, CopySign op, const Spec() method to OutputStreamManager, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, menu for the default demo app and option to Close processor/graph and Exit gracefully, ngrammer, per layer embeddings and Relu1p5 fields to llm_params and update from Proto, a special InMemory Resources (current use case is in tests, but may be needed for some simple things as well), ResourceProviderCalculator (replacement for LocalFileContentsCalculator), Resource support into TfliteModelCalculator and a flag to set the default number of XNNPACK threads.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Initialize new members in LlmModelSettings
- Create an implicit session for all requests to generateResponse()
- Change session management so that all JNI calls come from the same thread.
- Add Session API support to LLM Java API
iOS
- Updated name of iOS audio classifier delegate
- Fixed incorrect stream mode in iOS audio classifier options
- Added method to ios audio task runner
- Updated iOS audio classifier BUILD file
- Fixed buffer length calculation in iOS MPPAudioData
- Updated iOS audio data tests to fix issue in buffer length calculation
- Revert "Added method for getting interleaved float32 pcm buffer from audio file"
- Updated comments in iOS LlmInference
- Dropped Refactored suffix for modified files in iOS genai
- Updated documentation of LlmTaskRunner
- Removed allocation of LlmInference Options
- Updated the response generation queue to be serial in iOS LlmInference
- Updated documentation of iOS LlmInference, documentation of LlmInference+Session
- Fixed marking of response generation completed control flow in LlmInference+Session.
- LlmInference.Options: remove unnecessary
numOfSupportedLoraRanksparameter. - Add activation data type to LlmInference.Options.
- Added more methods to iOS
AVAudioPCMBuffer+TestUtils, few basic iOS audio classifier tests, options tests to iOS audio classifier, utils for AVAudioFile, test for score threshold to MPPAudioClassifierTests, constants in MPPAudioClassifierTests, close method to iOS audio classifier, iOS MPPAudioData test utils, stream mode tests for iOS audio classifier, iOS audio classifier to cocoapods build, audio record creation tests to MPPAudioClassifierTests, close method to MPPAudioEmbedder, iOS audio embedder tests, more utility methods to MPPAudioEmbedderTests, streams mode tests for iOS audio embedder, iOS audio embedder to cocoapods build, comments to MPPAudioClassifierTests, iOS audio embedder header and implementation, iOS audio classifier implementation file, method for getting interleaved float32 pcm buffer from audio file, refactored iOS LlmTaskRunner, iOS LlmSessionRunner, more errors to GenAiInferenceError, refactored LlmInference, iOS session runner to build files, extra safeguards for response context in LlmSessionRunner, LlmInference+Session.swift and documentation regarding session and inference life times to iOS LLM Inference. - Fixed issue with iOS audio embedder result parsing, iOS audio embedder options processing , index error in AVAudioFile+TestUtils, audio classifier result processing in stream mode, error handling in MPPAudioData, microphone recording issues in iOS MPPAudioRecord, documentation of iOS Audio Record, iOS audio record and audio data tests by avoiding audio engine running state checks and iOS audio embedder result helpers and bug due to simultaneous response generation calls across sessions.
- Updated method signatures in iOS audio classifier tests
- Fixed flow limiting in iOS audio classifier
- Removed duplicate test from MPPAudioClassifierTests
- Updated comments in AVAudioFile+TestUtils
- Changed the name of iOS audio classifier async test helper
- Update comment for
LlmInference.Session.clone()method. - Marked inits unavailable in MPPFloatBuffer
- Updated documentation of iOS audio record
- Adds a LlmInference.Metrics for providing some key performance metrics ( initialization time, response generation time) of the LLM inference.
- Removed unwanted imports from iOS audio data tests
- Cleaned ios audio test utils BUILD file
- Remove the activation data type from the Swift API. We don't expect users to set it directly.
- Use seconds instead of milliseconds for latency metrics.
Javascript
- Add comments to generateResponses method.
- Migrate to ForEachLine to have a single source of truth for getting file contents lines.
- Workaround for multi-output web LLM issue where last response can get corrupted when numResponses is odd.
- Quick fix for wrong number of multi-outputs sometimes when streaming
Python
- Add a flag in the converter config for generating fake weights. When it is set to true, all weights will be filled with zeros.
- Update text embedder test to match the output after XNNPack upgrade.
- Update remaining data in text embedder test to match the output after XNNPack upgrade.
- Update the expected value of the text embedder test.
- Add python pip deps to WORKSPACE
- Fix pip_deps targets.
Model Maker changes
- Undo dynamic sequence length for export_model api because it doesn't work with MediaPipe.
- Replace
mockwithunittest.mockinmodel_makertests. - Move tensorflow lite python calls to ai-edge-litert.
MediaPipe Dependencies
- Update WASM files
MediaPipe v0.10.15
Build changes
- Fix unwanted dependency on GPU libraries.
- Adds TwoTapFirFilterCalculator.
- Add public visibility to
graph_serviceheaders. - Disable ASAN, TSAN and MSAN tests which take more than 10 minutes.
Framework and core calculator improvements
- Update
PointToForeignwith an optional cleanup object. - Enable
BeginLoopCalculatorfor move-only types (e.g.Tensor) withoutPacket::Consumeusage and copyable types without copying unless it's a fundamental type. - Ensure proper release of resources in case of multiple AHWB reads.
- Enables the configuration of GpuBufferPool options via GpuResources::Create();
- Bugfix to correctly handle landmark projection in the non-square case.
- add utility to wait for a sync (represented by FD)
- Change a RET_CHECK to RET_CHECK_EQ
- KinematicPathSolver: Avoid overshooting target
- Introduce GetDefaultGpuExecutor(GpuResources) to allow executing all calculators on MP GPU thread.
- No destruction for static ahwb_usage_track_.
- Unbind framebufffer in Affine Transformation Runner GL
- Move/isolate ahwb_usage_track_ into tensor_ahwb
- Guard ahwb_tensor_track_ with mutex.
- Add SidePacketConnectionTest
- Update C++ Graph Builder to support executors and support input/output stream handlers.
- Node::Input/OutputStreamHandler -> Node::SetInput/OutputStreamHandler
- Add
Packet::Share()method in replacement ofSharedPtrWithPacket()function. - Default to high-performance power preference hint for WebGL contexts. For some computers with dual GPUs (like MBP2019), this will more frequently give us the higher performance GPU, which is generally preferable for most of our use cases (realtime rendering and ML), since speed is more critical than power consumption. If necessary, the user can override this setting by requesting their canvas' WebGL context manually before initializing the graph.
- Introduce input_scale parameter to SpectogramCalculator.
- Improve documentation of graph options
- Add an option to PackMediaSequenceCalculator to add empty clip labels instead of ignoring them. This is useful when we want to distinguish processing errors from no-detections.
- Updates language detection headers
- Fix dangling error reporter pointer in memory mapped models
- Fix for possible infinite stall using setOptions immediately before a loadLoraModel call.
- Add relu1p5 op, abs op, Log op, mdspan and Lhs Broadcast Sub with test
- Fix missing member move in Tensor class
- Add support for single Tensor output streams for ImageToTensorCalculator.
- Fix some compilation errors in WebGPU code. These changes are all minor.
- Add single tensor output support to tensor_converter_calculator.
- Replace QCHECK with ABSL_QCHECK and CHECK with ABSL_CHECK.
- Fix a bug in TensorAHWB that triggers a crash with multiple delayed AHWB readers followed by a CPU reader.
- Fixes an unnecessary allocation of GraphServiceManager in case it is adopted from the calculator context.
- Fix triggering of DFATAL message.
- Remove xnn_enable_avx512fp16=false from .bazelrc
- Replace uses of TfLiteOperatorCreate with TfLiteOperatorCreateWithData
- Compile with '--keep_going' in setup.py
- Update ndk version so that our open source users get the best possible performance out of mediapipe.
- Correct address of android ndk
- Replace absl::make_unique with std::make_unique in tensor.cc and tensor_ahwb.cc.
- LLM decode benchmarks fill the cache with a predefined number of tokens before starting decoding.
- Add logic to drop the offending non-monotonically increasing timestamp in the MicrophoneHelper.
- Make packet payload const.
- Pass flag to indicate that consuming op may support prepacked GEMM.
- Get timestamp from OpenCV VideoCapture after first frame is read.
- Update XNNPack and cpuinfo
- Update TensorFlow to 2024-07-18.
- Remove deprecated TfLiteOperatorCreateWithData function
- Add option to use shifted window in SpectrogramCalculator.
- Move AhwbUsage struct and helper methods into a separate library.
- Make fields in
PacketGetter.Pairpublic. - The GraphProfiler my be destoried before the task executed in the executor.
- Introduce flag in MicrophoneHelper to drop non-increasing timestamps.
- llm_test - add batch size of 8 for BM_Llm_QCINT8/512/128
- Add method to create MP Tensor from TfLite tensor specs
- Refactors AHardwareBufferView class to be instantiated with a TensorAhwbUsage pointer.
- Refactor LlmBuilder to have one graph
- Add
expected_seq_lenparam to ComputeLogits() - Fix mediapipe::file::Exists() for >2GB files on Windows.
- Bump XNNPACK and KleidiAI versions.
- Update MP demo app to acquire wake lock
- Replace mediapipe::StatusOr with absl::StatusOr
- Sync on ssbo_writte_ before mapping an AHWB to a CpuReadView.
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.
Android
- Bump targetSdkVersion to 34 throughout MediaPipe.
iOS
- Updated documentation in iOS audio classifier
- Added iOS holistic landmarker to vision framework build
- Changed method name in MPPAudioClassifierResult
- Added audio classifier options helpers
- Added audio classifier result helpers
- Added method to create audio record MPPAudioTaskRunner
- Removed unused imports in MPPAudioTaskRunner
- Added iOS audio embedder result, classifier result, classifier options, embedder options, embedder options helpers, classifier header and embedder result helpers
- Add missing argument for num_draft_tokens.
Javascript
- Set quantization bits for LoRA weight conversion to match those specified
- Warn on adding packets to a closed input stream instead of silently dropping packets.
- Enable experimental support for Chromium WGSL subgroups in LLM API, when available.
- Support multi-response generation.
Python
- Add prompt template to llm bundler.
Bug fixes
- class_weights flag cuases a crash for multiclass case
Model Maker changes
- Rename old BinaryAUC metric to BinarySparseAUC(used by text_classifier) and create a new BinaryAUC metric which does not expect sparse inputs.
- Allow configuration of num_parallel_calls and cycle_length in hparams
- Improve python code format.
- Use tf.io.gfile.GFile for writing metadata file in image classifier.
- Change SparsePrecision metric to BinarySparsePrecision metric, and same for SparseRecall->BinarySparseRecall in the core library. We only care about these metrics in the binary case, so this change makes the metric classnames more accurate for it's intended usage.
- Support multilabel model training in text classifier
- Create and add metrics for multi-class case
- Support a customized best model monitor for multiclass cases
MediaPipe Dependencies
- Update WASM files
MediaPipe v0.10.14
Framework and core calculator improvements
- Expose Lora ranks.
- Update C API documentation to make it clear that the callback is invoked multiple times
- Do not free response in PredictAsync callback
- Enable usage of DRISHTI_PROFILING from non mediapipe namespaces.
- Add model type to ImageGeneratorOptions.
- Allow casting Stream->Stream
MediaPipe Tasks update
This section should highlight the changes that are done specifically for any platform and don't propagate to other platforms.
iOS
- Added iOS audio data tests
- Removed unused methods in AVAudioPCMBufferTestUtils
- Added read at offset tests to MPPAudioRecordTests
- Renamed property in MPPAudioData
- Added iOS Audio Packet Creator
- Added iOS audio running mode
- Added iOS Packet Creator
- Added iOS audio task runner
- Updated documentation of MPPAudioPacketCreator
Javascript
- Allow models to be uploaded via ReadableStreamDefaultReader
- Allow all tasks to use a ReadableStreamDefaultReader
- Expose Web LoRA API.
- Raise WebGPU errors to JavaScript.
- Update GenAI Experimental README
- Update GenAI README
Python
- Fixed result_callback() argument
MediaPipe Dependencies
- Flatbuffers upgrade to 24.3.7
- Update TF and FlatBuffer dependency to latest.