tasks/segment/ #8734

2024-03-07T02:23:39Z

giscus[bot]
bot Mar 7, 2024

tasks/segment/

Learn how to use instance segmentation models with Ultralytics YOLO. Instructions on training, validation, image prediction, and model export.

https://docs.ultralytics.com/tasks/segment/

jrryzh · 2024-03-07T02:23:40Z

jrryzh
Mar 7, 2024 — with giscus

I have a question about segmentation dataset and really wish you can answer, my dataset contains lots of information including bbox and segmentations, however it has wrong category_id and all categories have been marked as the same category, does this has bad influence on finetuing? If so, how can I fix that?

1 reply

pderrenger Mar 7, 2024
Maintainer

@jrryzh hey there! 👋

Absolutely, having incorrect category_id in your dataset where all categories are marked as the same can influence the fine-tuning process, especially for tasks like segmentation where precise category distinctions are crucial.

To fix this, you'll need to correct the category_id for each instance in your dataset. This might involve manually checking and updating the category IDs or writing a script to automate the process if you have a mapping of incorrect to correct IDs.

For instance, if you're working with a JSON format similar to COCO, you could use Python to update the category IDs:

import json

# Load your dataset
with open('your_dataset.json') as f:
    data = json.load(f)

# Assuming all categories are incorrectly marked as category_id 1 and you want to update them to correct IDs
correct_mapping = {1: 'new_correct_id'}  # Update this with your correct category IDs

# Update category IDs
for ann in data['annotations']:
    if ann['category_id'] in correct_mapping:
        ann['category_id'] = correct_mapping[ann['category_id']]

# Save the corrected dataset
with open('your_corrected_dataset.json', 'w') as f:
    json.dump(data, f)

Remember to replace 'new_correct_id' and the mapping logic according to your specific needs. After correcting your dataset, you can proceed with fine-tuning your model, which should now learn with the correct category distinctions.

If you're working with segmentation, our Instance Segmentation guide might be helpful. It covers how to use our segmentation models, like yolov8n-seg.pt, which are pre-trained on COCO and can be fine-tuned on your corrected dataset.

Hope this helps! Let us know if you have any more questions. 😊

Boaruzhanchik · 2024-03-15T21:54:34Z

Boaruzhanchik
Mar 15, 2024 — with giscus

How can I run my code on GPU ? From the processor the recognition is freezing

while True:
    ret, im0 = cap.read()
    if not ret:
        break

    results = model(im0,verbose=False,conf=0.4,retina_masks=True)
    cv2.imshow('im0',im0)
    
    boxes = results[0].boxes.xyxy.tolist()
    classes = results[0].boxes.cls.tolist()
    confidences = results[0].boxes.conf.tolist()
    masks = results[0].masks
    if results[0].masks is not None:
        clss = results[0].boxes.cls.cpu().tolist()
        masks = results[0].masks.xy

        coordinates_mask = masks[0]
        contour = coordinates_mask.reshape((-1, 1, 2)).astype(np.int32)
        epsilon = 0.04 * cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, epsilon, True)
        auto_transform = four_point_transform(im0, approx.reshape(-2, 2))
        cv2.imshow('automated', auto_transform)
        russian_ocr(auto_transform)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

4 replies

glenn-jocher Mar 16, 2024
Maintainer

@Boaruzhanchik hey there! 😊 To run your code on a GPU, you typically just need to ensure your model is set to use the GPU. With Ultralytics YOLO, your model should automatically utilize a CUDA-compatible GPU if available. Here's a quick check to make sure:

from ultralytics import YOLO

# Make sure 'device' is set to 'cuda' if you want to use GPU (default if CUDA is available)
model = YOLO('yolov8n.pt', device='cuda')

This small addition directs the model to use the GPU. Make sure you have a CUDA-compatible Nvidia GPU and have installed the necessary CUDA and CuDNN libraries. If recognition is freezing, it's usually due to the heavy computation not being efficiently processed by the CPU. Running it on the GPU should significantly speed things up! 🚀

If CUDA is correctly set up and you're still encountering issues, make sure to check your GPU drivers and CUDA installation.

Keep me posted on how it goes or if you run into any further questions!

Boaruzhanchik Mar 16, 2024 — with giscus

Hi, i'm install pythorch and have problem when program detect object

while True:
    ret, im0 = cap.read()
    if not ret:
        break
    im0 = cv2.resize(im0, (new_width, new_height))
    results = model(im0,verbose=False,conf=0.4,retina_masks=True, device = 'cuda')
    cv2.imshow('im0',im0)
    
    boxes = results[0].boxes.xyxy.tolist()
    classes = results[0].boxes.cls.tolist()
    confidences = results[0].boxes.conf.tolist()
    masks = results[0].masks
    if results[0].masks is not None:
        clss = results[0].boxes.cls.cpu().tolist()
        masks = results[0].masks.xy

        coordinates_mask = masks[0]
        contour = coordinates_mask.reshape((-1, 1, 2)).astype(np.int32)
        epsilon = 0.04 * cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, epsilon, True)
        auto_transform = four_point_transform(im0, approx.reshape(-2, 2))
        cv2.imshow('automated', auto_transform)
        russian_ocr(auto_transform)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Traceback (most recent call last):
File "d:\RABOTA\yolov8_MISSION_COMPLITE.py", line 117, in
results = model(im0,verbose=False,conf=0.4,retina_masks=True, device = 'cuda')
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\model.py", line 169, in call
return self.predict(source, stream, **kwargs)
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\model.py", line 439, in predict
return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\predictor.py", line 167, in call
return list(self.stream_inference(source, model, *args, **kwargs)) # merge list of Result into one
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\predictor.py", line 254, in stream_inference
self.results = self.postprocess(preds, im, im0s)
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\models\yolo\segment\predict.py", line 30, in postprocess
p = ops.non_max_suppression(
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\ops.py", line 282, in non_max_suppression
i = torchvision.ops.nms(boxes, scores, iou_thres) # NMS
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\torchvision\ops\boxes.py", line 41, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "C:\Users\Iosif\AppData\Local\Programs\Python\Python310\lib\site-packages\torch_ops.py", line 755, in call
return self._op(*args, **(kwargs or {}))
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\cpu\nms_kernel.cpp:112 [kernel]
Meta: registered at /dev/null:440 [kernel]
QuantizedCPU: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\quantized\cpu\qnms_kernel.cpp:124 [kernel]
BackendSelect: fallthrough registered at ..\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:154 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at ..\aten\src\ATen\FunctionalizeFallbackKernel.cpp:324 [backend fallback]
Named: registered at ..\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ..\aten\src\ATen\ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at ..\aten\src\ATen\native\NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at ..\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:53 [backend fallback]
AutogradCPU: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:57 [backend fallback]
AutogradCUDA: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:65 [backend fallback]
AutogradXLA: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:69 [backend fallback]
AutogradMPS: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:77 [backend fallback]
AutogradXPU: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:61 [backend fallback]
AutogradHPU: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:90 [backend fallback]
AutogradLazy: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:73 [backend fallback]
AutogradMeta: registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:81 [backend fallback]
Tracer: registered at ..\torch\csrc\autograd\TraceTypeManual.cpp:297 [backend fallback]
AutocastCPU: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\autocast\nms_kernel.cpp:34 [kernel]
AutocastCUDA: registered at C:\actions-runner_work\vision\vision\pytorch\vision\torchvision\csrc\ops\autocast\nms_kernel.cpp:27 [kernel]
FuncTorchBatched: registered at ..\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:720 [backend fallback]
BatchedNestedTensor: registered at ..\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:746 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ..\aten\src\ATen\functorch\VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at ..\aten\src\ATen\LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at ..\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ..\aten\src\ATen\functorch\TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:162 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ..\aten\src\ATen\functorch\DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:166 [backend fallback]
PythonDispatcher: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:158 [backend fallback]

Boaruzhanchik Mar 16, 2024 — with giscus

Sorry for the inconvenience, everything was solved, thank you !

glenn-jocher Mar 16, 2024
Maintainer

Hey there! 😊 Glad to hear everything got sorted out. If you ever run into another hiccup or have any questions, please don't hesitate to reach out. Wishing you continued success with your projects! If instance segmentation or anything else piques your interest, our Ultralytics Docs are there to help guide you along. Cheers! 🎉

Mo777hamed · 2024-03-19T07:54:22Z

Mo777hamed
Mar 19, 2024 — with giscus

I am building a streamlit app on YOLOv8 models,
the question is how display the resulted or the annotated image or video

1 reply

glenn-jocher Mar 19, 2024
Maintainer

Hey there! 😊 For displaying the result or the annotated image/video in your Streamlit app using YOLOv8 models, you can follow this approach:

After running the prediction with a YOLOv8 model, you get a Results object that contains the annotated images. You can convert these images to a format suitable for display in Streamlit using the PIL library. Here’s a quick example:

from ultralytics import YOLO
import streamlit as st
from PIL import Image
import numpy as np

# Load a YOLO model
model = YOLO('yolov8n.pt')  # Or your custom model

# Run prediction
results = model('path/to/image.jpg')  # Replace with your source

# Convert results image to PIL Image for Streamlit
annotated_img = Image.fromarray(results[0].plot()[..., ::-1])  # Convert BGR to RGB

# Display the image in Streamlit
st.image(annotated_img, caption='Annotated Image')

Replace 'path/to/image.jpg' with your image path or a video source. This code snippet assumes you're running prediction on an image. For videos, consider processing frame-by-frame and updating the Streamlit display within a loop.

Hope this helps you integrate YOLOv8 with your Streamlit app smoothly! 🚀 Let me know if you have any more questions.

bdv29 · 2024-03-19T15:27:53Z

bdv29
Mar 19, 2024 — with giscus

has the ability to use image masks instead of contour points as labels ever been implemented ?
some datasets can't be fully represented with just contour points, for instance if there are gaps within the object that shouldn't be masked but a contour produced mask would cover anyway.

1 reply

glenn-jocher Mar 19, 2024
Maintainer

Hello! 👋 Yes, Ultralytics YOLO models have the ability to perform instance segmentation, which can utilize image masks to segment individual objects in an image. This is particularly useful for datasets where contour points might not adequately represent the object due to internal gaps or complex shapes.

For instance segmentation, you can use models with the -seg suffix like yolov8n-seg.pt. These models are trained to predict both bounding boxes and segmentation masks for each detected object, offering more precise representation than contours alone.

Here's a quick example on how to predict with a pretrained segmentation model:

from ultralytics import YOLO

# Load a segmentation model
model = YOLO('yolov8n-seg.pt')

# Predict with the model
results = model('https://ultralytics.com/images/bus.jpg')

This will give you both the bounding boxes and segmentation masks for detected objects, handling cases where simple contours are not sufficient. 🎭

Let us know if you need further assistance or examples!

saadtariq001s · 2024-03-22T06:39:30Z

saadtariq001s
Mar 22, 2024 — with giscus

Hi there!

I am using the pre-trained segmentation model by Yolov8.

I have an image that contains several objects, and I only want to receive the mask of any person present in the frame, while ignoring all the other masks.

My requirements:

The original image dimensions must not be tempered.
I do not need a bounding box surrounding the mask, as I later plan to draw contours on the mask
The background or any other objects in the frame must be ignored.
Save the image containing the mask of the person with the same spatial pixel dimensions.

3 replies

pderrenger Mar 22, 2024
Maintainer

Hey there! 👋

For your requirement to only get the person mask from an image using the YOLOv8 pre-trained segmentation model, and ignoring other detections, you can do something like this:

from ultralytics import YOLO

# Load the YOLOv8 segmentation model
model = YOLO('yolov8n-seg.pt')

# Perform prediction
results = model('path/to/your/image.jpg')

# Filter for 'person' class masks only (class_id for 'person' is usually 0)
person_masks = results.masks[results.boxes.cls == 0]

# Loop through each person mask to process/visualize as needed
for mask in person_masks:
    # Here, 'mask' represents individual person masks
    # You can proceed to draw contours or other operations as required

# Save or visualize person_masks as needed, maintaining original image dimensions

This snippet is a starting point. The results.masks and results.boxes.cls == 0 effectively filter out only the person masks from your detections. You might need to adjust the class_id based on the specific model and dataset used for training.

By utilizing this approach, you'll be able to:

Preserve the original image dimensions.
Ignore non-person objects.
Skip bounding box visualization.
Save the image with only person masks, ready for further processing.

Hope this helps! Let me know if you need further assistance. 😊

saadtariq001s Mar 23, 2024 — with giscus

results.masks gives an attribute error. It was fixed with results[0].masks.

I have the tensor containing the mask data of the person through

person_masks.data.

Just help me save the masked image now, that only contains mask of the detected person. (There is always going to be one person in my usecase)

pderrenger Mar 23, 2024
Maintainer

Hey! 👋

Great to hear you've successfully isolated the person mask using YOLOv8 segmentation! To save the mask image with the same spatial dimensions as the original, here is a quick way:

import cv2
import torch

# Assuming `person_masks.data` is your mask tensor
# Convert mask to a binary image (0 or 255), change the shape to match the original image height and width
mask_image = person_masks.data.squeeze().cpu().numpy() * 255  # Adjust the multiplication factor as necessary
mask_image = mask_image.astype(np.uint8)

# Save the mask image
cv2.imwrite('path/to/save/person_mask.jpg', mask_image)

This code snippet takes your person_masks.data tensor, converts it into a 255 scaled (grayscale) image, and saves it with the original dimensions. Make sure the path where you're saving is accessible.

Let me know if this helps or if you have any further questions! 😊

rex111536236236236 · 2024-03-22T09:58:01Z

rex111536236236236
Mar 22, 2024 — with giscus

I have a question, how do you visualize the segmentation mask with the detected label? it seems it's not discussed in the documentation, it just shows here how to predict using it but not visualizing it with an image. I even checked the list of results but it seems its just returning segmentation masks, not the classes that it predicts, with that said my question is can you visualize the segmentation mask with a label only because of what i see in the tutorial it has a bounding box too which i think that's how you put the label there because i think the segmentation masks are not directly related in the class ID, my goal is to detect object with segmentation and put label to it without bounding box. looking forward to your answer thank you

1 reply

glenn-jocher Mar 22, 2024
Maintainer

Hello there! 👋

To visualize the segmentation mask with its corresponding label (without bounding boxes), you can use the plot() method in our Python API, which nicely places the detected class labels directly onto the segmentation masks. Here's a quick example to guide you through:

from ultralytics import YOLO
import matplotlib.pyplot as plt

# Load a pretrained segmentation model
model = YOLO('yolov8n-seg.pt')  

# Predict on an image and return a Results object
results = model('https://ultralytics.com/images/bus.jpg')  

# Visualize segmentation with labels (without bounding boxes)
fig = plt.figure(figsize=(10, 10))
plt.imshow(results.render()[0])  # `render()` method overlays masks & class labels onto the image
plt.axis('off')
plt.show()

This generates an output where each segmentation mask is color-coded, and the class labels are overlaid directly on the masks. The render() method used in plotting neatly handles the visualization, making sure that the labels appear clear and readable.

Unfortunately, our default method integrates both segmentation masks and bounding boxes for comprehensive detection visualization. If you specifically want only the masks with labels and without the bounding boxes in your plots, the code tweak above should help.

Hope this answers your question! 😊 If you have any further queries or need more insights, feel free to reach out.

utokyo-sm96 · 2024-03-22T18:12:25Z

utokyo-sm96
Mar 22, 2024 — with giscus

I have a question. Can anyone suggest me how can I save the predicted bounding box and segmented mask in the format of :
<class_id> <bbox_center_x> <bbox_center_y> <bbox_width> <bbox_height> <polygon_x1> <polygon_y1> <polygon_x2> <polygon_y2> ... <polygon_xN> <polygon_yN>
which has labels of both bounding box and segmentation mask?

5 replies

pderrenger Mar 23, 2024
Maintainer

@utokyo-sm96 hey there! 😊 To save the predicted bounding box and segmentation mask in your specified format, you can iterate over the detected objects and their corresponding masks to build the desired string for each detection. Below is a Python code snippet using the Ultralytics YOLO model to demonstrate how you could achieve this:

from ultralytics import YOLO

# Load your YOLOv8 segment model
model = YOLO('yolov8n-seg.pt')

# Run a prediction
results = model('path/to/your/image.jpg')

# Iterate through results to format and save your detections and masks
for i, det in enumerate(results[0].boxes.xywh):
    class_id = int(det[4])
    bbox_center_x, bbox_center_y, bbox_width, bbox_height = det[:4]
    segment = results[0].masks.xyn[i]  # Get mask points for this detection

    # Build the desired format string
    output_str = f"{class_id} {bbox_center_x} {bbox_center_y} {bbox_width} {bbox_height}"
    for x, y in zip(segment[0::2], segment[1::2]):  # Iterate over points in mask
        output_str += f" {x} {y}"

    print(output_str)  # Or write to a file

This code will print out the details for each detection in your specified format. Note that segment might need more processing depending on how you wish to convert masks to polygons.

Hope this helps! Let me know if you need further assistance.

utokyo-sm96 Mar 25, 2024

When I try to predict some images using the best weights of this trained model (!python3 segment/predict.py --save-crop --save-txt)
I get:
Saved txt for predicted image example:
0 0.5 0.5 0.335938 0.078125

I expect to get predicted segment coordinates in the saved txt files as:
Dataset annotation format example (cls x1 y1 x2 y2 ...):
0 0.3347670250896058 0.4770609318996415 0.3369175627240144 0.5422939068100359 0.6681003584229391 0.5218637992831541 0.6645161290322581 0.4559139784946237
any ideas about what is getting wrong?

pderrenger Mar 25, 2024
Maintainer

Hey there! 👋 It looks like you're encountering an issue where you're expecting segmentation polygon coordinates in the .txt output file but are only getting bounding box coordinates instead. In the current implementation, the --save-txt flag saves detection results in the YOLO format (class, center_x, center_y, width, height) without segmentation coordinates.

For including segmentation mask coordinates in your output, you'd need to programmatically access the segmentation masks from the model's output and convert them into your desired format. Here's a quick example on how you might approach it:

from ultralytics import YOLO
import numpy as np

model = YOLO('yolov8n-seg.pt')  # Load your segment model
results = model('path/to/image.jpg')  # Perform inference

for i, (box, mask) in enumerate(zip(results[0].boxes, results[0].masks)):
    class_id = box[4].int()
    bbox = box[:4]  # bbox coordinates (center_x, center_y, width, height)
    polygon = mask.to_polygons()[0]  # Assuming one polygon per mask for simplification
    # Convert polygon to your desired format here
    # Example: polygon might need flattening and conversion to (x, y) pairs
    flat_polygon = np.array(polygon).flatten()
    
    output_line = f"{class_id} {bbox[0]} {bbox[1]} {bbox[2]} {bbox[3]} " + ' '.join(map(str, flat_polygon))
    
    # Now write output_line to your .txt file

This is just a starter; you might need to adjust the mask processing part depending on your exact needs. Remember, the actual implementation details may vary as this example assumes direct access to methods for converting masks to polygons which might not exist as is and you may need to implement or adapt them based on the format of the masks.

Don't hesitate to reach out again if you need more help! 🚀

utokyo-sm96 Mar 28, 2024

`
def find_polygons(mask, epsilon=0.02, min_area=0.02):

if isinstance(mask, torch.Tensor):
    mask = mask.to(torch.device('cpu')).numpy()

mask = np.uint8((mask > 0) * 255)

# Validate shape and convert mask to grayscale if it's not already
if mask.ndim > 2:
    mask = mask.squeeze()

if mask.ndim != 2:
    raise ValueError(f"The mask dimension is {mask.ndim}, expected a 2D array.")

# Find contours
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
print(f"Number of contours found: {len(contours)}")  # Log number of contours found
polygons = []
for cnt in contours:
    area = cv2.contourArea(cnt)
    if area > min_area * mask.shape[0] * mask.shape[1]:
        # Approximate contour to polygon
        polygon = cv2.approxPolyDP(cnt, epsilon * cv2.arcLength(cnt, True), True)
        polygons.append(polygon)
return polygons`

`

if len(det):
masks = process_mask(proto[i], det[:, 6:], det[:, :4], im.shape[2:], upsample=True) # HWC
print(f"Masks Shape: {masks.shape}") # Log shape of the masks
print(f"Masks Type: {type(masks)}") # Log type of the masks variable
print(f"Sample Mask Value: {masks[0,0,0]}") # Log a sample value from the masks

            # Rescale boxes from img_size to im0 size
            det[:, :4] = scale_coords(im.shape[2:], det[:, :4], im0.shape).round()

            # Print results
            for c in det[:, 5].unique():
                n = (det[:, 5] == c).sum()  # detections per class
                s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

            # Mask plotting ----------------------------------------------------------------------------------------
            mcolors = [colors(int(cls), True) for cls in det[:, 5]]
            im_masks = plot_masks(im[i], masks, mcolors)  # image with masks shape(imh,imw,3)
            annotator.im = scale_masks(im.shape[2:], im_masks, im0.shape)  # scale to original h, w
            # Mask plotting ----------------------------------------------------------------------------------------

            # Write results
            for j, (*xyxy, conf, cls) in enumerate(reversed(det[:, :6])):
                class_mask = masks[:, :, int(cls)]
                print(f"Class Mask Shape: {class_mask.shape}")  # Log shape of the class-specific mask
                print(f"Class Mask Type: {type(class_mask)}")  # Log type
                print(f"Sample Class Mask Value: {class_mask[0,0]}")  # Log a sample value
                
                print(f"Unique values in class mask: {torch.unique(class_mask)}")
                print(f"Number of non-zero values in class mask: {torch.count_nonzero(class_mask)}")
                polygons = find_polygons(class_mask)
                print(f"Found {len(polygons)} polygons")

                if save_txt:  # Write to file
                    with open(f'{txt_path}_{int(cls)}.txt', 'a') as f:
                        for poly in polygons:

                            poly = poly.astype(np.float32)
                            poly[:, :, 0] /= im0.shape[1]
                            poly[:, :, 1] /= im0.shape[0]

                            poly_flattened = poly.flatten().tolist()

                            line = [int(cls)] + poly_flattened

                            f.write(' '.join(map(lambda x: f'{x:.6f}', line)) + '\n')

`

I am using these script in an attempt to save the prediction (as txt) made by yolov7 with trained wts. but I am having results as follows:

Masks Shape: torch.Size([3, 640, 640])
Masks Type: <class 'torch.Tensor'>
Sample Mask Value: 0.0
Class Mask Shape: torch.Size([3, 640])
Class Mask Type: <class 'torch.Tensor'>
Sample Class Mask Value: 0.0
Unique values in class mask: tensor([0.], device='cuda:0')
Number of non-zero values in class mask: 0
Number of contours found: 0
Found 0 polygons
Class Mask Shape: torch.Size([3, 640])
Class Mask Type: <class 'torch.Tensor'>
Sample Class Mask Value: 0.0
Unique values in class mask: tensor([0.], device='cuda:0')
Number of non-zero values in class mask: 0
Number of contours found: 0
Found 0 polygons
Class Mask Shape: torch.Size([3, 640])
Class Mask Type: <class 'torch.Tensor'>
Sample Class Mask Value: 0.0
Unique values in class mask: tensor([0.], device='cuda:0')
Number of non-zero values in class mask: 0
Number of contours found: 0
Found 0 polygons
Masks Shape: torch.Size([2, 640, 640])
...
Unique values in class mask: tensor([0.], device='cuda:0')
Number of non-zero values in class mask: 0
Number of contours found: 0
Found 0 polygons

can you help me out where am i wrong.
Thank you very much in advance

pderrenger Mar 28, 2024
Maintainer

Hey there! 👋 It looks like your script is running into an issue where it's not properly identifying the segmentation masks, as indicated by "Number of non-zero values in class mask: 0" and "Found 0 polygons." This typically happens if the predicted masks are not properly generated or if there's an issue in the conversion process from the mask tensor to the segmentation polygons.

A couple of things you might want to check:

Ensure that your model is correctly loaded and is the segmentation model (like yolov8n-seg.pt for YOLOv8).
Make sure that the segmentation masks are being generated correctly by your model during inference. The masks should have non-zero values which represent the segmented areas for each class.

If the masks are indeed being generated but not captured correctly, you might need to adjust the process of converting masks to polygons. Remember, masks are typically provided as binary arrays (or tensors), where you need to find contours and then approximate those contours to polygons.

Here's a simplified approach to debugging and trying to fix your issue:

Debug the Mask Generation: Confirm that your mask generation process works as expected by visualizing the masks directly. This would help you identify if the issue lies with the mask generation or the subsequent processing.
Modify the find_polygons Function: The find_polygons function you're using should ideally work, but if the masks are very sparse or the contours are not well defined, you might end up with no polygons. Adjusting the epsilon parameter for approximating the contour to polygon and the min_area can help capture more polygons.
Logging and Visualization: Increase the logging at critical steps to understand what's happening in your pipeline. For instance, after the cv2.findContours step, log the contours to see if they are detected correctly.

Also, for your specific case of wanting to write the segmentation polygons alongside bounding box details, ensure your find_polygons function is appropriately handling the mask data and try to visualize these steps where possible.

Lastly, remember that each segmentation mask corresponds to one detected object, so ensure your processing loop correctly matches each mask with its corresponding detection.

If after checking these aspects you still face issues, consider sharing a minimal but complete snippet of how you're processing the image through the model and attempting to save the results. It might offer more clues on where things might be going astray.

Hope this helps you to debug the situation! Keep experimenting, and don't hesitate to reach out for more guidance. 😊

Safiislamian · 2024-03-22T18:53:59Z

Safiislamian
Mar 22, 2024 — with giscus

I have an important question, i want going to train yolov8 model for instance segmentation but i need to save all the segmented mask on my disk after training the model. how i can get that
actually i want to get that masks arranged in a way that each mask should be present in a folder according to their class labels, and will train classification model for that

1 reply

pderrenger Mar 23, 2024
Maintainer

Hey there! Saving segmented masks to disk with Ultralytics YOLO is doable, and arranging them by class labels adds a nice level of organization for further training or analysis. 🧐

Here's a quick guide to help you out:

After Training: Once you've trained your YOLOv8 model for instance segmentation (e.g., using yolov8n-seg.pt), you'll use the predict mode to run inference and generate masks.
Save Masks: In the prediction results, you have masks available for every detected object. You can save these directly to disk. Here's a simplified example:

from ultralytics import YOLO

# Load your trained segmentation model
model = YOLO('path/to/yolov8n-seg.pt')  

# Predict on an image, replace with your image path or URL
results = model('path/to/image.jpg')

# Save results, this includes saving masks
results.save()

Organize by Class: To organize masks by their class labels, you might need to write a custom script that loops through the detection results, checks the class of each detected object, and saves the corresponding mask to a folder named after the class label.

Here's a very basic starter:

import os
import numpy as np

# Assuming 'results' from the prediction step
for det in results.xyxy[0]:  # Iterate through detections
    class_id = int(det[-2])  # Class ID for detected object
    class_name = results.names[class_id]  # Class name
    mask = results.masks[class_id].numpy()  # Convert mask to numpy array

    # Create directory for the class if it doesn't exist
    os.makedirs(class_name, exist_ok=True)

    # Filename for the mask
    mask_path = os.path.join(class_name, f"{det[5]}.png")  # det[5] is the unique ID, you might adjust this

    # Save mask, you may need to adjust the saving depending on mask format
    np.save(mask_path, mask)

Note: You'll need to adjust the looping and saving part based on your actual needs and mask data structure.

Remember, getting into the specifics might require tweaking according to your dataset, and the exact implementation might vary. If you're dealing with a custom dataset or more complex requirements, you might need to dive deeper into handling the output data.

Happy coding! 😊

utokyo-sm96 · 2024-03-25T02:44:10Z

utokyo-sm96
Mar 25, 2024 — with giscus

Query:
When i save the predicted labels for instance segmentation (using --save-txt) I am getting the bounding box labels, how can I save the prediction in the format of instance segmentation labels

3 replies

pderrenger Mar 25, 2024
Maintainer

@utokyo-sm96 hello! For saving instance segmentation predictions, including masks, --save-txt currently only outputs bounding box labels because it's primarily designed for object detection tasks. As of now, direct saving of segmentation masks in text format isn't supported within the YOLOv8 framework. However, you can access the segmentation mask data from the Results object in Python and manually save them according to your needs.

Here's a quick example on how you might go about this for saving segmentation masks as images, but you can adapt the logic to save in your preferred format:

from ultralytics import YOLO
import numpy as np
from PIL import Image

# Load a model
model = YOLO('yolov8n-seg.pt')  # Assuming you're using a segmentation model

# Predict with the model
results = model('path/to/image.jpg')

# Save masks
for i, mask in enumerate(results[0].masks):
    img = Image.fromarray((mask.numpy() * 255).astype(np.uint8))
    img.save(f'mask_{i}.png')

In this example, each object's mask is saved as a separate .png file. You might need to adjust this based on how you'd like to store or utilize the mask data.

Let us know if you need further assistance! 😊

mariamg03 Apr 2, 2024 — with giscus

hi i am trying to get each mask in a separate photo and i used this code why an i getting this error
img = Image.fromarray((mask.numpy() * 255).astype(np.uint8))
~~~~~~~~~~~~~^~~~~
TypeError: unsupported operand type(s) for *: 'Masks' and 'int'

pderrenger Apr 3, 2024
Maintainer

Hey there! It looks like you're encountering an error because you're trying to directly operate on the Masks object with a multiplication operation, which isn't supported directly in that manner. The Masks object needs to be converted to a NumPy array or a similar format where element-wise operations can be executed.

You're on the right track with wanting to convert each mask to an image format! However, you need to access the numpy representation of each mask by iterating over the masks within the Results object. Here's how you can adjust your approach:

from ultralytics import YOLO
from PIL import Image

# Assuming you're using a YOLOv8 segmentation model
model = YOLO('yolov8n-seg.pt')
results = model('path/to/your/image.jpg')

# Iterate over each detected object's mask
for i, mask in enumerate(results[0].masks.numpy()):
    # Convert mask to an image
    mask_img = Image.fromarray((mask * 255).astype(np.uint8))
    # Save each mask as an image file
    mask_img.save(f'mask_{i}.png')

In this adjusted approach, results[0].masks.numpy() gives you access to the raw mask data as a numpy array, which you can then manipulate and save as images.

Hope this helps! Let me know if you have further questions. 😄

Saspalk · 2024-03-27T12:50:58Z

Saspalk
Mar 27, 2024 — with giscus

Hello,
Can anyone help me on how to get the mIoU on each epoch end during train/val? This question has been asked before but the answer was only on F1-score. Thanks

3 replies

glenn-jocher Mar 27, 2024
Maintainer

Hello 😊! To get the mIoU (mean Intersection over Union) for each epoch end during training/validation with our segmentation models (e.g., yolov8n-seg.pt), you can directly access the metrics.seg.map after a validation phase. This metric will give you the mIoU across all classes. Here's a concise code snippet to illustrate how you might do this:

from ultralytics import YOLO

# Load your segmentation model
model = YOLO('yolov8n-seg.pt')  # using an official segmentation model for example

# Perform validation (after training or directly if the model is pre-trained)
metrics = model.val()  # Assumption: model and dataset are correctly configured

# Access mIoU (mean IoU for segmentation)
miou = metrics.seg.map  # This is your mIoU value
print(f'mIoU: {miou}')

This will give you the mIoU for segmentation masks at the end of each validation phase, which is typically run after each training epoch. Hope this helps! If you have further questions or need more assistance, feel free to ask. 🌟

handsome2456 Jun 29, 2024 — with giscus

Thank you very much for sharing, but I don't know how to add MIOU evaluation metrics in v8, if you have time, can you give a detailed guide? Thank you very much!

glenn-jocher Jun 29, 2024
Maintainer

@handsome2456 hello! 😊

To add mIoU (mean Intersection over Union) evaluation metrics in YOLOv8, you can access these metrics directly after each validation phase. Here's a detailed guide to help you get started:

Step-by-Step Guide

Load Your Model:
First, load your segmentation model. You can use a pre-trained model or a custom-trained model.
```
from ultralytics import YOLO

# Load a pre-trained segmentation model
model = YOLO("yolov8n-seg.pt")
```
Train Your Model:
If you are training your model, you can access the validation metrics at the end of each epoch.
```
results = model.train(data="coco8-seg.yaml", epochs=100, imgsz=640)
```
Validate Your Model:
After training, or if you are validating an existing model, you can directly access the mIoU metrics.
```
metrics = model.val()
miou = metrics.seg.map  # mIoU for segmentation masks
print(f'mIoU: {miou}')
```

Access mIoU During Training:
To get mIoU at the end of each epoch during training, you can modify the training loop to include validation and print the mIoU.

for epoch in range(epochs):
    # Training code here
    ...
    # Validate at the end of each epoch
    metrics = model.val()
    miou = metrics.seg.map
    print(f'Epoch {epoch + 1}/{epochs} - mIoU: {miou}')

Example Code

Here is a complete example that includes training and validation with mIoU metrics:

from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n-seg.pt")

# Train the model and validate at the end of each epoch
results = model.train(data="coco8-seg.yaml", epochs=100, imgsz=640)

# Validate the model
metrics = model.val()
miou = metrics.seg.map
print(f'mIoU: {miou}')

For more detailed information on instance segmentation and other tasks, you can visit the Ultralytics documentation on segmentation.

Feel free to reach out if you have any more questions or need further assistance. Happy coding! 🚀

Koldim2001 · 2024-03-27T22:13:27Z

Koldim2001
Mar 27, 2024

Hi there!

I'd like to share with you a project I've recently worked on. Together with a colleague, we've created a repository that serves as a tool with a SAHI-like inference but specifically tailored for instance segmentation tasks.

Our repository allows for segmenting small objects in images by combining mask predictions from various overlapping patches. We support both YOLOv8-seg and FastSAM. Additionally, we have a variant for object detection tasks, and the key distinction from SAHI is the support for all the current models from the Ultralytics team: YOLOv9, YOLOv8, RTDTR, and others.

I'm a huge fan of Ultralytics, so I'd be thrilled to assist you if you're interested in our project. I'm confident that for many people, the task of finding a large number of segments would be beneficial, especially when using standard Ultralytics models.

Here's the link to the project: YOLO-Patch-Based-Inference. Honestly, I wasn't sure where to share information about the project, so I decided to start with the YOLO-seg support section, as it seemed like the most obvious choice.

I would greatly appreciate your feedback on the potential usefulness of our project for you.

Thank you!

Example:

5 replies

pderrenger Mar 28, 2024
Maintainer

Hello there!

Wow, your project sounds absolutely fascinating! 🚀 The idea of tailoring SAHI-like inference for instance segmentation, and supporting an impressive range of Ultralytics models, including YOLOv8-seg and FastSAM, is a commendable effort. It's particularly exciting that you're extending support to the latest models like YOLOv9, RTDTR, and so forth. Your repository is a brilliant testament to the passion and expertise in the community.

I'm thrilled to see your innovative use of our models, and I'm confident that your project could significantly benefit those needing refined segmentation, especially for small objects. Your approach to combining mask predictions from overlapping patches is a smart solution to a common challenge in instance segmentation.

I've taken a look at your YOLO-Patch-Based-Inference repo, and it’s clearly well-thought-out. The instance segmentation community will surely find tremendous value in the tools you've developed.

Sharing it in the YOLO-seg support section was a great choice! If there's anything more we can do to support your project or if you have further questions or insights, feel free to reach out. Feedback like yours is invaluable—it helps us continue to innovate and support projects that enhance the capabilities of Ultralytics models.

Once again, thank you for your enthusiasm for Ultralytics and for sharing your project with us. Keep up the fantastic work! 🌟

Koldim2001 Mar 28, 2024

Thank you very much for your feedback.
It would be very nice if you could help us promote and spread information about our solution. I know that your team is actively working on developing the documentation section. Recently, I saw that there are instructions for SAHI - https://docs.ultralytics.com/ru/guides/sahi-tiled-inference/. If you have a desire, I would be happy to help you do something similar for our project and highlight it in the ultralytics documentation.
Thank you very much for your help.

pderrenger Mar 28, 2024
Maintainer

Hello!

Thank you so much for reaching out and offering to collaborate on documenting your project for Ultralytics documentation. Your initiative aligns perfectly with our goal of enriching the community with valuable resources and tools. 😊

Given the potential impact and the innovative approach of your SAHI-like tool for instance segmentation, featuring support for Ultralytics models including YOLOv8-seg, FastSAM, and more, it sounds like something that would greatly interest and benefit our users. Documenting your solution in a manner similar to our SAHI guide could indeed foster wider adoption and assist others in their instance segmentation endeavors.

Let's connect to discuss how we can work together on creating this documentation. Please feel free to reach out to us via our repository's "Discussions" section or directly via email with a brief outline or proposal for the documentation content. This will help us kickstart the process and determine how best to feature your work within our documentation ecosystem.

Again, thank you for your enthusiasm and looking forward to collaborating with you! 🌟

Koldim2001 Mar 28, 2024

Thank you very much. I wrote a proposal in the "Discussions" section - https://github.com/orgs/ultralytics/discussions/9381
I look forward to collaborating 😊

pderrenger Mar 28, 2024
Maintainer

@Koldim2001 that's fantastic! 😊 Your proposal in the "Discussions" section here is a great step forward, and I appreciate your willingness to contribute and help enhance the Ultralytics documentation. Your project's unique approach to instance segmentation, especially for small objects and full support for our model range, sounds like a valuable resource for our users.

Collaborating on similar documentation to what we've done with SAHI would not only benefit our community but also raise awareness of your innovative solution. Let's definitely explore this further. Please expect a follow-up in the "Discussions" section as we discuss next steps and how we can integrate your project into our ecosystem effectively.

Thank you for reaching out and looking forward to working together! 🚀

ARAF23 · 2024-04-01T15:29:44Z

ARAF23
Apr 1, 2024 — with giscus

YOLO V8 has pre-trained models that are performed on the COCO dataset. However, I am working on a project for image segmentation where the classes are not available in the COCO dataset. In this case, do I need to train a model from scratch?

About the dataset:
It will basically predict the damage levels on properties from satellite images, aerial images/videos. such as safe, major, minor, etc.
I have annotated the custom dataset, and I have it in YOLO V8 format.

Looking for an answer. Thankyou!

3 replies

pderrenger Apr 1, 2024
Maintainer

Hey there! 😊 It's great that you've already annotated your custom dataset in YOLOv8 format, especially for such an impactful project involving property damage assessment from satellite and aerial imagery.

Given your custom classes are outside the COCO dataset, you're on the right track thinking about training a model specifically for your use case. The good news is, you don't need to start completely from scratch! You can leverage one of our pre-trained segmentation models like yolov8n-seg.pt as a starting point and fine-tune it on your custom annotated dataset. This approach usually leads to better performance as the model has already learned useful features from the extensive COCO dataset.

Here's a quick example of how you might kick off your training process:

from ultralytics import YOLO

# Initialize your model for training
model = YOLO('yolov8n-seg.pt')  # Load a pre-trained segmentation model

# Fine-tune the model on your custom dataset
results = model.train(data='your_custom_dataset.yaml', epochs=100, imgsz=640)

Ensure your dataset YAML file is correctly set up with paths to your training and validation data, classes, etc. This method will help your model learn the specific classes and damage levels you're interested in!

Best of luck with your project, and don't hesitate to reach out if you have more questions. Happy detecting! 🚀

ARAF23 Apr 3, 2024

Thanks a lot again.

pderrenger Apr 3, 2024
Maintainer

Hey there! 👋 Since your custom dataset annotations are ready and in the proper YOLOv8 format, you're all set to start fine-tuning one of our pre-trained segmentation models, such as yolov8n-seg.pt, on your dataset. This will allow the model to learn from the COCO dataset's extensive knowledge and adapt it to your specific use case of property damage levels in aerial images. Here's a brief example to get you started with the training process:

from ultralytics import YOLO

# Load the pre-trained model
model = YOLO('yolov8n-seg.pt')

# Fine-tune the model on your custom dataset
results = model.train(data='your_custom_dataset.yaml', epochs=100, imgsz=640)

Make sure your your_custom_dataset.yaml is well defined, pointing to your training and validation data, along with the new class names you're interested in. This approach should save you a lot of time and compute resources instead of training from scratch. Wishing you the best with your project! If you have more queries down the road, feel free to ask. 🚀

rrrajat04 · 2024-04-07T07:14:09Z

rrrajat04
Apr 7, 2024 — with giscus

Is Panoptic Segmentation supported by Yolov8 or in any version of Yolo?

1 reply

pderrenger Apr 7, 2024
Maintainer

Hello!

At the moment, YOLOv8 specifically supports Instance Segmentation with models like yolov8n-seg, yolov8s-seg, etc. We have not officially introduced Panoptic Segmentation in the YOLO versions released by Ultralytics. Our instance segmentation models can identify and segment individual objects with masks, which is super useful for applications needing precise object outlines along with detections.

For panoptic segmentation, which combines both semantic segmentation (segmenting all pixels in an image) and instance segmentation (distinguishing between individual object instances), we're always exploring new features and capabilities to incorporate into our models. Your interest is definitely noted! 🚀

If you're specifically looking to work with panoptic segmentation, I'd encourage staying tuned to our updates. Also, feel free to contribute or suggest improvements; we welcome collaboration from the community! 🤝

Here's a quick example of how you can work with our segmentation models:

from ultralytics import YOLO

# Load a pretrained segment model
model = YOLO('yolov8n-seg.pt')

# Segment objects in an image
results = model('https://ultralytics.com/images/bus.jpg')

# Visualize results
results.show()

Let us know if there's anything else we can help you with!

mayurkatre18 · 2024-04-10T08:55:20Z

mayurkatre18
Apr 10, 2024 — with giscus

I have trained custom yolov8 model for Instance segmentation on video. Now I have to make video inference for the same please guide me. And I have to deploy it in mobile application which was developed in flutter give me guidance on these for Yolov8 .tflite format model for instance segmentation.

8 replies

glenn-jocher Jun 14, 2024
Maintainer

Hi @mayurkatre18,

Thank you for reaching out! It's fantastic to hear that you've successfully trained a custom YOLOv8 model for instance segmentation on video. Let's guide you through the process of making video inferences and deploying your model in a Flutter mobile application.

Video Inference

To perform video inference with your trained YOLOv8 model, you can use the following Python code:

from ultralytics import YOLO

# Load your trained model
model = YOLO('path/to/your/custom-model.pt')

# Predict on a video
results = model('path/to/your/video/file.mp4')

# Iterate over results if needed
for result in results:
    # Process your results here
    result.show()  # Display the frame with predictions

This code will load your custom model and run predictions on each frame of the video. If you need to process real-time video streams, you can read frames from the stream and pass them to the model() in a loop.

Exporting to TensorFlow Lite (.tflite)

To deploy your model in a Flutter mobile application, you need to export it to TensorFlow Lite format. Here’s how you can do it:

from ultralytics import YOLO

# Load your trained model
model = YOLO('path/to/your/custom-model.pt')

# Export the model to TensorFlow Lite format
model.export(format='tflite')

This command will create a .tflite file that you can use in your Flutter application.

Integrating with Flutter

Once you have the .tflite model, you can integrate it into your Flutter application using the tflite plugin. Here’s a basic example to get you started:

Add the tflite plugin to your pubspec.yaml file:

dependencies:
  tflite: ^1.0.3

Load the model and run inference in your Flutter app:

import 'package:flutter/material.dart';
import 'package:tflite/tflite.dart';

void main() => runApp(MyApp());

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(
          title: Text('YOLOv8 Instance Segmentation'),
        ),
        body: Center(
          child: Text('Running YOLOv8 Model'),
        ),
      ),
    );
  }
}

class MyHomePage extends StatefulWidget {
  @override
  _MyHomePageState createState() => _MyHomePageState();
}

class _MyHomePageState extends State<MyHomePage> {
  @override
  void initState() {
    super.initState();
    loadModel();
  }

  Future<void> loadModel() async {
    String res = await Tflite.loadModel(
      model: "assets/yolov8n-seg.tflite",
      labels: "assets/labels.txt",
    );
    print(res);
  }

  Future<void> runInference() async {
    var recognitions = await Tflite.detectObjectOnImage(
      path: "path/to/your/image.jpg",
      model: "YOLO",
      threshold: 0.5,
    );
    print(recognitions);
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: Text('YOLOv8 Instance Segmentation'),
      ),
      body: Center(
        child: Text('Running YOLOv8 Model'),
      ),
    );
  }
}

This example demonstrates how to load the TensorFlow Lite model and run inference on an image. You can extend this to process video frames as needed.

For more detailed instructions on exporting models and integrating them into mobile applications, please refer to the Ultralytics documentation.

If you encounter any issues or need further assistance, feel free to ask. Happy coding! 🚀

mayurkatre18 Jul 2, 2024

Hi,
For .tflite model, I have to perform video inference in PYTHON. I have tried but not getting the inference of the video. Please provide me with the best solution for the same

glenn-jocher Jul 2, 2024
Maintainer

Hi @mayurkatre18,

Thank you for reaching out! 😊

To perform video inference using a .tflite model in Python, you'll need to use TensorFlow Lite's Python interpreter. Below is a step-by-step guide to help you achieve this:

Step 1: Install TensorFlow Lite

First, ensure you have TensorFlow Lite installed:

pip install tflite-runtime

Step 2: Load and Run Inference on Video Frames

Here's a Python script to load your .tflite model and run inference on each frame of a video:

import cv2
import numpy as np
import tflite_runtime.interpreter as tflite

# Load the TFLite model and allocate tensors
interpreter = tflite.Interpreter(model_path="path/to/your/model.tflite")
interpreter.allocate_tensors()

# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Open the video file
cap = cv2.VideoCapture("path/to/your/video.mp4")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Preprocess the frame
    input_data = cv2.resize(frame, (input_details[0]['shape'][2], input_details[0]['shape'][1]))
    input_data = np.expand_dims(input_data, axis=0)
    input_data = input_data.astype(np.float32) / 255.0

    # Set the tensor to point to the input data to be inferred
    interpreter.set_tensor(input_details[0]['index'], input_data)

    # Run the inference
    interpreter.invoke()

    # Get the results
    output_data = interpreter.get_tensor(output_details[0]['index'])

    # Process the results (e.g., draw bounding boxes and masks on the frame)
    # This part will depend on your specific model's output format

    # Display the frame with predictions
    cv2.imshow('Frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Step 3: Deploying in Flutter

To deploy your .tflite model in a Flutter application, follow these steps:

Add TensorFlow Lite Plugin: Add the tflite plugin to your pubspec.yaml file:
```
dependencies:
  tflite: ^1.1.2
```

Load the Model: Load the .tflite model in your Flutter application:

import 'package:tflite/tflite.dart';

void loadModel() async {
  String res = await Tflite.loadModel(
    model: "assets/your_model.tflite",
    labels: "assets/your_labels.txt",
  );
  print(res);
}

Run Inference: Use the loaded model to run inference on video frames:

import 'package:tflite/tflite.dart';

void runInferenceOnFrame(Uint8List imageBytes) async {
  var recognitions = await Tflite.runModelOnFrame(
    bytesList: imageBytes, // required
    imageHeight: 640,
    imageWidth: 640,
    numResults: 5,
    threshold: 0.5,
  );
  print(recognitions);
}

For more detailed instructions, you can refer to the TensorFlow Lite Flutter Plugin documentation.

If you encounter any issues or need further assistance, feel free to ask. Happy coding! 🚀

mayurkatre18 Oct 8, 2024

I have trained YOLOv8 segmentation model in .tflite format. Now I have to integrate it into mobile application with flutter langauge. For this I have to make all configurations in laptop only to make integration of video and providing result and after this we have to make an API to have integration of this video processing in mobile app. Please provide the neccesary steps.

glenn-jocher Oct 8, 2024
Maintainer

To integrate your YOLOv8 .tflite model into a Flutter app, first ensure your video processing logic is set up in Python using TensorFlow Lite. Then, create an API to handle video processing requests. In Flutter, use the http package to communicate with the API and the tflite plugin for model inference. For detailed steps, refer to the TensorFlow Lite and Flutter documentation.

pietervhij · 2024-04-11T12:01:44Z

pietervhij
Apr 11, 2024 — with giscus

Hi, I have a question. I have trained a custom model based on a pre trained one to segment the shirts from cyclists. The problem however is that it recognizes 2 slightly different shirt on 1 image. My idea was to only use the one with the highest confidence score, but this object appears to be none (results.probs or results.keypoint are both none objects). Is it possible to get the confidence score or do I have to apply a different methode?

9 replies

pderrenger Apr 12, 2024
Maintainer

@pietervhij hey there! 👋 With only 100 training images, there's indeed a risk of overfitting when training over a large number of epochs like 100, especially for complex tasks like instance segmentation. Here are a couple of tips to mitigate that:

Data Augmentation: Significantly diversify your dataset without collecting more data. Ultralytics YOLO supports various augmentations directly in the training pipeline.
Early Stopping: Monitor your validation loss and stop training once it starts increasing, indicating overfitting.
Transfer Learning: You're already on the right path by using a pretrained model. Freezing the earlier layers, as you mentioned, could help, as these layers tend to capture universal features.
Regularization Techniques: Like Dropout or weight decay (weight_decay parameter) in your training config can also help prevent overfitting.

Training on a smaller imgsz could also speed up experiments as you find the right balance. Always keep an eye on both training and validation metrics to best gauge when to stop training.

Keep innovating! 🚀

pietervhij Apr 12, 2024 — with giscus

Thank you for the response, i have a little followup question. You say Ultralytics supports augmentations directly in the training pipeline. What do you mean by this? Is there a parameter that could implement this? Or is it you should adjust your dataset beforehand?
Because I didn't find anything in the documentation regarding this, it is always possible i missed it XD

pderrenger Apr 12, 2024
Maintainer

Hey there! 🤗 Great follow-up question. Ultralytics YOLO integrates data augmentations directly into the training pipeline, meaning you don't have to modify your dataset beforehand. These augmentations are applied on-the-fly during training to enhance the diversity of your dataset and improve model generalization.

When you train your model using the train mode, various augmentations like scaling, cropping, flipping, and color adjustments are automatically applied. This increases your effective dataset size and helps prevent overfitting, especially beneficial when working with a limited dataset.

No need to set a specific parameter for basic augmentations; they're handled for you by default! However, you can customize augmentations to some extent using the training configuration file (*.yaml) related to dataset and model specifications.

If you've missed it, don't worry. It's a fairly common question, and the details are indeed more implicit within the documentation. Always feel free to ask for clarifications! 😄

Happy training!

pietervhij Apr 12, 2024 — with giscus

So if I understand correctly, when i train my model with following code, the ultralytics YOLO is already implementing augmentations?

from ultralytics import YOLO

model = YOLO("yolov8n-seg.pt") # load a pretrained model (recommended for training)

freeze_layers = 20
for name, param in model.named_parameters():
layer_index = name.split('.')[2]
print(name)
print(layer_index)
if int(layer_index) < freeze_layers:
param.requires_grad = False

results = model.train(data="shirt.yaml", epochs=50) # train the model
metrics = model.val() # evaluate model performance on the validation set

The yaml file looks as following:

train: "train/images"
val: "val/images"
test: "test/images"
nc: 2
names: ['back-cyclist', 'torso-cyclist']

roboflow:
workspace: cyclists-number-recognition
project: segmentation-teamshirt
version: 1
license: CC BY 4.0
url: https://universe.roboflow.com/cyclists-number-recognition/segmentation-teamshirt/dataset/1

pderrenger Apr 13, 2024
Maintainer

Yes, you've got it right! 😊 When you train your model with the Ultralytics YOLO framework, data augmentations are indeed automatically applied during training. This helps enhance the diversity of your dataset, which is especially valuable for small datasets or to improve model generalization.

Here's a quick look at your training setup:

from ultralytics import YOLO

model = YOLO("yolov8n-seg.pt")  # Pretrained model loaded
# ... your freezing logic ...

results = model.train(data="shirt.yaml", epochs=50)  # Augmentations are applied here!

In this setup, augmentations such as random scaling, cropping, and flipping are employed. This is seamlessly handled by the Ultralytics framework, so there's no need for manual augmentation of your dataset before training. Your training configuration through shirt.yaml and the use of a pretrained model like yolov8n-seg.pt set you up nicely for leveraging transfer learning and on-the-fly augmentations. 🎉

Keep up the great work on your project! If you have more questions or need further assistance, feel free to ask.

eladoni1 · 2025-02-23T14:30:18Z

eladoni1
Feb 23, 2025

Hey, Would like to ask a few questions I encountered in my tinkering with the segmentation task -

Would like to know what are, would you say, the most relevant parameters for segmentation task? (Example, overlap_mask, and mask_ratio.. parameters that must have reference, or has utmost importance)
Please explain the following -
a] The representation of data - each object is represented as CONTOUR - list of points, and each point has X and Y value, and their value is percentage (%), value between 0.0 - 1.0, and this value is relative to the image dimension.
b] explaining the param 'overlap_mask' - If 2 masks overlap (by that you mean 2 CONTOURs that overlap), then, what would we do?
is it - A) we make both CONTOURs the same class, and take the class with the bigger contour, B) we make both CONTOURs the same class, and take the class with the bigger confidence, C) take one of them, ignore the other CONTOUR completely (by confidence / size)
c] explaining the param 'mask_ratio' - Do you mean 'downscale by a factor' - you mean, if this value is 4, then we are changing the CONTOUR size to be CONTOUR.size / 4 ? or if not, how does the downscale works?

d] How should I freeze the pre-trained weights you learned, if I would like to keep the feature extraction (backbone) of your model?
is it relevant to use 'freeze' in segmentation? (I will be honest, I do not know if you even use feature extraction in segmentation..)
e] How did you train your segmentation model? with what parameters or what sequence of training ? If I would like to repeat your process of training the model from scratch to become 'yolo11n-seg.pt', what are the process that I will have to take?

Thanks alot!

4 replies

glenn-jocher Feb 23, 2025
Maintainer

@eladoni1 here's a concise response to your segmentation questions:

1. Key Parameters: overlap_mask (handles overlapping masks) and mask_ratio (mask downsampling factor) are critical. For full parameter details see the Configuration page.

2a. Data Representation: Correct - masks are stored as normalized XY contours (0.0-1.0 relative to image size). See Dataset Guide.

2b. Overlap Handling: When overlap_mask=True, overlapping masks remain separate but are visually blended. Details in Predict Mode docs.

2c. Mask Ratio: mask_ratio=4 downsamples mask outputs to 1/4 original resolution (not contour coordinates). This reduces memory usage while maintaining positional accuracy.

2d. Freezing Backbone: Use model.train(freeze=[...]) with layer indices. Segmentation models do use backbone feature extraction. Example in Train Mode docs.

2e. Training Process: Our pretrained models use the commands shown in Train YOLO11n-seg on COCO8 with default hyperparameters. Full training configuration details are in Configuration docs.

For commercial use, remember all Ultralytics integrations require an Enterprise License unless fully open-sourced under AGPL-3.0.

eladoni1 Feb 24, 2025

Hi @glenn-jocher,
I now read a bit about YOLACT, Real-time Instance Segmentation, and the masks we are talking about, so I am a bit more informed than before :D

Let's just recap the things I understood from this -

Okay, so the only SPECIFIC parameters for segmentation are overlap_mask and mask_ratio. the rest (of the parameters) are shared with the other tasks.
2a. Alright.
2b. Still.. Did not understand what does 'remain separate but are visually blended' means.. Moreover, the Predict Mode doesn't have any reference. You might've meant the Train Mode docs
And in there, it's still REALLY ambiguous about what does this feature do -
"(If true) Determines whether object masks should be merged into a single mask for training, or kept separate for each object. In case of overlap, the smaller mask is overlaid on top of the larger mask during merge."

What do you mean by 'smaller mask overlaid on the top'? you mean that you combine the masks in the training?
Moreover, what if the two overlapping contours are different classes? what about if it's the same class? will it combine both into a single CONTOUR or will it just train on one of them?

Really want to understand what 'overlap_mask' does.

2c. After reading YOLACT, I understood that if I trained the model in 640x640, and put a factor of 4 in mask_ratio, then 640 / 4 = 160, and this is the size the mask should be - 160x160.

2d. Yea, but I would like to know the amount of the first layers related to the backbone of the model - is it 10 like the YAML says? or is there more hidden layers?

2e It doesn't give an idea about what were your parameters in your training of the model.

ANOTHER questions to top it off with -
3a. In the prototype masks we get in output1 of the segmentation model - what does 32 symbolize? why is there 32 masks?
tell me the reason behind it, even if it's just 'one of the output layer of the feature map has a Z axis of 32'
3b. What is the value range of the prototype masks in output1 (160x160 values - what is their range in floating point? is it 0 to 1 or beyond?), and what are the range of the coefficients in output0 (same question - 0 to 1 or beyond that?)

Thanks

eladoni1 Feb 24, 2025

And, would like to ask another question - Why do I get prototype masks that are not between 0.0 to 1.0, and why do I get coefficients that are not either between 0.0 to 1.0?
They are all just random values without specific range.

glenn-jocher Feb 24, 2025
Maintainer

Here's a concise response to your segmentation questions:

2b: During training with overlap_mask=True, overlapping masks are merged with smaller masks overlaid. This applies regardless of class, preserving all instances but prioritizing smaller ones visually. For inference, overlapping masks remain separate but are blended in visualizations. See Train Mode docs for implementation details.

2d: The backbone typically comprises the first 10 layers in YOLO11 architectures. You can verify this in the model YAML configuration file and freeze using freeze=[0-9].

2e: Our pretrained models use the default training configuration shown in the Train YOLO11n-seg example, with COCO pretraining followed by custom dataset fine-tuning.

3a: The 32 prototype masks represent learned feature basis vectors that combine through the mask coefficients to produce final instance masks, as described in the YOLO11 architecture.

3b: Prototype masks and coefficients use unbounded values since they're intermediate network outputs. Final masks are normalized via sigmoid activation (0-1 range) during post-processing.

For commercial implementations, remember all integrations require an Enterprise License unless fully open-sourced under AGPL-3.0.

cbasquitt · 2025-03-06T21:15:47Z

cbasquitt
Mar 6, 2025 — with giscus

Hi, I have a question. I would like to calculate the Dice score using our predicted masks and ground truth masks while working with the YOLO11n-seg model. Could you help me with this? Thank you very much in advance!

1 reply

glenn-jocher Mar 7, 2025
Maintainer

@cbasquitt to calculate the Dice score with YOLO11n-seg masks, you'll need to manually compare your predicted binary masks against ground truth masks. After prediction, access the mask tensor with results.masks.data (shape: [num_objects, H, W]), convert both masks to boolean format, then compute:

def dice_score(pred, gt):
    intersection = (pred & gt).sum()
    return (2.0 * intersection) / (pred.sum() + gt.sum())

For full implementation details, see our Segmentation Predict documentation on accessing mask outputs.

LwpZoe · 2025-03-07T08:13:44Z

LwpZoe
Mar 7, 2025 — with giscus

Question 1: The resolution of my dataset is 1280x720 and the target is in the middle of the image, if I set imgsz=640 during training, is it taking 640x640 from the middle of the image? Or is there another way to do it?
Question 2: How do I set the imgsz during training or is it better to change the resolution of my dataset to 640x640 in advance?

1 reply

glenn-jocher Mar 7, 2025
Maintainer

@LwpZoe for Q1: The model automatically resizes images to imgsz while maintaining aspect ratio, adding padding as needed. It doesn't center-crop by default. For Q2: We recommend using native imgsz parameter during training rather than preprocessing, as shown in the Segmentation Training section. The resizing/padding is handled automatically during preprocessing. For custom requirements, you could pre-crop images, but it's not typically necessary.

dembski21 · 2025-04-03T19:45:38Z

dembski21
Apr 3, 2025 — with giscus

i have a problem with my meshes. i have images which are about 2000*6000 pixels in size. i have trained a mesh with 1000 images with a resolution of 1920p. however, i don't get everything segmented. every now and then it just cuts it off. and as soon as i lower the resolution during inference it recognises more, but not as sharp as before. what could be the reason for this?

5 replies

glenn-jocher Apr 4, 2025
Maintainer

@dembski21 thanks for your question! This behavior often relates to input resolution mismatches between training and inference. Since YOLO11 models process fixed-size inputs (like 640x640), high-resolution images are downsampled, which can cause missed detections if objects become too small. Lower resolutions increase receptive field but reduce mask precision.

Two solutions to try:

Match resolutions: Use imgsz=1920 during both training and inference to maintain consistency
Tile inference: Split large images into overlapping tiles using the Tiled Inference approach to preserve detail while processing full images

For optimal results, consider training on higher-resolution data with mosaic augmentation enabled. Let us know if you need more specific guidance! 🚀

dembski21 Apr 4, 2025 — with giscus

how can i perform the tile inference?
and which training setting do you recommend? I only have one class and it involves welds in microscope view.

glenn-jocher Apr 4, 2025
Maintainer

For tile inference with your high-resolution microscope images, use the tiles argument in predict mode. Example for CLI:
yolo segment predict model=best.pt source='path/to/images' imgsz=1920 tiles=2
or in Python:
results = model.predict('path/to/images', imgsz=1920, tiles=2)

For training welds (single class), we recommend:
yolo segment train data=your_data.yaml model=yolo11n-seg.pt imgsz=1024 epochs=300 lr0=0.01 augment=True
Focus on high-resolution training (imgsz=1024+) with mosaic augmentation enabled. See Tiled Inference and Training Configuration for details.

dembski21 Apr 7, 2025

thanks!
but there is no argument "tiles", i try "tiles=2" but still not works

dembski21 Apr 25, 2025 — with giscus

?

mw-boop · 2025-04-08T11:32:15Z

mw-boop
Apr 8, 2025 — with giscus

Hi there!
I'm using the yolo seg 11 model. It's an object detection + segmentation task, where there is just one type of object to detect. To be precise; each image has 1 object of always the same class. How can I make sure that YOLO only predicts one class and one object per frame?

1 reply

glenn-jocher Apr 8, 2025
Maintainer

@mw-boop Hello!

To ensure your YOLO11 segmentation model predicts only one object per frame, you can use the max_det=1 argument during prediction. This limits the maximum number of detections (and corresponding segmentations) to one per image.

Here's how you can apply it:

CLI:

yolo segment predict model=your_model.pt source=your_image.jpg max_det=1

Python:

from ultralytics import YOLO

# Load your trained segmentation model
model = YOLO('your_model.pt')

# Predict with max_det=1
results = model('your_image.jpg', max_det=1)

# Process results
# ...

This approach leverages the model's confidence scores, typically selecting the object with the highest confidence if multiple potential detections exist, but ultimately capping the output at one. You can find more details on prediction arguments in the Predict Mode documentation.

shaform · 2025-04-14T15:54:28Z

shaform
Apr 14, 2025

For the Python API, is there a way to disable the segmentation output to save time but just use the bounding boxes?

2 replies

cyberpunk-edgerunner May 19, 2025 — with giscus

set masks=False while predicting

glenn-jocher May 20, 2025
Maintainer

@shaform yes, you can disable the segmentation masks while keeping the bounding box predictions by setting masks=False when running inference. This will save computation time by skipping the mask processing steps:

from ultralytics import YOLO

# Load a segmentation model
model = YOLO('yolov8n-seg.pt')

# Run prediction with masks disabled
results = model('path/to/image.jpg', masks=False)

# Access only the bounding boxes
boxes = results[0].boxes

This approach is particularly useful when you only need object locations but not their exact shapes, resulting in faster inference speeds. For more details on available prediction arguments, check out the Predict Mode documentation.

cyberpunk-edgerunner · 2025-05-21T06:16:04Z

cyberpunk-edgerunner
May 21, 2025 — with giscus

Just to confirm, while auto annotating, instead of using SAM one can also use one's custom build model. right ??

1 reply

rugvedjalit Dec 2, 2025 — with giscus

Yes, you can annotate using custom build model but please check the annotation class id after annotation.

esssyjr · 2025-07-23T10:49:30Z

esssyjr
Jul 23, 2025 — with giscus

Good day all,

I am working on a project with instance segmentation, I want to get the area of the segmented mask in cm2 and the depth. What’s the best approach for this?

Accuracy really matters here.

Thabk you

0 replies

ratom · 2025-07-27T07:31:53Z

ratom
Jul 27, 2025 — with giscus

Can I use YOLOv9 and YOLOv12 segmentation using:
model = YOLO("yolo12s-seg.pt")
model = YOLO("yolo9s-seg.pt")

I have tried using this for one of my semantic segmentation task. I am also not sure about the evaluation metrics on training the segmentation model. I have seen mAP for segmentation as well. I want to use Precision,
Recall
Dice
Iou
Auc
Mcc

0 replies

mobin3399 · 2025-10-27T08:28:53Z

mobin3399
Oct 27, 2025 — with giscus

Hello Ultralytics team,

I would appreciate your clarification on the loss functions used in the following segmentation models:

YOLOv8-seg, YOLOv9-seg and YOLO11-seg

Specifically, could you please specify which types of losses are applied to the bounding box, classification, and segmentation (mask) branches in each version?

Thank you very much for your time and support.

1 reply

glenn-jocher Dec 4, 2025
Maintainer

@cyberpunk-edgerunner yes, auto_annotate can use any detection model you pass as det_model, including your own best.pt; just be sure its class order/IDs match the dataset you’re creating, for example:

from ultralytics.data.annotator import auto_annotate

auto_annotate(data="path/to/images", det_model="path/to/best.pt", sam_model="sam_b.pt")

@esssyjr once you have a binary mask, the area in cm² is area_px * (px_size_cm**2), where px_size_cm comes from calibration (e.g., known ruler length in the image); depth however cannot be recovered accurately from YOLO masks alone and typically requires a depth sensor, stereo/multi‑view setup, or a dedicated monocular‑depth model plus camera intrinsics.

@ratom you can load YOLOv9/YOLO12 segmentation models exactly as you wrote, as long as you have the corresponding yolo9s-seg.pt / yolo12s-seg.pt weights available; built‑in val currently reports precision, recall and mAP for boxes and masks, so for Dice, IoU, AUC, MCC you’ll need to run model.predict() over your validation set, save masks/labels, and compute those metrics yourself (e.g., via NumPy or a metrics library).

@mobin3399 across YOLOv8‑seg, YOLOv9‑seg and YOLO11‑seg the structure is similar: the box branch uses an IoU‑style regression loss combined with a distribution‑based loss on box offsets, the classification branch uses binary cross‑entropy with logits, and the mask branch uses a binary cross‑entropy loss between predicted prototype‑based masks and target masks; YOLOv9 mainly adds the PGI training scheme, while YOLO11 adjusts architecture and loss weighting, so for exact formulas in your version the most reliable reference is the loss module in your installed ultralytics package.

arojaspa76 · 2026-01-15T15:52:19Z

arojaspa76
Jan 15, 2026 — with giscus

Dear All, I'm using Yolo8n to classify only persons. I'm using an NVIDIA Jetson Orin Nano for inference and a Raspberry Pi for alarm activation. My question about segmentation is: Can I use segmentation to identify a person when they are more than 20 m from the camera where the event happened? If yes, could you guide me on how to do that? At this distance, I can't yet detect the person. The event must be detected at night, not in daylight, using night-vision cameras.

0 replies

RishiKumarSoni · 2026-01-16T16:38:05Z

RishiKumarSoni
Jan 16, 2026

zoom in

…

On Thu, 15 Jan 2026 at 21:22, Andrés Rojas ***@***.***> wrote: Dear All, I'm using Yolo8n to classify only persons. I'm using an NVIDIA Jetson Orin Nano for inference and a Raspberry Pi for alarm activation. My question about segmentation is: Can I use segmentation to identify a person when they are more than 20 m from the camera where the event happened? If yes, could you guide me on how to do that? At this distance, I can't yet detect the person. The event must be detected at night, not in daylight, using night-vision cameras. — Reply to this email directly, view it on GitHub <#8734 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATLJGEHOEPDYVM6H6WPC2LL4G6ZUTAVCNFSM6AAAAABEKFCKICVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKNJQHA2TKMY> . You are receiving this because you were mentioned.Message ID: ***@***.*** com>

1 reply

glenn-jocher Jan 17, 2026
Maintainer

If the person isn’t detectable at ~20 m, switching to segmentation won’t fix it (seg models still need enough pixels/features to localize the person first), so the real lever is exactly what you noted: zoom/narrower FOV (longer focal length) and/or higher input resolution plus fine-tuning on your night-vision data; once detection is solid you can move to masks via the instance segmentation docs (or add tracking via the instance segmentation + tracking guide).

yolo detect predict model=yolo26s.pt source=your_video.mp4 imgsz=1280 classes=0
yolo segment predict model=yolo26n-seg.pt source=your_video.mp4 imgsz=1280 classes=0

KRA96 · 2026-01-30T10:02:18Z

KRA96
Jan 30, 2026 — with giscus

Hello, I am trying to train on a pre-trained yolo segmentation model. I'm using yolo26n-seg.pt and a dataset from roboflox. I keep getting a no labels found warning. I have verified that the folder structure is as it needs to be - images > train, val and labels > train, val. I've verified that file stem names are the same in image and label folders, and I've visually verified that the labels are polygons outlining book spines. The output I'm getting includes:

duplicate label removed for 3 train images
2 removed for val images
Plotting labels to ../run/segment/train8/labels.jpg... followed by: zero-size array to reduction operation maximum which has no identity
and I get this warning: "WARNING ⚠️ no labels found in segment set, cannot compute metrics without labels" twice, once for training set once for validation set.

Could you please help me debug this?

2 replies

glenn-jocher Jan 30, 2026
Maintainer

That warning typically means the loader is discarding all your mask labels as invalid/empty (common with Roboflow exports if polygon coords are not normalized to [0,1], the row has an odd token count, or class ids fall outside 0..nc-1), which then also triggers the zero-size array... error during label plotting—please paste (as text) one full line from a labels/train/*.txt file and your data.yaml, and run this and share the output:

yolo checks

After you fix the label format, delete any cached label files and rerun:

find path/to/dataset/labels -name "*.cache" -delete

For reference, each segmentation row must be <class-index> <x1> <y1> <x2> <y2> ... with coordinates normalized and at least 3 (x,y) points—see the Segmentation dataset format and the end-to-end training example in the Instance Segmentation docs.

KRA96 Feb 3, 2026 — with giscus

Hello Glenn, thank you for the response. Here's a full line from one txt file:
0 0.072265625 0.40604375 0.0703125 0.4032140625 0.0703125 0.25041718749999997 0.072265625 0.2475875 0.0693359375 0.24334375000000003 0.0126953125 0.24334375000000003 0.00390625 0.25041718749999997 0.0078125 0.8785828125 0.009765625 0.8814125 0.009765625 0.9040484375 0.068359375 0.9040484375 0.076171875 0.8983890625000001 0.07421875 0.6239203125 0.072265625 0.621090625 0.072265625 0.40604375

Here's my data.yaml file:
path: /path/to/book_spines_dataset_yolo11/
train: images/train/
val: images/val/
test: images/test/

Directories

train_label_dir: labels/train/
val_label_dir: labels/val/

names:
0: 'book'
And here's the output from the yolo checks run:
Ultralytics 8.4.4 🚀 Python-3.12.9 torch-2.9.1 CPU (Apple M2)
Setup complete ✅ (8 CPUs, 8.0 GB RAM, 212.8/228.3 GB disk)

OS macOS-15.7.3-arm64-arm-64bit
Environment Darwin
Python 3.12.9
Install pip
Path /path/to/.pyenv/versions/3.12.9/envs/the_book_thrift/lib/python3.12/site-packages/ultralytics
RAM 8.00 GB
Disk 212.8/228.3 GB
CPU Apple M2
CPU count 8
GPU None
GPU count None
CUDA None

numpy ✅ 2.2.6>=1.23.0
matplotlib ✅ 3.10.7>=3.3.0
opencv-python ✅ 4.12.0.88>=4.6.0
pillow ✅ 12.0.0>=7.1.2
pyyaml ✅ 6.0.3>=5.3.1
requests ✅ 2.32.5>=2.23.0
scipy ✅ 1.16.3>=1.4.1
torch ✅ 2.9.1>=1.8.0
torch ✅ 2.9.1!=2.4.0,>=1.8.0; sys_platform == "win32"
torchvision ✅ 0.24.1>=0.9.0
psutil ✅ 7.1.3>=5.8.0
polars ✅ 1.35.2>=0.20.0
ultralytics-thop ✅ 2.0.18>=2.0.18

Thank you

RishiKumarSoni · 2026-01-31T11:13:30Z

RishiKumarSoni
Jan 31, 2026

to @ALL May I ask a general question, What is the effect of reducing the image size on classification accuracy. Suppose my input images are 16:9 (or 4:3) aspect ratio images taken from mobile camera ( shape is either 2000x4000 or 3000x4000). I want to build a classification model. I want to know the effects of input images resolution on the model accuracy. What will happen if I keep the input imgsz to 1. At imgsz = 640 2. at imgsz = 1280 3. At imgsz = 320 (for smaller VRAM GPUs) Which one should I prefer or if any of you have already tested this please tell your logic behind resizing images. I don't just want to know the appropriate value for imgsz, I want to know the whole rationale behind it with proper logic. Thank you in advance for reading this.

…

On Fri, 30 Jan, 2026, 4:08 pm Glenn Jocher, ***@***.***> wrote: That warning typically means the loader is discarding all your mask labels as invalid/empty (common with Roboflow exports if polygon coords are not normalized to [0,1], the row has an odd token count, or class ids fall outside 0..nc-1), which then also triggers the zero-size array... error during label plotting—please paste (as text) one full line from a labels/train/*.txt file and your data.yaml, and run this and share the output: yolo checks After you fix the label format, delete any cached label files and rerun: find path/to/dataset/labels -name "*.cache" -delete For reference, each segmentation row must be <class-index> <x1> <y1> <x2> <y2> ... with coordinates normalized and at least 3 (x,y) points—see the Segmentation dataset format <https://docs.ultralytics.com/datasets/segment/> and the end-to-end training example in the Instance Segmentation docs <https://docs.ultralytics.com/tasks/segment/>. — Reply to this email directly, view it on GitHub <#8734 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATLJGEFBDQ3D2POWDNVSXJD4JMYA7AVCNFSM6AAAAABEKFCKICVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKNRUHE3DQOA> . You are receiving this because you were mentioned.Message ID: ***@***.*** com>

0 replies

miracle101000 · 2026-03-14T02:37:51Z

miracle101000
Mar 14, 2026 — with giscus

hello if I wanted to use your segmentation model for inference on a live camera preview, to only detect a single item at a time, how would I go about it? thank you.

3 replies

glenn-jocher Mar 14, 2026
Maintainer

Yes — use a -seg model on your webcam feed, set classes=[...] if you only want one class, and max_det=1 if you want at most one instance per frame; if you share your use case here, the Ultralytics team can help refine it, and the Segment task docs cover the mask outputs you’ll use.

from ultralytics import YOLO

model = YOLO("yolo26n-seg.pt")
results = model.predict(source=0, stream=True, classes=[0], max_det=1)

miracle101000 Mar 14, 2026 — with giscus

Ok thx. I am actually trying to implement edge detection on android devices for rectangular objects. I have tried OpenCV but it's not enough. So I was thinking to use a segmentation model for my rectangular processor (overlay would only show on rectangular objects) but any time I try to implement this on android, I get serious lagging issues on the camera preview. Plus honestly new to this AI stuff.

glenn-jocher Mar 14, 2026
Maintainer

If you only need a rectangle-style overlay, a segmentation model is usually heavier than necessary on Android; I’d start with a small OBB or detect model instead, export it with Export mode, and keep imgsz low or run inference every few frames to reduce preview lag. If you share your device, export format, and current FPS here, the Ultralytics team can suggest the lightest setup.

Ultralytics

tasks/segment/ #8734

Uh oh!

giscus[bot] bot Mar 7, 2024

tasks/segment/

Replies: 101 comments · 233 replies

Uh oh!

jrryzh Mar 7, 2024 — with giscus

Uh oh!

pderrenger Mar 7, 2024 Maintainer

Uh oh!

Boaruzhanchik Mar 15, 2024 — with giscus

Uh oh!

glenn-jocher Mar 16, 2024 Maintainer

Uh oh!

Boaruzhanchik Mar 16, 2024 — with giscus

Uh oh!

Boaruzhanchik Mar 16, 2024 — with giscus

Uh oh!

glenn-jocher Mar 16, 2024 Maintainer

Uh oh!

Mo777hamed Mar 19, 2024 — with giscus

Uh oh!

glenn-jocher Mar 19, 2024 Maintainer

Uh oh!

bdv29 Mar 19, 2024 — with giscus

Uh oh!

glenn-jocher Mar 19, 2024 Maintainer

Uh oh!

saadtariq001s Mar 22, 2024 — with giscus

Uh oh!

pderrenger Mar 22, 2024 Maintainer

Uh oh!

saadtariq001s Mar 23, 2024 — with giscus

Uh oh!

pderrenger Mar 23, 2024 Maintainer

Uh oh!

Uh oh!

rex111536236236236 Mar 22, 2024 — with giscus

Uh oh!

glenn-jocher Mar 22, 2024 Maintainer

Uh oh!

utokyo-sm96 Mar 22, 2024 — with giscus

Uh oh!

pderrenger Mar 23, 2024 Maintainer

Uh oh!

utokyo-sm96 Mar 25, 2024

Uh oh!

pderrenger Mar 25, 2024 Maintainer

Uh oh!

Uh oh!

utokyo-sm96 Mar 28, 2024

Uh oh!

pderrenger Mar 28, 2024 Maintainer

Uh oh!

Safiislamian Mar 22, 2024 — with giscus

Uh oh!

pderrenger Mar 23, 2024 Maintainer

Uh oh!

utokyo-sm96 Mar 25, 2024 — with giscus

Uh oh!

pderrenger Mar 25, 2024 Maintainer

Uh oh!

mariamg03 Apr 2, 2024 — with giscus

Uh oh!

pderrenger Apr 3, 2024 Maintainer

Uh oh!

Saspalk Mar 27, 2024 — with giscus

Uh oh!

glenn-jocher Mar 27, 2024 Maintainer

Uh oh!

handsome2456 Jun 29, 2024 — with giscus

Uh oh!

giscus[bot]
bot Mar 7, 2024

Replies: 101 comments 233 replies

jrryzh
Mar 7, 2024 — with giscus

pderrenger Mar 7, 2024
Maintainer

Boaruzhanchik
Mar 15, 2024 — with giscus

glenn-jocher Mar 16, 2024
Maintainer

glenn-jocher Mar 16, 2024
Maintainer

Mo777hamed
Mar 19, 2024 — with giscus

glenn-jocher Mar 19, 2024
Maintainer

bdv29
Mar 19, 2024 — with giscus

glenn-jocher Mar 19, 2024
Maintainer

saadtariq001s
Mar 22, 2024 — with giscus

pderrenger Mar 22, 2024
Maintainer

pderrenger Mar 23, 2024
Maintainer

rex111536236236236
Mar 22, 2024 — with giscus

glenn-jocher Mar 22, 2024
Maintainer

utokyo-sm96
Mar 22, 2024 — with giscus

pderrenger Mar 23, 2024
Maintainer

pderrenger Mar 25, 2024
Maintainer

pderrenger Mar 28, 2024
Maintainer

Safiislamian
Mar 22, 2024 — with giscus

pderrenger Mar 23, 2024
Maintainer

utokyo-sm96
Mar 25, 2024 — with giscus

pderrenger Mar 25, 2024
Maintainer

pderrenger Apr 3, 2024
Maintainer

Saspalk
Mar 27, 2024 — with giscus

glenn-jocher Mar 27, 2024
Maintainer