Skip to content

feat: support LongCat-Image-Edit on cuda device.#957

Open
Dragonliu2018 wants to merge 1 commit intojd-opensource:mainfrom
Dragonliu2018:lzl/feat/support_longcat_image_edit_on_cuda
Open

feat: support LongCat-Image-Edit on cuda device.#957
Dragonliu2018 wants to merge 1 commit intojd-opensource:mainfrom
Dragonliu2018:lzl/feat/support_longcat_image_edit_on_cuda

Conversation

@Dragonliu2018
Copy link
Contributor

This PR supports LongCat-Image-Edit on cuda device.

The test program and the generated image are as follows.

import requests
import json
import base64
from PIL import Image
from io import BytesIO

# Test prompt for image generation
url = "http://localhost:9977/v1/image/generation"

img = Image.open("cat.png").convert("RGB")
buf = BytesIO()
img.save(buf, format="PNG")        # 和服务端 OpenCV 解码兼容
img_bytes = buf.getvalue()
image_base64 = base64.b64encode(img_bytes).decode("utf-8")

prompt = "将猫变成狗"
request_data = {
    "model": "LongCat-Image-Edit",
    "input": {
        "prompt": prompt,
        "negative_prompt": "",
        "image": image_base64
    },
    "parameters": {
        "guidance_scale": 1,
        "num_inference_steps": 8,
        "num_images_per_prompt": 1,
        "seed":43
    }
}

print("Testing LongCat-Image-Edit model...")
print(f"Request URL: {url}")
print(f"Request data: {json.dumps(request_data, indent=2, ensure_ascii=False)}")

response = requests.post(url, json=request_data)
if response.status_code != 200:
    print(f"Error: {response.status_code}")
    print(f"Response: {response.text}")
else:
    try:
        result = json.loads(response.text)
        print("Success! Response:")
        print(json.dumps(result, indent=2, ensure_ascii=False))
        
        # Handle image response
        if "output" in result and "results" in result["output"]:
            for i, image_data in enumerate(result["output"]["results"]):
                if "image" in image_data:
                    # Decode base64 image
                    image_bytes = base64.b64decode(image_data["image"])
                    image = Image.open(BytesIO(image_bytes))
                    
                    # Save image
                    filename = f"edited_image_{i+1}.png"
                    image.save(filename)
                    print(f"\nGenerated image saved as: {filename}")
                    print(f"Image size: {image_data.get('width', 'unknown')}x{image_data.get('height', 'unknown')}")
                    print(f"Seed: {image_data.get('seed', 'unknown')}")
    except json.JSONDecodeError as e:
        print(f"Failed to parse JSON response: {e}")
        print(f"Raw response: {response.text}")

Input image:
cat

Output image:

edited_image_1

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the LongCat-Image-Edit model on CUDA devices. The changes include a new pipeline implementation, modifications to the model loader to handle preprocessor configurations, and several improvements and bug fixes in the attention and rotary embedding kernels. The implementation of the new pipeline is comprehensive. I've identified one issue in the model loader's error handling for the preprocessor configuration that could lead to silent failures.

@Dragonliu2018 Dragonliu2018 force-pushed the lzl/feat/support_longcat_image_edit_on_cuda branch from 153bb27 to a72b41a Compare February 28, 2026 04:37
@Dragonliu2018 Dragonliu2018 force-pushed the lzl/feat/support_longcat_image_edit_on_cuda branch from a72b41a to 941c362 Compare February 28, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants