Merge pull request #6793 from PatrickFarley/imagen

Stacyrch140 · web-flow · commit 7187ac56c502 · 2025-09-05T11:34:04.000-04:00
Sora updates
diff --git a/articles/ai-foundry/openai/concepts/models.md b/articles/ai-foundry/openai/concepts/models.md
@@ -283,7 +283,7 @@ Sora is an AI model from OpenAI that can create realistic and imaginative video
 
 | Model | Region |
 |---|---|
-|`sora` | East US 2|
+|`sora` | East US 2 (Global Standard)<br>Sweden Central(Global Standard)|
 
 ## Audio models
 
diff --git a/articles/ai-foundry/openai/concepts/video-generation.md b/articles/ai-foundry/openai/concepts/video-generation.md
@@ -15,7 +15,13 @@ Sora is an AI model from OpenAI that can create realistic and imaginative video
 
 ## Supported features
 
-Sora can generate complex scenes with multiple characters, diverse motions, and detailed backgrounds. The model interprets prompts with contextual and physical world understanding, enabling accurate scene composition and character persistence across multiple shots. Sora demonstrates strong language comprehension for prompt interpretation and emotional character generation. 
+Sora can generate complex scenes with multiple characters, diverse motions, and detailed backgrounds. 
+
+**Text to video**: The model interprets prompts with contextual and physical world understanding, enabling accurate scene composition and character persistence across multiple shots. Sora demonstrates strong language comprehension for prompt interpretation and emotional character generation. 
+
+**Image to video**: Sora can generate video content from a still image. You can specify where in the generated video the image appears (it doesn't need to be the first frame) and which region of the image to use.
+
+
 
 ## How it works
 
diff --git a/articles/ai-foundry/openai/includes/azure-openai-models-list.md b/articles/ai-foundry/openai/includes/azure-openai-models-list.md
@@ -23,4 +23,5 @@ Azure OpenAI is powered by a diverse set of models with different capabilities a
 | [GPT-3.5](../concepts/models.md#gpt-35) | A set of models that improve on GPT-3 and can understand and generate natural language and code. |
 | [Embeddings](../concepts/models.md#embeddings) | A set of models that can convert text into numerical vector form to facilitate text similarity. |
 | [Image generation](../concepts/models.md#image-generation-models) | A series of models that can generate original images from natural language. |
+| [`Video generation`](../concepts/models.md#video-generation-models) | A model that can generate original video scenes from text instructions. |
 | [Audio](../concepts/models.md#audio-models) | A series of models for speech to text, translation, and text to speech. GPT-4o audio models support either low latency *speech in, speech out* conversational interactions or audio generation. |
diff --git a/articles/ai-foundry/openai/includes/video-generation-intro.md b/articles/ai-foundry/openai/includes/video-generation-intro.md
@@ -7,6 +7,6 @@ ms.topic: include
 ms.date: 5/29/2025
 ---
 
-In this quickstart, you generate video clips using the Azure OpenAI service. The example uses the Sora model, which is a video generation model that creates realistic and imaginative video scenes from text instructions. This guide shows you how to create a video generation job, poll for its status, and retrieve the generated video.
+In this quickstart, you generate video clips using the Azure OpenAI service. The example uses the Sora model, which is a video generation model that creates realistic and imaginative video scenes from text instructions and/or image inputs. This guide shows you how to create a video generation job, poll for its status, and retrieve the generated video.
 
 For more information on video generation, see [Video generation concepts](../concepts/video-generation.md).
diff --git a/articles/ai-foundry/openai/includes/video-generation-rest.md b/articles/ai-foundry/openai/includes/video-generation-rest.md
@@ -71,13 +71,13 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
 
 [!INCLUDE [resource authentication](resource-authentication.md)]
 
-
 ## Generate video with Sora
+
 You can generate a video with the Sora model by creating a video generation job, polling for its status, and retrieving the generated video. The following code shows how to do this via the REST API using Python.
 
-## [Microsoft Entra ID](#tab/keyless)
+1. Create the `sora-quickstart.py` file and add the following code to authenticate your resource:
 
-1. Create the `sora-quickstart.py` file with the following code:
+    ## [Microsoft Entra ID](#tab/keyless)
 
     ```python
     import requests
@@ -91,10 +91,33 @@ You can generate a video with the Sora model by creating a video generation job,
     # Keyless authentication
     credential = DefaultAzureCredential()
     token = credential.get_token("https://cognitiveservices.azure.com/.default")
-    
+
     api_version = 'preview'
     headers= { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json" }
-        
+    ```
+
+    ## [API key](#tab/api-key)
+
+    ```python
+    import requests
+    import base64 
+    import os
+    from azure.identity import DefaultAzureCredential
+    
+    # Set environment variables or edit the corresponding values here.
+    endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
+    api_key = os.environ['AZURE_OPENAI_API_KEY']
+
+    api_version = 'preview'
+    headers= { "api-key": api_key, "Content-Type": "application/json" }
+    ```
+    ---
+
+1. Create the video generation job. You can create it from a text prompt only, or from an input image and text prompt.
+
+    ## [Text prompt](#tab/text-prompt)
+
+    ```python
     # 1. Create a video generation job
     create_url = f"{endpoint}/openai/v1/video/generations/jobs?api-version={api_version}"
     body = {
@@ -138,86 +161,104 @@ You can generate a video with the Sora model by creating a video generation job,
         raise Exception(f"Job didn't succeed. Status: {status}")
     ```
 
-1. Run the Python file.
+    ## [Image prompt](#tab/image-prompt)
 
-    ```shell
-    python sora-quickstart.py
-    ```
+    Replace the `"file_name"` field in `"inpaint_items"` with the name of your input image file. Also replace the construction of the `files` array, which associates the path to the actual file with the filename that the API uses.
 
-## [API key](#tab/api-key)
+    Use the `"crop_bounds"` data (image crop distances, from each direction, as a fraction of the total image dimensions) to specify which part of the image should be used in video generation.
+
+    You can optionally set the `"frame_index"` to the frame in the generated video where your image should appear (the default is 0, the start of the video).
 
-1. Create the `sora-quickstart.py` file with the following code:
 
     ```python
-    import requests
-    import base64 
-    import os
+    # 1. Create a video generation job with image inpainting (multipart upload)
+    create_url = f"{endpoint}/openai/v1/video/generations/jobs?api-version=preview"
     
-    # Set environment variables or edit the corresponding values here.
-    endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
-    api_key = os.environ['AZURE_OPENAI_API_KEY']
-    
-    api_version = 'preview'
-    headers= { "api-key": api_key, "Content-Type": "application/json" }
-
-    # 1. Create a video generation job
-    create_url = f"{endpoint}/openai/v1/video/generations/jobs?api-version={api_version}"
-    body = {
-        "prompt": "A cat playing piano in a jazz bar.",
-        "width": 480,
-        "height": 480,
-        "n_seconds": 5,
-        "model": "sora"
+    # Flatten the body for multipart/form-data
+    data = {
+        "prompt": "A serene forest scene transitioning into autumn",
+        "height": str(1080),
+        "width": str(1920),
+        "n_seconds": str(10),
+        "n_variants": str(1),
+        "model": "sora",
+        # inpaint_items must be JSON string
+        "inpaint_items": json.dumps([
+            {
+                "frame_index": 0,
+                "type": "image",
+                "file_name": "dog_swimming.jpg",
+                "crop_bounds": {
+                    "left_fraction": 0.1,
+                    "top_fraction": 0.1,
+                    "right_fraction": 0.9,
+                    "bottom_fraction": 0.9
+                }
+            }
+        ])
     }
-    response = requests.post(create_url, headers=headers, json=body)
-    response.raise_for_status()
+    
+    # Replace with your own image file path
+    with open("dog_swimming.jpg", "rb") as image_file:
+        files = [
+            ("files", ("dog_swimming.jpg", image_file, "image/jpeg"))
+        ]
+        multipart_headers = {k: v for k, v in headers.items() if k.lower() != "content-type"}
+        response = requests.post(
+            create_url,
+            headers=multipart_headers,
+            data=data,
+            files=files
+        )
+    
+    if not response.ok:
+        print("Error response:", response.status_code, response.text)
+        response.raise_for_status()
     print("Full response JSON:", response.json())
     job_id = response.json()["id"]
     print(f"Job created: {job_id}")
     
     # 2. Poll for job status
-    status_url = f"{endpoint}/openai/v1/video/generations/jobs/{job_id}?api-version={api_version}"
-    status=None
+    status_url = f"{endpoint}/openai/v1/video/generations/jobs/{job_id}?api-version=preview"
+    status = None
     while status not in ("succeeded", "failed", "cancelled"):
-        time.sleep(5)  # Wait before polling again
+        time.sleep(5)
         status_response = requests.get(status_url, headers=headers).json()
         status = status_response.get("status")
         print(f"Job status: {status}")
-        
-    # 3. Retrieve generated video 
+    
+    # 3. Retrieve generated video
     if status == "succeeded":
         generations = status_response.get("generations", [])
         if generations:
-            print(f"✅ Video generation succeeded.")
             generation_id = generations[0].get("id")
-            video_url = f"{endpoint}/openai/v1/video/generations/{generation_id}/content/video?api-version={api_version}"
+            video_url = f"{endpoint}/openai/v1/video/generations/{generation_id}/content/video?api-version=preview"
             video_response = requests.get(video_url, headers=headers)
             if video_response.ok:
                 output_filename = "output.mp4"
                 with open(output_filename, "wb") as file:
                     file.write(video_response.content)
-                    print(f'Generated video saved as "{output_filename}"')
+                    print(f'✅ Generated video saved as "{output_filename}"')
         else:
             raise Exception("No generations found in job result.")
     else:
         raise Exception(f"Job didn't succeed. Status: {status}")
     ```
+    ---
+
 
 1. Run the Python file.
 
     ```shell
     python sora-quickstart.py
     ```
 
----
-
-Wait a few moments to get the response.
+    Wait a few moments to get the response.
 
 ### Output
 
 The output will show the full response JSON from the video generation job creation request, including the job ID and status. 
 
-```json
 ```json
 {
     "object": "video.generation.job",
diff --git a/articles/ai-foundry/openai/whats-new.md b/articles/ai-foundry/openai/whats-new.md
@@ -20,6 +20,14 @@ This article provides a summary of the latest releases and major documentation u
 
 ## August 2025
 
+### Sora image-to-video support
+
+The Sora model from OpenAI now supports image-to-video generation. You can provide an image as input to the model to generate a video that incorporates the content of the image. You can also specify the frame of the video in which the image should appear: it doesn't need to be the beginning.
+
+
+Sora is now available in the Sweden Central region as well as East US 2.
+
+
 ### Realtime API audio model GA
 
 OpenAI's GPT RealTime and Audio models are now generally available on Azure AI Foundry Direct Models.