Skip to content

Commit 08d2602

Browse files
committed
Add Pylint configuration and implement Azure Function for WSI to DZI processing
- Create GitHub Actions workflow for linting Python code with Pylint. - Add Pylint configuration file with max line length set to 150. - Update README to reflect project name and clarify input formats. - Implement functions for downloading blobs, converting images to DZI, and cleaning up temporary files. - Integrate AzCopy for uploading DZI outputs to Azure Blob Storage. - Enhance logging for better traceability during processing.
1 parent 8866929 commit 08d2602

File tree

10 files changed

+313
-103
lines changed

10 files changed

+313
-103
lines changed

.github/workflows/pylint.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: Lint Python with Pylint
2+
3+
on:
4+
push:
5+
branches: [ "main" ]
6+
pull_request:
7+
branches: [ "main" ]
8+
9+
jobs:
10+
lint:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
- name: Set up Python
15+
uses: actions/setup-python@v5
16+
with:
17+
python-version: '3.12'
18+
- name: Install dependencies
19+
run: |
20+
python -m pip install --upgrade pip
21+
pip install -r requirements.txt
22+
pip install pylint
23+
- name: Run Pylint
24+
run: |
25+
pylint $(git ls-files '*.py')

.pylintrc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[FORMAT]
2+
max-line-length=150

README.md

Lines changed: 50 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# wsi-slides-dzi-processor Azure Function
1+
# WSI Slide Image to DZI Processor
22

33
![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)
44
![Azure](https://img.shields.io/badge/azure-%230072C6.svg?style=for-the-badge&logo=microsoftazure&logoColor=white)
@@ -10,9 +10,9 @@
1010
This Azure Function automatically processes Whole Slide Images (WSI) uploaded to Azure Blob Storage, converting them into Deep Zoom Image (DZI) format using `pyvips`. It is designed to handle very large files (up to several GB) efficiently and is suitable for digital pathology, microscopy, and similar domains.
1111

1212
- **Trigger:** Event Grid (on blob upload)
13-
- **Input:** Image blob (e.g., `.jpg`, `.tif`) in a monitored container
13+
- **Input:** Image blob (e.g., `.svs`, `.tiff`) in a monitored container
1414
- **Output:** DZI tiles and metadata uploaded to a designated output container
15-
- **Tech Stack:** Python, Azure Functions, Azure Blob Storage, pyvips
15+
- **Tech Stack:** Python, Azure Functions, Azure Blob Storage, AzCopy, pyvips
1616

1717
> **Note:**
1818
> This function is built and deployed using Docker because the `libvips` library (required by `pyvips`) is not supported on standard Azure Functions hosting options such as Flex Consumption, Consumption, or App Service plans. Docker allows you to include all necessary native dependencies for reliable execution on the Premium plan or in a custom container environment.
@@ -28,6 +28,12 @@ This Azure Function automatically processes Whole Slide Images (WSI) uploaded to
2828
```
2929
├── Dockerfile
3030
├── function_app.py
31+
├── parse_event_blob_url.py
32+
├── parse_container_and_blob.py
33+
├── download_blob_to_temp.py
34+
├── convert_to_dzi.py
35+
├── upload_with_azcopy.py
36+
├── cleanup_temp_files.py
3137
├── host.json
3238
├── local.settings.json
3339
├── requirements.txt
@@ -155,7 +161,13 @@ Create a file named `event.json` with the following content:
155161
"subject": "/blobServices/default/containers/your-container/blobs/your-image.jpg",
156162
"eventTime": "2025-06-27T00:00:00.000Z",
157163
"data": {
158-
"url": "https://<your-storage-account>.blob.core.windows.net/your-container/your-image.jpg"
164+
"url": "https://<your-storage-account>.blob.core.windows.net/your-container/your-image.jpg",
165+
"api": "PutBlob",
166+
"contentType": "image/jpeg",
167+
"contentLength": 12345678,
168+
"blobType": "BlockBlob",
169+
"blobTier": "Hot",
170+
"metadata": {}
159171
},
160172
"dataVersion": "",
161173
"metadataVersion": "1"
@@ -181,4 +193,37 @@ curl -X POST "http://localhost:7071/runtime/webhooks/EventGrid?functionName=blob
181193
- Make sure all dependencies are installed and the correct Python version is used.
182194

183195
---
184-
*For more, see the Azure Functions [local development docs](https://learn.microsoft.com/azure/azure-functions/functions-develop-local).*
196+
197+
## AzCopy Authentication: Managed Identity and SAS Token
198+
199+
This function uploads DZI output directories to Azure Blob Storage using AzCopy. Two authentication methods are supported:
200+
201+
### 1. Managed Identity (Recommended for Production)
202+
- **How it works:**
203+
- The Azure Function runs with a User-Assigned or System-Assigned Managed Identity.
204+
- AzCopy uses the identity to obtain an OAuth token and authenticate to Azure Blob Storage.
205+
- **Requirements:**
206+
- The Function App's Managed Identity must have at least `Storage Blob Data Contributor` role on the target storage account or container.
207+
- No secrets or connection strings are required in code or environment variables.
208+
- **How to use:**
209+
- Ensure the Function App is assigned a Managed Identity in Azure.
210+
- Grant the identity access to the storage account/container.
211+
- **Important:** Set the environment variable `AZCOPY_AUTO_LOGIN_TYPE` to `MSI`.
212+
- AzCopy will automatically use the identity for authentication when running inside Azure.
213+
214+
### 2. SAS Token (For Local Development or Special Cases)
215+
- **How it works:**
216+
- AzCopy authenticates using a Shared Access Signature (SAS) token appended to the destination Blob Storage URL.
217+
- **Requirements:**
218+
- A valid SAS token with write permissions for the target container or directory.
219+
- **How to use:**
220+
- Generate a SAS token for the storage account or container.
221+
- Append the SAS token to the destination URL in the AzCopy command (e.g., `https://<account>.blob.core.windows.net/<container>?<sas-token>`).
222+
- This method is useful for local testing or scenarios where Managed Identity is not available.
223+
224+
> **Best Practice:**
225+
> Use Managed Identity for all production deployments to avoid secret management and improve security. SAS tokens should only be used for local development or temporary access.
226+
227+
Please feel free to reach out to me if you have any questions or need further assistance with the WSI Slide Image to DZI Processor project. Your feedback and contributions are always welcome!
228+
229+
---

cleanup_temp_files.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
"""
2+
cleanup_temp_files.py: Removes temporary files and directories.
3+
"""
4+
import os
5+
import shutil
6+
import logging
7+
8+
def cleanup_temp_files(*paths):
9+
"""
10+
Removes temporary files and directories.
11+
Args:
12+
*paths: Paths to files or directories to remove.
13+
"""
14+
logger = logging.getLogger("helpers.cleanup_temp_files")
15+
for path in paths:
16+
try:
17+
if os.path.isdir(path):
18+
shutil.rmtree(path)
19+
logger.info("Cleaned up temp directory: %s", path)
20+
elif os.path.isfile(path):
21+
os.remove(path)
22+
logger.info("Cleaned up temp file: %s", path)
23+
except (OSError, shutil.Error) as exc:
24+
logger.warning("Failed to clean up %s: %s", path, exc)

convert_to_dzi.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""
2+
convert_to_dzi.py: Converts an image file to Deep Zoom Image (DZI) format using pyvips.
3+
"""
4+
import tempfile
5+
import os
6+
import logging
7+
from typing import Optional
8+
import pyvips
9+
10+
def convert_to_dzi(input_path: str, blob_name: str) -> Optional[str]:
11+
"""
12+
Converts an image file to Deep Zoom Image (DZI) format using pyvips.
13+
Args:
14+
input_path: Path to the input image file.
15+
blob_name: The original blob name (used for output naming).
16+
Returns:
17+
The path to the DZI output directory, or None if conversion fails.
18+
"""
19+
logger = logging.getLogger("helpers.convert_to_dzi")
20+
try:
21+
dzi_output_dir = tempfile.mkdtemp()
22+
dzi_basename = os.path.splitext(os.path.basename(blob_name))[0]
23+
dzi_output_path = os.path.join(dzi_output_dir, dzi_basename)
24+
logger.info("Converting to DZI: %s", dzi_output_path)
25+
image = pyvips.Image.new_from_file(input_path, access='sequential')
26+
image.dzsave(dzi_output_path, tile_size=512)
27+
return dzi_output_dir
28+
except (pyvips.Error, OSError) as exc:
29+
logger.error("DZI conversion failed: %s", exc)
30+
return None

download_blob_to_temp.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
"""
2+
download_blob_to_temp.py: Downloads a blob to a temporary file using streaming.
3+
"""
4+
import tempfile
5+
import os
6+
import logging
7+
from typing import Optional
8+
from azure.storage.blob import BlobServiceClient
9+
from azure.core.exceptions import AzureError
10+
11+
def download_blob_to_temp(blob_service_client: BlobServiceClient, container_name: str, blob_name: str) -> Optional[str]:
12+
"""
13+
Downloads a blob to a temporary file using streaming (suitable for large files).
14+
Args:
15+
blob_service_client: The BlobServiceClient instance.
16+
container_name: The name of the container.
17+
blob_name: The name of the blob.
18+
Returns:
19+
The path to the temporary file, or None if download fails.
20+
"""
21+
logger = logging.getLogger("helpers.download_blob_to_temp")
22+
try:
23+
container_client = blob_service_client.get_container_client(container_name)
24+
exists = container_client.exists()
25+
if not exists:
26+
logger.error(
27+
"Container '%s' does not exist in the storage account.", container_name
28+
)
29+
return None
30+
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
31+
with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(blob_name)[1]) as temp_blob:
32+
logger.info("Streaming blob to temp file: %s", temp_blob.name)
33+
download_stream = blob_client.download_blob()
34+
chunk_size = 8 * 1024 * 1024 # 8 MB
35+
offset = 0
36+
while True:
37+
chunk = download_stream.read(chunk_size)
38+
if not chunk:
39+
break
40+
temp_blob.write(chunk)
41+
offset += len(chunk)
42+
logger.info(
43+
"Completed streaming blob to: %s, total bytes: %d", temp_blob.name, offset
44+
)
45+
return temp_blob.name
46+
except (OSError, AzureError) as exc:
47+
logger.error("Failed to download blob (streaming): %s", exc)
48+
return None

function_app.py

Lines changed: 50 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -1,120 +1,72 @@
1-
import azure.functions as func
1+
"""
2+
function_app.py: Azure Function entrypoint for WSI to DZI processing.
3+
"""
4+
5+
import asyncio
26
import logging
37
import os
4-
import asyncio
8+
9+
import azure.functions as func
510
from azure.storage.blob import BlobServiceClient
6-
import pyvips
7-
import subprocess
11+
12+
from cleanup_temp_files import cleanup_temp_files
13+
from convert_to_dzi import convert_to_dzi
14+
from download_blob_to_temp import download_blob_to_temp
15+
from parse_container_and_blob import parse_container_and_blob
16+
from parse_event_blob_url import parse_event_blob_url
17+
from upload_with_azcopy import upload_with_azcopy
818

919
app = func.FunctionApp()
1020

21+
1122
@app.event_grid_trigger(arg_name="event")
1223
async def blob_to_dzi_eventgrid_trigger(event: func.EventGridEvent):
24+
"""
25+
Azure Function entrypoint for processing blob events:
26+
- Parses the event for the blob URL
27+
- Downloads the blob to a temp file
28+
- Converts the blob to DZI format
29+
- Uploads the DZI output to Azure Blob Storage using AzCopy
30+
- Cleans up temp files and directories
31+
"""
1332
logger = logging.getLogger("blob_to_dzi_eventgrid_trigger")
14-
logger.info(f"Received Event: id={getattr(event, 'id', None)}, subject={getattr(event, 'subject', None)}")
15-
try:
16-
event_data = event.get_json()
17-
blob_url = event_data.get('url')
18-
logger.info(f"Parsed blob URL: {blob_url}")
19-
except Exception as e:
20-
logger.error(f"Error accessing event.get_json(): {e}")
21-
blob_url = None
33+
logger.info(
34+
"Received Event: id=%s, subject=%s",
35+
getattr(event, "id", None),
36+
getattr(event, "subject", None),
37+
)
38+
39+
blob_url = parse_event_blob_url(event)
2240
if not blob_url:
23-
logger.error('No blob URL found in event data.')
41+
logger.error("No blob URL found in event data.")
2442
return
2543

26-
# Parse storage account, container, and blob name from URL
27-
from urllib.parse import urlparse
28-
parsed = urlparse(blob_url)
29-
path_parts = parsed.path.lstrip('/').split('/', 1)
30-
if len(path_parts) != 2:
31-
logger.error(f'Could not parse container and blob name from URL: {blob_url}')
44+
parsed = parse_container_and_blob(blob_url)
45+
if not parsed:
3246
return
33-
container_name, blob_name = path_parts
34-
logger.info(f"Container: {container_name}, Blob: {blob_name}")
47+
container_name, blob_name = parsed
48+
logger.info("Container: %s, Blob: %s", container_name, blob_name)
3549

36-
# Get connection string from environment
37-
conn_str = os.environ.get('AzureWebJobsStorage')
50+
conn_str = os.environ.get("AzureWebJobsStorage")
3851
if not conn_str:
39-
logger.error('AzureWebJobsStorage not set in environment.')
52+
logger.error("AzureWebJobsStorage not set in environment.")
4053
return
4154

4255
blob_service_client = BlobServiceClient.from_connection_string(conn_str)
43-
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
44-
45-
# Download blob to temp file (streaming, suitable for very large files)
46-
import tempfile
47-
logger.info(f"Attempting to download from container: '{container_name}', blob: '{blob_name}' (streaming mode)")
48-
try:
49-
# Check if container exists
50-
container_client = blob_service_client.get_container_client(container_name)
51-
exists = await asyncio.to_thread(container_client.exists)
52-
if not exists:
53-
logger.error(f"Container '{container_name}' does not exist in the storage account.")
54-
return
55-
with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(blob_name)[1]) as temp_blob:
56-
logger.info(f"Streaming blob to temp file: {temp_blob.name}")
57-
download_stream = await asyncio.to_thread(blob_client.download_blob)
58-
# Stream the blob in chunks to avoid high memory usage
59-
chunk_size = 8 * 1024 * 1024 # 8 MB
60-
offset = 0
61-
while True:
62-
chunk = await asyncio.to_thread(download_stream.read, chunk_size)
63-
if not chunk:
64-
break
65-
temp_blob.write(chunk)
66-
offset += len(chunk)
67-
temp_blob_path = temp_blob.name
68-
logger.info(f"Completed streaming blob to: {temp_blob_path}, total bytes: {offset}")
69-
except Exception as e:
70-
logger.error(f"Failed to download blob (streaming): {e}")
56+
temp_blob_path = await asyncio.to_thread(
57+
download_blob_to_temp, blob_service_client, container_name, blob_name
58+
)
59+
if not temp_blob_path:
7160
return
7261

73-
# Convert to DZI using pyvips
74-
try:
75-
dzi_output_dir = tempfile.mkdtemp()
76-
dzi_basename = os.path.splitext(os.path.basename(blob_name))[0]
77-
dzi_output_path = os.path.join(dzi_output_dir, dzi_basename)
78-
logger.info(f"Converting to DZI: {dzi_output_path}")
79-
image = await asyncio.to_thread(pyvips.Image.new_from_file, temp_blob_path, access='sequential')
80-
await asyncio.to_thread(
81-
image.dzsave,
82-
dzi_output_path,
83-
tile_size=512 # Recommended for OpenSeadragon, fewer files, good performance
84-
)
85-
except Exception as e:
86-
logger.error(f"DZI conversion failed: {e}")
62+
dzi_output_dir = await asyncio.to_thread(convert_to_dzi, temp_blob_path, blob_name)
63+
if not dzi_output_dir:
8764
return
8865

89-
# Upload DZI files and all subdirectory files to the destination container using AzCopy
90-
def upload_with_azcopy(local_dir):
91-
dest_url = os.environ.get('DZI_UPLOAD_DEST_URL')
92-
cmd = [
93-
"azcopy", "copy",
94-
f"{local_dir}/*",
95-
dest_url,
96-
"--recursive=true"
97-
]
98-
logger.info(f"AzCopy command: azcopy copy {cmd[2]} {dest_url} --recursive=true (using SAS token)")
99-
result = subprocess.run(cmd, capture_output=True, text=True)
100-
logger.info(f"AzCopy stdout: {result.stdout}")
101-
logger.info(f"AzCopy stderr: {result.stderr}")
102-
logger.info(f"AzCopy returncode: {result.returncode}")
103-
if result.returncode == 0:
104-
logger.info(f"AzCopy upload successful for {cmd[2]}")
105-
else:
106-
logger.error(f"AzCopy failed with exit code {result.returncode} for {cmd[2]}")
107-
108-
upload_with_azcopy(dzi_output_dir)
109-
110-
# Clean up temp files and directories
111-
import shutil
112-
try:
113-
if os.path.exists(temp_blob_path):
114-
os.remove(temp_blob_path)
115-
logger.info(f"Cleaned up temp file: {temp_blob_path}")
116-
if os.path.exists(dzi_output_dir):
117-
shutil.rmtree(dzi_output_dir)
118-
logger.info(f"Cleaned up temp directory: {dzi_output_dir}")
119-
except Exception as e:
120-
logger.warning(f"Failed to clean up temp files or directories: {e}")
66+
dest_url = os.environ.get("DZI_UPLOAD_DEST_URL")
67+
if not dest_url:
68+
logger.error("DZI_UPLOAD_DEST_URL environment variable not set!")
69+
cleanup_temp_files(temp_blob_path, dzi_output_dir)
70+
return
71+
upload_with_azcopy(dzi_output_dir, dest_url, logger)
72+
cleanup_temp_files(temp_blob_path, dzi_output_dir)

0 commit comments

Comments
 (0)