Deploying Docling using SageMaker #2003

matanitah-healthee · 2025-07-28T17:07:56Z

matanitah-healthee
Jul 28, 2025

Hello! I am trying to deploy a production instance of Docling in sagemaker using Terraform.
I use the below python code to deploy my instance:

def download_model_to_s3():
    """
    Download model from Hugging Face Hub and upload to S3
    """
    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
    
    # Create temporary directory
    with tempfile.TemporaryDirectory() as temp_dir:
        print("📥 Downloading model from Hugging Face Hub...")
        print(f"Model ID: {MODEL_ID}")
        
        # Download model to local directory
        model_dir = snapshot_download(
            repo_id=MODEL_ID,
            cache_dir=temp_dir,
            local_dir=os.path.join(temp_dir, "model"),
            local_dir_use_symlinks=False
        )
        
        print(f"✅ Model downloaded to: {model_dir}")
        
        # List downloaded files
        print("\n📋 Downloaded files:")
        for root, dirs, files in os.walk(model_dir):
            for file in files:
                file_path = os.path.join(root, file)
                rel_path = os.path.relpath(file_path, model_dir)
                file_size = os.path.getsize(file_path) / (1024*1024)  # MB
                print(f"  {rel_path} ({file_size:.1f} MB)")
        
        print("\n📦 Creating model.tar.gz...")
        # Create tar.gz file
        tar_path = os.path.join(temp_dir, "model.tar.gz")
        with tarfile.open(tar_path, "w:gz") as tar:
            tar.add(model_dir, arcname=".")
        
        tar_size = os.path.getsize(tar_path) / (1024*1024)  # MB
        print(f"✅ Tar file created: {tar_size:.1f} MB")
        
        print("\n☁️ Uploading to S3...")
        # Upload to S3
        s3_key = f"{S3_MODEL_PREFIX}/model-{timestamp}.tar.gz"
        s3_uri = f"s3://{S3_BUCKET}/{s3_key}"
        
        s3_client = boto3.client('s3')
        
        # Upload with progress
        def upload_progress(bytes_transferred):
            percentage = (bytes_transferred / os.path.getsize(tar_path)) * 100
            print(f"\rUpload progress: {percentage:.1f}%", end="")
        
        s3_client.upload_file(
            tar_path, 
            S3_BUCKET, 
            s3_key,
            Callback=upload_progress
        )
        
        print(f"\n✅ Model uploaded to: {s3_uri}")
        return s3_uri

# Download and upload model
model_s3_uri = download_model_to_s3()

def deploy_from_s3(model_s3_uri):
    """Deploy model from S3"""
        
    # Create HuggingFace Model pointing to S3
    huggingface_model = HuggingFaceModel(
        model_data=model_s3_uri,
        transformers_version='4.49.0',
        pytorch_version='2.6.0',
        py_version='py312',
        role=role,
        env={
            'HF_TASK': 'image-text-to-text',
        }
    )
    
    print("Deploying model from S3...")
    print("This may take 5-10 minutes...")
    
    # Deploy model
    predictor = huggingface_model.deploy(
        initial_instance_count=1,
        instance_type='ml.m5.xlarge'
    )
    
    return predictor

predictor = deploy_from_s3(model_s3_uri)

test_input = {
    "inputs": "Convert this document"
}

predictor.predict(test_input)

I am able to successfully download the model tar.gz into s3 and deploy it successfully, but when I try to run inference I get the following error:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "\u0027str\u0027 object has no attribute \u0027pad_token_id\u0027"
}
".

Has anyone else had the same problem and come up with a solution? Does anyone have any suggestions on how to proceed?
Thanks!

dolfim-ibm · 2025-07-29T06:17:18Z

dolfim-ibm
Jul 29, 2025
Maintainer

Which model are you trying to deploy?

Docling is actually running multiple models in steps, not just a single one. Exception is the SmolDocling model.

1 reply

matanitah-healthee Jul 30, 2025
Author

SmolDocling is the one I'm trying to deploy currently using the pytorch-huggingface-inference container, but ideally I'd have the entire Docling pipeline running on a sagemaker inference endpoint so that when I hit the endpoint with a PDF it sends a response with the extracted markdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying Docling using SageMaker #2003

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Deploying Docling using SageMaker #2003

Uh oh!

matanitah-healthee Jul 28, 2025

Replies: 1 comment · 1 reply

Uh oh!

dolfim-ibm Jul 29, 2025 Maintainer

Uh oh!

matanitah-healthee Jul 30, 2025 Author

matanitah-healthee
Jul 28, 2025

Replies: 1 comment 1 reply

dolfim-ibm
Jul 29, 2025
Maintainer

matanitah-healthee Jul 30, 2025
Author