Skip to content

Commit d1c7f7f

Browse files
committed
Update how SageMaker docs are built
- Update the GitHub Actions - Move the `docs/sagemaker` docs into the `source` directory - Add `docs/sagemaker/Makefile` and `docs/sagemaker/scripts` for automatically generating the examples and updating the `_toctree.yml` - Most of those things are ported from https://github.com/huggingface/Google-Cloud-Containers
1 parent 7773b41 commit d1c7f7f

23 files changed

+233
-38
lines changed
Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
1-
name: Build sagemaker documentation
1+
name: Build SageMaker Documentation
22

33
on:
44
push:
5-
paths:
6-
- "docs/sagemaker/**"
75
branches:
86
- main
7+
- doc-builder*
8+
paths:
9+
- docs/sagemaker/**
10+
- .github/workflows/sagemaker_build_documentation.yaml
911

1012
jobs:
1113
build:
@@ -14,7 +16,9 @@ jobs:
1416
commit_sha: ${{ github.sha }}
1517
package: hub-docs
1618
package_name: sagemaker
17-
path_to_docs: hub-docs/docs/sagemaker/
19+
path_to_docs: hub-docs/docs/sagemaker/source
1820
additional_args: --not_python_module
21+
pre_command: cd hub-docs/docs/sagemaker && make docs
1922
secrets:
23+
token: ${{ secrets.HUGGINGFACE_PUSH }}
2024
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

.github/workflows/sagemaker_build_pr_documentation.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
1-
name: Build sagemaker PR Documentation
1+
name: Build SageMaker PR Documentation
22

33
on:
44
pull_request:
55
paths:
6-
- "docs/sagemaker/**"
6+
- docs/sagemaker/**
7+
- .github/workflows/sagemaker_build_pr_documentation.yaml
78

89
concurrency:
910
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -17,5 +18,6 @@ jobs:
1718
pr_number: ${{ github.event.number }}
1819
package: hub-docs
1920
package_name: sagemaker
20-
path_to_docs: hub-docs/docs/sagemaker/
21+
path_to_docs: hub-docs/docs/sagemaker/source
2122
additional_args: --not_python_module
23+
pre_command: cd hub-docs/docs/sagemaker && make docs

.github/workflows/sagemaker_delete_doc_comment.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
1-
name: Delete sagemaker doc comment trigger
1+
name: Delete SageMaker PR Documentation Comment
22

33
on:
44
pull_request:
55
types: [ closed ]
66

7-
87
jobs:
98
delete:
109
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment_trigger.yml@main
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
name: Upload sagemaker PR Documentation
1+
name: Upload SageMaker PR Documentation
22

33
on:
44
workflow_run:
5-
workflows: ["Build sagemaker PR Documentation"]
5+
workflows: ["Build SageMaker PR Documentation"]
66
types:
77
- completed
88

@@ -13,4 +13,4 @@ jobs:
1313
package_name: sagemaker
1414
secrets:
1515
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
16-
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
16+
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}

docs/sagemaker/Makefile

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
.PHONY: docs clean help
2+
3+
docs: clean
4+
@echo "Processing README.md files from examples/gke, examples/cloud-run, and examples/vertex-ai..."
5+
@mkdir -p source/examples
6+
@echo "Converting Jupyter Notebooks to MDX..."
7+
@doc-builder notebook-to-mdx notebooks/sagemaker-sdk/
8+
@echo "Auto-generating example files for documentation..."
9+
@python scripts/auto-generate-examples.py
10+
@echo "Cleaning up generated Markdown Notebook files..."
11+
@find notebooks/sagemaker-sdk -name "sagemaker-notebook.md" -type f -delete
12+
@echo "Generating YAML tree structure and appending to _toctree.yml..."
13+
@python scripts/auto-update-toctree.py
14+
@echo "YAML tree structure appended to docs/source/_toctree.yml"
15+
@echo "Documentation setup complete."
16+
17+
clean:
18+
@echo "Cleaning up generated documentation..."
19+
@rm -rf source/examples
20+
@awk '/^# GENERATED CONTENT DO NOT EDIT!/,/^# END GENERATED CONTENT/{next} {print}' source/_toctree.yml > source/_toctree.yml.tmp && mv source/_toctree.yml.tmp source/_toctree.yml
21+
@echo "Cleaning up generated Markdown Notebook files (if any)..."
22+
@find notebooks/sagemaker-sdk -name "sagemaker-notebook.md" -type f -delete
23+
@echo "Cleanup complete."
24+
25+
serve:
26+
@echo "Serving documentation via doc-builder"
27+
doc-builder preview sagemaker source/ --not_python_module
28+
29+
help:
30+
@echo "Usage:"
31+
@echo " make docs - Auto-generate the examples for the docs"
32+
@echo " make clean - Remove the auto-generated docs"
33+
@echo " make help - Display this help message"
34+

docs/sagemaker/examples/index.md

Lines changed: 0 additions & 19 deletions
This file was deleted.
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
import os
2+
import re
3+
4+
BRANCH_NAME = os.getenv("SAGEMAKER_BRANCH", "main")
5+
6+
7+
def process_readme_files():
8+
print(
9+
"Processing README.md files generated from the Jupyter Notebooks in docs/sagemaker/notebooks/sagemaker-sdk..."
10+
)
11+
os.makedirs("source/examples", exist_ok=True)
12+
13+
# NOTE: at the moment only `sagemaker-sdk` but left here to easily include new files from other sources
14+
for dirname in {"sagemaker-sdk"}:
15+
for root, _, files in os.walk(f"notebooks/{dirname}"):
16+
for file in files:
17+
if file == "sagemaker-notebook.md":
18+
process_file(root, file, dirname)
19+
20+
21+
def process_file(root, file, dirname):
22+
parsed_dirname = (
23+
dirname if not dirname.__contains__("/") else dirname.replace("/", "-")
24+
)
25+
26+
file_path = os.path.join(root, file)
27+
subdir = root.replace(f"notebooks/{dirname}/", "")
28+
base = os.path.basename(subdir)
29+
30+
# NOTE: temporarily disabled
31+
if file_path == f"examples/{dirname}/README.md":
32+
target = f"source/examples/{parsed_dirname}-index.mdx"
33+
else:
34+
target = f"source/examples/{parsed_dirname}-{base}.mdx"
35+
36+
print(f"Processing {file_path} to {target}")
37+
with open(file_path, "r") as f:
38+
content = f.read()
39+
40+
# For Juypter Notebooks, remove the comment i.e. `<!--` and the `--!>` but keep the metadata
41+
content = re.sub(r"<!-- (.*?) -->", r"\1", content, flags=re.DOTALL)
42+
43+
# Replace image and link paths
44+
content = re.sub(
45+
r"\(\./(imgs|assets)/([^)]*\.png)\)",
46+
rf"(https://raw.githubusercontent.com/huggingface/hub-docs/refs/heads/{BRANCH_NAME}/docs/sagemaker/"
47+
+ root
48+
+ r"/\1/\2)",
49+
content,
50+
)
51+
content = re.sub(
52+
r"\(\.\./([^)]+)\)",
53+
rf"(https://github.com/huggingface/hub-docs/tree/{BRANCH_NAME}/docs/sagemaker/notebooks/"
54+
+ dirname
55+
+ r"/\1)",
56+
content,
57+
)
58+
content = re.sub(
59+
r"\(\.\/([^)]+)\)",
60+
rf"(https://github.com/huggingface/hub-docs/tree/{BRANCH_NAME}/docs/sagemaker/"
61+
+ root
62+
+ r"/\1)",
63+
content,
64+
)
65+
66+
def replacement(match):
67+
block_type = match.group(1)
68+
content = match.group(2)
69+
70+
# Remove '> ' from the beginning of each line
71+
lines = [line[2:] for line in content.split("\n") if line.strip()]
72+
73+
# Determine the Tip type
74+
tip_type = " warning" if block_type == "WARNING" else ""
75+
76+
# Construct the new block
77+
new_block = f"<Tip{tip_type}>\n\n"
78+
new_block += "\n".join(lines)
79+
new_block += "\n\n</Tip>\n"
80+
81+
return new_block
82+
83+
# Regular expression to match the specified blocks
84+
pattern = r"> \[!(NOTE|WARNING)\]\n((?:>.*(?:\n|$))+)"
85+
86+
# Perform the transformation
87+
content = re.sub(pattern, replacement, content, flags=re.MULTILINE)
88+
89+
# Remove any remaining '>' or '> ' at the beginning of lines
90+
content = re.sub(r"^>[ ]?", "", content, flags=re.MULTILINE)
91+
92+
# Check for remaining relative paths
93+
if re.search(r"\(\.\./|\(\./", content):
94+
print("WARNING: Relative paths still exist in the processed file.")
95+
print(
96+
"The following lines contain relative paths, consider replacing those with GitHub URLs instead:"
97+
)
98+
for i, line in enumerate(content.split("\n"), 1):
99+
if re.search(r"\(\.\./|\(\./", line):
100+
print(f"{i}: {line}")
101+
else:
102+
print("No relative paths found in the processed file.")
103+
104+
# Calculate the example URL
105+
example_url = f"https://github.com/huggingface/hub-docs/tree/{BRANCH_NAME}/{root}"
106+
if file.__contains__("sagemaker-notebook"):
107+
example_url += "/sagemaker-notebook.ipynb"
108+
109+
# Add the final note
110+
content += f"\n\n---\n<Tip>\n\n📍 Find the complete example on GitHub [here]({example_url})!\n\n</Tip>"
111+
112+
with open(target, "w") as f:
113+
f.write(content)
114+
115+
116+
if __name__ == "__main__":
117+
process_readme_files()
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
import glob
2+
import os
3+
from pathlib import Path
4+
5+
6+
def update_toctree_yaml():
7+
output_file = "source/_toctree.yml"
8+
dirnames = ["sagemaker-sdk"]
9+
10+
with open(output_file, "a") as f:
11+
f.write("# GENERATED CONTENT DO NOT EDIT!\n")
12+
f.write("- title: Examples\n")
13+
f.write(" sections:\n")
14+
15+
for dirname in dirnames:
16+
# Get sorted files excluding index
17+
files = sorted(glob.glob(f"source/examples/{dirname}-*.mdx"))
18+
files = [f for f in files if not f.endswith(f"{dirname}-index.mdx")]
19+
20+
file_entries = []
21+
for file_path in files:
22+
with open(file_path, "r") as mdx_file:
23+
first_line = mdx_file.readline().strip()
24+
if first_line.startswith("# "):
25+
title = first_line[2:].strip()
26+
base_name = Path(file_path).stem
27+
file_entries.append((base_name, title))
28+
else:
29+
print(f"⚠️ Skipping {Path(file_path).name} - missing H1 title")
30+
continue
31+
32+
# Write directory section
33+
f.write(" - title: SageMaker SDK\n")
34+
# f.write(f" local: examples/{dirname}-index\n")
35+
f.write(" isExpanded: true\n")
36+
37+
for idx, (base, title) in enumerate(file_entries):
38+
if idx == 0:
39+
f.write(" sections:\n")
40+
f.write(f" - local: examples/{base}\n")
41+
f.write(f' title: "{title}"\n')
42+
43+
f.write("# END GENERATED CONTENT\n")
44+
45+
46+
if __name__ == "__main__":
47+
update_toctree_yaml()

docs/sagemaker/source/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
examples/

docs/sagemaker/_toctree.yml renamed to docs/sagemaker/source/_toctree.yml

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,23 @@
4343
isExpanded: false
4444
title: Tutorials
4545
isExpanded: false
46-
- sections:
47-
- local: examples/index
48-
title: Introduction
49-
title: Examples
50-
isExpanded: false
5146
- sections:
5247
- local: reference/inference-toolkit
5348
title: Inference Toolkit API
5449
title: Reference
55-
isExpanded: false
50+
isExpanded: false
51+
# GENERATED CONTENT DO NOT EDIT!
52+
- title: Examples
53+
sections:
54+
- title: SageMaker SDK
55+
isExpanded: true
56+
sections:
57+
- local: examples/sagemaker-sdk-deploy-embedding-models
58+
title: "How to deploy Embedding Models to Amazon SageMaker using new Hugging Face Embedding DLC"
59+
- local: examples/sagemaker-sdk-deploy-llama-3-3-70b-inferentia2
60+
title: "Deploy Llama 3.3 70B on AWS Inferentia2"
61+
- local: examples/sagemaker-sdk-evaluate-llm-lighteval
62+
title: "Evaluate LLMs with Hugging Face Lighteval on Amazon SageMaker"
63+
- local: examples/sagemaker-sdk-fine-tune-embedding-models
64+
title: "Fine-tune and deploy embedding models with Amazon SageMaker"
65+
# END GENERATED CONTENT

0 commit comments

Comments
 (0)