You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds file status information Added (A), Copied (C), Deleted (D),
Modified (M), Renamed (R) to the tutorial statistics tracking. The
change enhances the `get_tutorials_stats.py` script to capture Git
status information using the `--name-status` flag and store it in the
`filenames.csv` output.
Changes:
* Added status field to FileInfo class
* Modified file parsing logic to extract status information from Git
output
* Status is now included in the **filenames.csv** output for each file
Local test generates this `filenames.csv`:
| commit_id | date | filename | lines_added | lines_deleted | status |
|-----------|------|----------|-------------|---------------|--------|
| 7c45ceb313 | 2025-07-11 | tutorials/index.rst | 17 | 171 | M |
| f4dda5dca1 | 2025-07-10 | tutorials/conf.py | 4 | 2 | M |
| 3569db3e78 | 2025-07-10 | tutorials/conf.py | 0 | 1 | M |
...
How to test:
1. Comment out the part that uploads to S3
2. Add save to local file:
```
def save_to_local_file(filename: str, docs: list[dict[str, Any]]) ->
None:
print(f"Writing {len(docs)} documents to {filename}")
with open(filename, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=sorted(docs[0].keys()))
writer.writeheader()
writer.writerows(docs)
print(f"Done writing to {filename}")
```
4. Add this to main:
```
save_to_local_file("metadata.csv", history_log)
save_to_local_file("filenames.csv", filenames)
```
5. Make sure the tutorias_dir and pytorch_doc_dir point to the correct
location of tutorials and pytorch repos.
0 commit comments