docs: Added starter dev notes on push to hugging face hub#355
docs: Added starter dev notes on push to hugging face hub#355
Conversation
Greptile SummaryThis PR adds a new developer notes post covering the The post is well-structured and covers the full feature surface — auth resolution, what gets uploaded, processor first-class treatment, auto-generated dataset cards, and round-trip reproducibility via
|
| Filename | Overview |
|---|---|
| docs/devnotes/posts/push-datasets-to-hugging-face-hub.md | New blog post documenting push_to_hub feature; single marker placed correctly after intro, template path verified as full path, content is accurate and well-structured |
| docs/devnotes/.authors.yml | Added two new blog authors (nmulepati, davanstrien) with correct names, descriptions, and GitHub avatar URLs |
| mkdocs.yml | Added nav entry for the new push-to-hub post in the correct position (most recent first) |
| plans/479/skip-when-conditional-generation.md | Empty file unrelated to this PR, pulled in from main via a merge commit — should be excluded from this changeset |
Sequence Diagram
sequenceDiagram
participant User
participant DataDesigner
participant Results
participant HFHubClient
participant HuggingFaceHub
User->>DataDesigner: create(config_builder, num_records)
DataDesigner-->>Results: results object (parquet + processor files)
alt Happy path
User->>Results: push_to_hub(repo_id, description, tags)
Results->>HFHubClient: push_to_hub_from_folder(dataset_path, repo_id, ...)
else Saved artifacts path
User->>HFHubClient: push_to_hub_from_folder(dataset_path, repo_id, ...)
end
HFHubClient->>HFHubClient: Resolve token (explicit → HF_TOKEN → cached creds)
HFHubClient->>HuggingFaceHub: Upload README.md (dataset card)
HFHubClient->>HuggingFaceHub: Upload data/*.parquet
HFHubClient->>HuggingFaceHub: Upload images/* (if present)
HFHubClient->>HuggingFaceHub: Upload {processor}/* per processor
HFHubClient->>HuggingFaceHub: Upload builder_config.json
HFHubClient->>HuggingFaceHub: Upload metadata.json (paths rewritten)
HuggingFaceHub-->>User: dataset URL
Note over User,HuggingFaceHub: Round-trip: load builder_config.json URL → from_config() → recreate pipeline
Prompt To Fix All With AI
This is a comment left during a code review.
Path: plans/479/skip-when-conditional-generation.md
Line: 1
Comment:
**Unrelated empty file included via merge**
This file is completely empty and unrelated to the push-to-hub docs PR. It appears to have been pulled in through one of the `Merge branch 'main' into ...` commits while syncing the branch. It belongs to a separate planning effort (`plans/479/`) and should not be part of this changeset.
Consider removing it before merging to keep the PR diff focused.
How can I resolve this? If you propose a fix, please make it concise.Reviews (9): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile
dhruvnathawani
left a comment
There was a problem hiding this comment.
Did you use AI for the images?
LGTM
Move the single <\!-- more --> to after the intro paragraph for a shorter blog teaser and remove the 6 redundant markers throughout the post.
@dhruvnathawani, yes! |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* docs: add HF ecosystem context to push-to-hub dev notes Add section on what datasets get on the Hub (Dataset Viewer, streaming, Viewer API), link to Hub search for DataDesigner datasets, and note that private datasets can be flipped to public. * Update docs/devnotes/posts/push-datasets-to-hugging-face-hub.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: remove doubled library: prefix in Hub search URL --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Adds a dev note post to cover
push_to_hubfeature of Data Designer