Dataset viewer workflow refactor and bug fix#270
Conversation
Dataset viewer workflow refactor and bug fix See merge request product/starhub/starhub-server!873
Linter Issue ReportDuring the code review, a list issues were found. These issues could affect the code quality, maintainability, and consistency. Below is the detailed Linter issue report: dataviewer/workflows/activity.goLint Issue: undefined: appendFile
Please make the suggested changes to improve the code quality. |
Review Comments And Suggestions:
MR Evaluation:This feature is still under test, evaluation are given by AI and might be inaccurate. After evaluation, the code changes in the Merge Request get score: 100. TipsCodeReview Commands (invoked as MR or PR comments)
CodeReview Discussion ChatThere are 2 ways to chat with Starship CodeReview:
Note: Be mindful of the bot's finite context window. CodeReview Documentation and Community
About Us:Visit the OpenCSG StarShip website for the Dashboard and detailed information on CodeReview, CodeGen, and other StarShip modules. |
|
The TipsCodeReview Commands (invoked as MR or PR comments)
CodeReview Discussion ChatThere are 2 ways to chat with Starship CodeReview:
Note: Be mindful of the bot's finite context window. CodeReview Documentation and Community
|
What this PR includes:
Enabling Tracing for Temporal Workflow

Refactor
ScanRepoFilesin Dataset Viewer WorkflowsThe
ScanRepoFilesfunction is triggered every time there is a dataset repository git push. When there are a large number of files, it has a significant impact on performance and generates massive logs in Gitaly. Therefore, the function has been modified to use the new Gitaly Tree API to retrieve files. The optimization effect is similar to the tree API optimization PR: when there are many files, the number of Gitaly requests reduces from 100+ to 1, and the time taken reduces from seconds/minutes to microseconds.Fix README Dataset Config Parsing
Hugging Face dataset cards allow the definition of dataset configurations in the README using YAML:
In this case, the
pathuses a wildcard format, and through manual testing, the**syntax is also supported, for example,data/**/*.csv.Issue: The current code uses regex for matching instead of the wildcard format, which is completely incompatible with the Hugging Face syntax.
Fix: The code has been updated to use doublestar for matching (since Go's built-in
filepathdoes not support the**syntax). Relevant tests have also been added.