Closes #221: Feature request - Provide CSV files#228
Conversation
|
I wonder if we should use DatasetJSON instead of CSV. Then we wouldn't lose the labels. And if we want to do the same in pharmaverseadam, it would be easier to handle date, datetime, and time variables. What do you think? |
|
This is actually a very nice idea, you are totally right! |
|
@hski-github what do you think about the proposal to use dataset json instead? would this still suit your needs? |
|
There is pandas.read_json(URL). But the Json need to follow certain structures. There is 'split', 'records' and 'index'. See https://pandas.pydata.org/docs/reference/api/pandas.read_json.html But I don't see how meta data could be preserved resp utilized directly from this Json. In pandas.read_json there is parameter dtype which is used for defining datatypes, but this would require this Information in a separate file I guess. |
It seems that there is not much support for python at the moment. The CDISC pilot focused more on SAS and R (see https://www.cdisc.org/sites/default/files/2023-10/2023-cdisc-dataset-json-plenary-v5_0.pdf). I don't think that you can use |
|
Okay, agreed. Then please go back to the original proposal to create CSV, because that would nicely work with Python pandas.read_csv and then pharmaverse datasets can be used in Python. |
|
Yep right, we can provide CSV then. But I think it would still be worthy to consider the idea of creating another issue to provide datasetJSON files as well. What do you think @manciniedoardo @bundfussr ? In the meantime, I will create a package to allow the Python community to deal with |
Yes, I think it makes sense to create CSVs as a temporary solution. Once the python package is available we can replace the CSV files with dataset-JSON files and then also provide dataset-JSON files in pharmaverseadam. |
bundfussr
left a comment
There was a problem hiding this comment.
Could you add an item to the changelog?
I also think we should mention and link the CSV files somewhere on the webpage. @Lina2689 , @manciniedoardo , @Fanny-Gautier , any ideas?
Yes, somewhere near the top of the readme? maybe the data sources section could be renamed to "data" and then you could have subsections for data sources and data formats.
i also think the "How to update" section should also mention that csv versions of the datasets are also saved - what do you think @Lina2689? |
Yeah, linking the CSV files on the webpage would be super helpful! We could add the link near the top, as suggested by @manciniedoardo. And, mentioning the point for CSV versions in the 'How to update' section is definitely useful for users who prefer that format. |
manciniedoardo
left a comment
There was a problem hiding this comment.
Looking good, just left some comments and will leave @Lina2689 to do the final review/approval - thanks
| @@ -1,5 +1,7 @@ | |||
| # pharmaversesdtm <img src="man/figures/logo.png" align="right" width="200" style="margin-left:50px;" alt="pharmaverse sdtm hex"/> | |||
|
|
|||
| > <sup>Interactive data exploration: <a href="https://pharmaverse.github.io/pharmaversesdtm/articles/preview-sdtm.html">Preview SDTM vignette</a></sup> | |||
There was a problem hiding this comment.
good catch, that is my bad I'll remove it later
Co-authored-by: Edoardo Mancini <53403957+manciniedoardo@users.noreply.github.com>
Fanny-Gautier
left a comment
There was a problem hiding this comment.
Minor typo to correct in README. Thank you for the implementation!
Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>
Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com>
NEWS.md
Outdated
|
|
||
| ## Documentation | ||
|
|
||
| - Included CSV versions of all SDTM data under `extdata/sdtm-csv/` for ease of use of non R programmers. (#221) |
There was a problem hiding this comment.
Please update the folder path, you are saving the csv files under inst/extdata and here the its mentioned under extdata/sdtm-csv/.
Co-authored-by: Lina Patil <157117024+Lina2689@users.noreply.github.com>
remote-tracking branch 'origin/main' into 221-feature-request-provide-csv-files

Implementation description
This pull request introduces the automated export of SDTM datasets to CSV format and adds two example CSV files to the repository. The changes improve reproducibility and make example datasets more accessible for external use or testing.
The data generation script (
data-raw/create_sdtms_data.R) now saves a CSV version of each dataset to theinst/extdata/directory, making the datasets directly available for non R-programmers.The
.Rbuildignorefile is updated to exclude all CSV files ininst/extdata/from the R package build, ensuring these files are not included in the built package by default for CRAN submissions.styler::style_file()to style R and Rmd filesdevtools::document()so all.Rdfiles in themanfolder and theNAMESPACEfile in the project root are updated appropriatelyNEWS.mdif the changes pertain to a user-facing function (i.e. it has an@exporttag) or documentation aimed at users (rather than developers)pkgdown::build_site()and check that all affected examples are displayed correctly and that all new functions occur on the "Reference" page.lintr::lint_package()R CMD checklocally and address all errors and warnings -devtools::check()