Closes #221: Feature request - Provide CSV files by Gero1999 · Pull Request #228 · pharmaverse/pharmaversesdtm

Gero1999 · 2026-01-31T06:55:16Z

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

Implementation description

This pull request introduces the automated export of SDTM datasets to CSV format and adds two example CSV files to the repository. The changes improve reproducibility and make example datasets more accessible for external use or testing.

The data generation script (data-raw/create_sdtms_data.R) now saves a CSV version of each dataset to the inst/extdata/ directory, making the datasets directly available for non R-programmers.
The .Rbuildignore file is updated to exclude all CSV files in inst/extdata/ from the R package build, ensuring these files are not included in the built package by default for CRAN submissions.

bundfussr · 2026-02-03T14:19:47Z

I wonder if we should use DatasetJSON instead of CSV. Then we wouldn't lose the labels. And if we want to do the same in pharmaverseadam, it would be easier to handle date, datetime, and time variables.

What do you think?

Gero1999 · 2026-02-03T19:01:04Z

This is actually a very nice idea, you are totally right!

manciniedoardo · 2026-02-04T13:20:05Z

@hski-github what do you think about the proposal to use dataset json instead? would this still suit your needs?

hski-bayer · 2026-02-04T14:29:34Z

There is pandas.read_json(URL). But the Json need to follow certain structures. There is 'split', 'records' and 'index'.

See https://pandas.pydata.org/docs/reference/api/pandas.read_json.html

But I don't see how meta data could be preserved resp utilized directly from this Json. In pandas.read_json there is parameter dtype which is used for defining datatypes, but this would require this Information in a separate file I guess.

bundfussr · 2026-02-04T17:11:12Z

There is pandas.read_json(URL). But the Json need to follow certain structures. There is 'split', 'records' and 'index'.

See https://pandas.pydata.org/docs/reference/api/pandas.read_json.html

But I don't see how meta data could be preserved resp utilized directly from this Json. In pandas.read_json there is parameter dtype which is used for defining datatypes, but this would require this Information in a separate file I guess.

It seems that there is not much support for python at the moment. The CDISC pilot focused more on SAS and R (see https://www.cdisc.org/sites/default/files/2023-10/2023-cdisc-dataset-json-plenary-v5_0.pdf).

I don't think that you can use pandas.read_json(URL) directly with a Dataset-JSON file. The structure is similar to orient = 'split' but doesn't match exactly. You would need a python module which provides similar functionality as the datasetjson R package. As a work-around you could use dataset-json to convert into XPT and then read them in python.

hski-bayer · 2026-02-04T19:49:54Z

Okay, agreed. Then please go back to the original proposal to create CSV, because that would nicely work with Python pandas.read_csv and then pharmaverse datasets can be used in Python.

Gero1999 · 2026-02-04T20:28:07Z

Yep right, we can provide CSV then. But I think it would still be worthy to consider the idea of creating another issue to provide datasetJSON files as well. What do you think @manciniedoardo @bundfussr ?

In the meantime, I will create a package to allow the Python community to deal with datasetJSON as well as any other bio/pharma JSON standard. It might take me a bit to make something solid and publish, but I think it might be worthy

bundfussr · 2026-02-05T08:28:49Z

Yep right, we can provide CSV then. But I think it would still be worthy to consider the idea of creating another issue to provide datasetJSON files as well. What do you think @manciniedoardo @bundfussr ?

In the meantime, I will create a package to allow the Python community to deal with datasetJSON as well as any other bio/pharma JSON standard. It might take me a bit to make something solid and publish, but I think it might be worthy

Yes, I think it makes sense to create CSVs as a temporary solution. Once the python package is available we can replace the CSV files with dataset-JSON files and then also provide dataset-JSON files in pharmaverseadam.

bundfussr

Could you add an item to the changelog?

I also think we should mention and link the CSV files somewhere on the webpage. @Lina2689 , @manciniedoardo , @Fanny-Gautier , any ideas?

manciniedoardo · 2026-02-06T12:51:28Z

I also think we should mention and link the CSV files somewhere on the webpage. @Lina2689 , @manciniedoardo , @Fanny-Gautier , any ideas?

Yes, somewhere near the top of the readme? maybe the data sources section could be renamed to "data" and then you could have subsections for data sources and data formats.

i also think the "How to update" section should also mention that csv versions of the datasets are also saved - what do you think @Lina2689?

Lina2689 · 2026-02-17T12:31:44Z

create CSVs as a temporary solution. Once the python package is available we can replace the CSV files with dataset-JSON files and then also provide dataset-

Yeah, linking the CSV files on the webpage would be super helpful! We could add the link near the top, as suggested by @manciniedoardo. And, mentioning the point for CSV versions in the 'How to update' section is definitely useful for users who prefer that format.

manciniedoardo

Looking good, just left some comments and will leave @Lina2689 to do the final review/approval - thanks

NEWS.md

manciniedoardo · 2026-02-18T14:18:42Z

README.md

@@ -1,5 +1,7 @@
 # pharmaversesdtm <img src="man/figures/logo.png" align="right" width="200" style="margin-left:50px;" alt="pharmaverse sdtm hex"/>

+> <sup>Interactive data exploration: <a href="https://pharmaverse.github.io/pharmaversesdtm/articles/preview-sdtm.html">Preview SDTM vignette</a></sup>


Should this be part of this PR? @Lina2689

good catch, that is my bad I'll remove it later

Co-authored-by: Edoardo Mancini <53403957+manciniedoardo@users.noreply.github.com>

Fanny-Gautier

Minor typo to correct in README. Thank you for the implementation!

README.md

Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>

README.md

Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com>

README.md

Lina2689 · 2026-02-24T12:26:38Z

NEWS.md

+
+## Documentation
+
+- Included CSV versions of all SDTM data under `extdata/sdtm-csv/` for ease of use of non R programmers. (#221)


Please update the folder path, you are saving the csv files under inst/extdata and here the its mentioned under extdata/sdtm-csv/.

Co-authored-by: Lina Patil <157117024+Lina2689@users.noreply.github.com>

remote-tracking branch 'origin/main' into 221-feature-request-provide-csv-files

Gero1999 added 2 commits January 31, 2026 07:42

add CSV generation in create_sdtms_data.R (inst/extdata folder)

9c12f0c

add in .Rbuildignore CSV files

92ee61f

Gero1999 linked an issue Jan 31, 2026 that may be closed by this pull request

Feature Request: Provide CSV files #221

Open

Gero1999 marked this pull request as ready for review January 31, 2026 07:44

Gero1999 marked this pull request as draft February 3, 2026 19:01

Gero1999 marked this pull request as ready for review February 4, 2026 20:28

Gero1999 requested review from bundfussr and manciniedoardo February 4, 2026 20:28

bundfussr reviewed Feb 5, 2026

View reviewed changes

bump pkg dev version & add news in documentation

52375b3

manciniedoardo requested a review from Lina2689 February 6, 2026 12:50

readme: add CSV info for data & how-to-update sections

dec79f1

Gero1999 requested review from Lina2689, bundfussr and manciniedoardo and removed request for Lina2689 and manciniedoardo February 18, 2026 13:30

manciniedoardo reviewed Feb 18, 2026

View reviewed changes

Update NEWS.md

d3d2aa6

Co-authored-by: Edoardo Mancini <53403957+manciniedoardo@users.noreply.github.com>

Fanny-Gautier approved these changes Feb 18, 2026

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

d9cd254

Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>

bundfussr reviewed Feb 18, 2026

View reviewed changes

README.md Outdated Show resolved Hide resolved

Gero1999 and others added 3 commits February 18, 2026 21:10

readme: change links to https paths

4e1dbc0

Co-authored-by: Stefan Bundfuss <80953585+bundfussr@users.noreply.github.com>

rm interactive data exploration (change from different branch)

f1bc35e

spelling: update wordlist

ea90f97

Gero1999 requested review from bundfussr and manciniedoardo February 18, 2026 20:27

Lina2689 requested changes Feb 24, 2026

View reviewed changes

Gero1999 and others added 4 commits February 28, 2026 00:27

Apply suggestions from code review Lina2689

88f53a7

Co-authored-by: Lina Patil <157117024+Lina2689@users.noreply.github.com>

change NEWS.md: extdata/sdtm-csv/ > inst/extdata

f2b7cc6

Merge accept incoming or both changes

a4b1c42

remote-tracking branch 'origin/main' into 221-feature-request-provide-csv-files

bump version to 1.4.0.9002

e1cef5c

Gero1999 requested a review from Lina2689 February 27, 2026 23:38

		@@ -1,5 +1,7 @@
		# pharmaversesdtm <img src="man/figures/logo.png" align="right" width="200" style="margin-left:50px;" alt="pharmaverse sdtm hex"/>

		> <sup>Interactive data exploration: <a href="https://pharmaverse.github.io/pharmaversesdtm/articles/preview-sdtm.html">Preview SDTM vignette</a></sup>


		## Documentation

		- Included CSV versions of all SDTM data under `extdata/sdtm-csv/` for ease of use of non R programmers. (#221)

Conversation

Gero1999 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation description

Uh oh!

bundfussr commented Feb 3, 2026

Uh oh!

Gero1999 commented Feb 3, 2026

Uh oh!

manciniedoardo commented Feb 4, 2026

Uh oh!

hski-bayer commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bundfussr commented Feb 4, 2026

Uh oh!

hski-bayer commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gero1999 commented Feb 4, 2026

Uh oh!

bundfussr commented Feb 5, 2026

Uh oh!

bundfussr left a comment

Choose a reason for hiding this comment

Uh oh!

manciniedoardo commented Feb 6, 2026

Uh oh!

Lina2689 commented Feb 17, 2026

Uh oh!

manciniedoardo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

manciniedoardo Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Gero1999 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Fanny-Gautier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lina2689 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Gero1999 commented Jan 31, 2026 •

edited

Loading

hski-bayer commented Feb 4, 2026 •

edited

Loading

hski-bayer commented Feb 4, 2026 •

edited

Loading