Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 17, 2025

FDA requested that programs use datasetjson as the source format without converting to RDS, and eliminate RDS usage entirely from the workflow.

Changes

ADaM programs

ADaM programs (adsl.r, adae.r, adadas.r, adlbc.r, adtte.r)

  • Read SDTM data directly from JSON using datasetjson::read_dataset_json()
  • Write output directly to JSON files (no longer create RDS files)
  • Read dependent ADaM datasets from JSON outputs
  • Use new helper function write_dataset_json_with_metadata() for JSON output
  • Added library(datasetjson) imports
  • Updated headers to reflect .json output
TLF programs

TLF programs (tlf-demographic.r, tlf-efficacy.r, tlf-kmplot.r, tlf-primary.r)

  • Read ADaM data from JSON output files
Documentation

Documentation (adrg/_adrg.qmd)

  • Removed JSON-to-RDS conversion step from workflow instructions
  • Updated process descriptions to reflect that ADaM programs write JSON directly
  • Updated output descriptions to reference datasetjson files
Helpers

Helpers (pilot5-helper-fcns.r, run-all-adams-tlfs.r)

  • Created write_dataset_json_with_metadata() function - Extracted JSON writing logic from convert_rds_to_json.r into a reusable function that takes dataset dataframe, metacore spec, dataset name, and output path. Eliminates code duplication across ADaM programs.
  • Deprecated convert_json_to_rds() function with warning
  • Updated run-all script header to reflect JSON outputs

Cleanup

  • Deleted 43 SDTM RDS files that were converted from JSON

Example

Before (reading):

datasets <- map(
  dat_to_load,
  ~ convert_blanks_to_na(readRDS(file.path(path$sdtm, paste0(.x, ".rds"))))
)

After (reading):

library(datasetjson)

datasets <- map(
  dat_to_load,
  ~ convert_blanks_to_na(read_dataset_json(file.path(path$sdtm, paste0(.x, ".json")), decimals_as_floats = TRUE))
)

Before (writing):

saveRDS(adsl, file.path(path$adam, "adsl.rds"))

# Plus 45+ lines of metadata preparation code...

After (writing):

write_dataset_json_with_metadata(adsl, adsl_spec, "adsl", path$adam_json)

Workflow

Complete datasetjson workflow with no RDS files:

  • SDTM datasetjson → ADaM programs → ADaM datasetjson → TLF programs → Outputs
  • convert_rds_to_json.r is now obsolete as ADaM programs write JSON directly
  • Reduced code duplication by 235 lines through helper function
Original prompt

This section details on the original issue you should resolve

<issue_title>Purge use of rds based on feedback from FDA</issue_title>
<issue_description>FDA just wants us to use datasetjson as our source and not convert the datasetjson to rds in our programs.</issue_description>

Comments on the Issue (you are @copilot in this section)


Compare Combined Reports ## Compare Combined Reports

Here's a comparison report detailing the differences between pilot5-cmb-report-manual (3).pdf and pilot5-cmb-report-manual (4).pdf, broken down by each table and figure.

Overall Report Header Difference: The primary difference between the two files is the generation date of the overall report.

pilot5-cmb-report-manual (3).pdf is dated 2025-09-29. [cite: pilot5-cmb-report-manual (3).pdf]

pilot5-cmb-report-manual (4).pdf is dated 2026-01-07. [cite: pilot5-cmb-report-manual (4).pdf]

Table 14-2.01 Summary of Demographic and Baseline Characteristics

Differences Found:

The "Program:" generation timestamp differs between the two files.

In pilot5-cmb-report-manual (3).pdf, it states 2025-09-29 20:13:22.834974. [cite: pilot5-cmb-report-manual (3).pdf]

In pilot5-cmb-report-manual (4).pdf, it states 2026-01-07 19:37:03.11848. [cite: pilot5-cmb-report-manual (4).pdf]

pilot5-cmb-report-manual (4).pdf includes two "Warning in lapply(d[dbl_cols], as.double): NAs introduced by coercion" messages above the table, which are not present in pilot5-cmb-report-manual (3).pdf. [cite: pilot5-cmb-report-manual (4).pdf]

No Differences Found:

All numerical demographic and baseline characteristics data presented in the table (e.g., Age, Baseline Height, Baseline Weight, Baseline BMI, MMSE Total, Pooled Age Group counts, Race counts, N values for treatment groups) are identical in both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]

Table 14-3.01 Primary Endpoint Analysis: ADAS Cog (11) - Change from Baseline to Week 24 - LOCF

Differences Found:

The "Source:" generation timestamp differs between the two files.

In pilot5-cmb-report-manual (3).pdf, it states 20:03 Monday, September 29, 2025. [cite: pilot5-cmb-report-manual (3).pdf]

In pilot5-cmb-report-manual (4).pdf, it states 19:19 Wednesday, January 07, 2026. [cite: pilot5-cmb-report-manual (4).pdf]

No Differences Found:

All numerical data for Baseline, Week 24, and Change from Baseline (n, Mean (SD), Median (Range)), as well as p-values and Differences of LS Means, are identical in both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]

Table 14-3.02 Primary Endpoint Analysis: Glucose (mmol/L) - Summary at Week 20 - LOCF

Differences Found:

The "Table generated on:" timestamp differs between the two files.

In pilot5-cmb-report-manual (3).pdf, it states 2025-09-29 20:03:24.216929. [cite: pilot5-cmb-report-manual (3).pdf]

In pilot5-cmb-report-manual (4).pdf, it states 2026-01-07 19:19:04.70865. [cite: pilot5-cmb-report-manual (4).pdf]

No Differences Found:

All numerical data for Baseline, Week 20, Change from Baseline (N, Mean (SD)), LS Mean, and Pairwise Comparison results are identical in both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]

Figure 14-1 Time to Dermatologic Event by Treatment Group KM plot

Differences Found:

The "Program:" generation timestamp differs between the two files.

In pilot5-cmb-report-manual (3).pdf, it states [2025−09−11 12:59:06.233846]. [cite: pilot5-cmb-report-manual (3).pdf]

In pilot5-cmb-report-manual (4).pdf, it states [2026−01−07 19:19:07.024642]. [cite: pilot5-cmb-report-manual (4).pdf]

No Differences Found:

The "At Risk" numbers for each treatment group and time point are identical across both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]

The visual representation of the Kaplan-Meier plot itself appears identical. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]

@bms63
Copy link
Contributor

bms63 commented Dec 17, 2025

@copilot - This Pull Request should remove the conversions of the datasetjson files to rds files. The programs should all ingest the datastjson files, e.g. sdtms should use datasetjson files for the ADaM programs. The Table and Figure programs should use the ADaM datasetjson files.

Copilot AI changed the title [WIP] Remove conversion of datasetjson to rds Remove RDS conversion for datasetjson inputs per FDA feedback Dec 17, 2025
Copilot AI requested a review from bms63 December 17, 2025 21:56
Copy link
Contributor Author

Copilot AI commented Dec 17, 2025

@copilot - This Pull Request should remove the conversions of the datasetjson files to rds files. The programs should all ingest the datastjson files, e.g. sdtms should use datasetjson files for the ...

Done. All programs now ingest and output datasetjson files exclusively:

  • ADaM programs read SDTM datasetjson files and write ADaM datasetjson files directly (commits 1bb83e6, 1812f9d)
  • TLF programs read ADaM datasetjson files
  • Removed all RDS file creation and usage
  • convert_rds_to_json.r is now obsolete as ADaMs write JSON directly

Complete workflow: SDTM JSON → ADaM programs → ADaM JSON → TLF programs

Copilot AI changed the title Remove RDS conversion for datasetjson inputs per FDA feedback Completely remove RDS format usage per FDA feedback Dec 17, 2025
@bms63
Copy link
Contributor

bms63 commented Dec 17, 2025

@copilot there is a program called convert_rds_to_json.r that used to take the rds files created in the ad* programs. Can you make this program into a function that can be called in each the adam programs to take the dataframe and convert to datajson. Please take into consideration that the metacore object also needs to be accessed as well as seen in the original program convert_rds_to_json.r.

Copy link
Contributor Author

Copilot AI commented Dec 17, 2025

@copilot there is a program called convert_rds_to_json.r that used to take the rds files created in the ad* programs. Can you make this program into a function that can be called in each the adam pro...

Done. Created write_dataset_json_with_metadata() function in pilot5-helper-fcns.r that encapsulates the logic from convert_rds_to_json.r (commit b3e2e43).

The function:

  • Takes dataset dataframe, metacore spec object, dataset name, and output path
  • Handles all column metadata preparation using the metacore spec
  • Writes properly formatted datasetjson files

All 5 ADaM programs now use this function, eliminating 235 lines of duplicated code.

@github-actions
Copy link

github-actions bot commented Dec 18, 2025

Dataset QC Check Complete!

Date: 2026-01-08
Run by: bms63
Git Ref: refs/pull/136/merge
BASE: Pilot 5 Datasets
COMPARE: Pilot 3 Datasets

✅ Dataset: adadas

No issues were found!
✅ Dataset: adae

No issues were found!
✅ Dataset: adlbc

No issues were found!
✅ Dataset: adsl

No issues were found!
✅ Dataset: adtte

No issues were found!

@bms63 bms63 marked this pull request as ready for review December 18, 2025 20:45
@bms63
Copy link
Contributor

bms63 commented Dec 18, 2025

@RConsortium/pilot5 finally getting round to this - copilot is rescuing me here!!

@elimillera any ideas on the TLF Check fix

@parmsam-pfizer please start reviewing - I'm testing things out, but looks promising.

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

TLF QC Check Complete!

Date: 2026-01-08
Run by: bms63
Git Ref: refs/pull/136/merge
BASE: Pilot 5 Datasets
COMPARE: Pilot 3 Datasets

✅ Dataset: tlf-demographic-pilot5.out

old[36:40] vs new[36:40]
  "—————————————————————————————————————————————————————————————————————————————————————————————————"
  ""
  "Program: tlf-demographic.r "
- "2026-01-08 02:22:28.583663"
+ "2024-04-18 20:17:25"
  ""
✅ TLF: tlf-kmplot-pilot5.pdf - Score: 75

[1] "1. Title differences: First image has 'KM plot for Time to First Dermatologic Event: Safety population' while second image has 'KM plot for Time to First Dermatologic Event: Safety population' (same content but potentially different font/rendering)\n\n2. Y-axis label: First image shows 'Survival Percent' while second image shows 'Probability of event'\n\n3. X-axis label: First image shows 'Time to First Dermatologic Event (Days)' while second image shows 'Time to First Dermatologic Event (Days)' with slightly different positioning\n\n4. Grid lines: First image has minimal gridlines, second image has prominent gray gridlines across the entire plot area\n\n5. Legend position: First image has legend on the right side, second image has legend labeled 'TRT01A' on the right side\n\n6. Legend styling: Second image has a box around the legend, first image does not\n\n7. Confidence intervals shading: Both have shaded regions but with different transparency/color intensity\n\n8. Reference line: First image has a dotted horizontal line, second image has a dashed horizontal line at the 0.50 mark\n\n9. Data table below: Different formatting and layout - first image has horizontal orientation, second image appears more compact\n\n10. At risk numbers: Different formatting in the tables below the graphs\n\n11. Footer text: First image shows 'Program ID: trt-01-07-19-017-126N02', second image shows 'Program: tfl--km plot.r [2024--04--18 20:17:28]'\n\n12. Overall plot dimensions and aspect ratio appear slightly different\n\n13. Font sizes and styling differ between the two images\n\n14. The curves themselves follow the same general pattern but may have slight rendering differences"

✅ TLF: tlf-efficacy-pilot5.rtf - Score: 98

[1] "After careful examination of both images, I found the following differences:\n\n1. Footer timestamp: The first image shows 'Report generated on: 15Jan2019 08:13' while the second image shows 'Report generated on: 15Oct2019 20:17' - indicating different generation dates and times (January vs October, different times)\n\n2. All other elements appear identical including:\n - Table title 'ANCOVA of Change from Baseline at Week 20'\n - All column headers and structure\n - All numerical data values in the table\n - Treatment groups and sample sizes (N values)\n - Mean, standard deviation, and confidence interval values\n - Footnotes and table structure\n - Overall layout and formatting"

✅ TLF: tlf-primary-pilot5.rtf - Score: 99

[1] "After careful comparison of both images, I found only one difference:\n\n1. In the footnote text at the bottom of the second table (Table 14.3.4.2), there is a subtle text difference. The first image states 'SE=Standard error; CI=Confidence interval; Based on ANCOVA model; DV=Primary end baseline value as a covariate' while the second image states 'SE=Standard error; CI=Confidence interval; Based on ANCOVA model; DV=Primary end baseline value as a covariate'. Upon very close inspection, the word 'DV=Primary' appears to have slightly different spacing or rendering.\n\nAll other elements are identical including:\n- Both tables have the same title 'Protocol: CS3150/98/Page 1 of 2 Population: FASOT'\n- Table 14.3.4.1 content is identical in both images\n- Table 14.3.4.2 content is identical in both images\n- All numerical values match exactly\n- Column headers are identical\n- Sample sizes (N) are the same\n- All statistical values (means, standard errors, confidence intervals) are identical\n- Layout and formatting are the same\n- Both show 'Primary Endpoint Analysis: ASES total change from Baseline at Week 24 - LDCF'"

@parmsam-pfizer
Copy link
Collaborator

Looks good to me. I was initially concerned about how write_dataset_json_with_metadata() was implemented but it seems fine.

@parmsam-pfizer
Copy link
Collaborator

parmsam-pfizer commented Jan 7, 2026

Maybe one of us can try going through the new ADRG instructions tomorrow after this PR is merged and the ECTD repo is updated: https://github.com/RConsortium/submissions-pilot5-datasetjson-to-fda

@bms63
Copy link
Contributor

bms63 commented Jan 7, 2026

Maybe one of us can try going through the new ADRG instructions tomorrow after this PR is merged and the ECTD repo is updated: https://github.com/RConsortium/submissions-pilot5-datasetjson-to-fda

Yes please!!

@robertdevine
Copy link
Collaborator

robertdevine commented Jan 8, 2026

@bms63, @parmsam-pfizer, @RConsortium/pilot5 - picking up a few warnings in the combined report run on posit cloud.
I'll test Win 10/11 run on J&J laptops before this week's meetings.

image image

Linux:
pilot5-cmb-report-manual.pdf

@parmsam-pfizer
Copy link
Collaborator

parmsam-pfizer commented Jan 9, 2026

I can try running the ADRG instructions once the PR and Github Action are done.

@parmsam-pfizer
Copy link
Collaborator

Is the PR ready to merge?

@bms63
Copy link
Contributor

bms63 commented Jan 9, 2026

@rpodcast just FYI there are no rds files in this bundle...i forgot we take the json files and convert them

so this is good to go!! @parmsam-pfizer !!

@bms63 bms63 merged commit d9669eb into main Jan 9, 2026
8 checks passed
@bms63 bms63 deleted the copilot/purge-rds-usage branch January 9, 2026 21:53
@github-project-automation github-project-automation bot moved this from Todo to Done in Pilot 5: datasetjson Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Purge use of rds based on feedback from FDA

4 participants