-
Notifications
You must be signed in to change notification settings - Fork 5
Completely remove RDS format usage per FDA feedback #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@copilot - This Pull Request should remove the conversions of the datasetjson files to rds files. The programs should all ingest the datastjson files, e.g. sdtms should use datasetjson files for the ADaM programs. The Table and Figure programs should use the ADaM datasetjson files. |
Co-authored-by: bms63 <[email protected]>
Co-authored-by: bms63 <[email protected]>
Co-authored-by: bms63 <[email protected]>
Co-authored-by: bms63 <[email protected]>
Co-authored-by: bms63 <[email protected]>
Co-authored-by: bms63 <[email protected]>
Done. All programs now ingest and output datasetjson files exclusively:
Complete workflow: SDTM JSON → ADaM programs → ADaM JSON → TLF programs |
|
@copilot there is a program called convert_rds_to_json.r that used to take the rds files created in the ad* programs. Can you make this program into a function that can be called in each the adam programs to take the dataframe and convert to datajson. Please take into consideration that the metacore object also needs to be accessed as well as seen in the original program convert_rds_to_json.r. |
Co-authored-by: bms63 <[email protected]>
Done. Created The function:
All 5 ADaM programs now use this function, eliminating 235 lines of duplicated code. |
Dataset QC Check Complete!Date: 2026-01-08 ✅ Dataset: adadas✅ Dataset: adae✅ Dataset: adlbc✅ Dataset: adsl✅ Dataset: adtte |
|
@RConsortium/pilot5 finally getting round to this - copilot is rescuing me here!! @elimillera any ideas on the TLF Check fix @parmsam-pfizer please start reviewing - I'm testing things out, but looks promising. |
TLF QC Check Complete!Date: 2026-01-08 ✅ Dataset: tlf-demographic-pilot5.out✅ TLF: tlf-kmplot-pilot5.pdf - Score: 75[1] "1. Title differences: First image has 'KM plot for Time to First Dermatologic Event: Safety population' while second image has 'KM plot for Time to First Dermatologic Event: Safety population' (same content but potentially different font/rendering)\n\n2. Y-axis label: First image shows 'Survival Percent' while second image shows 'Probability of event'\n\n3. X-axis label: First image shows 'Time to First Dermatologic Event (Days)' while second image shows 'Time to First Dermatologic Event (Days)' with slightly different positioning\n\n4. Grid lines: First image has minimal gridlines, second image has prominent gray gridlines across the entire plot area\n\n5. Legend position: First image has legend on the right side, second image has legend labeled 'TRT01A' on the right side\n\n6. Legend styling: Second image has a box around the legend, first image does not\n\n7. Confidence intervals shading: Both have shaded regions but with different transparency/color intensity\n\n8. Reference line: First image has a dotted horizontal line, second image has a dashed horizontal line at the 0.50 mark\n\n9. Data table below: Different formatting and layout - first image has horizontal orientation, second image appears more compact\n\n10. At risk numbers: Different formatting in the tables below the graphs\n\n11. Footer text: First image shows 'Program ID: trt-01-07-19-017-126N02', second image shows 'Program: tfl--km plot.r [2024--04--18 20:17:28]'\n\n12. Overall plot dimensions and aspect ratio appear slightly different\n\n13. Font sizes and styling differ between the two images\n\n14. The curves themselves follow the same general pattern but may have slight rendering differences" ✅ TLF: tlf-efficacy-pilot5.rtf - Score: 98[1] "After careful examination of both images, I found the following differences:\n\n1. Footer timestamp: The first image shows 'Report generated on: 15Jan2019 08:13' while the second image shows 'Report generated on: 15Oct2019 20:17' - indicating different generation dates and times (January vs October, different times)\n\n2. All other elements appear identical including:\n - Table title 'ANCOVA of Change from Baseline at Week 20'\n - All column headers and structure\n - All numerical data values in the table\n - Treatment groups and sample sizes (N values)\n - Mean, standard deviation, and confidence interval values\n - Footnotes and table structure\n - Overall layout and formatting" ✅ TLF: tlf-primary-pilot5.rtf - Score: 99[1] "After careful comparison of both images, I found only one difference:\n\n1. In the footnote text at the bottom of the second table (Table 14.3.4.2), there is a subtle text difference. The first image states 'SE=Standard error; CI=Confidence interval; Based on ANCOVA model; DV=Primary end baseline value as a covariate' while the second image states 'SE=Standard error; CI=Confidence interval; Based on ANCOVA model; DV=Primary end baseline value as a covariate'. Upon very close inspection, the word 'DV=Primary' appears to have slightly different spacing or rendering.\n\nAll other elements are identical including:\n- Both tables have the same title 'Protocol: CS3150/98/Page 1 of 2 Population: FASOT'\n- Table 14.3.4.1 content is identical in both images\n- Table 14.3.4.2 content is identical in both images\n- All numerical values match exactly\n- Column headers are identical\n- Sample sizes (N) are the same\n- All statistical values (means, standard errors, confidence intervals) are identical\n- Layout and formatting are the same\n- Both show 'Primary Endpoint Analysis: ASES total change from Baseline at Week 24 - LDCF'" |
|
Looks good to me. I was initially concerned about how |
|
Maybe one of us can try going through the new ADRG instructions tomorrow after this PR is merged and the ECTD repo is updated: https://github.com/RConsortium/submissions-pilot5-datasetjson-to-fda |
Yes please!! |
|
@bms63, @parmsam-pfizer, @RConsortium/pilot5 - picking up a few warnings in the combined report run on posit cloud.
Linux: |
|
I can try running the ADRG instructions once the PR and Github Action are done. |
|
Is the PR ready to merge? |
|
@rpodcast just FYI there are no rds files in this bundle...i forgot we take the json files and convert them so this is good to go!! @parmsam-pfizer !! |


FDA requested that programs use datasetjson as the source format without converting to RDS, and eliminate RDS usage entirely from the workflow.
Changes
ADaM programs
ADaM programs (adsl.r, adae.r, adadas.r, adlbc.r, adtte.r)
datasetjson::read_dataset_json()write_dataset_json_with_metadata()for JSON outputlibrary(datasetjson)imports.jsonoutputTLF programs
TLF programs (tlf-demographic.r, tlf-efficacy.r, tlf-kmplot.r, tlf-primary.r)
Documentation
Documentation (adrg/_adrg.qmd)
Helpers
Helpers (pilot5-helper-fcns.r, run-all-adams-tlfs.r)
write_dataset_json_with_metadata()function - Extracted JSON writing logic fromconvert_rds_to_json.rinto a reusable function that takes dataset dataframe, metacore spec, dataset name, and output path. Eliminates code duplication across ADaM programs.convert_json_to_rds()function with warningCleanup
Example
Before (reading):
After (reading):
Before (writing):
After (writing):
Workflow
Complete datasetjson workflow with no RDS files:
convert_rds_to_json.ris now obsolete as ADaM programs write JSON directlyOriginal prompt
Compare Combined Reports
## Compare Combined ReportsHere's a comparison report detailing the differences between pilot5-cmb-report-manual (3).pdf and pilot5-cmb-report-manual (4).pdf, broken down by each table and figure.
Overall Report Header Difference: The primary difference between the two files is the generation date of the overall report.
pilot5-cmb-report-manual (3).pdf is dated 2025-09-29. [cite: pilot5-cmb-report-manual (3).pdf]
pilot5-cmb-report-manual (4).pdf is dated 2026-01-07. [cite: pilot5-cmb-report-manual (4).pdf]
Table 14-2.01 Summary of Demographic and Baseline Characteristics
Differences Found:
The "Program:" generation timestamp differs between the two files.
In pilot5-cmb-report-manual (3).pdf, it states 2025-09-29 20:13:22.834974. [cite: pilot5-cmb-report-manual (3).pdf]
In pilot5-cmb-report-manual (4).pdf, it states 2026-01-07 19:37:03.11848. [cite: pilot5-cmb-report-manual (4).pdf]
pilot5-cmb-report-manual (4).pdf includes two "Warning in lapply(d[dbl_cols], as.double): NAs introduced by coercion" messages above the table, which are not present in pilot5-cmb-report-manual (3).pdf. [cite: pilot5-cmb-report-manual (4).pdf]
No Differences Found:
All numerical demographic and baseline characteristics data presented in the table (e.g., Age, Baseline Height, Baseline Weight, Baseline BMI, MMSE Total, Pooled Age Group counts, Race counts, N values for treatment groups) are identical in both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]
Table 14-3.01 Primary Endpoint Analysis: ADAS Cog (11) - Change from Baseline to Week 24 - LOCF
Differences Found:
The "Source:" generation timestamp differs between the two files.
In pilot5-cmb-report-manual (3).pdf, it states 20:03 Monday, September 29, 2025. [cite: pilot5-cmb-report-manual (3).pdf]
In pilot5-cmb-report-manual (4).pdf, it states 19:19 Wednesday, January 07, 2026. [cite: pilot5-cmb-report-manual (4).pdf]
No Differences Found:
All numerical data for Baseline, Week 24, and Change from Baseline (n, Mean (SD), Median (Range)), as well as p-values and Differences of LS Means, are identical in both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]
Table 14-3.02 Primary Endpoint Analysis: Glucose (mmol/L) - Summary at Week 20 - LOCF
Differences Found:
The "Table generated on:" timestamp differs between the two files.
In pilot5-cmb-report-manual (3).pdf, it states 2025-09-29 20:03:24.216929. [cite: pilot5-cmb-report-manual (3).pdf]
In pilot5-cmb-report-manual (4).pdf, it states 2026-01-07 19:19:04.70865. [cite: pilot5-cmb-report-manual (4).pdf]
No Differences Found:
All numerical data for Baseline, Week 20, Change from Baseline (N, Mean (SD)), LS Mean, and Pairwise Comparison results are identical in both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]
Figure 14-1 Time to Dermatologic Event by Treatment Group KM plot
Differences Found:
The "Program:" generation timestamp differs between the two files.
In pilot5-cmb-report-manual (3).pdf, it states [2025−09−11 12:59:06.233846]. [cite: pilot5-cmb-report-manual (3).pdf]
In pilot5-cmb-report-manual (4).pdf, it states [2026−01−07 19:19:07.024642]. [cite: pilot5-cmb-report-manual (4).pdf]
No Differences Found:
The "At Risk" numbers for each treatment group and time point are identical across both files. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]
The visual representation of the Kaplan-Meier plot itself appears identical. [cite: pilot5-cmb-report-manual (3).pdf, pilot5-cmb-report-manual (4).pdf]