hvtiRutilities provides utility functions for working with clinical research data at the Cleveland Clinic Heart, Vascular and Thoracic Institute (HVTI) Clinical Outcomes Registries and Research (CORR) department. The package simplifies common data preparation tasks when working with SAS datasets in R.
-
r_data_types(): Automatically infer and convert data types in a dataset- Converts character columns to factors
- Detects binary numeric variables (0/1) and converts to logical
- Converts numeric variables with few unique values to factors
- Handles various NA representations ("NA", "na", etc.)
- Preserves variable labels from SAS/labelled data
-
label_map(): Extract variable labels from labeled datasets- Creates a lookup table mapping variable names to their labels
- Useful for working with SAS datasets that have variable labels
- Returns a data frame with
key(variable name) andlabelcolumns
-
sample_data(): Generate sample datasets for testing- Creates datasets with various column types for testing package functions
- Useful for examples and unit tests
You can install the development version of hvtiRutilities from GitHub with:
# install.packages("pak")
pak::pak("ehrlinger/hvtiRutilities")library(hvtiRutilities)
# Create sample data
dta <- sample_data(n = 100)
# Examine original types
str(dta)
# boolean: int (values: 1, 2)
# logical: chr (values: "F", "T")
# char: chr (values: "male", "female")
# Apply automatic type conversion
dta_converted <- r_data_types(dta)
# Examine converted types
str(dta_converted)
# boolean: logi (binary 1/2 → TRUE/FALSE)
# logical: Factor (character → factor)
# char: Factor (character → factor)# Skip conversion for specific variables
dta_partial <- r_data_types(dta, skip_vars = c("boolean", "char"))
# boolean and char remain unchanged, others are converted# Convert only variables with fewer than 5 unique values to factors
dta_strict <- r_data_types(dta, factor_size = 5)
# Keep binary variables as factors instead of logical
dta_factors <- r_data_types(dta, binary_factor = TRUE)# Create labeled data (common with SAS imports)
library(labelled)
dta <- data.frame(
age = c(25, 30, 35),
sex = c("M", "F", "M"),
bp = c(120, 130, 125)
)
var_label(dta$age) <- "Patient Age (years)"
var_label(dta$sex) <- "Patient Sex"
var_label(dta$bp) <- "Systolic Blood Pressure (mmHg)"
# Extract labels as a lookup table
labels <- label_map(dta)
print(labels)
# key label
# 1 age Patient Age (years)
# 2 sex Patient Sex
# 3 bp Systolic Blood Pressure (mmHg)
# Use for matching/joining
summary_table <- data.frame(variable = c("age", "bp"))
summary_table$label <- labels$label[match(summary_table$variable, labels$key)]- Preserves variable labels: All functions maintain SAS/labelled variable attributes
- Handles NA variants: Automatically converts "NA", "na", "Na", "nA" strings to actual NA values
- Type-safe: Returns the same data structure class as input (data.frame, tibble, data.table, etc.)
- Flexible control: Multiple parameters to customize type conversion behavior
- For bug reports and feature requests: GitHub Issues
- For package news and changes: Run
hvtiRutilities.news()in R
GPL (>= 3)