Skip to content

dp-next/fastreg

fastreg

GitHub Release Build pre-commit.ci status lifecycle Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Overview

fastreg converts large SAS register files (.sas7bdat) into Apache Parquet format. This is particularly useful for researchers working with Danish registers at Statistics Denmark, where large SAS files are common. Parquet files are smaller on disk, faster to read, and work well with modern tools like DuckDB and Arrow.

A register in this context refers to a collection of related data files, typically with yearly snapshots like kontakter_2020.sas7bdat, kontakter_2021.sas7bdat (from Landspatientregisteret, LPR3).

fastreg provides functions to:

  • Convert SAS files to Parquet.
  • Read Parquet registers.
  • Create a targets pipeline from a template for parallel batch conversion.
  • List SAS and Parquet files in directories.

Purpose

The primary purpose of the fastreg package is to simplify the process of converting the large Danish registers into the more modern Parquet storage format as well as to simplify reading these Parquet files. By converting data from SAS to the more modern and efficient Parquet format, the package reduces storage costs and aims to improve performance in data analysis workflows.

Installation

# install.packages("fastreg")

# Development version on GitHub
pak::pak("dp-next/fastreg")

Usage

library(fastreg)

# Convert SAS files to Parquet
convert_to_parquet(
  path = list_sas_files("path/to/sas_register/"),
  output_dir = "path/to/parquet_register/"
)

# Read Parquet register (as DuckDB table)
read_register("path/to/parquet_register/")

# Use targets template
use_targets_template()

# List files
list_sas_files("path/to/directory/with/sas_files")
list_parquet_files("path/to/directory/with/parquet_files")

See vignette("fastreg") for a complete guide.

Getting help

If you find a bug or have any questions, please add an Issue on GitHub. Please include a minimal reproducible example.

Code of conduct

This project is released with a Code of conduct. By contributing to this project you agree to follow its terms.

About

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 7