-
-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Submitting Author Name: Sidney da Silva Pereira Bissoli
Submitting Author Github Handle: @SidneyBissoli
Other Package Authors Github handles:
Repository: https://github.com/SidneyBissoli/healthbR
Submission type: Pre-submission
Language: en
Paste the full DESCRIPTION file inside a code block below:
Package: healthbR
Title: Access Brazilian Public Health Data
Version: 0.2.0
Authors@R:
person("Sidney", "Bissoli", , "sbissoli76@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0009-0001-0442-3700"))
Description: Provides easy access to Brazilian public health data from multiple
sources including VIGITEL (Surveillance of Risk Factors for Chronic Diseases
by Telephone Survey), PNS (National Health Survey), 'PNAD' Continua (Continuous
National Household Sample Survey), 'POF' (Household Budget Survey with food
security and consumption data), 'Censo Demografico' (population denominators
via 'SIDRA' API), SIM (Mortality Information System), SINASC (Live Birth
Information System), 'SIH' (Hospital Information System),
'SIA' (Outpatient Information System), 'SINAN' (Notifiable Diseases Surveillance),
'CNES' (National Health Facility Registry),
'SI-PNI' (National Immunization Program - aggregated 1994-2019 via FTP,
individual-level 'microdata' 2020+ via 'OpenDataSUS' API),
'SISAB' (Primary Care Health Information System - coverage indicators via
REST API), ANS ('Agencia Nacional de Saude Suplementar' - supplementary
health beneficiaries, consumer complaints, and financial statements),
'ANVISA' ('Agencia Nacional de Vigilancia Sanitaria' - product registrations,
'pharmacovigilance', 'hemovigilance', 'technovigilance', and controlled
substance sales via 'SNGPC'),
and other health information systems. Data is downloaded
from the Brazilian Ministry of Health and 'IBGE' repositories. Data is returned
in tidy format following tidyverse conventions.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3
Depends:
R (>= 4.2.0)
Imports:
tibble,
dplyr,
curl,
cli,
rlang,
stringr,
purrr,
readr,
jsonlite,
foreign
Suggests:
testthat (>= 3.1.5),
knitr,
rmarkdown,
readxl,
haven,
furrr,
future,
arrow,
dbplyr,
duckdb,
piggyback,
survey,
srvyr
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://github.com/SidneyBissoli/healthbR, https://sidneybissoli.github.io/healthbR/
BugReports: https://github.com/SidneyBissoli/healthbR/issues
Scope
I believe my package falls under the following categories:
- data retrieval
- data extraction
healthbR provides unified programmatic access to over 15 Brazilian public health data sources directly from R, including:
DATASUS microdata: SIM (mortality), SINASC (live births), SIH (hospital admissions), SIA (outpatient procedures), SINAN (notifiable diseases), CNES (health facilities) — all using internal .dbc file decompression via vendored C code with no external dependencies.
Population surveys: VIGITEL (telephone surveillance), PNS (National Health Survey), PNAD Contínua (Continuous National Household Sample Survey), POF (Household Budget Survey), Demographic Census — with dual access via microdata and IBGE's SIDRA API.
Other sources: SI-PNI (vaccination), SISAB/e-SUS (primary care), ANS (private health insurance regulator), ANVISA (health products regulator).
The package addresses a well-documented problem: Brazilian public health data is fragmented across dozens of systems with different formats, unstable government portals, and proprietary file formats (.dbc). Researchers and epidemiologists spend significant time just accessing and preparing these data before any analysis can begin. healthbR automates this entire pipeline through a consistent API (*_data(), *_variables(), *_dictionary(), *_info()) and returns tidy tibbles ready for analysis.
Target audience and scientific applications
The primary target audience is epidemiologists and public health researchers who use R to analyze data from Brazil's Unified Health System (SUS). The package also serves data scientists working with Brazilian public data and graduate students in public health and epidemiology.
Scientific applications include: mortality and morbidity analysis, epidemiological surveillance, vaccination coverage assessment, health service utilization studies, population survey analysis, health economics research, and public policy planning.
Similar packages
Several R packages access subsets of these data individually:
microdatasus: accesses DATASUS microdata, but depends on the externalread.dbcpackage (currently archived on CRAN) and covers fewer sources.datasus: accesses TabNet/DATASUS data with limited scope.censobr: focused exclusively on Census data.PNADcIBGEandPOFIBGE: focused on specific IBGE surveys.read.dbc: .dbc file reading only — currently archived on CRAN.
healthbR differentiates itself by:
- Broad coverage: unifies 15+ data sources in a single package with a consistent API
- Self-contained: vendors the .dbc decompression C code internally (no dependency on the archived
read.dbcpackage) - Smart caching: automatic Parquet conversion via
arrowfor efficient filtered reads - Dual access: microdata + aggregated data via SIDRA API for survey modules
- Integrated dictionaries:
*_dictionary()and*_variables()functions for every module
Other information
- The package is already published on CRAN (current version at https://CRAN.R-project.org/package=healthbR)
- pkgdown site: https://sidneybissoli.github.io/healthbR/
- The package includes individual vignettes for each module
- I intend to actively maintain the package for at least 2 years
- I am available to respond to reviews in a timely manner