Skip to content

healthbR: Unified Access to Brazilian Public Health Data #751

@SidneyBissoli

Description

@SidneyBissoli

Submitting Author Name: Sidney da Silva Pereira Bissoli
Submitting Author Github Handle: @SidneyBissoli
Other Package Authors Github handles:
Repository: https://github.com/SidneyBissoli/healthbR
Submission type: Pre-submission
Language: en


Paste the full DESCRIPTION file inside a code block below:

Package: healthbR
Title: Access Brazilian Public Health Data
Version: 0.2.0
Authors@R:
    person("Sidney", "Bissoli", , "sbissoli76@gmail.com", role = c("aut", "cre"),
           comment = c(ORCID = "0009-0001-0442-3700"))
Description: Provides easy access to Brazilian public health data from multiple
    sources including VIGITEL (Surveillance of Risk Factors for Chronic Diseases
    by Telephone Survey), PNS (National Health Survey), 'PNAD' Continua (Continuous
    National Household Sample Survey), 'POF' (Household Budget Survey with food
    security and consumption data), 'Censo Demografico' (population denominators
    via 'SIDRA' API), SIM (Mortality Information System), SINASC (Live Birth
    Information System), 'SIH' (Hospital Information System),
    'SIA' (Outpatient Information System), 'SINAN' (Notifiable Diseases Surveillance),
    'CNES' (National Health Facility Registry),
    'SI-PNI' (National Immunization Program - aggregated 1994-2019 via FTP,
    individual-level 'microdata' 2020+ via 'OpenDataSUS' API),
    'SISAB' (Primary Care Health Information System - coverage indicators via
    REST API), ANS ('Agencia Nacional de Saude Suplementar' - supplementary
    health beneficiaries, consumer complaints, and financial statements),
    'ANVISA' ('Agencia Nacional de Vigilancia Sanitaria' - product registrations,
    'pharmacovigilance', 'hemovigilance', 'technovigilance', and controlled
    substance sales via 'SNGPC'),
    and other health information systems. Data is downloaded
    from the Brazilian Ministry of Health and 'IBGE' repositories. Data is returned
    in tidy format following tidyverse conventions.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3
Depends:
    R (>= 4.2.0)
Imports:
    tibble,
    dplyr,
    curl,
    cli,
    rlang,
    stringr,
    purrr,
    readr,
    jsonlite,
    foreign
Suggests:
    testthat (>= 3.1.5),
    knitr,
    rmarkdown,
    readxl,
    haven,
    furrr,
    future,
    arrow,
    dbplyr,
    duckdb,
    piggyback,
    survey,
    srvyr
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://github.com/SidneyBissoli/healthbR, https://sidneybissoli.github.io/healthbR/
BugReports: https://github.com/SidneyBissoli/healthbR/issues

Scope

I believe my package falls under the following categories:

  • data retrieval
  • data extraction

healthbR provides unified programmatic access to over 15 Brazilian public health data sources directly from R, including:

DATASUS microdata: SIM (mortality), SINASC (live births), SIH (hospital admissions), SIA (outpatient procedures), SINAN (notifiable diseases), CNES (health facilities) — all using internal .dbc file decompression via vendored C code with no external dependencies.

Population surveys: VIGITEL (telephone surveillance), PNS (National Health Survey), PNAD Contínua (Continuous National Household Sample Survey), POF (Household Budget Survey), Demographic Census — with dual access via microdata and IBGE's SIDRA API.

Other sources: SI-PNI (vaccination), SISAB/e-SUS (primary care), ANS (private health insurance regulator), ANVISA (health products regulator).

The package addresses a well-documented problem: Brazilian public health data is fragmented across dozens of systems with different formats, unstable government portals, and proprietary file formats (.dbc). Researchers and epidemiologists spend significant time just accessing and preparing these data before any analysis can begin. healthbR automates this entire pipeline through a consistent API (*_data(), *_variables(), *_dictionary(), *_info()) and returns tidy tibbles ready for analysis.

Target audience and scientific applications

The primary target audience is epidemiologists and public health researchers who use R to analyze data from Brazil's Unified Health System (SUS). The package also serves data scientists working with Brazilian public data and graduate students in public health and epidemiology.

Scientific applications include: mortality and morbidity analysis, epidemiological surveillance, vaccination coverage assessment, health service utilization studies, population survey analysis, health economics research, and public policy planning.

Similar packages

Several R packages access subsets of these data individually:

  • microdatasus: accesses DATASUS microdata, but depends on the external read.dbc package (currently archived on CRAN) and covers fewer sources.
  • datasus: accesses TabNet/DATASUS data with limited scope.
  • censobr: focused exclusively on Census data.
  • PNADcIBGE and POFIBGE: focused on specific IBGE surveys.
  • read.dbc: .dbc file reading only — currently archived on CRAN.

healthbR differentiates itself by:

  1. Broad coverage: unifies 15+ data sources in a single package with a consistent API
  2. Self-contained: vendors the .dbc decompression C code internally (no dependency on the archived read.dbc package)
  3. Smart caching: automatic Parquet conversion via arrow for efficient filtered reads
  4. Dual access: microdata + aggregated data via SIDRA API for survey modules
  5. Integrated dictionaries: *_dictionary() and *_variables() functions for every module

Other information

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions