Skip to content

claudiacerqn/microdadosBrasil

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

microdadosBrasil

work in progress

  • vignettes and Portuguese documentation soon
  • RAIS, CAGED, PNAD_Continua soon

Description

this package contains functions to read most commonly used Brazilian microdata easily and quickly. Importing Brazilian microdata can be tedious. Most data is provided in fixed width files (fwf) with import instructions only for SAS and SPSS. Data usually comes subdivided by state (UF) or macro regions (regiões). And filenames can vary, for the same dataset overtime. microdadoBrasil handles all these idiosyncrasies for you. In the background the package is running readr for fwf data and data.table for .csv data. Therefore reading is reasonably fast.

Currently the package includes import functions for:

Source Dataset Import_function Period Subdataset
IBGE PNAD read_PNAD 2001 to 2014 domicilios, pessoas
IBGE Censo Demográfico read_CENSO 2000 domicilios, pessoas
IBGE POF read_POF 2008 several, see details
INEP Censo Escolar read_CensoEscolar 1995 to 2014 escolas, ..., see detials
INEP Censo da Educação Superior read_CensoEducSuperior 1995 to 2014 see details

To be added soon:

  • Censo 2010 and 1991
  • RAIS, de-identified version,
  • download functions
  • variable name harmonization
  • Support for data not fitting into memory.

Installation

install.packages("devtools")
install.packages("stringi") 
devtools::install_github("lucasmation/microdadosBrasil")
library('microdadosBrasil')

Usage

# Censo Demográfico 2000
#after having downloaded the data to the root directory, and unziped to root run
d <- read_CENSO('domicilios',2005)
d <- read_CENSO('pessoas',2005)

# PNAD 2002
download_sourceData("PNAD", 2002, unzip = T)
d  <- read_PNAD("domicilios", 2002)
d2 <- read_PNAD("pessoas", 2002)

# Censo Escolar
download_sourceData('CensoEscolar', 2005, unzip=T)
d <- read_CensoEscolar('escola',2005)
d <- read_CensoEscolar('escola',2005,harmonize_varnames=T)

Related efforts

This package is highly influenced by similar efforts, which are great time savers, vastly used and often unrecognized:

  • Anthony Damico's scripts to read most IBGE surveys. Great if you your data does not fit into memory and you want speed when working with complex survey design data.
  • Data Zoom by Gustavo Gonzaga, Claudio Ferraz and Juliano Assunção. Similar ease of use and harmonization of Brazilian microdada for Stata.
  • dicionariosIBGE, by Alexandre Rademaker. A set of data.frames containing the information from SAS import dictionaries for IBGE datasets.
  • IPUMS. Harmonization of Census data from several countries, including Brasil. Import functions for R, Stata, SAS and SPSS.

microdadosBrasil differs from those packages in that it:

  • updates import functions to more recent years
  • includes non-IBGE data, such as INEP Education Census and MTE RAIS (de-identified)
  • separates import code from dataset specific metadata, as explained bellow.

Design principles

The main design principle was separating details of each dataset in each year - such as folder structure, data files and import dictionaries of the of original data - into metadata tables (saved as csv files at the extdata folder). The elements in these tables, along with list of import dictionaries extracted from the SAS import instructions from the data provider, serve as parameters to import a dataset for a specific year. This separation of dataset specific details from the actual code makes code short and easier to extend to new packages.

ergonomics over speed (develop)

About

Reads most common Brazilian public microdata (CENSO, PNAD, etc) easy and fast

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • R 100.0%