- vignettes and Portuguese documentation soon
- RAIS, CAGED, PNAD_Continua soon
this package contains functions to read most commonly used Brazilian microdata easily and quickly. Importing Brazilian microdata can be tedious. Most data is provided in fixed width files (fwf) with import instructions only for SAS and SPSS. Data usually comes subdivided by state (UF) or macro regions (regiões). And filenames can vary, for the same dataset overtime. microdadoBrasil handles all these idiosyncrasies for you. In the background the package is running readr for fwf data and data.table for .csv data. Therefore reading is reasonably fast.
Currently the package includes import functions for:
| Source | Dataset | Import_function | Period | Subdataset |
|---|---|---|---|---|
| IBGE | PNAD | read_PNAD | 2001 to 2014 | domicilios, pessoas |
| IBGE | Censo Demográfico | read_CENSO | 2000 | domicilios, pessoas |
| IBGE | POF | read_POF | 2008 | several, see details |
| INEP | Censo Escolar | read_CensoEscolar | 1995 to 2014 | escolas, ..., see detials |
| INEP | Censo da Educação Superior | read_CensoEducSuperior | 1995 to 2014 | see details |
To be added soon:
- Censo 2010 and 1991
- RAIS, de-identified version,
- download functions
- variable name harmonization
- Support for data not fitting into memory.
install.packages("devtools")
install.packages("stringi")
devtools::install_github("lucasmation/microdadosBrasil")
library('microdadosBrasil')# Censo Demográfico 2000
#after having downloaded the data to the root directory, and unziped to root run
d <- read_CENSO('domicilios',2005)
d <- read_CENSO('pessoas',2005)
# PNAD 2002
download_sourceData("PNAD", 2002, unzip = T)
d <- read_PNAD("domicilios", 2002)
d2 <- read_PNAD("pessoas", 2002)
# Censo Escolar
download_sourceData('CensoEscolar', 2005, unzip=T)
d <- read_CensoEscolar('escola',2005)
d <- read_CensoEscolar('escola',2005,harmonize_varnames=T)This package is highly influenced by similar efforts, which are great time savers, vastly used and often unrecognized:
- Anthony Damico's scripts to read most IBGE surveys. Great if you your data does not fit into memory and you want speed when working with complex survey design data.
- Data Zoom by Gustavo Gonzaga, Claudio Ferraz and Juliano Assunção. Similar ease of use and harmonization of Brazilian microdada for Stata.
- dicionariosIBGE, by Alexandre Rademaker. A set of data.frames containing the information from SAS import dictionaries for IBGE datasets.
- IPUMS. Harmonization of Census data from several countries, including Brasil. Import functions for R, Stata, SAS and SPSS.
microdadosBrasil differs from those packages in that it:
- updates import functions to more recent years
- includes non-IBGE data, such as INEP Education Census and MTE RAIS (de-identified)
- separates import code from dataset specific metadata, as explained bellow.
The main design principle was separating details of each dataset in each year - such as folder structure, data files and import dictionaries of the of original data - into metadata tables (saved as csv files at the extdata folder). The elements in these tables, along with list of import dictionaries extracted from the SAS import instructions from the data provider, serve as parameters to import a dataset for a specific year. This separation of dataset specific details from the actual code makes code short and easier to extend to new packages.
ergonomics over speed (develop)