Toy Survey Data

An R package designed to simplify the process of creating fake survey data.

Why create this package?

This R package began as a project to create a simple fake dataset based on annual Community Health Assessment surveys. I was preparing to give a presentation on cleaning and validating survey data, but I did not have access to real survey data, for which privacy and confidentiality are major concerns. I reviewed some existing survey creation packages, but they either did not allow the level of customization I needed or used models that read in real data, which I did not have, as a starting point.

What does this package do?

This package was built initially to generate responses for individual categorical questions based on a priori knowledge of response proportions. Some numeric variable and date handling were added later, mostly based on normal distributions.

While functions can be used individually, the package is designed to allow you to build a settings table, then run most functions on that table to generate the full dataset.

Getting started

This package is still in active development, but you can install it directly from GitHub using the code devtools::install_github("ajstamm/toysurveydata").

To create your dataset, you will need to read in a table of pre-defined variable settings. Learn how to set up your table in the Settings Table Design vignette. The package is designed so that if your table is set up correctly, you can run nearly all functions on that one table.

Limitations of this package

This package is designed to be very simple and easy to run. It does not consider relationships between variables. It includes optional missingness and a function to introduce random error of different kinds to your data.

Future plans

Add an option in the select-many function to require an exact number of selections
Add a function to handle ranked choice questions
Add error-creation functions for text values such as random upper/lower-case, misspellings
Add non-random missingness
Rethink or improve instructions for percent missing and number of options in the settings table
Maybe integrate with or suggest packages that handle things like random addresses
Maybe make the IP function at least nominally geographically sensitive

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
R		R
archive		archive
man		man
vignettes		vignettes
.RBuildIgnore		.RBuildIgnore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toy Survey Data

Why create this package?

What does this package do?

Getting started

Limitations of this package

Future plans

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Toy Survey Data

Why create this package?

What does this package do?

Getting started

Limitations of this package

Future plans

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages