Merge pull request #412 from UBC-DSCI/feat-foreward

trevorcampbell · web-flow · commit 3846665fea76 · 2022-01-07T20:04:38.000-08:00
foreword added
diff --git a/foreword-text.Rmd b/foreword-text.Rmd
@@ -0,0 +1,63 @@
+# Foreword {-}
+
+*Roger D. Peng*
+
+*Johns Hopkins Bloomberg School of Public Health*
+
+*2022-01-04*
+
+The field of data science has expanded and grown significantly in recent years, 
+attracting excitement and interest from many different directions. The demand for introductory
+educational materials has grown concurrently with the growth of the field itself, leading to
+a proliferation of textbooks, courses, blog posts, and tutorials. This book is an important
+contribution to this fast-growing literature, but given the wide availability of materials, a
+reader should be inclined to ask, "What is the unique contribution of *this* book?" In order
+to answer that question it is useful to step back for a moment and consider the development
+of the field of data science over the past few years.
+
+When thinking about data science, it is important to consider two questions: "What is
+data science?" and "How should one do data science?" The former question is under active
+discussion amongst a broad community of researchers and practitioners and there does
+not appear to be much consensus to date. However, there seems a general understanding
+that data science focuses on the more "active" elements&mdash;data wrangling, cleaning, and
+analysis&mdash;of answering questions with data. These elements are often highly
+problem-specific and may seem difficult to generalize across applications. Nevertheless, over time we
+have seen some core elements emerge that appear to repeat themselves as useful concepts
+across different problems. Given the lack of clear agreement over the definition of data
+science, there is a strong need for a book like this one to propose a vision for what the field
+is and what the implications are for the activities in which members of the field engage.
+
+The first important concept addressed by this book is tidy data, which is a format for
+tabular data formally introduced to the statistical community in a 2014 paper by Hadley
+Wickham. The tidy data organization strategy has proven a powerful abstract concept for
+conducting data analysis, in large part because of the vast toolchain implemented in the
+Tidyverse collection of R packages. The second key concept is the development of workflows
+for reproducible and auditable data analyses. Modern data analyses have only grown in
+complexity due to the availability of data and the ease with which we can implement complex
+data analysis procedures. Furthermore, these data analyses are often part of 
+decision-making processes that may have significant impacts on people and communities. Therefore,
+there is a critical need to build reproducible analyses that can be studied and repeated by
+others in a reliable manner. Statistical methods clearly represent an important element
+of data science for building prediction and classification models and for making inferences
+about unobserved populations. Finally, because a field can succeed only if it fosters an
+active and collaborative community, it has become clear that being fluent in the tools of
+collaboration is a core element of data science.
+
+This book takes these core concepts and focuses on how one can apply them to *do* data
+science in a rigorous manner. Students who learn from this book will be well-versed in
+the techniques and principles behind producing reliable evidence from data. This book is
+centered around the use of the R programming language within the tidy data framework,
+and as such employs the most recent advances in data analysis coding. The use of Jupyter
+notebooks for exercises immediately places the student in an environment that encourages
+auditability and reproducibility of analyses. The integration of git and GitHub into the
+course is a key tool for teaching about collaboration and community, key concepts that are
+critical to data science.
+
+The demand for training in data science continues to increase. The availability of large
+quantities of data to answer a variety of questions, the computational power available to
+many more people than ever before, and the public awareness of the importance of data for
+decision-making have all contributed to the need for high-quality data science work. This
+book provides a sophisticated first introduction to the field of data science and provides
+a balanced mix of practical skills along with generalizable principles. As we continue to
+introduce students to data science and train them to confront an expanding array of data
+science problems, they will be well-served by the ideas presented here.
diff --git a/index.Rmd b/index.Rmd
@@ -34,6 +34,9 @@ output:
 
 ---
 
+```{r preface, child="foreword-text.Rmd"}
+```
+
 ```{r preface, child="preface-text.Rmd"}
 ```
 
diff --git a/pdf/index.Rmd b/pdf/index.Rmd
@@ -28,6 +28,9 @@ knitr::opts_chunk$set(fig.pos = "H",
 
 ```
 
+```{r preface, child="foreword-text.Rmd"}
+```
+
 ```{r preface, child="preface-text.Rmd"}
 ```