@@ -13,13 +13,13 @@ kernelspec:
13
13
name : python3
14
14
---
15
15
16
- # Foreword -- TBD
16
+ # Foreword
17
17
18
18
* Roger D. Peng*
19
19
20
20
* Johns Hopkins Bloomberg School of Public Health*
21
21
22
- * 2022-01-04 *
22
+ * 2023-11-30 *
23
23
24
24
The field of data science has expanded and grown significantly in recent years,
25
25
attracting excitement and interest from many different directions. The demand for introductory
@@ -44,9 +44,10 @@ is and what the implications are for the activities in which members of the fiel
44
44
45
45
The first important concept addressed by this book is tidy data, which is a format for
46
46
tabular data formally introduced to the statistical community in a 2014 paper by Hadley
47
- Wickham. The tidy data organization strategy has proven a powerful abstract concept for
48
- conducting data analysis, in large part because of the vast toolchain implemented in the
49
- Tidyverse collection of R packages. The second key concept is the development of workflows
47
+ Wickham. Although originally popularized within the R programming language community
48
+ via the Tidyverse package collection, the tidy data format is a language-independent concept
49
+ that facilitates the application of powerful generalized data cleaning and wrangling tools.
50
+ The second key concept is the development of workflows
50
51
for reproducible and auditable data analyses. Modern data analyses have only grown in
51
52
complexity due to the availability of data and the ease with which we can implement complex
52
53
data analysis procedures. Furthermore, these data analyses are often part of
@@ -61,7 +62,7 @@ collaboration is a core element of data science.
61
62
This book takes these core concepts and focuses on how one can apply them to * do* data
62
63
science in a rigorous manner. Students who learn from this book will be well-versed in
63
64
the techniques and principles behind producing reliable evidence from data. This book is
64
- centered around the use of the R programming language within the tidy data framework ,
65
+ centered around the implementation of the tidy data framework within the Python programming language ,
65
66
and as such employs the most recent advances in data analysis coding. The use of Jupyter
66
67
notebooks for exercises immediately places the student in an environment that encourages
67
68
auditability and reproducibility of analyses. The integration of git and GitHub into the
0 commit comments