Skip to content

Commit 39589fc

Browse files
committed
Added first slide deck using quarto/revealjs.
1 parent 5f81f89 commit 39589fc

File tree

9 files changed

+224
-1
lines changed

9 files changed

+224
-1
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,7 @@ data/msft_pr_coding.csv
1212

1313
# Mac
1414
.DS_Store
15+
16+
# Quarto
17+
*_files
18+
*.html

readme.markdown

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ There is both a notebook and a slide deck for most segments.
6161
The slides are available in zip archives containing Keynote and Powerpoint versions in the releases section (on Github).
6262
Please note that the Keynote slides are (usually) the ones actually presented.
6363

64+
**Note:** For June 2024, I am testing a [new slide technology](https://quarto.org/docs/presentations/revealjs/) for the first slide deck only, which is included in the slides folder. A full set of slides is available as described above.
65+
6466

6567
## Preparing for the course
6668

@@ -117,7 +119,7 @@ Once your local container has been created, you can return to it using the follo
117119
1. On the Welcome tab, under the heading "Recent," click the "carma_python in a unique volume" link.
118120

119121

120-
## API configuration
122+
## API credential configuration
121123

122124
**Note:** I will refer to the configuration instructions below during the course, but you do not need to follow these when preparing for the course.
123125

slides/0a.qmd

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
---
2+
title: "Introduction to Python for Research"
3+
subtitle: "0a: Introduction"
4+
author: "Jason T. Kiley"
5+
format:
6+
revealjs:
7+
theme: dark
8+
css: _style.css
9+
slide-number: true
10+
11+
---
12+
13+
## {.center}
14+
15+
::: r-fit-text
16+
[github.com/jtkiley](https://github.com/jtkiley/)
17+
:::
18+
19+
## Related
20+
21+
- CARMA 2020 (overlaps with this course): Introduction to Python and Content Analysis of Text. ([Github](https://github.com/jtkiley/2020_carma_python))
22+
- Seminar materials (overlaps with this course): Text Analysis: Planning to Publication. ([Github](https://github.com/jtkiley/text_seminar))
23+
- Text analysis and machine learning workshop at WU (Oct. 2018) and RSM (Oct. 2019).
24+
- AOM Big Data workshop with Tim Hannigan, Hovig Tchalian, and Laura Nelson. ([Github](https://github.com/jtkiley/curation_workshop))
25+
26+
## Course Agenda
27+
28+
- Tools: Python, packages and environments
29+
- Basics: Python syntax and conventions, Jupyter Notebooks
30+
- Data handling and project planning
31+
- Data gathering and assembly
32+
33+
# Overview
34+
35+
## Overview
36+
37+
- What you really need to know about Python.
38+
- Resources for learning.
39+
- A brief R comparison.
40+
41+
# What do I really need to know about Python?
42+
43+
## Why Python?
44+
45+
- Approachability: well-designed modern programming language that handles a lot for you.
46+
- Features: many things have been built already, and you simply "glue" them together.
47+
- Learning resources: wide popularity in academia and practice means that there are extensive resources.
48+
- Scalability: from your computer, to the cloud, to a computing cluster, you can use largely the same tools.
49+
50+
## Python Fluency
51+
52+
- [Basics]{style="color:lightblue;"}
53+
- [Data Preparation]{style="color:lightgreen;"}
54+
- [Good-enough Programming]{style="color:lightyellow;"}
55+
- [Software Engineering]{style="color:pink;"}
56+
57+
## [Basics]{style="color:lightblue;"}
58+
59+
- Skills
60+
- Software: Python interpreter, Jupyter Notebooks, VS Code
61+
- Variable types: strings, ints, floats
62+
- Objects and methods: lists, dictionaries
63+
- Packages: importing and installing
64+
- Documentation: official and community
65+
- Time: 2-4 hours
66+
- Necessity: Largely unavoidable
67+
68+
69+
## [Data Preparation]{style="color:lightgreen;"}
70+
71+
- Skills
72+
- Software: pandas
73+
- Reading data formats (built-in)
74+
- Slicing, views, `df.loc[]`
75+
- Operations on columns and rows
76+
- Reshaping
77+
- Merging and querying
78+
- Time: 1-2 days and ongoing
79+
- Necessity: Needed and high ROI
80+
81+
82+
## [Good-enough Programming]{style="color:lightyellow;"}
83+
84+
- Skills
85+
- Loops
86+
- Writing functions
87+
- Reading and writing files (the hard way)
88+
- Throwing and handling exceptions
89+
- Using additional packages
90+
- End point: working, reusable script
91+
- Time: 1 week and ongoing; divisible
92+
- Necessity: Helpful and good ROI
93+
94+
95+
## [Software Engineering]{style="color:pink;"}
96+
97+
- Skills
98+
- Classes and inheritance\*
99+
- Package development\*
100+
- Version control\*
101+
- Unit testing and continuous integration
102+
- Cross-version support
103+
- Open source contributions
104+
- Time: A lot
105+
- Necessity: Not at all; good for the field
106+
107+
108+
# What resources are available for learning?
109+
110+
## Pandas documentation
111+
112+
::: columns
113+
::: {.column width="50%" #vcenter}
114+
Comparison with Stata
115+
:::
116+
117+
::: {.column width="50%" #vcenter}
118+
![](_img/0a_stata.png)
119+
:::
120+
:::
121+
122+
::: footer
123+
See more: [pandas documentation](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_stata.html)
124+
:::
125+
126+
## Stack Overflow
127+
128+
::: columns
129+
::: {.column width="50%" #vcenter}
130+
Search for what you are trying to do, merging on multiple columns with different names, in this case.
131+
:::
132+
133+
::: {.column width="50%" #vcenter}
134+
![](_img/0a_so.png)
135+
:::
136+
:::
137+
138+
::: footer
139+
See more: [Stack Overflow](https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns)
140+
:::
141+
142+
## Python for Data Analysis
143+
144+
::: columns
145+
::: {.column width="50%" #vcenter}
146+
Wes McKinney is the creator of pandas and other open source projects.
147+
:::
148+
149+
::: {.column width="50%" #vcenter}
150+
![](_img/0a_book.png)
151+
:::
152+
:::
153+
154+
::: footer
155+
For more: [Wes McKinney](https://wesmckinney.com/book/)
156+
:::
157+
158+
159+
## Other Resources
160+
161+
- edX. Provides many courses that use or teach Python that are relevant for data work (free).
162+
- Self-study tracks from my seminar. Includes resources for data handling, data retrieval, machine learning.
163+
- YouTube. Has many content creators, covering Python, data science, and software development.
164+
165+
# Python and R
166+
167+
## What About R?
168+
169+
- R is great overall, especially compared to a lot of commercial stats software.
170+
- Compared to Python, it is less general purpose, so some useful packages may not have analogues.
171+
- The syntax (from S in the 1970s) is sometimes quite arcane.
172+
- Best of both worlds:
173+
- Gather and prep data in Python.
174+
- If needed, use R for analyses.
175+
176+
## Stack Overflow - Most Popular
177+
178+
![](_img/0a_r.png)
179+
180+
::: footer
181+
See more: [Stack Overflow](https://survey.stackoverflow.co/2023/#programming-scripting-and-markup-languages)
182+
:::
183+
184+
## Stack Overflow - Most Desired
185+
186+
![](_img/0a_r2.png)
187+
188+
::: footer
189+
See more: [Stack Overflow](https://survey.stackoverflow.co/2023/#section-admired-and-desired-programming-scripting-and-markup-languages)
190+
:::
191+
192+
# Getting Started
193+
194+
## Getting Started
195+
196+
- Two approaches (choose one):
197+
- Github Codespaces (cloud)
198+
- VS Code, Docker, local container
199+
- You'll see me use both.
200+
201+
# Hands on
202+
203+
## Summary
204+
205+
- Using Python for data analysis is not exactly programming, and you already have much of the knowledge you need.
206+
- Capturing all of our work in code that runs is a best practice that promotes reproducibility, and that helps us most of all.
207+
- We will start using our container in the next segment, so make sure it is set up and ready (or ask for help).
208+
209+
# Break

slides/_img/0a_book.png

4.11 MB
Loading

slides/_img/0a_r.png

489 KB
Loading

slides/_img/0a_r2.png

552 KB
Loading

slides/_img/0a_so.png

739 KB
Loading

slides/_img/0a_stata.png

1.03 MB
Loading

slides/_style.css

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
/*-- scss:defaults --*/
2+
3+
4+
/*-- scss:rules --*/
5+
6+
#vcenter {
7+
vertical-align: middle;
8+
}

0 commit comments

Comments
 (0)