-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathSlides.qmd
More file actions
512 lines (363 loc) · 16 KB
/
Slides.qmd
File metadata and controls
512 lines (363 loc) · 16 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
---
title: "Introduction to Git / GitHub in Economics I"
author: "Wooyong Park"
format:
revealjs:
# theme: league
transition: slide
math:
engine: katex
title-slide-attributes:
data-background-image: images/main.jpg
fontsize: 16pt
footer: "Applied Economic Methods Study Group"
embed-resources: true
---
<style>
pre code {
font-size: 2em;
}
code {
font-size: 1em;
}
</style>
# Why bother?
## The best way to learn coding is to{#sec-first_slide}
- <p style="font-size: 26px;">Choose the right first language. Python didn't work for me in the beginning.</p>
- <p style="font-size: 26px;">Try a lot of [EDA](/#/sec-eda)(explanatory data analysis).</p>
- <p style="font-size: 26px;">No need to hurry. You probably don't know what you don't know because you don't have to.</p>
These slides cannot cover everything you can do with Git. We will only cover *version controls*, *collaboration*, and *GitHub pages*. But I did bring a gift in the last slide:)
## What is Git?
Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.
- Version Control
- Backup
- Collaboration
<br>
<img src="images/version_control_eg2.png" width="150px" />
## What is Git?
### Examples in Economic Research
- Development of an algorithm software[(example)](https://github.com/bcallaway11/did)
- Large size labs with numbers of collaborators
- Posting codes for visibility[(example)](https://github.com/wyeconomics/econometrics2final)
- Creating websites[(example)](https://wyeconomics.github.io) [(source code)](https://github.com/wyeconomics/wyeconomics.github.io)
<br>
### What is GitHub?
Git creates a detailed history of your directory(folder). GitHub is where you post it, or where you download resources from.
You can share your work or copy others' work.
## Setting up
### for Windows
1. Go to [https://git-scm.com](https://git-scm.com).
2. Click \[Download 2.xx.x for Windows\].
3. Set all options to default.
4. Open `Git Bash` in Windows toolbar.
<br>
### for Mac
1. Go to [https://git-scm.com](https://git-scm.com).
2. Click \[Download 2.xx.x for Mac\].
3. Open the package and install Git. After installation, you can trash the installator package.
4. Open terminal with `Cmd` + `spacebar`.
If Git is properly installed, writing `git --version` in the terminal(shell) would display the following:
```{bash}
#| echo: true
#| style: "font-size: 0.6em"
git --version
```
## Setting Up
1. Set your `user.name` and `user.email` to the ones you have in your GitHub account.
```{bash}
#| echo: true
#| style: "font-size: 0.6em"
git config --global user.name "wyeconomics"
git config --global user.email "tommypark822@naver.com"
```
Single(`-`) and double dash(`--`) stand for options.
<div style="margin-top: 0.5em;"></div>
::: {.fragment}
2. Check your current working directory with `pwd`, set it to `Desktop` if it isn't, and create a `practice` directory.
```{bash}
#| echo: true
#| eval: false
#| style: "font-size: 0.6em"
pwd
cd Desktop
mkdir practice
```
Usually a project directory would have multiple subdirectories including `codes`, `doc`, `data`, `figures`, and `temp`.
**Tip)** `cd ..` leads to the outer folder containing current working directory.
:::
<div style="margin-top: 0.5em;"></div>
::: {.fragment}
3. Once the main structure of a directory is made, initialize git with `git init`, and check it with `Cmd+Shift+.`
```{bash}
#| echo: true
#| eval: false
cd practice
git init
```
:::: {.callout-tip title="Tipbox<br>Directory vs Repository"}
These two are near identical in normal use. Locally, it is the folder of interest in a computer. However, repositories are directories with Git activated. Thus, folders in GitHub is mostly called a repo than a directory as it always entails Git.
::::
:::
## Adding Files
### `REAEME` file
Most projects include `README` files to explain the content of the project and what's in the repo.
```{bash}
#| echo: true
#| eval: false
#|
vim README.txt
```
Write "This is a practice repo of Git", and close it with `Ctrl(Cmd) + X` and `enter`.
:::: {.callout-tip title="Tipbox<br>Terminal vs Shell"}
You might have come across these two words. Put simply, shell is a program that covers the core of the computer. In other words, it is a program, a translator between you and the computer, that touches the core functions of a computer. Shell could be incarnated in a Command Line Interface(CLI) or Graphical User Interface(GUI). But usually, when we say shell, it refers to CLIs. Terminal is the interface program where you can interact with the shell. You can consider it a imitated CLI on a GUI - or more accurately, a TUI. - An eptiome og GUI is the Windows File Explorer.
::::
## Adding Files
### `html` output with Jupyter Notebook
Let's create a `html` output with either Python(.ipynb) or R(.rmd). Save it as `output.html` in the `doc` subdirectory.
::: {.panel-tabset}
#### Python
```{python .code-overflow-scroll}
#| echo: true
#| output: true
#| style: "font-size: 0.6em"
# Practice with the `penguins` dataset
## Cell one : loading data
import seaborn as sns
import pandas as pd
df = sns.load_dataset('penguins')
df.head(3)
## Cell two : processing data
df = df.dropna(subset = ['sex', 'body_mass_g'], how = 'any', axis = 0)
### Cell three : plotting figure
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (9,5))
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
df.plot(kind='scatter', x='flipper_length_mm', y='body_mass_g', c='coral', ax=ax1)
sns.regplot(x='flipper_length_mm', y='body_mass_g', data=df, scatter_kws={"alpha": 0.5}, ax=ax2)
plt.show()
```
#### R
```{r}
#| echo: true
#| output: true
#| style: "font-size: 0.6em"
# Cell one: Load required libraries
library(tidyverse)
library(palmerpenguins)
library(patchwork)
# Cell two: Load dataset
df <- penguins
head(df, 3)
# Cell three: Process data
df <- df %>% filter(!is.na(sex), !is.na(body_mass_g))
# Cell four: Plotting figure
p1 <- ggplot(df, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(color = "coral") +
labs(title = "Scatter Plot")
p2 <- ggplot(df, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm") +
labs(title = "Regression Plot")
p1 + p2
```
:::
## Git Workflow

## Staging and Commiting
```{bash}
#| eval: false
#| echo: true
#| output: false
git add . #to add a particular file, git add filename
git commit -m "first commit"
```
Then, all the files are now recorded in part of your commitment history. Check `git log`.\
In `git log`, `hash` is the ID specific to a particular commit.
```{bash}
#| eval: false
#| echo: true
#| output: false
git log
```
Make any slight modification to the `README.txt` file, and let's check if git spots the difference.
```{bash}
#| eval: false
#| echo: true
#| output: false
git diff README.txt
```
<br>
**Tip)** Try to leave the staging area clean.
## Checking History
`git log` can include multiple options.
```{bash}
#| eval: false
#| echo: true
#| output: false
git log -3 # restricts the number of commits to three (most recent)
git log -3 README.txt # displays log of only README.txt
git log --since='Month Day Year' # displays only the commits made after the specified date
git log --since='M D Y' --until='M D Y' # displays only the commits made after and until the specified date
```
<br>
## Undoing Changes
- unstaging a file : `git reset HEAD` or `git reset HEAD file_name`
:::: {.callout-tip title="Why call HEAD?"}
You might wonder why we have to call HEAD when HEAD refers to the last commit. This is because unstaging a file actually means matching the staging area to the last commit. Technically, unstaging is not emptying the staging area, but matching the staging area to the last commit.
::::
- Undoing changes to an unstaged file : `git checkout .` or `git checkout -- file_name`
- Reverting Commits : `git revert HEAD` or `git revert hash`
:::: {.callout-tip title="What is Reverting?"}
Sometimes, we commit files that contains an error, and then spot the issue. Naturally, we need a command of restoring a repo to the state before the previous commit and make a new commit with the previous version. This is what `git revert` does. Also, the `git revert` command opens a text editor in the shell to add a commit message. However, it is also a command that often generates conflict. **I do not recommend `git revert` unless reverting is inevitable. If the command sounds less intuitive, ignore it. Totally fine.**
::::
## Pushing a Repo{#sec-push}
Let's push the repo and the html file we made to our GitHub account.
1. Click `New` in the top-left banner.
2. Fill the `Repository Name*` with the local repo name.
3. Set options to `Public` and don't add a `README` file.
4. Click `Create Repository`.
<br>
:::{.fragment}
5. Go to `< > Code` in the left-top banner.
6. Also click `< > Code` button in the right-top.
7. Copy the `HTTPS` URL, which would say [https://github.com/username/practice.git].
:::
<br>
:::{.fragment}
8. Back to the terminal, write `git remote add origin https://github.com/username/practice.git`.
9. Finally, write `git push -u origin main`. This means that we're pushing our local repo to the origin repo's main branch. **Git might require you to insert ID and password. For the password, please insert [PAT](/#/sec-pat).**
10. Check the GitHub repo to see your local repo uploaded.
:::
## Pushing Cautions
Pushing your local repo to a public repo means you share the items and history of your local repo. This requires a few cautions.
1. Without inevitable reasons, don't include data to the `git push`, especially raw data. This is not only because it takes a lot of space, but most raw data are either publicly available or classified. You can lead the readers to the data website if it's public in the `README.md`. If the data is classified, you have a responsibility not to disclose it. To avoid such issues, you can include the raw data files into the `.gitignore` file and only commit and push selected files of a repo.
2. Check if your codes include personal information. This could often happen if your code includes personal API keys. If you upload them, you'll get an email from GitHub that your information is exposed in most cases. However, avoiding is much easier than solving problems.
## Publishing via GitHub Pages
1. Go to the `Settings` page of the repo in GitHub.
2. In the left toc, you can find `Pages`. Click it.
3. In the `Build and Deployment` setting, set the source to `Deploy from a branch` and Branch to `main` and `/(root)`.
4. After you build it, go to `https://username.github.io/practice/doc/output.html`
## Cloning a repo
Let's practice cloning with with package repo of [Callaway and Sant'anna(2021)](https://www.sciencedirect.com/science/article/pii/S0304407620303948).
```{bash}
#| echo: true
#| eval: false
#| output: true
git clone https://github.com/bcallaway11/did.git
```
Also add the cloned repo a remote name, so that pull/pushing to the original repo(in GitHub) can be modified.
(It is impossible to push to the original repo without the author's permission, so just take it as an example.)
```{bash}
#| echo: true
#| eval: false
git remote add origin https://github.com/bcallaway11/did.git # adds the did package repo and alias it as origin
git remote -v
```
## Pulling
Having a cloned version of the `did` package, you can look into the repo in your local computer. However, the author might make changes to package. You can pull the changes via `git pull`.
### Fetch + Merge
- `git fetch remote_name`: fetches a Git remote with the specified name into the current local repo
- `git fetch remote_name branch_name`: fetches a Git remote’s branch with the specified remote and branch name
- `git merge remote_name branch_name`: merges the local repo to the remote’s branch with the specified remote and branch name
### Pulling
- `git pull remote_name branch_name`: fetches and merges the remote to the local
```{bash}
#| echo: true
#| eval: false
#| output: false
cd did
git pull origin master
```
::: {.fragment}
:::: {.callout-tip title="Why not clone again?"}
Cloning is copying the entire repo. Pulling is fetching and merging a 'branch' out of a remote repo. May be now is the right time to discuss branches.
::::
:::
## Brief Intro to Branches
:::: {.columns}
:::{.column width="60%"}
<img src="images/git-workflow.png" width="400px" />
[Image Source](https://www.edrawmax.com/article/gitflow-diagram.html)
### Related Commands
```{bash}
#| eval: false
#| echo: true
#| output: false
#| style: "font-size: 0.6em"
git branch #displays which branches exist in the repo(* denotes the current branch)
git branch branch_name #creates a branch
git branch -m old_name new_name #renames a branch
git switch branch_name #moves to the branch
git diff branch1 branch2 #displays diffs between branches
git merge source_b destination_b #merges one branch to the other
```
:::
:::{.column width="40%"}
Branches enable you to isolate changes for specific features, bug fixes, or experiments without affecting the main codebase(usually the `main` or `master` branch). By creating branches, teams can work on multiple tasks concurrently, merging them back into the main branch once they are complete and tested.
One can think of the main branch as the **ground truth**. Each branch exists for a specific task, and once the task is complete and the process is confirmed to be ground true, it is then merged to the main branch.
:::
::::
## General Tips
### General
- Make codes and repo reproducible. Learn how to cleanly code.
- Remember that you tomorrow is a new reader of your codes.
- Never forget to commit. Keep a consistent log of your work.
- In a collaborative environment, notify your collaborators on what you commit to the `main` branch.
### Cleanliness
- Remember to delete idle branches.
- Don't `git add .` indiscrimminately and try to add files selectively.
- Don't add unnecessary large files. Keep them in the `.gitignore`.
## Useful Resources
### Websites
- <p style="font-size: 26px;">[Cheatsheet](/git_cheatsheet.html)</p>
- <p style="font-size: 26px;">[Atlassian - Git Tutorial](https://www.atlassian.com/git)</p>
- <p style="font-size: 26px;">[w3schools - Git Tutorial](https://www.w3schools.com/git/)</p>
- <p style="font-size: 26px;">[DataCamp Course](https://app.datacamp.com/learn/skill-tracks/github-foundations)</p>
### Books
- <p style="font-size: 26px;">**Pro Git** by Ben Straub and Scott Chacon</p>
## Example EDA{#sec-eda}
```{r}
#| echo: false
library(tidyverse)
ggplot(iris, aes(Petal.Length, Sepal.Length)) +
geom_point(aes(color = Species)) +
geom_smooth(method = 'lm') +
labs(title = 'Correlation Analysis of Iris Size')
```
[Return](/#/sec-first_slide)
## Personal Access Tokens{#sec-pat}
To check your Personal Access Token (PAT) in GitHub, follow these steps:
If you cannot find or remember your PAT:
1. You cannot view its value again.
2. You will need to generate a new PAT:
3. Follow the steps in “View Existing Personal Access Tokens” to navigate to the token list.
4. Revoke the old token if it’s no longer needed or compromised.
5. Click Generate new token to create a new one.
## Personal Access Tokens
Verify PAT is Working.
To check if a PAT is valid:
1. Use the token in a GitHub API call:
```{bash}
#| echo: true
#| eval: false
#| style: "font-size: 0.6em"
curl -H "Authorization: token YOUR_PERSONAL_ACCESS_TOKEN" https://api.github.com/user
```
If valid, this returns details about your GitHub user.
2. Test with Git:
- Clone a repository:
```{bash}
#| echo: true
#| eval: false
#| style: "font-size: 0.6em"
git clone https://<username>@github.com/<username>/<repo>.git
```
- Use the token as the password when prompted.
<br>
Tips:
- Store PAT Securely: Use credential managers, environment variables, or secret management tools to keep it safe.
- Limit Scopes: Create tokens with the minimal necessary permissions to reduce risks.
- Use Fine-Grained Tokens: These provide more granular permissions compared to classic tokens.
<div style="margin-top: 0.5em;"></div>
[Return](/#/sec-push)