Skip to content

Commit a636bb4

Browse files
committed
Add Linux snippets documentation for data science tools
1 parent 1aa812e commit a636bb4

File tree

3 files changed

+144
-1
lines changed

3 files changed

+144
-1
lines changed
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Linux snippets
2+
3+
A collection of useful Linux (and macOS) command-line snippets for data science workflows. All commands are compatible with zsh and bash shells.
4+
5+
## List files by size (descending)
6+
```sh linenums="1"
7+
ls -lhS
8+
```
9+
10+
## Find files larger than 100MB
11+
```sh linenums="1"
12+
find . -type f -size +100M
13+
```
14+
15+
## Count lines in all CSV files in a directory
16+
```sh linenums="1"
17+
wc -l *.csv
18+
```
19+
20+
## Show top 10 memory-consuming processes
21+
```sh linenums="1"
22+
ps aux --sort=-%mem | head -n 11
23+
```
24+
25+
## Search for a pattern in all Python files
26+
```sh linenums="1"
27+
grep -rnw . -e 'pattern' --include=*.py
28+
```
29+
30+
## Replace text in multiple files (in-place)
31+
```sh linenums="1"
32+
sed -i '' 's/oldtext/newtext/g' *.txt
33+
```
34+
35+
## Download a file from the internet
36+
```sh linenums="1"
37+
curl -O https://example.com/file.csv
38+
```
39+
40+
## Extract a tar.gz archive
41+
```sh linenums="1"
42+
tar -xzvf archive.tar.gz
43+
```
44+
45+
## Monitor disk usage in current directory
46+
```sh linenums="1"
47+
du -sh *
48+
```
49+
50+
## Show GPU usage (NVIDIA)
51+
```sh linenums="1"
52+
nvidia-smi
53+
```
54+
55+
Sample output:
56+
57+
```sh linenums="1"
58+
Wed Jun 11 12:18:15 2025
59+
+---------------------------------------------------------------------------------------+
60+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
61+
|-----------------------------------------+----------------------+----------------------+
62+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
63+
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
64+
| | | MIG M. |
65+
|=========================================+======================+======================|
66+
| 0 NVIDIA H100 PCIe Off | 00000000:00:08.0 Off | 0 |
67+
| N/A 37C P0 117W / 350W | 27085MiB / 81559MiB | 51% Default |
68+
| | | Disabled |
69+
+-----------------------------------------+----------------------+----------------------+
70+
| 1 NVIDIA H100 PCIe Off | 00000000:00:09.0 Off | 0 |
71+
| N/A 38C P0 53W / 350W | 3MiB / 81559MiB | 0% Default |
72+
| | | Disabled |
73+
+-----------------------------------------+----------------------+----------------------+
74+
75+
+---------------------------------------------------------------------------------------+
76+
| Processes: |
77+
| GPU GI CI PID Type Process name GPU Memory |
78+
| ID ID Usage |
79+
|=======================================================================================|
80+
| 0 N/A N/A 213705 C /usr/bin/python3 27076MiB |
81+
+---------------------------------------------------------------------------------------+
82+
```
83+
84+
## Check Python package versions in environment
85+
```sh linenums="1"
86+
pip freeze
87+
```
88+
89+
## Create a virtual environment (Python 3)
90+
```sh linenums="1"
91+
python3 -m venv venv
92+
source venv/bin/activate
93+
```
94+
95+
## Kill a process by name
96+
```sh linenums="1"
97+
pkill -f process_name
98+
```
99+
100+
## Count unique values in a CSV column
101+
```sh linenums="1"
102+
cut -d, -f2 file.csv | sort | uniq -c | sort -nr
103+
```
104+
105+
## Preview a CSV file (first 5 rows)
106+
```sh linenums="1"
107+
head -n 5 file.csv
108+
```
109+
110+
## Check open ports
111+
```sh linenums="1"
112+
lsof -i -P -n | grep LISTEN
113+
```
114+
115+
## Download all images from a webpage
116+
```sh linenums="1"
117+
wget -nd -r -P ./images -A jpg,jpeg,png,gif http://example.com
118+
```
119+
120+
## Show the 10 largest files in a directory tree
121+
```sh linenums="1"
122+
find . -type f -exec du -h {} + | sort -rh | head -n 10
123+
```
124+
125+
## Remove all files except .csv in a directory
126+
```sh linenums="1"
127+
find . ! -name '*.csv' -type f -delete
128+
```
129+
130+
## Split a large CSV into smaller files (1000 lines each)
131+
```sh linenums="1"
132+
split -l 1000 bigfile.csv smallfile_
133+
```
134+
135+
## Find the 10 largest directories in the current directory
136+
```sh linenums="1"
137+
du -h --max-depth=1 | sort -hr | head -n 10
138+
```
139+
140+
---
141+
142+
Feel free to copy, modify, and combine these snippets for your data science projects!

docs/machine_learning/ML_snippets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Machine Learning / Deep Learning Snippets
22
========================
33

4-
Sharing some of the most widely used and arguably not *so famous* Machine Learning snippets :wink:
4+
Sharing some of the most widely used and arguably not *so famous* Machine Learning snippets 😉
55

66
## Feature importance
77

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ nav:
128128
- 'Data Science Tools':
129129
- 'data_science_tools/introduction.md'
130130
- 'data_science_tools/python_snippets.md'
131+
- 'data_science_tools/linux_snippets.md'
131132
- 'data_science_tools/version_control.md'
132133
- 'data_science_tools/compute_and_ai_services.md'
133134
- 'data_science_tools/scraping_websites.md'

0 commit comments

Comments
 (0)