Skip to content

Commit 145e0d2

Browse files
Merge pull request #64 from CarterPerez-dev/phase/3.4
add and create all learn/ folders for beginner projects
2 parents 1c5027d + 218305e commit 145e0d2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+14852
-867
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ mypy your_module/
217217

218218
**Code Formatting:**
219219

220-
Format your code using the repository's custom YAPF configuration. Copy the [.style.yapf](https://github.com/CarterPerez-dev/Cybersecurity-Projects/blob/main/.style.yapf) file and place it in the root of your project directory.
220+
Format your code using the repository's custom YAPF configuration. Copy the [.style.yapf](https://github.com/CarterPerez-dev/Cybersecurity-Projects/blob/main/TEMPLATES/.style.yapf) file and place it in the root of your project directory.
221221
```bash
222222
yapf -i -r -vv your_project/
223223
```
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Caesar Cipher CLI Tool
2+
3+
## What This Is
4+
5+
A command line tool that implements the Caesar cipher, one of the oldest known encryption techniques. It shifts each letter in your text by a fixed number of positions in the alphabet. The tool can encrypt messages, decrypt them if you know the key, or crack encrypted text by trying all possible shifts and ranking them using frequency analysis.
6+
7+
## Why This Matters
8+
9+
The Caesar cipher is weak by modern standards, but understanding how to break it teaches fundamental cryptanalysis skills that apply to stronger systems. Every security professional should know why simple substitution ciphers fail.
10+
11+
**Real world scenarios where this applies:**
12+
- ROT13 is still used for spoiler text on forums and in email (it's Caesar with key=13)
13+
- Understanding frequency analysis helps you break other substitution ciphers found in CTF challenges
14+
- The concept of brute forcing a small key space applies to weak passwords, short PINs, and poorly designed crypto
15+
16+
## What You'll Learn
17+
18+
This project teaches you how classical cryptography breaks down under statistical analysis. By building it yourself, you'll understand:
19+
20+
**Security Concepts:**
21+
- Substitution ciphers: how they work and why character frequency gives them away
22+
- Brute force attacks: when the key space is small enough (26 possibilities here), trying everything is trivial
23+
- Frequency analysis: real English text has predictable letter patterns that survive encryption
24+
25+
**Technical Skills:**
26+
- Chi-squared statistical testing to score how "English-like" text appears
27+
- Building CLI tools with proper argument parsing (using Typer)
28+
- Implementing both encryption and cryptanalysis in the same codebase
29+
30+
**Tools and Techniques:**
31+
- Typer for command line interfaces with automatic help text
32+
- Rich library for colored terminal output and formatted tables
33+
- Python's Counter for frequency counting and statistical analysis
34+
35+
## Prerequisites
36+
37+
Before starting, you should understand:
38+
39+
**Required knowledge:**
40+
- Python basics: functions, classes, list comprehensions
41+
- String manipulation: iterating characters, checking if they're letters
42+
- Modular arithmetic: why `(25 + 3) % 26 = 2` wraps around the alphabet
43+
44+
**Tools you'll need:**
45+
- Python 3.12 or higher
46+
- pip for installing dependencies
47+
- A terminal where you can run commands
48+
49+
**Helpful but not required:**
50+
- Basic statistics (what chi-squared means)
51+
- Some exposure to encryption concepts
52+
53+
## Quick Start
54+
55+
Get the project running locally:
56+
```bash
57+
# Clone and navigate
58+
cd PROJECTS/beginner/caesar-cipher
59+
60+
# Install with dependencies
61+
pip install -e .
62+
63+
# Encrypt some text
64+
caesar-cipher encrypt "HELLO WORLD" --key 3
65+
66+
# Decrypt it back
67+
caesar-cipher decrypt "KHOOR ZRUOG" --key 3
68+
69+
# Crack it without knowing the key
70+
caesar-cipher crack "KHOOR ZRUOG"
71+
```
72+
73+
Expected output: You should see `KHOOR ZRUOG` when encrypting, `HELLO WORLD` when decrypting, and a ranked table of all 26 possible decryptions when cracking (with the correct one at the top).
74+
75+
## Project Structure
76+
```
77+
caesar-cipher/
78+
├── src/caesar_cipher/
79+
│ ├── cipher.py # Core encryption/decryption logic
80+
│ ├── analyzer.py # Frequency analysis for cracking
81+
│ ├── constants.py # English letter frequencies, alphabet
82+
│ ├── main.py # CLI commands (encrypt/decrypt/crack)
83+
│ └── utils.py # File I/O and input validation
84+
├── tests/ # Pytest test suite
85+
└── pyproject.toml # Project dependencies and config
86+
```
87+
88+
## Next Steps
89+
90+
1. **Understand the concepts** - Read [01-CONCEPTS.md](./01-CONCEPTS.md) to learn why Caesar ciphers are broken
91+
2. **Study the architecture** - Read [02-ARCHITECTURE.md](./02-ARCHITECTURE.md) to see how the pieces fit together
92+
3. **Walk through the code** - Read [03-IMPLEMENTATION.md](./03-IMPLEMENTATION.md) for line by line explanation
93+
4. **Extend the project** - Read [04-CHALLENGES.md](./04-CHALLENGES.md) to build variants like Vigenère cipher
94+
95+
## Common Issues
96+
97+
**"Key must be between -25 and 26" error**
98+
```
99+
ValueError: Key must be between -25 and 26
100+
```
101+
Solution: Caesar cipher only makes sense with shifts from -25 to 26. Use a key in that range. The code validates this in `cipher.py:20-22`.
102+
103+
**No output when piping from stdin**
104+
Solution: Make sure you're actually sending text. Try `echo "TEST" | caesar-cipher encrypt --key 5` instead of just `caesar-cipher encrypt --key 5`.
Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Core Security Concepts
2+
3+
This document explains the security concepts you'll encounter while building this project. These are not just definitions, we'll dig into why they matter and how they actually work.
4+
5+
## Substitution Ciphers
6+
7+
### What It Is
8+
9+
A substitution cipher replaces each letter in your message with a different letter according to a fixed rule. Caesar cipher is the simplest version: shift every letter by the same amount. If your key is 3, A becomes D, B becomes E, and so on until Z wraps back to C.
10+
11+
### Why It Matters
12+
13+
Substitution ciphers were state of the art cryptography for centuries. Julius Caesar actually used this method to protect military messages around 58 BC. Understanding why they fail teaches you that security through obscurity doesn't work when the underlying pattern is too simple.
14+
15+
### How It Works
16+
17+
The transformation is just modular arithmetic on alphabet positions:
18+
```
19+
Encryption: ciphertext_position = (plaintext_position + key) % 26
20+
Decryption: plaintext_position = (ciphertext_position - key) % 26
21+
```
22+
23+
For example, encrypting "HELLO" with key=3:
24+
```
25+
H (position 7) → (7+3) % 26 = 10 → K
26+
E (position 4) → (4+3) % 26 = 7 → H
27+
L (position 11) → (11+3) % 26 = 14 → O
28+
L (position 11) → (11+3) % 26 = 14 → O
29+
O (position 14) → (14+3) % 26 = 17 → R
30+
31+
Result: KHOOR
32+
```
33+
34+
### Common Attacks
35+
36+
1. **Brute Force** - Try all 26 possible keys. With modern computers this takes milliseconds. The key space is too small.
37+
2. **Frequency Analysis** - English text has known letter frequencies. E appears 12.7% of the time, T about 9%, Z only 0.07%. These patterns survive Caesar encryption.
38+
3. **Known Plaintext** - If you know even part of the message, you can calculate the key immediately. One plaintext/ciphertext pair reveals everything.
39+
40+
### Defense Strategies
41+
42+
You can't defend Caesar cipher. It's fundamentally broken. But the lessons apply to stronger ciphers:
43+
- Larger key spaces make brute force impractical (this is why AES uses 128+ bit keys)
44+
- Randomization breaks frequency patterns (modern ciphers use different transformations for each block)
45+
- Authenticated encryption prevents known plaintext attacks from being useful
46+
47+
## Frequency Analysis
48+
49+
### What It Is
50+
51+
A statistical attack that exploits the non-uniform distribution of letters in natural language. In English, E is the most common letter. In Caesar-encrypted English, some other letter will be most common, but it's still E underneath. By comparing the frequency distribution of the ciphertext to known English frequencies, you can score how likely a given shift is correct.
52+
53+
### Why It Matters
54+
55+
Al-Kindi described this technique in the 9th century, over a thousand years ago. It broke all simple substitution ciphers and stayed relevant until polyalphabetic ciphers like Vigenère were developed. Modern cryptanalysis still uses statistical attacks, just against more complex patterns.
56+
57+
### How It Works
58+
59+
The chi-squared test measures how far an observed distribution differs from an expected one:
60+
```
61+
χ² = Σ ((observed - expected)² / expected)
62+
```
63+
64+
Lower scores mean better matches. In `analyzer.py:27-42`, the code calculates this:
65+
```python
66+
def calculate_chi_squared(self, text: str) -> float:
67+
text_upper = text.upper()
68+
letter_counts = Counter(char for char in text_upper if char.isalpha())
69+
70+
total_letters = sum(letter_counts.values())
71+
chi_squared = 0.0
72+
73+
for letter, expected_freq in self.reference_frequencies.items():
74+
observed_count = letter_counts.get(letter, 0)
75+
expected_count = (expected_freq / 100) * total_letters
76+
77+
if expected_count > 0:
78+
chi_squared += ((observed_count - expected_count)**2) / expected_count
79+
80+
return chi_squared
81+
```
82+
83+
### Common Pitfalls
84+
85+
**Mistake 1: Not handling case properly**
86+
```python
87+
# Bad - misses lowercase letters
88+
def count_letters(text):
89+
return Counter(c for c in text if c.isupper())
90+
91+
# Good - normalize to uppercase first
92+
def count_letters(text):
93+
return Counter(c for c in text.upper() if c.isalpha())
94+
```
95+
96+
Frequency analysis needs all letters. The code in `analyzer.py:29` converts to uppercase before counting.
97+
98+
**Mistake 2: Including non-letters in frequency counts**
99+
100+
Spaces, punctuation, and numbers will skew your statistics. Only count actual letters. The code uses `if char.isalpha()` to filter properly.
101+
102+
**Mistake 3: Short text gives unreliable results**
103+
104+
You need at least 50-100 letters for frequency analysis to work. With "HI" encrypted, there's not enough data. The chi-squared test returns `float("inf")` for empty strings in `analyzer.py:33`.
105+
106+
## Brute Force Attacks
107+
108+
### What It Is
109+
110+
Simply trying every possible key until you find one that works. For Caesar cipher, that's only 26 attempts. Your computer can do millions of attempts per second, so this is instant.
111+
112+
### Why It Matters
113+
114+
Brute force sets the absolute maximum security of any cipher. Even with perfect implementation, if the key space is too small, the cipher is broken. This is why password complexity matters: each additional character multiplies the search space exponentially.
115+
116+
### How It Works
117+
118+
The `crack()` method in `cipher.py:53-60` implements this:
119+
```python
120+
@staticmethod
121+
def crack(ciphertext: str) -> list[tuple[int, str]]:
122+
results = []
123+
for shift in range(ALPHABET_SIZE):
124+
cipher = CaesarCipher(key=shift)
125+
decrypted = cipher.decrypt(ciphertext)
126+
results.append((shift, decrypted))
127+
return results
128+
```
129+
130+
Try shift 0, shift 1, shift 2, all the way to shift 25. Return all results and let frequency analysis pick the best one.
131+
132+
### Key Space Analysis
133+
```
134+
Caesar cipher: 26 possible keys (2^4.7 bits)
135+
4-digit PIN: 10,000 possibilities (2^13 bits)
136+
8-char password: ~200 trillion (2^47 bits if using a-z, A-Z, 0-9)
137+
AES-128: 2^128 ≈ 10^38 possibilities
138+
```
139+
140+
Anything under 2^40 is considered brute forceable today. Caesar is laughably weak.
141+
142+
## How These Concepts Relate
143+
```
144+
Substitution Cipher (weak pattern)
145+
146+
preserves letter frequencies
147+
148+
Frequency Analysis (detects pattern)
149+
150+
scores all possible keys
151+
152+
Brute Force (tries all keys)
153+
154+
Cipher is broken
155+
```
156+
157+
The vulnerability chain: simple substitution creates a detectable pattern, frequency analysis exploits that pattern, brute force makes trying all keys practical.
158+
159+
## Industry Standards and Frameworks
160+
161+
### OWASP Top 10
162+
163+
This project addresses:
164+
- **A02:2021 - Cryptographic Failures** - Demonstrates why weak cryptographic algorithms fail. Shows proper key validation (though the algorithm itself is pedagogical, not production-ready).
165+
166+
### MITRE ATT&CK
167+
168+
Relevant techniques:
169+
- **T1552.001** - Credentials from Password Stores - Weak encryption of stored credentials can be broken like this
170+
- **T1140** - Deobfuscate/Decode Files or Information - Attackers use frequency analysis on ROT13 and similar "obfuscation"
171+
172+
### CWE
173+
174+
Common weakness enumerations covered:
175+
- **CWE-327** - Use of a Broken or Risky Cryptographic Algorithm - Caesar cipher is the textbook example
176+
- **CWE-326** - Inadequate Encryption Strength - 4.7 bits of key strength is inadequate for anything
177+
178+
## Real World Examples
179+
180+
### Case Study 1: Zodiac Killer Cipher (1969)
181+
182+
The Zodiac Killer sent encrypted messages to newspapers. His Z408 cipher was a homophonic substitution (multiple symbols per letter) but still fell to frequency analysis. Solved in 1969 by a schoolteacher and his wife using pencil and paper.
183+
184+
What happened: The killer used 54 different symbols but they still mapped to 26 letters. Frequency analysis revealed the patterns. The Z340 cipher took 51 years to crack (finally solved in 2020) because it used polyalphabetic techniques.
185+
186+
How this could have been prevented: Modern encryption with proper key length and randomization. Using ROT13 or Caesar on sensitive data is security theater.
187+
188+
### Case Study 2: CVE-2015-2187 (ROT13 for passwords)
189+
190+
In 2015, researchers found that some router firmware was storing admin passwords using ROT13 encoding. This is Caesar cipher with shift=13. Anyone with file access could instantly decrypt all passwords.
191+
192+
What happened: Developers confused encoding with encryption. ROT13 is meant for spoiler text, not security.
193+
194+
What defenses failed: No security review caught the use of a broken cipher for credential storage.
195+
196+
Lesson: Never use Caesar cipher (or ROT13) for anything security-sensitive. Use bcrypt, scrypt, or argon2 for password hashing.
197+
198+
## Testing Your Understanding
199+
200+
Before moving to the architecture, make sure you can answer:
201+
202+
1. If you encrypt "ATTACK AT DAWN" with key=7, will the letter A always encrypt to the same letter? Why does this matter for security?
203+
204+
2. You intercept "WKLV LV D WHVW" and know it's encrypted with Caesar. Describe two different ways to decrypt it without brute forcing all 26 keys.
205+
206+
3. Why does frequency analysis work on Caesar cipher but not on modern ciphers like AES?
207+
208+
If these questions feel unclear, re-read the relevant sections. The implementation will make more sense once these fundamentals click.
209+
210+
## Further Reading
211+
212+
**Essential:**
213+
- "The Code Book" by Simon Singh - Chapter on historical ciphers explains Caesar and frequency analysis with great examples
214+
- Wikipedia: Chi-squared test - Mathematical foundation for the statistical scoring
215+
216+
**Deep dives:**
217+
- "Applied Cryptography" by Bruce Schneier - Chapter 1 covers why classical ciphers fail
218+
- Al-Kindi's original manuscript on frequency analysis (translated versions available)
219+
220+
**Historical context:**
221+
- David Kahn's "The Codebreakers" - History of cryptanalysis from ancient times to WWII

0 commit comments

Comments
 (0)