|
| 1 | +# Core Security Concepts |
| 2 | + |
| 3 | +This document explains the security concepts you'll encounter while building this project. These are not just definitions, we'll dig into why they matter and how they actually work. |
| 4 | + |
| 5 | +## Substitution Ciphers |
| 6 | + |
| 7 | +### What It Is |
| 8 | + |
| 9 | +A substitution cipher replaces each letter in your message with a different letter according to a fixed rule. Caesar cipher is the simplest version: shift every letter by the same amount. If your key is 3, A becomes D, B becomes E, and so on until Z wraps back to C. |
| 10 | + |
| 11 | +### Why It Matters |
| 12 | + |
| 13 | +Substitution ciphers were state of the art cryptography for centuries. Julius Caesar actually used this method to protect military messages around 58 BC. Understanding why they fail teaches you that security through obscurity doesn't work when the underlying pattern is too simple. |
| 14 | + |
| 15 | +### How It Works |
| 16 | + |
| 17 | +The transformation is just modular arithmetic on alphabet positions: |
| 18 | +``` |
| 19 | +Encryption: ciphertext_position = (plaintext_position + key) % 26 |
| 20 | +Decryption: plaintext_position = (ciphertext_position - key) % 26 |
| 21 | +``` |
| 22 | + |
| 23 | +For example, encrypting "HELLO" with key=3: |
| 24 | +``` |
| 25 | +H (position 7) → (7+3) % 26 = 10 → K |
| 26 | +E (position 4) → (4+3) % 26 = 7 → H |
| 27 | +L (position 11) → (11+3) % 26 = 14 → O |
| 28 | +L (position 11) → (11+3) % 26 = 14 → O |
| 29 | +O (position 14) → (14+3) % 26 = 17 → R |
| 30 | +
|
| 31 | +Result: KHOOR |
| 32 | +``` |
| 33 | + |
| 34 | +### Common Attacks |
| 35 | + |
| 36 | +1. **Brute Force** - Try all 26 possible keys. With modern computers this takes milliseconds. The key space is too small. |
| 37 | +2. **Frequency Analysis** - English text has known letter frequencies. E appears 12.7% of the time, T about 9%, Z only 0.07%. These patterns survive Caesar encryption. |
| 38 | +3. **Known Plaintext** - If you know even part of the message, you can calculate the key immediately. One plaintext/ciphertext pair reveals everything. |
| 39 | + |
| 40 | +### Defense Strategies |
| 41 | + |
| 42 | +You can't defend Caesar cipher. It's fundamentally broken. But the lessons apply to stronger ciphers: |
| 43 | +- Larger key spaces make brute force impractical (this is why AES uses 128+ bit keys) |
| 44 | +- Randomization breaks frequency patterns (modern ciphers use different transformations for each block) |
| 45 | +- Authenticated encryption prevents known plaintext attacks from being useful |
| 46 | + |
| 47 | +## Frequency Analysis |
| 48 | + |
| 49 | +### What It Is |
| 50 | + |
| 51 | +A statistical attack that exploits the non-uniform distribution of letters in natural language. In English, E is the most common letter. In Caesar-encrypted English, some other letter will be most common, but it's still E underneath. By comparing the frequency distribution of the ciphertext to known English frequencies, you can score how likely a given shift is correct. |
| 52 | + |
| 53 | +### Why It Matters |
| 54 | + |
| 55 | +Al-Kindi described this technique in the 9th century, over a thousand years ago. It broke all simple substitution ciphers and stayed relevant until polyalphabetic ciphers like Vigenère were developed. Modern cryptanalysis still uses statistical attacks, just against more complex patterns. |
| 56 | + |
| 57 | +### How It Works |
| 58 | + |
| 59 | +The chi-squared test measures how far an observed distribution differs from an expected one: |
| 60 | +``` |
| 61 | +χ² = Σ ((observed - expected)² / expected) |
| 62 | +``` |
| 63 | + |
| 64 | +Lower scores mean better matches. In `analyzer.py:27-42`, the code calculates this: |
| 65 | +```python |
| 66 | +def calculate_chi_squared(self, text: str) -> float: |
| 67 | + text_upper = text.upper() |
| 68 | + letter_counts = Counter(char for char in text_upper if char.isalpha()) |
| 69 | + |
| 70 | + total_letters = sum(letter_counts.values()) |
| 71 | + chi_squared = 0.0 |
| 72 | + |
| 73 | + for letter, expected_freq in self.reference_frequencies.items(): |
| 74 | + observed_count = letter_counts.get(letter, 0) |
| 75 | + expected_count = (expected_freq / 100) * total_letters |
| 76 | + |
| 77 | + if expected_count > 0: |
| 78 | + chi_squared += ((observed_count - expected_count)**2) / expected_count |
| 79 | + |
| 80 | + return chi_squared |
| 81 | +``` |
| 82 | + |
| 83 | +### Common Pitfalls |
| 84 | + |
| 85 | +**Mistake 1: Not handling case properly** |
| 86 | +```python |
| 87 | +# Bad - misses lowercase letters |
| 88 | +def count_letters(text): |
| 89 | + return Counter(c for c in text if c.isupper()) |
| 90 | + |
| 91 | +# Good - normalize to uppercase first |
| 92 | +def count_letters(text): |
| 93 | + return Counter(c for c in text.upper() if c.isalpha()) |
| 94 | +``` |
| 95 | + |
| 96 | +Frequency analysis needs all letters. The code in `analyzer.py:29` converts to uppercase before counting. |
| 97 | + |
| 98 | +**Mistake 2: Including non-letters in frequency counts** |
| 99 | + |
| 100 | +Spaces, punctuation, and numbers will skew your statistics. Only count actual letters. The code uses `if char.isalpha()` to filter properly. |
| 101 | + |
| 102 | +**Mistake 3: Short text gives unreliable results** |
| 103 | + |
| 104 | +You need at least 50-100 letters for frequency analysis to work. With "HI" encrypted, there's not enough data. The chi-squared test returns `float("inf")` for empty strings in `analyzer.py:33`. |
| 105 | + |
| 106 | +## Brute Force Attacks |
| 107 | + |
| 108 | +### What It Is |
| 109 | + |
| 110 | +Simply trying every possible key until you find one that works. For Caesar cipher, that's only 26 attempts. Your computer can do millions of attempts per second, so this is instant. |
| 111 | + |
| 112 | +### Why It Matters |
| 113 | + |
| 114 | +Brute force sets the absolute maximum security of any cipher. Even with perfect implementation, if the key space is too small, the cipher is broken. This is why password complexity matters: each additional character multiplies the search space exponentially. |
| 115 | + |
| 116 | +### How It Works |
| 117 | + |
| 118 | +The `crack()` method in `cipher.py:53-60` implements this: |
| 119 | +```python |
| 120 | +@staticmethod |
| 121 | +def crack(ciphertext: str) -> list[tuple[int, str]]: |
| 122 | + results = [] |
| 123 | + for shift in range(ALPHABET_SIZE): |
| 124 | + cipher = CaesarCipher(key=shift) |
| 125 | + decrypted = cipher.decrypt(ciphertext) |
| 126 | + results.append((shift, decrypted)) |
| 127 | + return results |
| 128 | +``` |
| 129 | + |
| 130 | +Try shift 0, shift 1, shift 2, all the way to shift 25. Return all results and let frequency analysis pick the best one. |
| 131 | + |
| 132 | +### Key Space Analysis |
| 133 | +``` |
| 134 | +Caesar cipher: 26 possible keys (2^4.7 bits) |
| 135 | +4-digit PIN: 10,000 possibilities (2^13 bits) |
| 136 | +8-char password: ~200 trillion (2^47 bits if using a-z, A-Z, 0-9) |
| 137 | +AES-128: 2^128 ≈ 10^38 possibilities |
| 138 | +``` |
| 139 | + |
| 140 | +Anything under 2^40 is considered brute forceable today. Caesar is laughably weak. |
| 141 | + |
| 142 | +## How These Concepts Relate |
| 143 | +``` |
| 144 | +Substitution Cipher (weak pattern) |
| 145 | + ↓ |
| 146 | +preserves letter frequencies |
| 147 | + ↓ |
| 148 | +Frequency Analysis (detects pattern) |
| 149 | + ↓ |
| 150 | +scores all possible keys |
| 151 | + ↓ |
| 152 | +Brute Force (tries all keys) |
| 153 | + ↓ |
| 154 | +Cipher is broken |
| 155 | +``` |
| 156 | + |
| 157 | +The vulnerability chain: simple substitution creates a detectable pattern, frequency analysis exploits that pattern, brute force makes trying all keys practical. |
| 158 | + |
| 159 | +## Industry Standards and Frameworks |
| 160 | + |
| 161 | +### OWASP Top 10 |
| 162 | + |
| 163 | +This project addresses: |
| 164 | +- **A02:2021 - Cryptographic Failures** - Demonstrates why weak cryptographic algorithms fail. Shows proper key validation (though the algorithm itself is pedagogical, not production-ready). |
| 165 | + |
| 166 | +### MITRE ATT&CK |
| 167 | + |
| 168 | +Relevant techniques: |
| 169 | +- **T1552.001** - Credentials from Password Stores - Weak encryption of stored credentials can be broken like this |
| 170 | +- **T1140** - Deobfuscate/Decode Files or Information - Attackers use frequency analysis on ROT13 and similar "obfuscation" |
| 171 | + |
| 172 | +### CWE |
| 173 | + |
| 174 | +Common weakness enumerations covered: |
| 175 | +- **CWE-327** - Use of a Broken or Risky Cryptographic Algorithm - Caesar cipher is the textbook example |
| 176 | +- **CWE-326** - Inadequate Encryption Strength - 4.7 bits of key strength is inadequate for anything |
| 177 | + |
| 178 | +## Real World Examples |
| 179 | + |
| 180 | +### Case Study 1: Zodiac Killer Cipher (1969) |
| 181 | + |
| 182 | +The Zodiac Killer sent encrypted messages to newspapers. His Z408 cipher was a homophonic substitution (multiple symbols per letter) but still fell to frequency analysis. Solved in 1969 by a schoolteacher and his wife using pencil and paper. |
| 183 | + |
| 184 | +What happened: The killer used 54 different symbols but they still mapped to 26 letters. Frequency analysis revealed the patterns. The Z340 cipher took 51 years to crack (finally solved in 2020) because it used polyalphabetic techniques. |
| 185 | + |
| 186 | +How this could have been prevented: Modern encryption with proper key length and randomization. Using ROT13 or Caesar on sensitive data is security theater. |
| 187 | + |
| 188 | +### Case Study 2: CVE-2015-2187 (ROT13 for passwords) |
| 189 | + |
| 190 | +In 2015, researchers found that some router firmware was storing admin passwords using ROT13 encoding. This is Caesar cipher with shift=13. Anyone with file access could instantly decrypt all passwords. |
| 191 | + |
| 192 | +What happened: Developers confused encoding with encryption. ROT13 is meant for spoiler text, not security. |
| 193 | + |
| 194 | +What defenses failed: No security review caught the use of a broken cipher for credential storage. |
| 195 | + |
| 196 | +Lesson: Never use Caesar cipher (or ROT13) for anything security-sensitive. Use bcrypt, scrypt, or argon2 for password hashing. |
| 197 | + |
| 198 | +## Testing Your Understanding |
| 199 | + |
| 200 | +Before moving to the architecture, make sure you can answer: |
| 201 | + |
| 202 | +1. If you encrypt "ATTACK AT DAWN" with key=7, will the letter A always encrypt to the same letter? Why does this matter for security? |
| 203 | + |
| 204 | +2. You intercept "WKLV LV D WHVW" and know it's encrypted with Caesar. Describe two different ways to decrypt it without brute forcing all 26 keys. |
| 205 | + |
| 206 | +3. Why does frequency analysis work on Caesar cipher but not on modern ciphers like AES? |
| 207 | + |
| 208 | +If these questions feel unclear, re-read the relevant sections. The implementation will make more sense once these fundamentals click. |
| 209 | + |
| 210 | +## Further Reading |
| 211 | + |
| 212 | +**Essential:** |
| 213 | +- "The Code Book" by Simon Singh - Chapter on historical ciphers explains Caesar and frequency analysis with great examples |
| 214 | +- Wikipedia: Chi-squared test - Mathematical foundation for the statistical scoring |
| 215 | + |
| 216 | +**Deep dives:** |
| 217 | +- "Applied Cryptography" by Bruce Schneier - Chapter 1 covers why classical ciphers fail |
| 218 | +- Al-Kindi's original manuscript on frequency analysis (translated versions available) |
| 219 | + |
| 220 | +**Historical context:** |
| 221 | +- David Kahn's "The Codebreakers" - History of cryptanalysis from ancient times to WWII |
0 commit comments