Skip to content

Latest commit

 

History

History
288 lines (188 loc) · 3.93 KB

File metadata and controls

288 lines (188 loc) · 3.93 KB

✅ RNN Project: Next Character Prediction (from scratch)

🎯 Goal

Given a sequence of characters, predict the next character.

Example:

Input:  "hell"
Output: "o"

1️⃣ Understand the data (CRITICAL)

Dataset:

"hello world"

Characters:

h e l l o _ w o r l d

(_ = space)


2️⃣ Character → Integer Encoding (Vocabulary)

RNNs do NOT understand characters — only numbers.

text = "hello world"

chars = sorted(list(set(text)))
vocab_size = len(chars)

char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for i, c in enumerate(chars)}

Example mapping:

h → 1
e → 0
l → 3
o → 4
' ' → 2

3️⃣ Create sequences (Time Steps)

Choose sequence length

seq_length = 4

We slide over text:

Input Target
hell o
ello _
llo_ w
lo_w o

Code:

import numpy as np

X = []
y = []

for i in range(len(text) - seq_length):
    X.append([char_to_idx[c] for c in text[i:i+seq_length]])
    y.append(char_to_idx[text[i+seq_length]])

X = np.array(X)
y = np.array(y)

Now:

X shape = (num_samples, time_steps)

4️⃣ One-Hot Encode (Important)

RNN expects vectors, not integers.

X = tf.keras.utils.to_categorical(X, num_classes=vocab_size)
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)

Final shape:

X = (samples, time_steps, vocab_size)

✅ THIS is what “time steps” REALLY means


5️⃣ Build the RNN Model (SimpleRNN)

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.SimpleRNN(
        64,
        return_sequences=True,
        input_shape=(seq_length, vocab_size)
    ),
    tf.keras.layers.SimpleRNN(64),
    tf.keras.layers.Dense(vocab_size, activation='softmax')
])

model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.summary()

🔥 What happens inside the RNN?

At each time step:

h_t = tanh(Wx·x_t + Wh·h_(t-1) + b)

✅ Memory = h_(t-1) ✅ Time = processing each character sequentially


6️⃣ Train the model

model.fit(X, y, epochs=200, verbose=2)

Because dataset is tiny, many epochs are required.


7️⃣ Predict Next Character ✅

def predict_next(text_input):
    x = [char_to_idx[c] for c in text_input]
    x = tf.keras.utils.to_categorical([x], num_classes=vocab_size)

    preds = model.predict(x, verbose=0)
    idx = np.argmax(preds)
    return idx_to_char[idx]

print(predict_next("hell"))
print(predict_next("worl"))

Expected:

o
d

8️⃣ Generate Text (Cool Part 😎)

def generate_text(start, length=20):
    result = start

    for _ in range(length):
        next_char = predict_next(result[-seq_length:])
        result += next_char

    return result

print(generate_text("hell", 10))

🧠 What You LEARN (Deep Understanding)

✅ Sequential Input

  • Data has order
  • h → e → l → l → o is NOT same as shuffled

✅ Time Steps

(time_steps = sequence_length)

Each character = one step in time RNN processes one step at a time


✅ Memory Flow

Hidden state carries:

meaning so far

Example:

"h" → "he" → "hel" → "hell"

🚨 Vanishing Gradient Problem (IMPORTANT)

Why it happens

  • SimpleRNN uses tanh
  • Gradients get multiplied repeatedly
  • Small gradients → become nearly zero

Result:

  • RNN forgets early characters
  • Long sequences break learning

Example:

Predict last char in 100-char sentence
→ SimpleRNN FAILS

✅ Why LSTM & GRU exist

They add:

  • Gates (forget, input, output)
  • Controlled memory flow
  • Stronger gradient flow

📌 Replace SimpleRNN with:

tf.keras.layers.LSTM(64)

Same code — vastly better memory.


✅ Summary