Given a sequence of characters, predict the next character.
Example:
Input: "hell"
Output: "o"
Dataset:
"hello world"
Characters:
h e l l o _ w o r l d
(_ = space)
RNNs do NOT understand characters — only numbers.
text = "hello world"
chars = sorted(list(set(text)))
vocab_size = len(chars)
char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for i, c in enumerate(chars)}Example mapping:
h → 1
e → 0
l → 3
o → 4
' ' → 2
Choose sequence length
seq_length = 4We slide over text:
| Input | Target |
|---|---|
| hell | o |
| ello | _ |
| llo_ | w |
| lo_w | o |
Code:
import numpy as np
X = []
y = []
for i in range(len(text) - seq_length):
X.append([char_to_idx[c] for c in text[i:i+seq_length]])
y.append(char_to_idx[text[i+seq_length]])
X = np.array(X)
y = np.array(y)Now:
X shape = (num_samples, time_steps)
RNN expects vectors, not integers.
X = tf.keras.utils.to_categorical(X, num_classes=vocab_size)
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)Final shape:
X = (samples, time_steps, vocab_size)
✅ THIS is what “time steps” REALLY means
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.SimpleRNN(
64,
return_sequences=True,
input_shape=(seq_length, vocab_size)
),
tf.keras.layers.SimpleRNN(64),
tf.keras.layers.Dense(vocab_size, activation='softmax')
])
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
model.summary()At each time step:
h_t = tanh(Wx·x_t + Wh·h_(t-1) + b)
✅ Memory = h_(t-1)
✅ Time = processing each character sequentially
model.fit(X, y, epochs=200, verbose=2)Because dataset is tiny, many epochs are required.
def predict_next(text_input):
x = [char_to_idx[c] for c in text_input]
x = tf.keras.utils.to_categorical([x], num_classes=vocab_size)
preds = model.predict(x, verbose=0)
idx = np.argmax(preds)
return idx_to_char[idx]
print(predict_next("hell"))
print(predict_next("worl"))Expected:
o
d
def generate_text(start, length=20):
result = start
for _ in range(length):
next_char = predict_next(result[-seq_length:])
result += next_char
return result
print(generate_text("hell", 10))- Data has order
h → e → l → l → ois NOT same as shuffled
(time_steps = sequence_length)
Each character = one step in time RNN processes one step at a time
Hidden state carries:
meaning so far
Example:
"h" → "he" → "hel" → "hell"
- SimpleRNN uses tanh
- Gradients get multiplied repeatedly
- Small gradients → become nearly zero
- RNN forgets early characters
- Long sequences break learning
Predict last char in 100-char sentence
→ SimpleRNN FAILS
They add:
- Gates (forget, input, output)
- Controlled memory flow
- Stronger gradient flow
📌 Replace SimpleRNN with:
tf.keras.layers.LSTM(64)Same code — vastly better memory.