Skip to content

Latest commit

 

History

History
536 lines (349 loc) · 12.6 KB

File metadata and controls

536 lines (349 loc) · 12.6 KB

🐦 Design Twitter #355

📌 Problem Statement

You need to design a simplified version of Twitter with these operations:

  1. postTweet(userId, tweetId)

    • User posts a new tweet.
  2. getNewsFeed(userId)

    • Return the 10 most recent tweet IDs in the user's news feed.

    • The feed includes:

      • The user's own tweets.
      • Tweets from users they follow.
    • Order should be from most recent to least recent.

  3. follow(followerId, followeeId)

    • followerId starts following followeeId.
  4. unfollow(followerId, followeeId)

    • followerId stops following followeeId.

Constraints (from the typical LeetCode version):

  • At most around 3 * 10^4 operations.
  • News feed should be efficient. We can't sort the world every time like a badly optimized social media app 😅

🧠 Intuition

Think of Twitter like this:

  • Each tweet happens at a global "time step".

  • Newer tweets have a larger timestamp.

  • For a user, their news feed is simply:

    • All of their tweets +
    • All tweets from people they follow → then pick the latest 10.

Core challenges:

  • Efficiently keeping only the 10 most recent tweets among:

    • The user's tweets.
    • All followees’ tweets.

That screams:

  • Use a heap (priority queue) to keep just the top 10 recent tweets.

We don't want to:

  • Gather hundreds or thousands of tweets,
  • Sort them every time, and
  • Then just take the top 10.

That's like watching a 3-hour movie to only screenshot one scene.

🧪 Brute Force Approach

Idea

  1. Keep a global list of all tweets with their (time, userId, tweetId).

  2. When getNewsFeed(userId) is called:

    • Scan the entire tweet list from the end (most recent).

    • For each tweet, check:

      • Is it posted by userId?
      • Or by someone userId follows?
    • Collect until you get 10 tweets.

Pseudo-code

vector<tuple<int,int,int>> allTweets; // (time, userId, tweetId)

// postTweet
allTweets.push_back({time++, userId, tweetId});

// getNewsFeed
vector<int> res;
for (int i = allTweets.size() - 1; i >= 0 && res.size() < 10; i--) {
    auto [t, uid, tid] = allTweets[i];
    if (uid == userId || following[userId].count(uid)) {
        res.push_back(tid);
    }
}
return res;

Complexity

  • postTweetO(1)
  • follow / unfollowO(1) average (hash set)
  • getNewsFeedO(T) where T = total number of tweets ever posted

This gets slow as the number of tweets grows. Imagine scanning millions of tweets for every refresh. Even Batman’s computer would cry 🦇

💡 Better Approach – Using a Min-Heap

High-Level Idea

Instead of scanning the whole world:

  1. Store tweets per user.

    • tweets[userId] → a vector of (time, tweetId).
  2. On getNewsFeed(userId):

    • Look only at:

      • tweets[userId]
      • tweets[eachFollowee]
    • Use a min-heap of size at most 10 to keep track of the 10 latest tweets:

      • Push tweets into heap by (time, tweetId).
      • If heap size > 10 → pop the smallest (oldest) tweet.
    • In the end, the heap contains the 10 most recent tweets among all those considered.

    • Extract them, reverse the result to get most recent first.

This is exactly what your code is doing ✔️

🧱 Data Structures Used

unordered_map<int, vector<pair<int, int>>> tweets;
unordered_map<int, unordered_set<int>> following;
int time;
  • tweets[userId]

    • Vector of {time, tweetId} pairs.
    • Time is strictly increasing, so later tweets have greater time.
  • following[followerId]

    • Set of user IDs that followerId follows.
  • time

    • A global integer counter.
    • Every postTweet uses time++ so each tweet gets a unique "timestamp".
  • Priority queue in getNewsFeed:

priority_queue<pair<int, int>, vector<pair<int, int>>, greater<>> pq;
  • This is a min-heap ordered by time (because of greater<>).
  • We maintain its size at most 10.

✅ Full CPP Code

#include <bits/stdc++.h>
using namespace std;

class Twitter {
private:
    // userId -> list of {time, tweetId}
    unordered_map<int, vector<pair<int, int>>> tweets;
    
    // followerId -> set of followeeIds
    unordered_map<int, unordered_set<int>> following;
    
    int timeStamp;

public:
    Twitter() {
        timeStamp = 0;
    }
    
    // O(1)
    void postTweet(int userId, int tweetId) {
        tweets[userId].push_back({timeStamp++, tweetId});
    }
    
    // O(K log 10) where K = total tweets of user + followees
    vector<int> getNewsFeed(int userId) {
        // min-heap storing {time, tweetId}
        priority_queue<pair<int, int>, 
                       vector<pair<int, int>>, 
                       greater<pair<int, int>>> pq;

        // 1. Add user's own tweets
        for (auto &t : tweets[userId]) {
            pq.push(t);
            if (pq.size() > 10) {
                pq.pop(); // remove oldest
            }
        }

        // 2. Add followees' tweets
        for (int followee : following[userId]) {
            for (auto &t : tweets[followee]) {
                pq.push(t);
                if (pq.size() > 10) {
                    pq.pop(); // keep only top 10 most recent
                }
            }
        }

        // 3. Extract from heap to vector (currently oldest -> newest)
        vector<int> res;
        while (!pq.empty()) {
            res.push_back(pq.top().second); // tweetId
            pq.pop();
        }

        // Reverse to get newest -> oldest
        reverse(res.begin(), res.end());
        return res;
    }
    
    // O(1) average
    void follow(int followerId, int followeeId) {
        if (followerId == followeeId) return; // ignoring self-follow (optional)
        following[followerId].insert(followeeId);
    }
    
    // O(1) average
    void unfollow(int followerId, int followeeId) {
        if (followerId == followeeId) return; // can't unfollow self (optional)
        following[followerId].erase(followeeId);
    }
};

Your original logic was already correct ✅ The only changes are minor safety checks and explicit includes.

🔍 Step-By-Step: getNewsFeed

Let’s walk through like we’re debugging in VS Code:

1. Create Min-Heap

priority_queue<pair<int, int>, vector<pair<int, int>>, greater<>> pq;
  • Stores (time, tweetId).
  • greater<> makes this a min-heap, so pq.top() is the smallest time.

2. Push User's Own Tweets

for (auto& t : tweets[userId]) {
    pq.push(t);
    if (pq.size() > 10)
        pq.pop();
}
  • For each (time, tweetId):

    • Push into heap.

    • If size exceeds 10:

      • Pop the smallest time (oldest tweet).
  • After this loop:

    • Heap contains up to 10 most recent tweets for this user alone.

3. Push Followees' Tweets

for (int followee : following[userId]) {
    for (auto& t : tweets[followee]) {
        pq.push(t);
        if (pq.size() > 10)
            pq.pop();
    }
}
  • For each followee:

    • For each of their tweets:

      • Push into heap.
      • If heap > 10, remove oldest.
  • After this:

    • Heap contains the 10 most recent tweets overall among:

      • userId
      • All following[userId]

Think of it like keeping a top-10 chart. Every time a new tweet comes in, if it’s "hotter" (newer) than the coldest one in the chart, it kicks the cold one out 👢

4. Extract and Reverse

vector<int> res;
while (!pq.empty()) {
    res.push_back(pq.top().second);
    pq.pop();
}
reverse(res.begin(), res.end());
  • Initially, res will have tweets from oldest to newest, because:

    • Min-heap pops the smallest time first.
  • Reversing gives newest to oldest, which is what we want.

⏱️ Time & Space Complexity

Let:

  • F = number of followees for userId
  • Tu = number of tweets by userId
  • Tf = total number of tweets by all followees
  • Total tweets checked: K = Tu + Tf

postTweet

  • Append {time, tweetId} to a vector.
  • Time: O(1)
  • Space: O(1) extra; overall tweets storage grows as O(totalTweets)

follow / unfollow

  • Inserting/erasing from unordered_set.
  • Average time: O(1)

getNewsFeed

  • For each of the K tweets:

    • One push to heap: O(log 10) (heap size ≤ 10)
    • Sometimes a pop: also O(log 10)
  • log 10 is constant ≈ 3.3, so:

    • Time ≈ O(K) in practice.
  • Space:

    • Heap of at most 10 elements → O(1)
    • Output vector of at most 10 elements → O(1)

Compared to brute force, this is much more scalable, especially when many tweets exist but only a few users are relevant.

🧪 Example Test Cases

You can mentally map these to what LeetCode usually tests.

Test Case 1 – Basic Use

Twitter twitter;
twitter.postTweet(1, 5);
vector<int> feed = twitter.getNewsFeed(1); // [5]

Expected: [5] Only one tweet, from user 1.

Test Case 2 – Follow + News Feed

Twitter twitter;
twitter.postTweet(1, 5);
twitter.follow(1, 2);
twitter.postTweet(2, 6);
vector<int> feed = twitter.getNewsFeed(1); 
// should be [6, 5] (6 is newer)

Test Case 3 – Unfollow

Twitter twitter;
twitter.postTweet(1, 5);
twitter.follow(1, 2);
twitter.postTweet(2, 6);
twitter.unfollow(1, 2);
vector<int> feed = twitter.getNewsFeed(1); 
// should be [5]

User 1 unfollowed user 2, so user 2's tweets vanish from the feed like a deleted WhatsApp message 💬

Test Case 4 – More Than 10 Tweets

Twitter twitter;
for (int i = 1; i <= 12; i++) {
    twitter.postTweet(1, i); 
}
vector<int> feed = twitter.getNewsFeed(1);
// should contain exactly 10 tweetIds: [12, 11, 10, 9, 8, 7, 6, 5, 4, 3]
  • Only the 10 latest tweets are kept.

Test Case 5 – No Tweets Anywhere

Twitter twitter;
vector<int> feed = twitter.getNewsFeed(1); 
// should be []

Feed is empty. No crashes, no drama.

💡 Tips, Tricks & Variations

1. Why Min-Heap and Not Max-Heap?

  • If you used a max-heap, you’d get the largest time first, but then:

    • You’d either store all tweets and only take 10.
    • Or complicate logic to limit size.

Using a min-heap with fixed size 10 is perfect:

  • It naturally keeps the 10 largest elements by repeatedly kicking out the smallest.

Think of it like Top 10 Spotify – whenever a new song tops one of them, the weakest gets kicked out.

2. Could This Be Even More Optimal?

Yes, another common approach:

  • For each user, store tweets in a vector.

  • In getNewsFeed, instead of iterating all tweets of all followees:

    • Use a max-heap with:

      • Each followee's latest tweet index.
    • Then do a k-way merge (like merging k sorted lists).

  • This is better when followees have many tweets, because you don’t scan them all.

Your current solution is still good and usually accepted in interviews, especially if constraints allow it.

3. Self-Follow

  • Some implementations make each user implicitly follow themselves.
  • But your code handles self tweets by always including tweets[userId], so there’s no need to insert userId into following[userId].

4. Overflow of timeStamp?

  • Given typical constraints, timeStamp fits well inside int.

  • If you were designing this for a real production Twitter:

    • Use long long or actual timestamps.

🙋‍♂️ FAQs

Q1: Why do we store (time, tweetId) instead of just tweetId?

Because the order matters.

  • We need to compare recency across tweets from different users.
  • time gives us a consistent global ordering.

Q2: Why not store an actual timestamp like 2025-12-08T18:30?

In competitive programming / interviews:

  • Incrementing integer timeStamp is:

    • Faster
    • Easier to compare
    • Less code

In real systems, you’d use a real timestamp or some monotonic ID.

Q3: What happens if a user calls unfollow on someone they weren’t following?

following[followerId].erase(followeeId);
  • erase on unordered_set of a non-existent element is fine.
  • It does nothing.
  • No crash, no exception.

Q4: Why do we reverse the result at the end?

Because:

  • The min-heap gives tweets from oldest → newest when we pop.
  • But we want newest → oldest.
  • So we reverse the vector.

You could also push into the vector and then iterate from the back, but reverse is clean and readable.

Q5: Is this code good enough for interviews?

Yes ✅

  • Clear use of:

    • Hash maps
    • Hash sets
    • Priority queue (min-heap)
  • Correct handling of:

    • User’s own tweets + followees
    • Limiting to 10 results
    • Time ordering

If you explain this with the intuition you now have, most interviewers will be happy. Just be sure to mention complexity and possible optimizations.