Skip to content

Commit adfbd28

Browse files
committed
adding new days
1 parent a0d7086 commit adfbd28

File tree

8 files changed

+834
-14
lines changed

8 files changed

+834
-14
lines changed

book-src/src/week-1/day-01.md

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,80 @@
1-
# Day 01: The Data Warehouse & Opportunity Discovery
1+
# Day 01: The Spark of an Idea
22

3-
_Summary and tasks as per curriculum. Add your notes and findings here._
3+
Welcome to Day 1 of the Product Analytics Masterclass! Today, we’re not starting with complex dashboards or A/B tests. We're starting with the most fundamental skill of a great product analyst: **curiosity**. We're going on an expedition into the raw, unstructured world of user feedback to find the spark of our next big feature.
4+
5+
Our scenario for this week is the **"Journals Sprint."** The product team has a hunch that users want a way to journal or log their activities within our app. Is this a real need, or just a guess? Our job is to find the data to support or challenge this idea.
6+
7+
### Objective
8+
- To validate the need for a new feature by performing proactive discovery on raw, qualitative user data.
9+
10+
### Why This Matters
11+
Great analysis isn't just about answering questions that are handed to you; it's about asking the *right* questions in the first place. Too often, analytics is purely **reactive**—measuring the performance of features that already exist. Today, we're flipping the script.
12+
13+
We will practice **proactive discovery**: the art of sifting through raw data to uncover hidden opportunities and unmet user needs. By analyzing qualitative data like user reviews *before* a single line of code is written for a new feature, you can:
14+
- **De-risk product decisions:** Provide evidence that a real user problem exists.
15+
- **Influence the roadmap:** Champion features that are backed by user-driven data.
16+
- **Build empathy:** Gain a deep, unfiltered understanding of what your users are actually saying and feeling.
17+
18+
This skill—finding signals in the noise—is what separates a good analyst from a great one.
19+
20+
### Key Concepts
21+
Before we dive into the code, let's familiarize ourselves with the tools and concepts we'll be using today.
22+
23+
1. **DuckDB:** Think of DuckDB as "SQLite for analytics." It's an in-process analytical database management system.
24+
- **In-process:** It runs inside our Python notebook. No complex server setup, no database administrators, no network connections. It's just a library you import.
25+
- **Analytical:** It's blazing fast for the types of queries we'll be doing (aggregations, filtering, etc.) because of its columnar-vectorized query engine.
26+
- **Perfect for us:** We can query our data files directly, making it incredibly easy to get started.
27+
28+
2. **Parquet Files:** The data we're using today (`app_reviews.parquet`) is stored in the Parquet format.
29+
- **Columnar:** Unlike row-based formats like CSV, Parquet stores data by column. When you query `AVG(rating)`, it only reads the `rating` column, ignoring all others. This makes analytical queries significantly faster.
30+
- **Compressed:** Parquet files are highly compressed, saving disk space and speeding up data reads. It's the go-to format for analytical datasets in the modern data stack.
31+
32+
3. **Exploratory SQL:** Before you can perform complex analysis, you must first understand your dataset's basic shape and content. We call this exploratory analysis, and we'll use a few fundamental SQL commands:
33+
- `DESCRIBE`: Shows the schema of the table—the column names and their data types (`VARCHAR`, `INTEGER`, `TIMESTAMP`, etc.).
34+
- `SELECT * LIMIT 10`: Fetches the first 10 rows of the table. This is a quick and safe way to peek at the actual data without trying to load the entire (potentially huge) dataset.
35+
- `COUNT(*)`: Counts the total number of rows in a table.
36+
- `AVG()`: Calculates the average value of a numeric column.
37+
38+
4. **Keyword Searching with `LIKE`:** This is our primary tool for discovery today. The `LIKE` operator in SQL is used for simple pattern matching in text data.
39+
- It's often paired with the `%` wildcard, which matches any sequence of characters (including zero characters).
40+
- For example, `WHERE content LIKE '%journal%'` will find any review that contains the word "journal" anywhere within its text. We'll use the case-insensitive version, `ILIKE`, to make our search more robust.
41+
42+
### Today's Challenge: A Step-by-Step Guide
43+
It's time to get our hands dirty. Open the `Day_01_Challenge.ipynb` notebook and follow along with this guide. Our mission is to find evidence for or against the "Journals" feature idea within our app reviews.
44+
45+
**Step 1: Set Up the Environment**
46+
The first few cells in the notebook will install and import the necessary libraries (`duckdb`) and then establish a connection to our Parquet file. This simple command tells DuckDB to treat our file as a SQL table.
47+
48+
**Step 2: Initial Data Exploration**
49+
Now that we're connected, we need to get acquainted with the data.
50+
1. Run the `DESCRIBE` query to understand the columns we have to work with. What are their names? What are their data types?
51+
2. Use `SELECT * LIMIT 10` to see some sample reviews. Get a feel for the language users are using. What do the `content` and `rating` columns look like?
52+
3. Calculate the total number of reviews with `COUNT(*)` and the overall average rating with `AVG(rating)`. This gives us a baseline to compare our findings against later.
53+
54+
**Step 3: Formulate the Keyword Query**
55+
Our hunch is about journaling. Let's translate that into keywords. We're looking for reviews that mention terms like `journal`, `diary`, `log`, or `track`.
56+
57+
We will build a query to find all reviews matching these keywords. To keep our logic clean and readable, we'll use a Common Table Expression (CTE) with the `WITH` clause.
58+
59+
The logic will be:
60+
1. **CTE (`Journal_Reviews`):** Create a temporary table that selects all columns from reviews `WHERE` the `content` column (using `ILIKE` for case-insensitivity) contains our keywords. We'll link them with `OR`.
61+
2. **Final Query:** Select the `COUNT(*)` and `AVG(rating)` from our `Journal_Reviews` CTE.
62+
63+
This query will tell us exactly how many people are talking about journaling and what their average sentiment (as measured by rating) is.
64+
65+
**Step 4: Analyze and Summarize Your Findings**
66+
The final, and most important, step is to interpret your results. The query gives you numbers; your job is to turn them into an insight.
67+
68+
In the final markdown cell of the notebook, write a short summary answering these questions:
69+
- How many reviews mentioned your keywords?
70+
- What was the average rating for these specific reviews?
71+
- How does this count and average rating compare to the overall dataset?
72+
- **Conclusion:** Based on this initial analysis, is there evidence to support prioritizing a journaling feature? Why or why not?
73+
74+
This summary is your deliverable. It's where you practice the crucial skill of communicating your findings to stakeholders.
75+
76+
### Deliverable Checklist
77+
- [ ] DuckDB environment is set up and data is loaded.
78+
- [ ] SQL queries for schema exploration are complete.
79+
- [ ] SQL query for keyword search is written and executed.
80+
- [ ] A summary of findings is written in the notebook's markdown cell.

book-src/src/week-1/day-02.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,35 @@
1-
# Day 02: Opportunity Sizing & The Business Case
1+
**Crafting Day 2 Content**
2+
3+
I'm currently working on the content for `book-src/src/week-1/day-02.md`, focusing on "Opportunity Sizing & The Business Case," as a continuation of the "Jour" scenario. My main objective is to provide a comprehensive, educational resource suitable for data science students.
4+
5+
6+
**Analyzing the Business Value**
7+
8+
I'm structuring the day 2 content to translate signals into a tangible business case. My focus is on the "Jour" scenario, emphasizing the need for data scientists to speak the language of business stakeholders. I've drafted the `Objective` and `Why This Matters` sections, highlighting the value of quantifying opportunities and aligning with business metrics. The key concepts are mapped out, and I'm detailing today's step-by-step guide to solidify the material.
9+
10+
11+
**Expanding Key Concepts**
12+
13+
My focus has shifted to the "Key Concepts" section, refining the explanations for User Segmentation and Core Metrics. I've expanded on the user-provided notes, making them more accessible for students. Now, I'm working to connect the concepts to the Jour scenario, demonstrating how to use segmentation to identify the target audience for the journal feature and how to select relevant business metrics like WAU and Retention to quantify its potential impact.
14+
15+
16+
**Developing Core Metric Definitions**
17+
18+
I'm now focusing on defining the crucial "Key Concepts" for the day 2 material. I've broken down "User Segmentation," "WAU," and "Retention," ensuring clear, student-friendly explanations linked to the "Jour" scenario. My goal is to equip students with a robust grasp of these concepts, solidifying their understanding of business case development. Now, I'm working on explaining the difference between absolute and relative lift, ensuring these definitions are accessible for the target audience.
19+
20+
21+
**Outlining the Core Challenge**
22+
23+
I've just finished the outline for the Day 02 content. My focus now is creating clear, straightforward definitions for key concepts like user segmentation and core metrics, along with an explanation of relative vs. absolute lift. The step-by-step guide is now complete. I'll translate the core challenge, including how to translate the user signals from Day 1 into the quantitative steps needed to build a business case. My emphasis will be on practical application, showing data science students how to identify user segments, join data using SQL, and build a model to estimate the impact of new features.
24+
25+
26+
**Outlining Day 2's Content**
27+
28+
I'm currently building the framework for "Day 02: Opportunity Sizing & The Business Case," within the context of the "Jour" scenario, focusing on translating signals into business value. I've finished the main outline and am now detailing the sections on Key Concepts: User Segmentation, Core Metrics (WAU, Retention), and the all-important definitions of Relative vs. Absolute Lift. My approach is to make each concept clear and relevant, particularly for those with little experience creating business cases. I will provide practical examples, and link them to the actual notebook with SQL to solidify the explanations.
29+
30+
31+
**Refining the Step-by-Step Guide**
32+
33+
I'm now revising the "Today's Challenge" section, clarifying each step of the guide. My goal is to transform the broad outline into actionable instructions for students, connecting the theoretical concepts to the practical application in the notebook. Specifically, I'm focusing on making the SQL joins and Pandas calculations crystal clear. I'll include comments in the notebook itself as a reference, enhancing the learning process. My objective is to provide students with the ability to confidently navigate the data analysis workflow and understand how to develop a business case from qualitative signals.
34+
235

3-
_Summary and tasks as per curriculum. Add your notes and findings here._

book-src/src/week-1/day-03.md

Lines changed: 53 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,54 @@
1-
# Day 03: The Instrumentation Plan & Success Metrics
1+
# Day 03: If You Can't Measure It, You Can't Improve It
22

3-
_Summary and tasks as per curriculum. Add your notes and findings here._
3+
So far, we've found a signal in the noise and sized the potential business impact. We've answered the "what" and the "why." Today, we tackle the "how"—specifically, how will we measure success? We're going to create an **Instrumentation Plan**.
4+
5+
This is one of the most critical, high-leverage activities an analyst can perform. We are defining, *before a single line of code is written*, exactly what data we need to collect to determine if our "Journals" feature is a success or a failure.
6+
7+
### Objective
8+
- To define exactly what user actions need to be tracked (events) and what metrics will define success, *before* any code is written.
9+
10+
### Why This Matters
11+
Think of an instrumentation plan as the foundational contract between Product, Engineering, and Analytics.
12+
- **For Product,** it forces clarity on what "success" actually means.
13+
- **For Engineering,** it provides a clear, unambiguous list of tracking requirements.
14+
- **For Analytics,** it ensures that the data you need to do your job will actually exist post-launch.
15+
16+
Without this plan, you're flying blind. You launch a feature and then ask, "Did it work?" only to realize you don't have the data to answer the question. This leads to opinions, not data-driven decisions. The old adage holds true: **"Bad data in, bad decisions out."** A thoughtful instrumentation plan is your quality control for future decisions.
17+
18+
### Key Concepts
19+
Let's define the core components of our plan.
20+
21+
1. **Events and Properties:** This is the vocabulary we use to describe user behavior.
22+
- An **Event** is a user action. It's the *verb*. Examples: `click_button`, `view_screen`, `save_entry`.
23+
- **Properties** are the context for that action. They are the *adverbs* or *nouns* that describe the verb. They answer questions like who, what, where, when, and how.
24+
- **Analogy:** If the event is `play_song` (the verb), the properties would be a dictionary of context like `{genre: 'rock', artist_name: 'Queen', duration_ms: 210000, source: 'playlist'}`.
25+
26+
2. **Primary vs. Secondary Metrics:** Not all metrics are created equal. You need a hierarchy to avoid confusion.
27+
- The **Primary Metric** (also called the North Star Metric for the project) is the single, undisputed measure of success. If this metric goes up, the project is a success. If it doesn't, it's a failure. It must be directly related to the business case we built yesterday.
28+
- **Secondary Metrics** add color, context, and diagnostic power. They help explain *why* the primary metric moved. If retention is our primary metric, a secondary metric might be "average number of journal entries per week," which could be a leading indicator of retention.
29+
30+
3. **Guardrail Metrics:** This is a concept that separates good analysts from great ones. Guardrail metrics are the metrics you hope *don't* change. They are your early warning system for unintended negative consequences.
31+
- **Purpose:** Their job is to protect the overall health of the product. When you launch a new feature, you might accidentally hurt another part of the app.
32+
- **Example:** For our Journals feature, we want to increase engagement. But what if it's so engaging that users stop using our app's main social feed? A good guardrail metric would be `time_spent_on_main_feed`. If that metric plummets for users of the journal, we know we've created a cannibalization problem. Other examples include app performance (crash rate) or uninstalls.
33+
34+
### Today's Challenge: A Step-by-Step Guide
35+
Your task is to create a simple instrumentation plan for the "Journals" feature. Think logically through the user journey and define the data you would need to collect. We'll outline the key components in your notebook, `Day_03_Challenge.md`.
36+
37+
**Step 1: Map the User Journey & Define Events**
38+
First, imagine you are a user. What are the key actions you would take within this new feature? List them out as events.
39+
- What happens when the user sees the feature for the first time? (`view_journal_promo`)
40+
- What is the first click? (`click_journal_tab`)
41+
- What actions can they take on the page? (`start_journal_entry`, `save_journal_entry`, `set_reminder`)
42+
43+
**Step 2: Add Context with Properties**
44+
Now, for each event, what additional information would be useful for deeper analysis later?
45+
- For `save_journal_entry`, you'd want to know the `character_count` to see if longer entries correlate with retention. You might also want a `template_used` property if you offer different journaling formats (e.g., 'gratitude', 'freeform').
46+
- For `start_journal_entry`, an `entry_point` property (`'from_prompt'`, `'from_blank_canvas'`) would be incredibly valuable.
47+
48+
**Step 3: Define Your Metrics**
49+
This is where you formalize your success criteria.
50+
- **Choose a Primary Metric:** The goal of a journal is to build a long-term habit. Therefore, a short-term metric like "number of entries on Day 1" is misleading. A better primary metric is **28-day retention for users who create their first entry**. This directly measures if the feature creates lasting value.
51+
- **List Secondary Metrics:** What would support this primary metric? Consider `Adoption Rate` (% of users who try the feature), `Engagement Rate` (average entries per user per week), and `Funnel Conversion` (the % of users who start an entry and then save it).
52+
- **Set Your Guardrails:** What could go wrong? The biggest risk is cannibalization. A key guardrail would be `sessions_per_week_on_core_feature_X`. You could also add `app_uninstall_rate` for the week after a user's first journal entry.
53+
54+
By completing this plan, you're not just preparing to analyze a feature; you are actively shaping its success.

0 commit comments

Comments
 (0)