Skip to content

Commit 2298145

Browse files
Copilotleestott
andcommitted
Add beginner-friendly examples with comprehensive documentation
Co-authored-by: leestott <[email protected]>
1 parent 1ecb020 commit 2298145

File tree

8 files changed

+1007
-0
lines changed

8 files changed

+1007
-0
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,3 +353,8 @@ MigrationBackup/
353353
.ionide/
354354
.vscode/settings.json
355355

356+
# Example output files (generated by running example scripts)
357+
examples/*.png
358+
examples/*.jpg
359+
examples/*.jpeg
360+

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ Get started with the following resources:
5151

5252
# Getting Started
5353

54+
> **Complete Beginners**: New to data science? Start with our [beginner-friendly examples](examples/README.md)! These simple, well-commented examples will help you understand the basics before diving into the full curriculum.
55+
5456
> **Teachers**: we have [included some suggestions](for-teachers.md) on how to use this curriculum. We'd love your feedback [in our discussion forum](https://github.com/microsoft/Data-Science-For-Beginners/discussions)!
5557
5658
> **[Students](https://aka.ms/student-page)**: to use this curriculum on your own, fork the entire repo and complete the exercises on your own, starting with a pre-lecture quiz. Then read the lecture and complete the rest of the activities. Try to create the projects by comprehending the lessons rather than copying the solution code; however, that code is available in the /solutions folders in each project-oriented lesson. Another idea would be to form a study group with friends and go through the content together. For further study, we recommend [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/qprpajyoy3x0g7?WT.mc_id=academic-77958-bethanycheum).
@@ -86,6 +88,20 @@ In addition, a low-stakes quiz before a class sets the intention of the student
8688

8789
> **A note about quizzes**: All quizzes are contained in the Quiz-App folder, for 40 total quizzes of three questions each. They are linked from within the lessons, but the quiz app can be run locally or deployed to Azure; follow the instruction in the `quiz-app` folder. They are gradually being localized.
8890
91+
## 🎓 Beginner-Friendly Examples
92+
93+
**New to Data Science?** We've created a special [examples directory](examples/README.md) with simple, well-commented code to help you get started:
94+
95+
- 🌟 **Hello World** - Your first data science program
96+
- 📂 **Loading Data** - Learn to read and explore datasets
97+
- 📊 **Simple Analysis** - Calculate statistics and find patterns
98+
- 📈 **Basic Visualization** - Create charts and graphs
99+
- 🔬 **Real-World Project** - Complete workflow from start to finish
100+
101+
Each example includes detailed comments explaining every step, making it perfect for absolute beginners!
102+
103+
👉 **[Start with the examples](examples/README.md)** 👈
104+
89105
## Lessons
90106

91107

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
"""
2+
Hello World - Data Science Style!
3+
4+
This is your very first data science program. It introduces you to the basic
5+
concepts of working with data in Python.
6+
7+
What you'll learn:
8+
- How to create a simple dataset
9+
- How to display data
10+
- How to work with Python lists and dictionaries
11+
- Basic data manipulation
12+
13+
Prerequisites: Just Python installed on your computer!
14+
"""
15+
16+
# Let's start with the classic "Hello, World!" but with a data science twist
17+
print("=" * 50)
18+
print("Hello, World of Data Science!")
19+
print("=" * 50)
20+
print()
21+
22+
# In data science, we work with data. Let's create our first simple dataset.
23+
# We'll use a list to store information about students and their test scores.
24+
25+
# A list is a collection of items in Python, written with square brackets []
26+
students = ["Alice", "Bob", "Charlie", "Diana", "Eve"]
27+
scores = [85, 92, 78, 95, 88]
28+
29+
print("Our Dataset:")
30+
print("-" * 50)
31+
print("Students:", students)
32+
print("Scores:", scores)
33+
print()
34+
35+
# Now let's do something useful with this data!
36+
# We can find basic statistics about the scores
37+
38+
# Find the highest score
39+
highest_score = max(scores)
40+
print(f"📊 Highest score: {highest_score}")
41+
42+
# Find the lowest score
43+
lowest_score = min(scores)
44+
print(f"📊 Lowest score: {lowest_score}")
45+
46+
# Calculate the average score
47+
# sum() adds all numbers together, len() tells us how many items we have
48+
average_score = sum(scores) / len(scores)
49+
print(f"📊 Average score: {average_score:.2f}") # .2f means show 2 decimal places
50+
print()
51+
52+
# Let's find who got the highest score
53+
# We use index() to find where the highest_score is in our list
54+
top_student_index = scores.index(highest_score)
55+
top_student = students[top_student_index]
56+
print(f"🏆 Top student: {top_student} with a score of {highest_score}")
57+
print()
58+
59+
# Now let's organize this data in a more structured way
60+
# We'll use a dictionary - it pairs keys (student names) with values (scores)
61+
print("Student Scores (organized as key-value pairs):")
62+
print("-" * 50)
63+
64+
# Create a dictionary by pairing students with their scores
65+
student_scores = {}
66+
for i in range(len(students)):
67+
student_scores[students[i]] = scores[i]
68+
69+
# Display each student and their score
70+
for student, score in student_scores.items():
71+
# Add a special marker for the top student
72+
marker = "⭐" if student == top_student else " "
73+
print(f"{marker} {student}: {score} points")
74+
75+
print()
76+
print("=" * 50)
77+
print("Congratulations! You've completed your first data science program!")
78+
print("=" * 50)
79+
80+
# What did we just do?
81+
# 1. Created a simple dataset (student names and scores)
82+
# 2. Performed basic analysis (max, min, average)
83+
# 3. Found insights (who is the top student)
84+
# 4. Organized the data in a useful structure (dictionary)
85+
#
86+
# These are the fundamental building blocks of data science!
87+
# Next, you'll learn to work with real datasets using powerful libraries.

examples/02_loading_data.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
"""
2+
Loading and Exploring Data
3+
4+
In real data science projects, you'll work with data stored in files.
5+
This example shows you how to load data from a CSV file and explore it.
6+
7+
What you'll learn:
8+
- How to load data from a CSV file
9+
- How to view basic information about your dataset
10+
- How to display the first/last rows
11+
- How to get summary statistics
12+
13+
Prerequisites: pandas library (install with: pip install pandas)
14+
"""
15+
16+
# Import the pandas library - it's the most popular tool for working with data in Python
17+
# We give it the short name 'pd' so we can type less
18+
import pandas as pd
19+
20+
print("=" * 70)
21+
print("Welcome to Data Loading and Exploration!")
22+
print("=" * 70)
23+
print()
24+
25+
# Step 1: Load data from a CSV file
26+
# CSV stands for "Comma-Separated Values" - a common format for storing data
27+
# We'll use the birds dataset that comes with this repository
28+
print("📂 Loading data from birds.csv...")
29+
print()
30+
31+
# Load the data into a DataFrame (think of it as a smart spreadsheet)
32+
# A DataFrame is pandas' main data structure - it organizes data in rows and columns
33+
data = pd.read_csv('../data/birds.csv')
34+
35+
print("✅ Data loaded successfully!")
36+
print()
37+
38+
# Step 2: Get basic information about the dataset
39+
print("-" * 70)
40+
print("BASIC DATASET INFORMATION")
41+
print("-" * 70)
42+
43+
# How many rows and columns do we have?
44+
num_rows, num_columns = data.shape
45+
print(f"📊 Dataset size: {num_rows} rows × {num_columns} columns")
46+
print()
47+
48+
# What are the column names?
49+
print("📋 Column names:")
50+
for i, column in enumerate(data.columns, 1):
51+
print(f" {i}. {column}")
52+
print()
53+
54+
# Step 3: Look at the first few rows of data
55+
# This gives us a quick preview of what the data looks like
56+
print("-" * 70)
57+
print("FIRST 5 ROWS OF DATA (Preview)")
58+
print("-" * 70)
59+
print(data.head()) # head() shows the first 5 rows by default
60+
print()
61+
62+
# Step 4: Look at the last few rows
63+
print("-" * 70)
64+
print("LAST 3 ROWS OF DATA")
65+
print("-" * 70)
66+
print(data.tail(3)) # tail(3) shows the last 3 rows
67+
print()
68+
69+
# Step 5: Get information about data types
70+
print("-" * 70)
71+
print("DATA TYPES AND NON-NULL COUNTS")
72+
print("-" * 70)
73+
print(data.info()) # Shows column names, data types, and count of non-null values
74+
print()
75+
76+
# Step 6: Get statistical summary
77+
print("-" * 70)
78+
print("STATISTICAL SUMMARY (for numerical columns)")
79+
print("-" * 70)
80+
# describe() gives us statistics like mean, std, min, max, etc.
81+
print(data.describe())
82+
print()
83+
84+
# Step 7: Check for missing values
85+
print("-" * 70)
86+
print("MISSING VALUES CHECK")
87+
print("-" * 70)
88+
missing_values = data.isnull().sum()
89+
print("Number of missing values per column:")
90+
print(missing_values)
91+
print()
92+
93+
if missing_values.sum() == 0:
94+
print("✅ Great! No missing values found.")
95+
else:
96+
print("⚠️ Some columns have missing values. You may need to handle them.")
97+
print()
98+
99+
# Step 8: Get unique values in a column
100+
print("-" * 70)
101+
print("SAMPLE: UNIQUE VALUES")
102+
print("-" * 70)
103+
# Let's see what unique values exist in the first column
104+
first_column = data.columns[0]
105+
unique_count = data[first_column].nunique()
106+
print(f"The column '{first_column}' has {unique_count} unique value(s)")
107+
print()
108+
109+
# Summary
110+
print("=" * 70)
111+
print("SUMMARY")
112+
print("=" * 70)
113+
print("You've learned how to:")
114+
print(" ✓ Load data from a CSV file using pandas")
115+
print(" ✓ Check the size and shape of your dataset")
116+
print(" ✓ View the first and last rows")
117+
print(" ✓ Understand data types")
118+
print(" ✓ Get statistical summaries")
119+
print(" ✓ Check for missing values")
120+
print()
121+
print("Next step: Try loading other CSV files from the data/ folder!")
122+
print("=" * 70)
123+
124+
# Pro Tips:
125+
# - Always explore your data before analyzing it
126+
# - Check for missing values and understand why they might be missing
127+
# - Look at the data types to ensure they make sense
128+
# - Use head() and tail() to spot any obvious issues with your data

0 commit comments

Comments
 (0)