CS4050 Lab 1: Graph Representations at Scale

Place all of your submission documents in the ./lab1_submission_folder. You will submit a zip file of that directory in Canvas.

Overview

This lab explores the practical performance differences between adjacency matrix and adjacency list graph representations. Rather than just learning the theoretical complexities, you will:

Generate road networks at various scales (100 to 100,000+ nodes)
Implement algorithms using both representations
Measure actual memory usage and runtime performance
Experience where each representation succeeds and fails
Understand why representation choice matters in real systems

If you are curious about the use of OO, classes, polymorphism, and overloading of methods in the pythonic example, and its corrallaries in C, you can read up on that in This File.

Learning Objectives

By completing this lab, you will:

Understand O(V²) vs O(V+E) memory tradeoffs through direct experience
See why sparse graphs (like road networks) favor adjacency lists
Observe O(1) vs O(degree) edge lookup tradeoffs
Understand why O(V) vs O(degree) neighbor iteration affects algorithm choice
Develop intuition for choosing representations in real applications

Quick Start

# 1. Generate test networks
cd scripts
python3 generate_network.py --size tiny    # 100 nodes - baseline
python3 generate_network.py --size small   # 1,000 nodes
python3 generate_network.py --size medium  # 10,000 nodes - matrix starts struggling
python3 generate_network.py --size large   # 50,000 nodes - matrix ~10GB RAM!

# 2. Run experiments
cd ../python
python3 run_experiments.py

# 3. Or run individual benchmarks
python graph_representations.py data/medium_nodes.csv data/medium_edges.csv --benchmark

Network Sizes and Expected Behavior

Size	Nodes	Matrix Memory	Expected Behavior
tiny	100	~80 KB	No visible difference between representations
small	1,000	~8 MB	Minimal difference, good for debugging
medium	10,000	~800 MB	Matrix noticeably slower to load
large	50,000	~20 GB	Matrix takes minutes to allocate, algorithms slow
huge	100,000	~80 GB	Matrix will crash your 32GB machine!

The "huge" size is intentionally designed to demonstrate the failure mode of O(V²) memory scaling.

Repository Structure

cs3050-Lab-6/
├── data/                           # Generated network files
│   ├── tiny_nodes.csv
│   ├── tiny_edges.csv
│   ├── small_nodes.csv
│   └── ...
├── lab1_submission/                           # Generated network files
│   ├── .keep  # Ensures an empty directory is cloned locally
│   ├── <<Your Submission in Markdown Format (you can copy that section of the README.md to start)>>
│   └── ...
├── scripts/
│   └── generate_network.py         # Network data generator
├── python/
│   ├── graph_representations.py    # Core implementations
│   └── run_experiments.py          # Guided experiment runner
├── c/
│   └── (C implementations)
└── README.md

Understanding the Code

Graph Representations

Adjacency List (AdjacencyListGraph):

# Memory: O(V + E)
# Stores: dict of node_id -> list of (neighbor_id, weight)

self.adj_list = {
    0: [(1, 2.5), (3, 1.8)],  # Node 0 connects to 1 and 3
    1: [(0, 2.5), (2, 3.0)],  # Node 1 connects to 0 and 2
    ...
}

Adjacency Matrix (AdjacencyMatrixGraph):

# Memory: O(V²) regardless of edge count!
# Stores: 2D array where matrix[i][j] = weight if edge exists, 0 otherwise

self.matrix = [
    [0.0, 2.5, 0.0, 1.8],  # Node 0's edges
    [2.5, 0.0, 3.0, 0.0],  # Node 1's edges
    ...
]

Key Operations Comparison

Operation	Adjacency List	Adjacency Matrix
Memory	O(V + E)	O(V²)
Add edge	O(1)	O(1)
Check if edge exists	O(degree)	O(1) ← Matrix wins
Get all neighbors	O(degree) ← List wins	O(V)
Iterate all edges	O(E)	O(V²)

Why This Matters for Algorithms

Dijkstra's algorithm repeatedly calls get_neighbors():

With adjacency list: O(degree) per call → O((V+E) log V) total
With adjacency matrix: O(V) per call → O(V² log V) total

For a road network with V=50,000 nodes and average degree 6:

List-based Dijkstra: ~300,000 neighbor lookups
Matrix-based Dijkstra: ~2.5 billion cell scans!

Lab Exercises

Exercise 1: Memory Scaling Analysis

Run experiments on progressively larger networks and record:

Network	Nodes	List Memory	Matrix Memory	Ratio
tiny
small
medium
large

Questions:

What is the relationship between node count and matrix memory?
At what size does the matrix become impractical?
Predict the memory for 200,000 nodes. Would it fit in 32GB?

Exercise 2: Edge Query Benchmark

The matrix has O(1) edge lookup. Measure the speedup:

# Test code is in benchmark_edge_queries() # In the C program
# Run 10,000 random edge existence checks

Questions:

How much faster is matrix edge lookup?
Why is list lookup slower? Trace through the code.
In what applications would fast edge lookup matter?

Exercise 3: Algorithm Performance

Run Dijkstra's algorithm on both representations:

Network	List Dijkstra	Matrix Dijkstra	Slowdown
tiny
small
medium

Dijkstra's algorithm is in the c folder, and instructions for running it are in the README.md file in that directory.

Questions:

Why is matrix-based Dijkstra slower despite O(1) edge lookup?
Where in the algorithm does the slowdown occur?
For what graph density would matrix be faster?

Exercise 4: The Breaking Point (Demonstration)

WARNING: Make sure all other work on your computer is saved before initiating this!

Try loading the "huge" network with matrix representation:

python3 scripts/generate_network.py --size huge
python3 -c "from python/graph_representations import load_graph; load_graph('data/huge_nodes.csv', 'data/huge_edges.csv', use_matrix=True)"

Watch your system monitor. Document:

Memory usage before loading
Memory usage during matrix allocation
What happens when memory is exhausted

Exercise 5: Real-World Decision Making

Given these scenarios, choose the appropriate representation and justify:

Google Maps routing: ~50 million road intersections, avg degree 4
Social network analysis: 1 billion users, avg 200 friends
Circuit analysis: 10,000 components, each connected to 3 others
Dense communication matrix: 500 servers, all-to-all connections

Implementing Your Own Graph Class

As an additional challenge, implement a hybrid representation that:

Uses adjacency list for sparse regions
Uses matrix blocks for dense subgraphs
Automatically chooses based on local density

class HybridGraph(Graph):
    def __init__(self, density_threshold=0.1):
        # Your implementation here
        pass

Submission Requirements

Place all of your submission documents in the ./lab1_submission_folder. You will submit a zip file of that directory in Canvas.

Experimental data from all exercises (CSV or screenshots)
Written answers to all questions (~1-2 paragraphs each)
Analysis of when each representation is appropriate
Code for any modifications you made
Reflection on what surprised you in the experiments

Additional Resources

Troubleshooting

"Matrix would require X GB": Working as intended! This demonstrates why O(V²) doesn't scale.

Out of memory: Kill the process with Ctrl+C. Use --size medium or smaller.

Slow matrix allocation: For 50,000 nodes, allocation takes 30-60 seconds. Be patient.

Different results each run: Shortest path benchmarks use random endpoints. Set random.seed(42) for reproducibility.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
c		c
data		data
lab1_submission		lab1_submission
python		python
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
polymorphism-overloading.md		polymorphism-overloading.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS4050 Lab 1: Graph Representations at Scale

Overview

Learning Objectives

Quick Start

Network Sizes and Expected Behavior

Repository Structure

Understanding the Code

Graph Representations

Key Operations Comparison

Why This Matters for Algorithms

Lab Exercises

Exercise 1: Memory Scaling Analysis

Exercise 2: Edge Query Benchmark

Exercise 3: Algorithm Performance

Exercise 4: The Breaking Point (Demonstration)

Exercise 5: Real-World Decision Making

Implementing Your Own Graph Class

Submission Requirements

Additional Resources

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

License

angrynarwhal/4050-Lab-1

Folders and files

Latest commit

History

Repository files navigation

CS4050 Lab 1: Graph Representations at Scale

Overview

Learning Objectives

Quick Start

Network Sizes and Expected Behavior

Repository Structure

Understanding the Code

Graph Representations

Key Operations Comparison

Why This Matters for Algorithms

Lab Exercises

Exercise 1: Memory Scaling Analysis

Exercise 2: Edge Query Benchmark

Exercise 3: Algorithm Performance

Exercise 4: The Breaking Point (Demonstration)

Exercise 5: Real-World Decision Making

Implementing Your Own Graph Class

Submission Requirements

Additional Resources

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages