diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index 302a0a0b..b1121ed3 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -21,6 +21,7 @@ parts:
- file: matplotlib
- file: scipy
- file: pandas
+ - file: polars
- file: pandas_panel
- file: sympy
- caption: High Performance Computing
diff --git a/lectures/pandas.md b/lectures/pandas.md
index 3d2c809d..79c07d11 100644
--- a/lectures/pandas.md
+++ b/lectures/pandas.md
@@ -78,6 +78,7 @@ You can think of a `Series` as a "column" of data, such as a collection of obser
A `DataFrame` is a two-dimensional object for storing related columns of data.
+(pandas:series)=
## Series
```{index} single: Pandas; Series
diff --git a/lectures/polars.md b/lectures/polars.md
new file mode 100644
index 00000000..842f8d41
--- /dev/null
+++ b/lectures/polars.md
@@ -0,0 +1,985 @@
+---
+jupytext:
+ text_representation:
+ extension: .md
+ format_name: myst
+ format_version: 0.13
+ jupytext_version: 1.16.7
+kernelspec:
+ display_name: Python 3 (ipykernel)
+ language: python
+ name: python3
+---
+
+(pl)=
+```{raw} jupyter
+
+```
+
+# {index}`Polars `
+
+```{index} single: Python; Polars
+```
+
+In addition to what's in Anaconda, this lecture will need the following libraries:
+
+```{code-cell} ipython3
+:tags: [hide-output]
+
+!pip install --upgrade polars wbgapi yfinance pyarrow
+```
+
+## Overview
+
+[Polars](https://pola.rs/) is a fast data manipulation library for Python written in Rust.
+
+Polars has gained significant popularity in recent years due to its superior performance compared to traditional data analysis tools.
+
+This makes it an excellent choice for modern data science and machine learning workflows.
+
+Polars is designed with performance and memory efficiency in mind, leveraging:
+
+* Arrow's columnar memory format for fast data access
+* Lazy evaluation to optimize query execution
+* Parallel processing for enhanced performance
+* Expressive API similar to pandas but with better performance characteristics
+
+Just as [NumPy](https://numpy.org/) provides the basic array data type plus core array operations, Polars
+
+1. defines fundamental structures for working with data and
+1. endows them with methods that facilitate operations such as
+ * reading in data
+ * adjusting indices
+ * working with dates and time series
+ * sorting, grouping, re-ordering and general data munging [^mung]
+ * dealing with missing values, etc.
+
+More sophisticated statistical functionality is left to other packages, such as [statsmodels](https://www.statsmodels.org/) and [scikit-learn](https://scikit-learn.org/), which can work with Polars DataFrames through their interoperability with pandas.
+
+This lecture will provide a basic introduction to Polars.
+
+```{tip}
+*Why use Polars over pandas?* One reason is *performance*: as a general rule, it is recommended to have 5 to 10 times as much RAM as the size of the dataset to carry out operations in pandas, compared to 2 to 4 times needed for Polars; in addition, Polars is between 10 and 100 times as fast as pandas for common operations; a great article comparing Polars and pandas can be found [in this JetBrains blog post](https://blog.jetbrains.com/pycharm/2024/07/polars-vs-pandas/).
+```
+
+Throughout the lecture, we will assume that the following imports have taken place
+
+```{code-cell} ipython3
+import polars as pl
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+```
+
+Two important data types defined by Polars are `Series` and `DataFrame`.
+
+You can think of a `Series` as a "column" of data, such as a collection of observations on a single variable.
+
+A `DataFrame` is a two-dimensional object for storing related columns of data.
+
+## Series
+
+```{index} single: Polars; Series
+```
+
+Let's start with Series.
+
+We begin by creating a series of four random observations
+
+```{code-cell} ipython3
+s = pl.Series(name='daily returns', values=np.random.randn(4))
+s
+```
+
+```{note}
+You may notice the above series has no indices, unlike in [pd.Series](pandas:series); this is because Polars' is column centric and accessing data is predominantly managed through filtering and boolean masks; here is [an interesting blog post discussing this in more detail](https://medium.com/data-science/understand-polars-lack-of-indexes-526ea75e413).
+```
+
+Polars `Series` are built on top of Apache Arrow arrays and support many similar operations to Pandas `Series`.
+
+(For interested readers, please see this extended reading on [Apache Arrow](https://www.datacamp.com/tutorial/apache-arrow))
+
+```{code-cell} ipython3
+s * 100
+```
+
+```{code-cell} ipython3
+s.abs()
+```
+
+But `Series` provide more than basic arrays.
+
+For example, they have some additional (statistically oriented) methods
+
+```{code-cell} ipython3
+s.describe()
+```
+
+However, the `pl.Series` object cannot be used in the same way as a `pd.Series` when pairing data with indices.
+
+For example, using a `pd.Series` you can do the following:
+
+```{code-cell} ipython3
+s = pd.Series(np.random.randn(4), name='daily returns')
+s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']
+s
+```
+
+However, in Polars you will need to use the `DataFrame` object to do the same task.
+
+This means you will use the `DataFrame` object more often when using Polars if you are interested in relationships between data.
+
+Let's create a `pl.DataFrame` containing the equivalent data in the `pd.Series`.
+
+```{code-cell} ipython3
+df = pl.DataFrame({
+ 'company': ['AMZN', 'AAPL', 'MSFT', 'GOOG'],
+ 'daily returns': s.to_list()
+})
+df
+```
+
+To access specific values by company name, we can filter the DataFrame for the `AMZN` ticker code and select the `daily returns`.
+
+```{code-cell} ipython3
+df.filter(pl.col('company') == 'AMZN').select('daily returns').item()
+```
+
+If we want to update the `AMZN` return to 0, you can use the following chain of methods.
+
+Here `with_columns` is similar to `select` but adds columns to the same `DataFrame`
+
+```{code-cell} ipython3
+df = df.with_columns(
+ pl.when(pl.col('company') == 'AMZN') # filter for AMZN in company column
+ .then(0) # set values to 0
+ .otherwise(pl.col('daily returns')) # otherwise keep original value
+ .alias('daily returns') # assign back to the column
+)
+df
+```
+
+You can check if a ticker code is in the company list
+
+```{code-cell} ipython3
+'AAPL' in df['company']
+```
+
+## DataFrames
+
+```{index} single: Polars; DataFrames
+```
+
+While a `Series` is a single column of data, a `DataFrame` is several columns, one for each variable.
+
+In essence, a `DataFrame` in Polars is analogous to a (highly optimized) Excel spreadsheet.
+
+Thus, it is a powerful tool for representing and analyzing data that are naturally organized into rows and columns.
+
+Let's look at an example that reads data from the CSV file `pandas/data/test_pwt.csv`, which is taken from the [Penn World Tables](https://www.rug.nl/ggdc/productivity/pwt/pwt-releases/pwt-7.0).
+
+The dataset contains the following indicators:
+
+| Variable Name | Description |
+| :-: | :-: |
+| POP | Population (in thousands) |
+| XRAT | Exchange Rate to US Dollar |
+| tcgdp | Total PPP Converted GDP (in million international dollar) |
+| cc | Consumption Share of PPP Converted GDP Per Capita (%) |
+| cg | Government Consumption Share of PPP Converted GDP Per Capita (%) |
+
+
+We'll read this in from a URL using the Polars function `read_csv`.
+
+```{code-cell} ipython3
+URL = ('https://raw.githubusercontent.com/QuantEcon/'
+ 'lecture-python-programming/master/source/_static/'
+ 'lecture_specific/pandas/data/test_pwt.csv')
+df = pl.read_csv(URL)
+type(df)
+```
+
+Here is the content of `test_pwt.csv`
+
+```{code-cell} ipython3
+df
+```
+
+### Select data by position
+
+In practice, one thing that we do all the time is to find, select and work with a
+subset of the data of our interests.
+
+We can select particular rows using array slicing notation
+
+```{code-cell} ipython3
+df[2:5]
+```
+
+To select columns, we can pass a list containing the names of the desired columns
+
+```{code-cell} ipython3
+df.select(['country', 'tcgdp'])
+```
+
+To select both rows and columns using integers, we can combine slicing with column selection
+
+```{code-cell} ipython3
+df[2:5].select(df.columns[0:4])
+```
+
+To select rows and columns using a mixture of integers and labels, we can use more complex selection
+
+```{code-cell} ipython3
+df[2:5].select(['country', 'tcgdp'])
+```
+
+### Select data by conditions
+
+Instead of indexing rows and columns using integers and names, we can also obtain a sub-dataframe of our interests that satisfies certain (potentially complicated) conditions.
+
+This section demonstrates various ways to do that.
+
+The most straightforward way is with the `filter` method.
+
+```{code-cell} ipython3
+df.filter(pl.col('POP') >= 20000)
+```
+
+In this case, `df.filter()` takes a boolean expression and only returns rows with the `True` values.
+
+We can view this boolean mask as a table with the alias `meets_criteria`
+
+```{code-cell} ipython3
+df.select(
+ pl.col('country'),
+ (pl.col('POP') >= 20000).alias('meets_criteria')
+)
+```
+
+Here is another example:
+
+```{code-cell} ipython3
+df.filter(
+ (pl.col('country').is_in(['Argentina', 'India', 'South Africa'])) &
+ (pl.col('POP') > 40000)
+)
+```
+
+We can also allow arithmetic operations between different columns.
+
+```{code-cell} ipython3
+df.filter(
+ (pl.col('cc') + pl.col('cg') >= 80) & (pl.col('POP') <= 20000)
+)
+```
+
+For example, we can use the condition to select the country with the largest
+household consumption–GDP share `cc`.
+
+```{code-cell} ipython3
+df.filter(pl.col('cc') == pl.col('cc').max())
+```
+
+When we only want to look at certain columns of a selected sub-DataFrame, we can combine filter with select.
+
+```{code-cell} ipython3
+df.filter(
+ (pl.col('cc') + pl.col('cg') >= 80) & (pl.col('POP') <= 20000)
+ ).select(['country', 'year', 'POP']
+)
+```
+
+**Application: Subsetting DataFrame**
+
+Real-world datasets can be very large.
+
+It is sometimes desirable to work with a subset of data to enhance computational efficiency and reduce redundancy.
+
+Let's imagine that we're only interested in the population (`POP`) and total GDP (`tcgdp`).
+
+One way to strip the data frame `df` down to only these variables is to overwrite the `DataFrame` using the selection method described above
+
+```{code-cell} ipython3
+df_subset = df.select(['country', 'POP', 'tcgdp'])
+df_subset
+```
+
+We can then save the smaller dataset for further analysis.
+
+```{code-block} python3
+:class: no-execute
+
+df_subset.write_csv('pwt_subset.csv')
+```
+
+### Apply and map operations
+
+Polars provides powerful methods for applying functions to data.
+
+Instead of pandas' `apply` method, Polars uses expressions within `select`, `with_columns`, or `filter` methods.
+
+Here is an example using built-in functions to find the `max` value for each column
+
+```{code-cell} ipython3
+df.select([
+ pl.col(['year', 'POP', 'XRAT', 'tcgdp', 'cc', 'cg'])
+ .max()
+ .name.suffix('_max')
+])
+```
+
+For more complex operations, we can use `map_elements` (similar to pandas' apply):
+
+```{code-cell} ipython3
+df.select([
+ pl.col('country'),
+ pl.col('POP').map_elements(lambda x: x * 2).alias('POP_doubled')
+])
+```
+
+However as you can see from the warning issued by Polars there is often a better way to achieve this using the Polars API.
+
+```{code-cell} ipython3
+df.select([
+ pl.col('country'),
+ (pl.col('POP') * 2).alias('POP_doubled')
+])
+```
+
+We can use complex filtering conditions with boolean logic:
+
+```{code-cell} ipython3
+complex_condition = (
+ pl.when(pl.col('country').is_in(['Argentina', 'India', 'South Africa']))
+ .then(pl.col('POP') > 40000)
+ .otherwise(pl.col('POP') < 20000)
+)
+
+df.filter(complex_condition).select([
+ 'country', 'year', 'POP', 'XRAT', 'tcgdp'
+])
+```
+
+### Make changes in DataFrames
+
+The ability to make changes in DataFrames is important to generate a clean dataset for future analysis.
+
+**1.** We can use conditional logic to "keep" certain values and replace others
+
+```{code-cell} ipython3
+df.with_columns(
+ pl.when(pl.col('POP') >= 20000) # when population >= 20000
+ .then(pl.col('POP')) # keep the population value
+ .otherwise(None) # otherwise set to null
+ .alias('POP_filtered') # save results in POP_filtered
+).select(['country', 'POP', 'POP_filtered']) # select the columns
+```
+
+**2.** We can modify specific values based on conditions
+
+```{code-cell} ipython3
+df_modified = df.with_columns(
+ pl.when(pl.col('cg') == pl.col('cg').max()) # pick the largest cg value
+ .then(None) # set to null
+ .otherwise(pl.col('cg')) # otherwise keep the value
+ .alias('cg') # update the column
+)
+df_modified
+```
+
+**3.** We can use expressions to modify columns as a whole
+
+```{code-cell} ipython3
+df.with_columns([
+ pl.when(pl.col('POP') <= 10000) # when population is < 10,000
+ .then(None) # set the value to null
+ .otherwise(pl.col('POP')) # otherwise keep existing value
+ .alias('POP'), # update the POP column
+ (pl.col('XRAT') / 10).alias('XRAT') # update XRAT in-place
+])
+```
+
+**4.** We can use in-built functions to modify all individual entries in specific columns by data type.
+
+```{code-cell} ipython3
+df.with_columns([
+ pl.col(pl.Float64).round(2) # round all Float64 columns
+])
+```
+
+**Application: Missing Value Imputation**
+
+Replacing missing values is an important step in data munging.
+
+Let's randomly insert some null values
+
+```{code-cell} ipython3
+# Create a copy with some null values
+df_with_nulls = df.clone()
+
+# Set some specific positions to null
+indices_to_null = [(0, 'XRAT'), (3, 'cc'), (5, 'tcgdp'), (6, 'POP')]
+
+for row_idx, col_name in indices_to_null:
+ df_with_nulls = df_with_nulls.with_columns(
+ pl.when(pl.int_range(pl.len()) == row_idx)
+ .then(None)
+ .otherwise(pl.col(col_name))
+ .alias(col_name)
+ )
+
+df_with_nulls
+```
+
+We can replace all missing values with 0
+
+```{code-cell} ipython3
+df_with_nulls.fill_null(0)
+```
+
+Polars also provides us with convenient methods to replace missing values.
+
+For example, we can use forward fill, backward fill, or interpolation
+
+Here we fill `null` values with the column means
+
+```{code-cell} ipython3
+cols = ["cc", "tcgdp", "POP", "XRAT"]
+df_with_nulls.with_columns([
+ pl.col(cols).fill_null(pl.col(cols).mean())
+])
+```
+
+Missing value imputation is a big area in data science involving various machine learning techniques.
+
+There are also more [advanced tools](https://scikit-learn.org/stable/modules/impute.html) in Python to impute missing values.
+
+### Standardization and visualization
+
+Let's imagine that we're only interested in the population (`POP`) and total GDP (`tcgdp`).
+
+One way to strip the data frame `df` down to only these variables is to overwrite the `DataFrame` using the selection method described above
+
+```{code-cell} ipython3
+df = df.select(['country', 'POP', 'tcgdp'])
+df
+```
+
+While Polars doesn't have a traditional index like pandas, we can work with country names directly
+
+```{code-cell} ipython3
+df
+```
+
+Let's give the columns slightly better names
+
+```{code-cell} ipython3
+df = df.rename({'POP': 'population', 'tcgdp': 'total GDP'})
+df
+```
+
+The `population` variable is in thousands, let's revert to single units
+
+```{code-cell} ipython3
+df = df.with_columns((pl.col('population') * 1e3).alias('population'))
+df
+```
+
+Next, we're going to add a column showing real GDP per capita, multiplying by 1,000,000 as we go because total GDP is in millions.
+
+```{note}
+Polars (or Pandas) doesn't have a way of recording dimensional analysis units such as GDP represented in millions of dollars. This is left to the user to ensure they track their own units when undertaking analysis.
+```
+
+```{code-cell} ipython3
+df = df.with_columns(
+ (pl.col('total GDP') * 1e6 / pl.col('population')).alias('GDP percap')
+)
+df
+```
+
+One of the nice things about Polars `DataFrame` and `Series` objects is that they can be easily converted to pandas for visualization through Matplotlib.
+
+For example, we can easily generate a bar plot of GDP per capita
+
+```{code-cell} ipython3
+# Convert to pandas for plotting
+df_pandas = df.to_pandas().set_index('country')
+ax = df_pandas['GDP percap'].plot(kind='bar')
+ax.set_xlabel('country', fontsize=12)
+ax.set_ylabel('GDP per capita', fontsize=12)
+plt.show()
+```
+
+At the moment the data frame is ordered alphabetically on the countries---let's change it to GDP per capita
+
+```{code-cell} ipython3
+df = df.sort('GDP percap', descending=True)
+df
+```
+
+Plotting as before now yields
+
+```{code-cell} ipython3
+# Convert to pandas for plotting
+df_pandas = df.to_pandas().set_index('country')
+ax = df_pandas['GDP percap'].plot(kind='bar')
+ax.set_xlabel('country', fontsize=12)
+ax.set_ylabel('GDP per capita', fontsize=12)
+plt.show()
+```
+
+## Lazy evaluation
+
+```{index} single: Polars; Lazy Evaluation
+```
+
+One of Polars' most powerful features is **lazy evaluation**. This allows Polars to optimize your entire query before executing it, leading to significant performance improvements.
+
+### Eager vs lazy APIs
+
+Polars provides two APIs:
+
+1. **Eager API** - Operations are executed immediately (like pandas)
+2. **Lazy API** - Operations are collected and optimized before execution
+
+Let's see the difference using our dataset:
+
+```{code-cell} ipython3
+# First, let's reload our original dataset for this example
+URL = ('https://raw.githubusercontent.com/QuantEcon/'
+ 'lecture-python-programming/master/source/_static/'
+ 'lecture_specific/pandas/data/test_pwt.csv')
+df_full = pl.read_csv(URL)
+
+# Eager API (executed immediately)
+result_eager = (df_full
+ .filter(pl.col('tcgdp') > 1000)
+ .select(['country', 'year', 'tcgdp'])
+ .sort('tcgdp', descending=True)
+)
+print("Eager result shape:", result_eager.shape)
+result_eager.head()
+```
+
+```{code-cell} ipython3
+# Lazy API (builds a query plan)
+lazy_query = (df_full.lazy() # Convert to lazy frame
+ .filter(pl.col('tcgdp') > 1000)
+ .select(['country', 'year', 'tcgdp'])
+ .sort('tcgdp', descending=True)
+)
+
+print("Lazy query:")
+print(lazy_query)
+```
+
+We can now execute the lazy query using `collect`:
+
+```{code-cell} ipython3
+result_lazy = lazy_query.collect()
+print("Lazy result shape:", result_lazy.shape)
+result_lazy.head()
+```
+
+### Query optimization
+
+The lazy API allows Polars to perform several optimizations:
+
+1. **Predicate Pushdown** - Filters are applied as early as possible
+2. **Projection Pushdown** - Only required columns are read
+3. **Common Subexpression Elimination** - Duplicate calculations are removed
+4. **Dead Code Elimination** - Unused operations are removed
+
+```{code-cell} ipython3
+# Example of optimization - only columns needed are processed
+optimized_query = (df_full.lazy()
+ .select(['country', 'year', 'tcgdp', 'POP']) # Select early
+ .filter(pl.col('tcgdp') > 500) # Filter pushdown
+ .with_columns((pl.col('tcgdp') / pl.col('POP')).alias('gdp_per_capita'))
+ .filter(pl.col('gdp_per_capita') > 10) # Additional filter
+ .select(['country', 'year', 'gdp_per_capita']) # Final projection
+)
+
+print("Optimized query plan:")
+print(optimized_query.explain())
+```
+
+```{code-cell} ipython3
+# Execute the optimized query
+result_optimized = optimized_query.collect()
+result_optimized.head()
+```
+
+### When to use lazy vs eager
+
+**Use Lazy API when:**
+- Working with large datasets
+- Performing complex transformations
+- Building data pipelines
+- Performance is critical
+
+**Use Eager API when:**
+- Exploring data interactively
+- Working with small datasets
+- Need immediate results for debugging
+
+The lazy API is particularly powerful for data processing pipelines where multiple transformations can be optimized together as a single operation.
+
+## Online data sources
+
+```{index} single: Data Sources
+```
+
+Python makes it straightforward to query online databases programmatically.
+
+An important database for economists is [FRED](https://fred.stlouisfed.org/) --- a vast collection of time series data maintained by the St. Louis Fed.
+
+For example, suppose that we are interested in the [unemployment rate](https://fred.stlouisfed.org/series/UNRATE).
+
+(To download the data as a csv, click on the top right `Download` and select the `CSV (data)` option).
+
+Alternatively, we can access the CSV file from within a Python program.
+
+
+In {doc}`pandas`, we studied how to use `requests` and `pandas` to access API data.
+
+Here Polars' `read_csv` function provides the same functionality.
+
+We use `try_parse_dates=True` so that Polars recognizes our dates column
+
+```{code-cell} ipython3
+url = ('https://fred.stlouisfed.org/graph/fredgraph.csv?'
+ 'bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&'
+ 'graph_bgcolor=%23ffffff&height=450&mode=fred&'
+ 'recession_bars=on&txtcolor=%23444444&ts=12&tts=12&'
+ 'width=1318&nt=0&thu=0&trc=0&show_legend=yes&'
+ 'show_axis_titles=yes&show_tooltip=yes&id=UNRATE&scale=left&'
+ 'cosd=1948-01-01&coed=2024-06-01&line_color=%234572a7&'
+ 'link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&'
+ 'ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&'
+ 'fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&'
+ 'vintage_date=2024-07-29&revision_date=2024-07-29&'
+ 'nd=1948-01-01')
+data = pl.read_csv(url, try_parse_dates=True)
+```
+
+The data has been read into a Polars DataFrame called `data` that we can now manipulate in the usual way
+
+```{code-cell} ipython3
+type(data)
+```
+
+```{code-cell} ipython3
+data.head() # A useful method to get a quick look at a DataFrame
+```
+
+```{code-cell} ipython3
+data.describe() # Your output might differ slightly
+```
+
+We can also plot the unemployment rate from 2006 to 2012 as follows:
+
+```{code-cell} ipython3
+# Filter data for the specified date range and convert to pandas for plotting
+filtered_data = data.filter(
+ (pl.col('observation_date') >= pl.date(2006, 1, 1)) &
+ (pl.col('observation_date') <= pl.date(2012, 12, 31))
+).to_pandas().set_index('observation_date')
+
+ax = filtered_data.plot(title='US Unemployment Rate', legend=False)
+ax.set_xlabel('year', fontsize=12)
+ax.set_ylabel('%', fontsize=12)
+plt.show()
+```
+
+Note that Polars offers many other file type alternatives.
+
+Polars has [a wide variety](https://docs.pola.rs/user-guide/io/) of methods that we can use to read excel, json, parquet or plug straight into a database server.
+
+## Exercises
+
+```{exercise-start}
+:label: pl_ex1
+```
+
+With these imports:
+
+```{code-cell} ipython3
+import datetime as dt
+import yfinance as yf
+```
+
+Write a program to calculate the percentage price change over 2021 for the following shares using Polars:
+
+```{code-cell} ipython3
+ticker_list = {'INTC': 'Intel',
+ 'MSFT': 'Microsoft',
+ 'IBM': 'IBM',
+ 'BHP': 'BHP',
+ 'TM': 'Toyota',
+ 'AAPL': 'Apple',
+ 'AMZN': 'Amazon',
+ 'C': 'Citigroup',
+ 'QCOM': 'Qualcomm',
+ 'KO': 'Coca-Cola',
+ 'GOOG': 'Google'}
+```
+
+Here's the first part of the program that reads data into a Polars DataFrame:
+
+```{code-cell} ipython3
+def read_data_polars(ticker_list,
+ start=dt.datetime(2021, 1, 1),
+ end=dt.datetime(2021, 12, 31)):
+ """
+ This function reads in closing price data from Yahoo
+ for each tick in the ticker_list and returns a Polars DataFrame.
+ Different indices may have different trading days, so we use joins
+ to handle this.
+ """
+ dataframes = []
+
+ for tick in ticker_list:
+ stock = yf.Ticker(tick)
+ prices = stock.history(start=start, end=end)
+
+ # Create a Polars DataFrame from the closing prices
+ df = pl.DataFrame({
+ 'Date': pd.to_datetime(prices.index.date),
+ tick: prices['Close'].values
+ })
+ dataframes.append(df)
+
+ # Start with the first DataFrame
+ result = dataframes[0]
+
+ # Join additional DataFrames, handling mismatched dates with full outer join
+ for df in dataframes[1:]:
+ result = result.join(df, on='Date', how='full', coalesce=True)
+
+ return result
+
+ticker = read_data_polars(ticker_list)
+```
+
+Complete the program to plot the result as a bar graph using Polars operations and matplotlib visualization.
+
+```{exercise-end}
+```
+
+```{solution-start} pl_ex1
+:class: dropdown
+```
+
+Here's a solution using Polars operations to calculate percentage changes:
+
+
+```{code-cell} ipython3
+price_change_df = ticker.select([
+ (pl.col(tick).last() / pl.col(tick).first() * 100 - 100).alias(tick)
+ for tick in ticker_list.keys()
+]).transpose(
+ include_header=True,
+ header_name='ticker',
+ column_names=['pct_change']
+)
+
+# Add company names and sort
+price_change_df = price_change_df.with_columns([
+ pl.col('ticker')
+ .replace_strict(ticker_list, default=pl.col('ticker'))
+ .alias('company')
+]).sort('pct_change')
+
+print(price_change_df)
+```
+
+Now plot the results:
+
+```{code-cell} ipython3
+# Convert to pandas for plotting (as demonstrated in the lecture)
+df_pandas = price_change_df.to_pandas().set_index('company')
+
+fig, ax = plt.subplots(figsize=(10,8))
+ax.set_xlabel('stock', fontsize=12)
+ax.set_ylabel('percentage change in price', fontsize=12)
+
+# Create colors: red for negative returns, green for positive returns
+colors = ['red' if x < 0 else 'blue' for x in df_pandas['pct_change']]
+df_pandas['pct_change'].plot(kind='bar', ax=ax, color=colors)
+
+plt.xticks(rotation=45)
+plt.tight_layout()
+plt.show()
+```
+
+```{solution-end}
+```
+
+
+```{exercise-start}
+:label: pl_ex2
+```
+
+Using the method `read_data_polars` introduced in {ref}`pl_ex1`, write a program to obtain year-on-year percentage change for the following indices using Polars operations:
+
+```{code-cell} ipython3
+indices_list = {'^GSPC': 'S&P 500',
+ '^IXIC': 'NASDAQ',
+ '^DJI': 'Dow Jones',
+ '^N225': 'Nikkei'}
+```
+
+Complete the program to show summary statistics and plot the result as a time series graph demonstrating Polars' data manipulation capabilities.
+
+```{exercise-end}
+```
+
+```{solution-start} pl_ex2
+:class: dropdown
+```
+
+Following the work you did in {ref}`pl_ex1`, you can query the data using `read_data_polars` by updating the start and end dates accordingly.
+
+```{code-cell} ipython3
+indices_data = read_data_polars(
+ indices_list,
+ start=dt.datetime(2000, 1, 1),
+ end=dt.datetime(2021, 12, 31)
+)
+
+# Add year column for grouping
+indices_data = indices_data.with_columns(
+ pl.col('Date').dt.year().alias('year')
+)
+
+print("Data shape:", indices_data.shape)
+print("\nFirst few rows:")
+print(indices_data.head())
+print("\nData availability check:")
+for index in indices_list.keys():
+ non_null_count = (indices_data
+ .select(pl.col(index).is_not_null().sum())
+ .item())
+ print(f"{indices_list[index]}: {non_null_count} non-null values")
+```
+
+Calculate yearly returns using Polars groupby operations:
+
+```{code-cell} ipython3
+# Calculate first and last valid price for each year and each index
+yearly_returns = indices_data.group_by('year').agg([
+ *[pl.col(index)
+ .filter(pl.col(index).is_not_null())
+ .first()
+ .alias(f"{index}_first") for index in indices_list.keys()],
+ *[pl.col(index)
+ .filter(pl.col(index).is_not_null())
+ .last()
+ .alias(f"{index}_last") for index in indices_list.keys()]
+])
+
+# Calculate percentage returns for each index, handling null values properly
+return_columns = []
+for index in indices_list.keys():
+ company_name = indices_list[index]
+ return_col = (
+ (pl.col(f"{index}_last") - pl.col(f"{index}_first")) /
+ pl.col(f"{index}_first") * 100
+ ).alias(company_name)
+ return_columns.append(return_col)
+
+yearly_returns = yearly_returns.with_columns(return_columns)
+
+# Select only the year and return columns, filter out years with insufficient data
+yearly_returns = yearly_returns.select([
+ 'year',
+ *list(indices_list.values())
+]).filter(
+ pl.col('year') >= 2001 # Ensure we have complete years of data
+).sort('year')
+
+print("Yearly returns shape:", yearly_returns.shape)
+print("\nYearly returns:")
+print(yearly_returns.head(10))
+```
+
+Generate summary statistics using Polars:
+
+```{code-cell} ipython3
+# Summary statistics for all indices
+summary_stats = yearly_returns.select(list(indices_list.values())).describe()
+print("Summary Statistics:")
+print(summary_stats)
+
+# Check for any null values or data issues
+print(f"\nData shape: {yearly_returns.shape}")
+print(f"Null counts:")
+print(yearly_returns.null_count())
+print(f"\nData range (first few years):")
+print(yearly_returns.head())
+```
+
+Plot the time series:
+
+```{code-cell} ipython3
+# Convert to pandas for plotting
+df_pandas = yearly_returns.to_pandas().set_index('year')
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 10))
+
+# Flatten 2-D array to 1-D array
+for iter_, ax in enumerate(axes.flatten()):
+ if iter_ < len(indices_list):
+
+ # Get index name per iteration
+ index_name = list(indices_list.values())[iter_]
+
+ # Plot with markers and lines for better visibility
+ ax.plot(df_pandas.index, df_pandas[index_name], 'o-',
+ linewidth=2, markersize=4)
+ ax.set_ylabel("yearly return", fontsize=12)
+ ax.set_xlabel("year", fontsize=12)
+ ax.set_title(index_name, fontsize=12)
+ ax.grid(True, alpha=0.3)
+
+ # Add horizontal line at zero for reference
+ ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
+
+plt.tight_layout()
+plt.show()
+```
+
+Alternative: Create a single plot with all indices:
+
+```{code-cell} ipython3
+# Single plot with all indices
+fig, ax = plt.subplots(figsize=(12, 8))
+
+for index_name in indices_list.values():
+ # Only plot if the column has valid data
+ if (index_name in df_pandas.columns and
+ not df_pandas[index_name].isna().all()):
+ ax.plot(df_pandas.index, df_pandas[index_name],
+ label=index_name, linewidth=2, marker='o', markersize=3)
+
+ax.set_xlabel("year", fontsize=12)
+ax.set_ylabel("yearly return (%)", fontsize=12)
+ax.set_title("Yearly Returns of Major Stock Indices (2001-2021)", fontsize=14)
+ax.legend()
+ax.grid(True, alpha=0.3)
+ax.axhline(y=0, color='k', linestyle='--', alpha=0.5, label='Zero line')
+plt.tight_layout()
+plt.show()
+```
+
+```{solution-end}
+```
+
+[^mung]: Wikipedia defines munging as cleaning data from one raw form into a structured, purged one.