diff --git a/lectures/_static/lecture_specific/about_py/pytorch_vs_matlab.png b/lectures/_static/lecture_specific/about_py/pytorch_vs_matlab.png index 7230ad09..733e6d60 100644 Binary files a/lectures/_static/lecture_specific/about_py/pytorch_vs_matlab.png and b/lectures/_static/lecture_specific/about_py/pytorch_vs_matlab.png differ diff --git a/lectures/about_py.md b/lectures/about_py.md index a4ae3370..61c450b0 100644 --- a/lectures/about_py.md +++ b/lectures/about_py.md @@ -34,52 +34,58 @@ into R." -- Chris Wiggins This lecture series will teach you to use Python for scientific computing, with a focus on economics and finance. -The series is aimed at Python novices, although experienced users will also find useful content in later lectures. +The series is aimed at Python novices, although experienced users will also find +useful content in later lectures. In this lecture we will * introduce Python, * showcase some of its abilities, -* discuss the connection between Python and AI, * explain why Python is our favorite language for scientific computing, and * point you to the next steps. You do **not** need to understand everything you see in this lecture -- we will work through the details slowly later in the lecture series. -### Can't I Just Use ChatGPT? +### Can't I Just Use LLMs? No! -It's tempting to think that in the age of AI we don't need to learn how to code. +Of course it's tempting to think that in the age of AI we don't need to learn how to code. -And it's true that AIs like [ChatGPT](https://chatgpt.com/) and other LLMs are wonderful productivity tools for coders. +And yes, we like to be lazy too sometimes. -In fact an AI can be a great companion for these lectures -- try copy-pasting some code from this series and ask the AI to explain it to you. +In addition, we agree that AIs are outstanding productivity tools for coders. -AIs will certainly help you write pieces of code that you can combine. +But AIs cannot reliably solve new problems that they haven't seen before. -But AIs cannot completely and reliably solve a new problem that they haven't seen before! +You will need to be the architect and the supervisor -- and for these tasks you need to +be able to read, write, and understand computer code. -You will need to be the supervisor -- and for that you need to be able to read, write, and understand computer code. +Having said that, a good LLM is a useful companion for these lectures -- try copy-pasting some +code from this series and asking for an explanation. ### Isn't MATLAB Better? No, no, and one hundred times no. -For almost all modern problems, Python's scientific libraries are now far in advance of MATLAB's capabilities. +Nirvana was great (and Soundgarden [was better](https://www.youtube.com/watch?v=3mbBbFH9fAg&list=RD3mbBbFH9fAg)) but +it's time to move on from the '90s. -We will explain the benefits of Python's libraries throughout this lecture -series, as well as in our later series on [JAX](https://jax.quantecon.org/intro.html). +For most modern problems, Python's scientific libraries are now far in advance of MATLAB's capabilities. + +This is particularly the case in fast-growing fields such as deep learning and reinforcement learning. -We will also explain how Python's elegant design helps you write clean, efficient code. +Moreover, all major LLMs are more proficient at writing Python code than MATLAB +code. -On top of these features, Python is more widely used, with a huge and helpful community, and free! +We will discuss relative merits of Python's libraries throughout this lecture +series, as well as in our later series on [JAX](https://jax.quantecon.org/intro.html). -## What's Python? +## Introducing Python [Python](https://www.python.org) is a general-purpose programming language conceived in 1989 by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum). @@ -92,13 +98,13 @@ This is important because it * encourages reproducibility and [open science](https://en.wikipedia.org/wiki/Open_science). - ### Common Uses -{index}`Python ` is a general-purpose language used in almost all application domains, including +{index}`Python ` is a general-purpose language used +in almost all application domains, including -* AI -* scientific computing +* AI and computer science +* other scientific computing * communication * web development * CGI and graphical user interfaces @@ -107,59 +113,47 @@ This is important because it * multimedia * etc. -It is used and supported extensively by tech firms including +It is used and supported extensively by large tech firms including * [Google](https://www.google.com/) * [OpenAI](https://openai.com/) * [Netflix](https://www.netflix.com/) * [Meta](https://opensource.fb.com/) -* [Dropbox](https://www.dropbox.com/) * [Amazon](https://www.amazon.com/) * [Reddit](https://www.reddit.com/) * etc. - - ### Relative Popularity -Python is, without doubt, one of the [most popular programming languages](https://www.tiobe.com/tiobe-index/). +Python is one of the most -- if not the most -- [popular programming languages](https://www.tiobe.com/tiobe-index/). Python libraries like [pandas](https://pandas.pydata.org/) and [Polars](https://pola.rs/) are replacing familiar tools like Excel and VBA as an essential skill in the fields of finance and banking. -Moreover, Python is extremely popular within the scientific community -- especially AI - -The following chart, produced using Stack Overflow Trends, provides some evidence. +Moreover, Python is extremely popular within the scientific community -- especially those connected to AI -It shows the popularity of a Python AI library called [PyTorch](https://pytorch.org/) relative to MATLAB. +For example, the following chart from Stack Overflow Trends shows how the +popularity of a single Python deep learning library +([PyTorch](https://pytorch.org/)) has grown over the last few years. ```{figure} /_static/lecture_specific/about_py/pytorch_vs_matlab.png ``` +Pytorch is just one of several Python libraries for deep learning and AI. -The chart shows that MATLAB's popularity has faded, while PyTorch is growing rapidly. - -Moreover, PyTorch is just one of the thousands of Python libraries available for scientic computing. ### Features -Python is a [high-level language](https://en.wikipedia.org/wiki/High-level_programming_language), which means it is relatively easy to read, write and debug. +Python is a [high-level +language](https://en.wikipedia.org/wiki/High-level_programming_language), which +means it is relatively easy to read, write and debug. It has a relatively small core language that is easy to learn. -This core is supported by many libraries, which you can learn to use as required. - -Python is very beginner-friendly - -* suitable for students learning programming -* used in many undergraduate and graduate programs - -Other features of Python: - -* multiple programming styles are supported (procedural, object-oriented, functional, etc.) -* [interpreted](https://en.wikipedia.org/wiki/Interpreter_(computing)) rather than [compiled](https://en.wikipedia.org/wiki/Compiler) ahead of time. +This core is supported by many libraries, which can be studied as required. +Python is flexible and pragmatic, supporting multiple programming styles (procedural, object-oriented, functional, etc.). ### Syntax and Design @@ -167,7 +161,7 @@ Other features of Python: ```{index} single: Python; syntax and design ``` -One reason for Python's popularity is its simple and elegant design --- we'll see many examples later on. +One reason for Python's popularity is its simple and elegant design. To get a feeling for this, let's look at an example. @@ -231,12 +225,9 @@ public class CSVReader { This Java code opens an imaginary file called `data.csv` and computes the mean of the values in the second column. -Even without knowing Java, you can see that the program is long and complex. - Here's Python code that does the same thing. -Even if you don't yet know Python, you can see that the code is simpler and -easier to read. +Even if you don't yet know Python, you can see that the code is far simpler and easier to read. ```{code-cell} python3 :tags: [skip-execution] @@ -256,20 +247,14 @@ print(f"Average: {total / count if count else 'No valid data'}") ``` -The simplicity of Python and its neat design are a big factor in its popularity. ### The AI Connection -Unless you have been living under a rock and avoiding all contact with the -modern world, you will know that AI is rapidly advancing. - -AI is already remarkably good at helping you write code, as discussed above. - -No doubt AI will take over many tasks currently performed by humans, -just like other forms of machinery have done over the past few centuries. +AI is in the process of taking over many tasks currently performed by humans, +just as other forms of machinery have done over the past few centuries. -Python is playing a huge role in the advance of AI and machine learning. +Moreover, Python is playing a huge role in the advance of AI and machine learning. This means that tech firms are pouring money into development of extremely powerful Python libraries. @@ -288,9 +273,7 @@ These lectures will explain how. We have already discussed the importance of Python for AI, machine learning and data science -Let's take a look at the role of Python in other areas of scientific computing. - -Python is either the dominant player or a major player in +Python is also one of the dominant players in * astronomy * chemistry @@ -305,7 +288,6 @@ operations research -- which were previously dominated by MATLAB / Excel / STATA This section briefly showcases some examples of Python for general scientific programming. - ### NumPy ```{index} single: scientific programming; numeric @@ -379,6 +361,8 @@ However, you should still learn NumPy first because * libraries like JAX directly extend NumPy functionality and hence are easier to learn when you already know NumPy. +This lecture series will provide you with extensive background in NumPy. + ### SciPy The [SciPy](http://www.scipy.org) library is built on top of NumPy and provides additional functionality. @@ -453,7 +437,7 @@ You can visit the [Python Graph Gallery](https://www.python-graph-gallery.com/) ### Networks and Graphs -The study of networks and graphs becoming an important part of scientific work +The study of [networks](https://networks.quantecon.org/) is becoming an important part of scientific work in economics, finance and other fields. For example, we are interesting in studying @@ -463,8 +447,6 @@ For example, we are interesting in studying * friendship and social networks * etc. -(We have a [book on economic networks](https://networks.quantecon.org/) if you would like to learn more.) - Python has many libraries for studying networks and graphs. ```{index} single: NetworkX @@ -530,7 +512,7 @@ mentioned above. * [Dask](https://docs.dask.org/en/stable/) for parallelization * [Numba](http://numba.pydata.org/) for making Python run at the same speed as native machine code * [CVXPY](https://www.cvxpy.org/) for convex optimization -* [scikit-image](https://scikit-image.org/) and [OpenCV](https://opencv.org/) for processing and analysing image data +* [scikit-image](https://scikit-image.org/) and [OpenCV](https://opencv.org/) for processing and analyzing image data * [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) for extracting data from HTML and XML files diff --git a/lectures/intro.md b/lectures/intro.md index a89b608c..440a2a1b 100644 --- a/lectures/intro.md +++ b/lectures/intro.md @@ -11,11 +11,10 @@ kernelspec: # Python Programming for Economics and Finance -This website presents a set of lectures on Python programming for economics and finance. +These lectures are the first in [the set of lecture series](https://quantecon.org/lectures/) provided by QuantEcon. -This is the first text in the series, which focuses on programming in Python. - -For an overview of the series, see [this page](https://quantecon.org/lectures/) +They focus on learning to program in Python, with a view to applications in +economics and finance. ```{tableofcontents} ``` diff --git a/lectures/need_for_speed.md b/lectures/need_for_speed.md index b61a56cd..dc9d7274 100644 --- a/lectures/need_for_speed.md +++ b/lectures/need_for_speed.md @@ -27,15 +27,15 @@ premature optimization is the root of all evil." -- Donald Knuth ## Overview -Python is extremely popular for scientific computing, due to such factors as +Python is popular for scientific computing due to factors such as * the accessible and expressive nature of the language itself, -* its vast range of high quality scientific libraries, +* the huge range of high quality scientific libraries, * the fact that the language and libraries are open source, * the popular [Anaconda Python distribution](https://www.anaconda.com/download), which simplifies installation and management of scientific libraries, and * the key role that Python plays in data science, machine learning and artificial intelligence. -In previous lectures, we looked at some scientific Python libaries such as NumPy and Matplotlib. +In previous lectures, we looked at some scientific Python libraries such as NumPy and Matplotlib. However, our main focus was the core Python language, rather than the libraries. @@ -70,7 +70,9 @@ One reason we use scientific libraries is because they implement routines we wan For example, it's almost always better to use an existing routine for root finding than to write a new one from scratch. -(For standard algorithms, efficiency is maximized if the community can coordinate on a common set of implementations, written by experts and tuned by users to be as fast and robust as possible.) +(For standard algorithms, efficiency is maximized if the community can +coordinate on a common set of implementations, written by experts and tuned by +users to be as fast and robust as possible.) But this is not the only reason that we use Python's scientific libraries. @@ -93,9 +95,9 @@ At QuantEcon, the scientific libraries we use most often are * [NumPy](https://numpy.org/) * [SciPy](https://scipy.org/) * [Matplotlib](https://matplotlib.org/) +* [JAX](https://github.com/jax-ml/jax) * [Pandas](https://pandas.pydata.org/) * [Numba](https://numba.pydata.org/) and -* [JAX](https://github.com/jax-ml/jax) Here's how they fit together: @@ -104,14 +106,11 @@ Here's how they fit together: multiplication). * SciPy builds on NumPy by adding numerical methods routinely used in science (interpolation, optimization, root finding, etc.). * Matplotlib is used to generate figures, with a focus on plotting data stored in NumPy arrays. -* Pandas provides types and functions for manipulating data. -* Numba provides a just-in-time compiler that integrates well with NumPy and - helps accelerate Python code. * JAX includes array processing operations similar to NumPy, automatic differentiation, a parallelization-centric just-in-time compiler, and automated integration with hardware accelerators such as GPUs. - - +* Pandas provides types and functions for manipulating data. +* Numba provides a just-in-time compiler that plays well with NumPy and helps accelerate Python code. ## The Need for Speed @@ -133,11 +132,9 @@ Indeed, the standard implementation of Python (called CPython) cannot match the Does that mean that we should just switch to C or Fortran for everything? -The answer is: No, no, and one hundred times no! - -(This is what you should say to your professor when they insist that your model needs to be rewritten in Fortran or C++.) +The answer is: No! -There are two reasons why: +There are three reasons why: First, for any given program, relatively few lines are ever going to be time-critical. @@ -145,13 +142,17 @@ Hence it is far more efficient to write most of our code in a high productivity Second, even for those lines of code that *are* time-critical, we can now achieve the same speed as C or Fortran using Python's scientific libraries. -In fact we can often do better, because some scientific libraries are so -effective at accelerating and parallelizing our code. +Third, in the last few years, accelerating code has become essentially +synonymous with parallelizing execution, and this task is best left to +specialized compilers. + +Certain Python libraries have outstanding capabilities for parallelizing +scientific code -- we'll discuss this more as we go along. ### Where are the Bottlenecks? -Before we learn how to do this, let's try to understand why plain vanilla Python is slower than C or Fortran. +Before we do so, let's try to understand why plain vanilla Python is slower than C or Fortran. This will, in turn, help us figure out how to speed things up. @@ -275,17 +276,22 @@ Let's look at some ways around these problems. ```{index} single: Python; Vectorization ``` -There is a clever method called **vectorization** that can be -used to speed up high level languages in numerical applications. +One method for avoiding memory traffic and type checking is [array programming](https://en.wikipedia.org/wiki/Array_programming). + +Economists usually refer to array programming as ``vectorization.'' + +(In computer science, this term has [a slightly different meaning](https://en.wikipedia.org/wiki/Automatic_vectorization).) The key idea is to send array processing operations in batch to pre-compiled and efficient native machine code. The machine code itself is typically compiled from carefully optimized C or Fortran. -For example, when working in a high level language, the operation of inverting a large matrix can be subcontracted to efficient machine code that is pre-compiled for this purpose and supplied to users as part of a package. +For example, when working in a high level language, the operation of inverting a +large matrix can be subcontracted to efficient machine code that is pre-compiled +for this purpose and supplied to users as part of a package. -This clever idea dates back to MATLAB, which uses vectorization extensively. +This idea dates back to MATLAB, which uses vectorization extensively. ```{figure} /_static/lecture_specific/need_for_speed/matlab.png @@ -297,30 +303,20 @@ in later lectures. (numba-p_c_vectorization)= ## Beyond Vectorization -At its best, vectorization yields fast, simple code. +At best, vectorization yields fast, simple code. However, it's not without disadvantages. One issue is that it can be highly memory-intensive. -For example, the vectorized maximization routine above is far more memory -intensive than the non-vectorized version that preceded it. - This is because vectorization tends to create many intermediate arrays before producing the final calculation. Another issue is that not all algorithms can be vectorized. -In these kinds of settings, we need to go back to loops. - -Fortunately, there are alternative ways to speed up Python loops that work in -almost any setting. - -For example, [Numba](http://numba.pydata.org/) solves the main problems with -vectorization listed above. - -It does so through something called **just in time (JIT) compilation**, -which can generate extremely fast and efficient code. +Because of these issues, most high performance computing is moving away from +traditional vectorization and towards the use of [just-in-time compilers](https://en.wikipedia.org/wiki/Just-in-time_compilation). -{doc}`Later ` we'll learn how to use Numba to accelerate Python code. +In later lectures in this series, we will learn about how modern Python libraries exploit +just-in-time compilers to generate fast, efficient, parallelized machine code.