SciML
diff --git a/‎_weave/homework02/hw2.jmd‎
Lines changed: 2 additions & 2 deletions b/‎_weave/homework02/hw2.jmd‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_weave/homework03/hw3.jmd‎
Lines changed: 1 addition & 1 deletion b/‎_weave/homework03/hw3.jmd‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_weave/lecture02/optimizing.jmd‎
Lines changed: 13 additions & 13 deletions b/‎_weave/lecture02/optimizing.jmd‎
Lines changed: 13 additions & 13 deletions
diff --git a/‎_weave/lecture03/sciml.jmd‎
Lines changed: 2 additions & 2 deletions b/‎_weave/lecture03/sciml.jmd‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_weave/lecture04/dynamical_systems.jmd‎
Lines changed: 9 additions & 9 deletions b/‎_weave/lecture04/dynamical_systems.jmd‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎_weave/lecture05/parallelism_overview.jmd‎
Lines changed: 6 additions & 6 deletions b/‎_weave/lecture05/parallelism_overview.jmd‎
Lines changed: 6 additions & 6 deletions
@@ -46,7 +46,7 @@ The forward sensitivity equations are given by:
 
 $$\frac{d}{dt}\left(\frac{du}{dp}\right) = \frac{df}{du}\frac{du}{dp} + \frac{df}{dp}$$
 
-Use this definition to simultaniously solve for the solution to the ODE along
+Use this definition to simultaneously solve for the solution to the ODE along
 with its derivative with respect to parameters.
 
 ### Part 3: Parameter Estimation
@@ -55,7 +55,7 @@ Generate data using the parameters from Part 1. Then perturb the parameters to
 start at $\alpha = 1.2$, $\beta = 0.8$, $\gamma = 2.8$, and $\delta = 0.8$. Use
 the L2 loss against the data as a cost function, and use the forward sensitivity
 equations to implement gradient descent and optimize the cost function to
-retreive the true parameters
+retrieve the true parameters
 
 ## Problem 2: Bandwidth Maximization
 
 
@@ -106,5 +106,5 @@ underlying array types.
 
 ### Part 2: GPU Neural ODE
 
-Change the inital condition of the ODE solves to a CuArray to make your neural
+Change the initial condition of the ODE solves to a CuArray to make your neural
 ODE GPU-accelerated.
@@ -88,7 +88,7 @@ end
 Locally, the stack is composed of a *stack* and a *heap*. The stack requires a
 static allocation: it is ordered. Because it's ordered, it is very clear where
 things are in the stack, and therefore accesses are very quick (think
-instantanious). However, because this is static, it requires that the size
+instantaneous). However, because this is static, it requires that the size
 of the variables is known at compile time (to determine all of the variable
 locations). Since that is not possible with all variables, there exists the
 heap. The heap is essentially a stack of pointers to objects in memory. When
@@ -124,9 +124,9 @@ end
 @btime inner_noalloc!(C,A,B)
 ```
 
-Why does the array here get heap-allocated? It isn't able to prove/guarentee
+Why does the array here get heap-allocated? It isn't able to prove/guarantee
 at compile-time that the array's size will always be a given value, and thus
-it allocates it to the heap. `@btime` tells us this allocation occured and
+it allocates it to the heap. `@btime` tells us this allocation occurred and
 shows us the total heap memory that was taken. Meanwhile, the size of a Float64
 number is known at compile-time (64-bits), and so this is stored onto the stack
 and given a specific location that the compiler will be able to directly
@@ -277,7 +277,7 @@ without a care about performance.
 temporary variables since the individual C kernels are written for specific
 numbers of inputs and thus don't naturally fuse. Julia's broadcast mechanism
 is just generating and JIT compiling Julia functions on the fly, and thus it
-can accomodate the combinatorial explosion in the amount of choices just by
+can accommodate the combinatorial explosion in the amount of choices just by
 only compiling the combinations that are necessary for a specific code)
 
 ### Heap Allocations from Slicing
@@ -310,7 +310,7 @@ but with a relatively small constant).
 ### Asymptotic Cost of Heap Allocations
 
 Heap allocations have to locate and prepare a space in RAM that is proportional
-to the amount of memory that is calcuated, which means that the cost of a heap
+to the amount of memory that is calculated, which means that the cost of a heap
 allocation for an array is O(n), with a large constant. As RAM begins to fill
 up, this cost dramatically increases. If you run out of RAM, your computer
 may begin to use *swap*, which is essentially RAM simulated on your hard drive.
@@ -443,7 +443,7 @@ since it needs to decode and have a version for all primitive types!
 Not only is there runtime overhead checks in function calls due to to not being
 explicit about types, there is also a memory overhead since it is impossible
 to know how much memory a value with take since that's a property of its type.
-Thus the Python interpreter cannot statically guerentee exact unchanging values
+Thus the Python interpreter cannot statically guarantee exact unchanging values
 for the size that a value would take in the stack, meaning that the variables
 are not stack-allocated. This means that every number ends up heap-allocated,
 which hopefully begins to explain why this is not as fast as C.
@@ -458,12 +458,12 @@ a + b
 
 However, before JIT compilation, Julia runs a type inference algorithm which
 finds out that `A` is an `Int`, and `B` is an `Int`. You can then understand
-that if it can prove that `A+B` is an `Int`, then it can propogate all of the
+that if it can prove that `A+B` is an `Int`, then it can propagate all of the
 types through.
 
 ### Type Specialization in Functions
 
-Julia is able to propogate type inference through functions because, even if
+Julia is able to propagate type inference through functions because, even if
 a function is "untyped", Julia will interpret this as a *generic function*
 over possible *methods*, where every method has a concrete type. This means that
 in Julia, the function:
@@ -564,7 +564,7 @@ and thus the output is unknown:
 
 This means that its output type is `Union{Int,Float64}` (Julia uses union types
 to keep the types still somewhat constrained). Once there are multiple choices,
-those need to get propogated through the compiler, and all subsequent calculations
+those need to get propagate through the compiler, and all subsequent calculations
 are the result of either being an `Int` or a `Float64`.
 
 (Note that Julia has small union optimizations, so if this union is of size
@@ -615,7 +615,7 @@ ff(x::Number,y::Number) = x + y
 ff(2.0,5)
 ```
 
-Notice that the fallback method still specailizes on the inputs:
+Notice that the fallback method still specializes on the inputs:
 
 ```julia
 @code_llvm ff(2.0,5)
@@ -643,7 +643,7 @@ Note that `f(x,y) = x+y` is equivalent to `f(x::Any,y::Any) = x+y`, where `Any`
 is the maximal supertype of every Julia type. Thus `f(x,y) = x+y` is essentially
 a fallback for all possible input values, telling it what to do in the case that
 no other dispatches exist. However, note that this dispatch itself is not slow,
-since it will be specailized on the input types.
+since it will be specialized on the input types.
 
 ### Ambiguities
 
@@ -1199,10 +1199,10 @@ cheat() = qinline(1.0,2.0)
 ```
 
 It realized that `1.0` and `2.0` are constants, so it did what's known as
-*constant propogation*, and then used those constants inside of the function.
+*constant propagation*, and then used those constants inside of the function.
 It realized that the solution is always `9`, so it compiled the function that...
 spits out `9`! So it's fast because it's not computing anything. So be very
-careful about propogation of constants and literals. In general this is a very
+careful about propagation of constants and literals. In general this is a very
 helpful feature, but when benchmarking this can cause some weird behavior. If
 a micro benchmark is taking less than a nanosecond, check and see if the compiler
 "fixed" your code!
 
@@ -54,7 +54,7 @@ A neural network is a function:
 \text{NN}(x) = W_3\sigma_2(W_2\sigma_1(W_1x + b_1) + b_2) + b_3
 ```
 
-where we can change the number of layers (`(W_i,b_i)`) as necesary. Let's assume
+where we can change the number of layers (`(W_i,b_i)`) as necessary. Let's assume
 we want to approximate some $R^{10} \rightarrow R^5$ function. To do this we need
 to make sure that we start with 10 inputs and arrive at 5 outputs. If we want a
 bigger middle layer for example, we can do something like (10,32,32,5). Size changing
@@ -491,7 +491,7 @@ the field. In scientific machine learning, neural networks and machine learning
 are used as the basis to solve problems in scientific computing. [Scientific
 computing, as a discipline also known as Computational Science, is a field of
 study which focuses on scientific simulation, using tools such as differential
-equations to investigate physical, biological, and other phonomena](https://en.wikipedia.org/wiki/Computational_science).
+equations to investigate physical, biological, and other phenomena](https://en.wikipedia.org/wiki/Computational_science).
 
 What we wish to do in scientific machine learning is use these properties of
 neural networks to improve the way that we investigate our scientific models.
 
@@ -16,7 +16,7 @@ scientific models are dynamical systems. Thus if we want to start to dig into
 deeper methods, we will need to start looking into the theory and practice of
 nonlinear dynamical systems. In this lecture we will go over the basic
 properties of dynamical systems and understand their general behavior through
-code. We will also learn the idea of stability as an asymtopic property of a
+code. We will also learn the idea of stability as an asymptotic property of a
 mapping, and understand when a system is stable.
 
 ## Discrete Dynamical Systems
@@ -55,7 +55,7 @@ $$u_{n+1} = u_n + f(u_n,\theta)$$
 
 where $f$ is a neural network parameterized by $\theta$.
 
-Note that discrete dyamical systems are even more fundamental than just the
+Note that discrete dynamical systems are even more fundamental than just the
 ones shown. In any case where a continuous model is discretized to loop on the
 computer, the resulting algorithm is a discrete dynamical system and thus
 evolves according to its properties. This fact will be revisited later.
@@ -152,8 +152,8 @@ This is essentially another way of saying that a function that is differentiable
 is Lipschitz, where we can use the derivative as the Lipschitz bound. But notice
 this means that, in this neighborhood, a function with a derivative less than
 1 is a contraction mapping, and thus there is a limiting sequence which goes to
-the fixed point by the Banach Fixed Point Theorem. Furthermore, the uniquess
-guerentees that there is only one fixed point in a sufficiently small neighborhood
+the fixed point by the Banach Fixed Point Theorem. Furthermore, the uniqueness
+guarantees that there is only one fixed point in a sufficiently small neighborhood
 where the derivative is all less than 1.
 
 A way to interpret this result is that, any nice enough function $f$ is locally
@@ -271,7 +271,7 @@ analysis:
 $$u_{n+1} = \sum_{j=0}^{k-1} \alpha_j u_{n-j} + \epsilon_n$$
 
 In a very quick handwavy way, we can understand such a system by seeing how the
-perturbations propogate. If $u_0 = 0$, then the starting is just $\epsilon_0$.
+perturbations propagate. If $u_0 = 0$, then the starting is just $\epsilon_0$.
 If we assume all other $\epsilon_i = 0$, then this system is the same as a linear
 dynamical system with delays. If all of the roots are in the unit circle, then
 it goes to zero, meaning the perturbation is forgotten or squashed over time.
@@ -312,10 +312,10 @@ well: this was a periodic orbit of length 2.
 Chaos is another interesting property of a discrete dynamical system. It can be
 interpreted as a periodic orbit where the length is infinity. This can happen if,
 by changing a parameter, a period 2 orbit becomes a period 4, then a period 8,
-etc. (a phonomenon known as period doubling), and when it goes beyond the
+etc. (a phenomenon known as period doubling), and when it goes beyond the
 accumulation point the "infinite period orbit" is reached and chaos is found.
 A homework problem will delve into the properties of chaos as an example of a
-simple embaressingly data-parallel problem.
+simple embarrassingly data-parallel problem.
 
 ## Efficient Implmentation of Dynamical Systems
 
@@ -441,7 +441,7 @@ count.
 #### Multidimensional System Implementations
 
 When we go to multidimensional systems, some care needs to be taken to decrease
-the number of allocations which are occuring. One of the ways to do this is
+the number of allocations which are occurring. One of the ways to do this is
 to utilize statically sized arrays. For example, let's look at a discretization
 of the Lorenz system:
 
@@ -750,7 +750,7 @@ get involved, this can be a significant effect.
 
 1. What are some ways to compute steady states? Periodic orbits?
 2. When using the mutating algorithms, what are the data dependencies between
-   different solves if they were to happen simultaniously?
+   different solves if they were to happen simultaneously?
 3. We saw that there is a connection between delayed systems and multivariable
    systems. How deep does that go? Is every delayed system also a multivariable
    system and vice versa? Is this a useful idea to explore?
@@ -104,7 +104,7 @@ points at every line (it's an interpreted language and therefore the interpreter
 can always take control). In Julia, the yield points are minimized. The common
 yield points are allocations and I/O (`println`). This means that a tight
 non-allocating inner loop will not have any yield points and will be a thread
-that is not interruptable. While this is great for numerical performance, it is
+that is not interruptible. While this is great for numerical performance, it is
 something to be aware of.
 
 Side effect: if you run a long tight loop and wish to exit it, you may try
@@ -300,7 +300,7 @@ u = [Vector{Float64}(undef,3) for i in 1:1000]
 **Parallelism doesn't always make things faster**. There are two costs associated
 with this code. For one, we had to go to the slower heap+mutation version, so
 its implementation starting point is slower. But secondly, and more importantly,
-the cost of spinning a new thread is non-negligable. In fact, here we can see
+the cost of spinning a new thread is non-negligible. In fact, here we can see
 that it even needs to make a small allocation for the new context. The total
 cost is on the order of It's on the order of 50ns: not huge, but something
 to take note of. So what we've done is taken almost free calculations and made
@@ -324,7 +324,7 @@ their inputs. The following questions allow for independent simulations:
   conditions?
 - How does the solution very when I use a different `p`?
 
-The problem has a few descriptions. For one, it's called an *embaressingly parallel*
+The problem has a few descriptions. For one, it's called an *embarrassingly parallel*
 problem since the problem can remain largely intact to solve the parallelism
 problem. To solve this, we can use the exact same `solve_system_save_iip!`,
 and just change how we are calling it. Secondly, this is called a *data parallel*
@@ -454,7 +454,7 @@ serial_out - threaded_out
 ### Hierarchical Task-Based Multithreading and Dynamic Scheduling
 
 The major change in Julia v1.3 is that Julia's `Task`s, which are traditionally
-its green threads interface, are now the basis of its multithreading infrustructure.
+its green threads interface, are now the basis of its multithreading infrastructure.
 This means that all independent threads are parallelized, and a new interface for
 multithreading will exist that works by spawning threads.
 
@@ -487,7 +487,7 @@ However, if we check the timing we see:
 @btime tmap2(p -> compute_trajectory_mean5(@SVector([1.0,0.0,0.0]),p),ps)
 ```
 
-`Threads.@threads` is built on the same multithreading infrustructure, so why
+`Threads.@threads` is built on the same multithreading infrastructure, so why
 is this so much slower? The reason is because `Threads.@threads` employs
 **static scheduling** while `Threads.@spawn` is using **dynamic scheduling**.
 Dynamic scheduling is the model of allowing the runtime to determine the ordering
@@ -602,7 +602,7 @@ BLAS implementations. Extensions to these, known as LAPACK, include operations
 like factorizations, and are included in these standard libraries. These are
 all multithreaded. The reason why this is a location to target is because the
 operation count is high enough that parallelism can be made efficient even
-when only targetting this level: a matrix multiplication can take on the order
+when only targeting this level: a matrix multiplication can take on the order
 of seconds, minutes, hours, or even days, and these are all highly parallel
 operations. This means you can get away with a bunch just by parallelizing at
 this level, which happens to be a bottleneck for a lot scientific computing