More cleaup.

jColeChanged · jColeChanged · commit 70a055b32760 · 2025-08-08T20:46:55.000-05:00
diff --git a/_posts/2019-04-23-hypothesis.md b/_posts/2019-04-23-hypothesis.md
@@ -55,7 +55,7 @@ except AssertionError:
 
 What a test *is* is a measurement of program execution under specific
 conditions. Since a test is a measurement, it follows that it is a
-[statistic](https://en.wikipedia.org/wiki/Statistic). So reallly, every
+[statistic](https://en.wikipedia.org/wiki/Statistic). So really, every
 set of tests is a sampling from the population of possible program
 executions.
 
@@ -80,7 +80,7 @@ when it's arguments aren't positive numbers, but there are
 
 One thing we know from statistics is that larger sample sizes are better
 than smaller sample sizes. Can we write our tests to include more
-samples, to help make our sample more represenative of all the possible
+samples, to help make our sample more representative of all the possible
 ways the program could be executed?
 
 We can. It is possible to write a hundred variants of the add testing
@@ -139,19 +139,19 @@ def add(a, b):
 skip = 3
 a_few = 99
 
-# An example of a bunch of tests, using paramaterized tests
+# An example of a bunch of tests, using parameterized tests
 
-def test_paramaterized_sum_positive(a, b):
+def test_parameterized_sum_positive(a, b):
     assert add(a, b) > 0
 
-def paramaterized_test_runner():   
-    test_paramaterized_sum_positive(1, 2)
-    test_paramaterized_sum_positive(skip, a_few)
+def parameterized_test_runner():   
+    test_parameterized_sum_positive(1, 2)
+    test_parameterized_sum_positive(skip, a_few)
     # ... snip
-    test_paramaterized_sum_positive(99, 100)
+    test_parameterized_sum_positive(99, 100)
 
 try:
-    paramaterized_test_runner()
+    parameterized_test_runner()
     print("The tests passed!")
 except AssertionError:
     print("A test failed.")
@@ -172,8 +172,8 @@ potential sources of error is considerable. By this metric,
 parameterized tests are exceedingly better than unparameterized tests.
 
   -----------|------------|-------|--------------------|---------------------
-  Function   |Argument    |\#     | Lines for          | Lines for
-  Length     |Length      |Tests  | Paramterized Tests | Unparamterized Tests
+   Function   |Argument    |\#     | Lines for          | Lines for
+   Length     |Length      |Tests  | Parameterized Tests | Unparameterized Tests
   -----------|------------|-------|--------------------|--------------------
   2          | 1          | 100   | 102                | 200
   10         | 1          | 100   | 110                | 1000
@@ -316,7 +316,7 @@ list of argument with our previous approach of writing out each argument
 individually in terms of how many lines it takes to write the tests.
 
   ---------------------------|---------------------------|---------------
-  Test Paramterizations      | Lines for hand  specification | Lines for programmatic creation
+  Test Parameterizations      | Lines for hand  specification | Lines for programmatic creation
   ---------------------------| --------------------------|--------------
   1                          | 1                         | 2
   2                          | 2                         | 2
@@ -334,7 +334,7 @@ where n is the number of test cases.
 
 This is a great improvement, especially if the constant size of each
 argument is high, but even with the add function **generating a million
-test paramterizations programmatically is cheaper in terms of hand
+ test parameterizations programmatically is cheaper in terms of hand
 movement than writing three manually**.
 
 With that power in mind, one weakness of creating a list of test cases
@@ -389,7 +389,7 @@ def test_sum_positive(a, b):
 
 
   ------------------------------|-------------------------------|-----------
-  Test Paramterizations         | Memory used for list creation | Memory used in lazy evaluation
+  Test Parameterizations         | Memory used for list creation | Memory used in lazy evaluation
   ------------------------------|-------------------------------|-----------
   1                             | 1 MB                          | 1 MB
   2                             | 2 MB                          | 1 MB
@@ -725,14 +725,14 @@ def test_sum_positive(a, b):
 If you've been playing with these examples, you may have noticed that
 hypothesis has found breaking examples quite easily. Not just ones where
 the function returned a negative result, but actual errors. The add
-function isn't overloaded so as to support adding arbitray numeric
+ function isn't overloaded so as to support adding arbitrary numeric
 types. If it wasn't an error you expected to see, than you may begin to
 understand just how useful this search for falsifying examples is. It
 doesn't just give you confidence that your code works. It can end up
 teaching you something you hadn't known.
 
 At the same time, you might not have seen these errors. When hypothesis
-runs tests, it generates them stoachastically. It's possible for two
+runs tests, it generates them stochastically. It's possible for two
 different runs of hypothesis to generate different examples.
 
 This is an important thing to keep in mind when using `hypothesis`. It
@@ -831,7 +831,7 @@ tests. It provides tools for limiting both the runtime of tests
 according to the clock and for limiting the runtime of tests in terms of
 the number of test cases it checks.
 
-On the topic of a contiual test loop, it's also nice to keep a record
+On the topic of a continual test loop, it's also nice to keep a record
 of what tests fail and than re-run those tests on the next run of the
 tester. `hypothesis` does this too.
 
@@ -851,7 +851,7 @@ def test_this_a_little(x):
     
 
 @given(strategies.integers())
-def test_this_with_default_setttings(x):
+def test_this_with_default_settings(x):
     assert True
     
 @given(strategies.integers())
@@ -923,7 +923,7 @@ flow can be fast so as to maximize engineer productivity.
 
 ## [Library integration](#library-integration)
 
-Hypothesis has support for many popular python librarys, including
+Hypothesis has support for many popular Python libraries, including
 django. It can infer strategy creation from a Django model or form. In a
 hypothetical example, let's say you have an email address model which
 is related to a contact model, which is related to an organization
@@ -976,7 +976,7 @@ Consider PowerPoint.
 One humorous example of PowerPoint struggling to capture importance is  
 <a href="https://norvig.com/Gettysburg/">Peter Norvig's Gettysburg address</a>.
 But there are much more horrific <a href="https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001yB">examples</a>.
-These structual implications are part of the underyling reason behind why  companies like Amazon are
+These structural implications are part of the underlying reason behind why  companies like Amazon are
 <a href="https://www.inc.com/carmine-gallo/jeff-bezos-bans-powerpoint-in-meetings-his-replacement-is-brilliant.html">forsaking PowerPoint in favor of the written
 word</a>.
 </p>
diff --git a/_posts/2021-05-17-virtuous-cycles.md b/_posts/2021-05-17-virtuous-cycles.md
@@ -145,7 +145,7 @@ Are there things you should be rewarding but aren't? Are there places where feed
 If you are already in a cycle is there a way to go through it more quickly so that you can get more benefit
 from that cycle?
 
-Understanding implies the existence of underwalking which implies the existence of undersprinting. So get
+Understanding implies the existence of under‑walking which implies the existence of under‑sprinting. So get
 going. Knowing without applying means little.
 
 
diff --git a/_posts/2021-2-21-numbers-programmers-should-know.md b/_posts/2021-2-21-numbers-programmers-should-know.md
@@ -10,23 +10,23 @@ Many well known great software engineers believe that it takes more than a
 solid understanding of the [run time complexity](https://en.wikipedia.org/wiki/Big_O_notation)
 of the algorithms being used to write performant software. In particular they claim that 
 the latency numbers associated with computer hardware are essential knowledge. Despite the claimed 
-importance there is little factful information about this topic readily available online.
+importance there is little factual information about this topic readily available online.
 
 The general assumption has been that the first viral spread of published numbers, originally 
 posted in 2010, have changed, but not by a magnitude which would change our intuition. 
 The current updates to the number which are popularly 
 shared online are not the actual numbers, but numbers produced by a model. The degree to which 
-they match reality has not been tested and primae facie investigation of the models claims 
+they match reality has not been tested and prima facie investigation of the model's claims 
 makes me skeptical that the model is accurate especially as of 2021.
 
-This post is intended to [increase factfullness][destiny instinct] in software engineering by refreshing our knowledge of these now outdated numbers.
+This post is intended to [increase factfulness][destiny instinct] in software engineering by refreshing our knowledge of these now outdated numbers.
 
 It is also not just merely an update to these numbers, but an update with context. In prior 
 times when these numbers have been shared they have been shared without much context being 
 given to what the numbers mean in practice. What I hope to show is not just what the numbers 
 are, but the changes in decision making that these numbers are supposed to motivate.
 
-Althought the numers have changed, there importance has only increased with time. In the 
+Although the numbers have changed, their importance has only increased with time. In the 
 past these numbers were important and we were also seeing our software become faster and 
 faster on account of advances in hardware. In recent years the free lunch of hardware speed 
 increases has been slowing.
@@ -98,7 +98,7 @@ Send packet CA->Netherlands->CA   | 150,000,000   ns | 150,000 us  | 150 ms
 
 ## Why The Numbers Matter
 
-First lets start with the L1 cache and put it in context 
+First let's start with the L1 cache and put it in context 
 with main memory reference. If data is already in the cache it can be fetched in 0.5 ns, but if it 
 isn't in either the L1 cache or the L2 cache then the CPU is going to be busy fetching data for 
 100-200 times longer than it would have if the data was in the cache.
@@ -112,7 +112,7 @@ hit and we can start to see some of why it is that these numbers are so importan
 write your low level code so that the computer has a good idea of what data it needs and what paths 
 it might be taking it can [speed up execution by a considerable constant factor][so: speedy].
 
-Next lets break down compression and relate it to reading data. It takes 3000 ns to compress 
+Next let's break down compression and relate it to reading data. It takes 3000 ns to compress 
 1K bytes, but it takes 10,000 ns to send that many bytes over the network. We can assume that 
 compressing 4k bytes is going to take 12000 ns, but reading 4k bytes from an SSD will take 
 150,000 ns. That is an ideal for reading from storage devices and via the network. Things 
@@ -128,15 +128,15 @@ your bag before you hop on a flight is a bit faster than taking a plane trip per
 
 The exact amount of when it becomes worthwhile to do this is a bit hard to say without knowing 
 your data. Some data formats are already compressed so you don't get as much utility from 
-compresing them. For Snappy in particular as a compression mechanism the Google repo for the project 
+compressing them. For Snappy in particular as a compression mechanism the Google repo for the project 
 says it performs as follows:
 
 <div class="p">
     <blockquote>
         Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input.
     </blockquote>
     <footer>
-        <a href="https://github.com/google/snappy/blob/ea368c2f07de5f31146a10214f27d15091b09771/README.md#performance">Google Snappy Github README</a>
+<a href="https://github.com/google/snappy/blob/ea368c2f07de5f31146a10214f27d15091b09771/README.md#performance">Google Snappy GitHub README</a>
     </footer>
 </div>
 
@@ -155,7 +155,7 @@ as shown in the stats, is one of the most costly operations.
         </blockquote>
         <footer>
             <a target="_blank" href="https://www.amazon.com/gp/product/1449373321/ref=as_li_tl?ie=UTF8&amp;tag=joshuacoles-20&amp;camp=1789&amp;creative=9325&amp;linkCode=as2&amp;creativeASIN=1449373321&amp;linkId=94ba2266d30810326c298c93c92b9296">
-                Designing Data Intensive Applications
+Designing Data-Intensive Applications
             </a>
         </footer>
     </div>
diff --git a/_posts/2022-07-19-notebooks.md b/_posts/2022-07-19-notebooks.md
@@ -6,10 +6,10 @@ author: joshuacole
 ---
 
 This essay is about some philosophical motivations behind a tool I'm building to help intelligences which think via language.
-The idea it to help think lazily, efficiently, and in accordance with principles which help to produce thoughtful output. 
+The idea is to help think lazily, efficiently, and in accordance with principles which help to produce thoughtful output. 
 So to start I'm going to explain the basic idea behind the tool.
 
-The basic idea is templatize known to be good structures: things like Polya's guide to reasoning about how to solve a problem, the Eisenhower matrix for thinking about prioritization, or the laws of probabilities. Then using those templates we transform away from linear documents to a graph representation whose continuations are in accordance with the templates. In some cases, like with the laws of probability, the templatized continuations are an automatic regurgitation of known to be structurally correct reasoning. In other cases as when solving a novel problem a great deal of creativity will need to be exercised by the document writer. Regardless the writer is expected to fill in the template if they care about reasoning correctly - or leave a path empty if they are too lazy to complete the work of thinking. After writing they can then walk a path through their graph to materialize a thoughtful and well structured document which explores the topic which is of interest to them in a manner optimized according to their intended use of the document.
+The basic idea is to templatize known‑good structures: things like Polya's guide to reasoning about how to solve a problem, the Eisenhower matrix for thinking about prioritization, or the laws of probability. Then, using those templates, we transform away from linear documents to a graph representation whose continuations are in accordance with the templates. In some cases, like with the laws of probability, the templatized continuations are an automatic regurgitation of known‑to‑be‑structurally‑correct reasoning. In other cases, as when solving a novel problem, a great deal of creativity will need to be exercised by the document writer. Regardless, the writer is expected to fill in the template if they care about reasoning correctly — or leave a path empty if they are too lazy to complete the work of thinking. After writing they can then walk a path through their graph to materialize a thoughtful and well‑structured document which explores the topic which is of interest to them in a manner optimized according to their intended use of the document.
 
 Breaking down the above there are three important things to call attention to:
 
@@ -27,9 +27,9 @@ Letters compose to form morphemes which compose to form words which compose to f
 
 Tragically - apparently not.
 
-We run into two problems. The first is cultural: attribution. People who craft a nice paragraph like to be recognized for it and feel cheated if you reuse it. Putting aside their theft of our shared language for their own selfish gain we still have another problem. Copying paragraph after paragraph in writing isn't especially tractable. Words are juts a few characters. Paragraphs are many. Writing a document which copies many paragraph would mean doing a lot of writing - a lot of work.
+We run into two problems. The first is cultural: attribution. People who craft a nice paragraph like to be recognized for it and feel cheated if you reuse it. Putting aside their theft of our shared language for their own selfish gain we still have another problem. Copying paragraph after paragraph in writing isn't especially tractable. Words are just a few characters. Paragraphs are many. Writing a document which copies many paragraphs would mean doing a lot of writing - a lot of work.
 
-We can solve both problems by using a graph structure. The first problem is solved because we can attribute the author of the paragraph as metadata on the graph path that describes a document. Plagarism is not citation. Documents are path descriptions through a graph - not claims to the writing along the content of the path description. The second problem is solved for much the same reason. Since we are giving a path description through the graph we don't have to worry about the character cost of typing out each paragraph. We just have to write out the edge. 
+We can solve both problems by using a graph structure. The first problem is solved because we can attribute the author of the paragraph as metadata on the graph path that describes a document. Plagiarism is not citation. Documents are path descriptions through a graph - not claims to the writing along the content of the path description. The second problem is solved for much the same reason. Since we are giving a path description through the graph we don't have to worry about the character cost of typing out each paragraph. We just have to write out the edge. 
 
 ## Templatized Graph Structure