Skip to content

Commit 70a055b

Browse files
committed
More cleaup.
1 parent 5680356 commit 70a055b

File tree

4 files changed

+34
-34
lines changed

4 files changed

+34
-34
lines changed

_posts/2019-04-23-hypothesis.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ except AssertionError:
5555

5656
What a test *is* is a measurement of program execution under specific
5757
conditions. Since a test is a measurement, it follows that it is a
58-
[statistic](https://en.wikipedia.org/wiki/Statistic). So reallly, every
58+
[statistic](https://en.wikipedia.org/wiki/Statistic). So really, every
5959
set of tests is a sampling from the population of possible program
6060
executions.
6161

@@ -80,7 +80,7 @@ when it's arguments aren't positive numbers, but there are
8080

8181
One thing we know from statistics is that larger sample sizes are better
8282
than smaller sample sizes. Can we write our tests to include more
83-
samples, to help make our sample more represenative of all the possible
83+
samples, to help make our sample more representative of all the possible
8484
ways the program could be executed?
8585

8686
We can. It is possible to write a hundred variants of the add testing
@@ -139,19 +139,19 @@ def add(a, b):
139139
skip = 3
140140
a_few = 99
141141

142-
# An example of a bunch of tests, using paramaterized tests
142+
# An example of a bunch of tests, using parameterized tests
143143

144-
def test_paramaterized_sum_positive(a, b):
144+
def test_parameterized_sum_positive(a, b):
145145
assert add(a, b) > 0
146146

147-
def paramaterized_test_runner():
148-
test_paramaterized_sum_positive(1, 2)
149-
test_paramaterized_sum_positive(skip, a_few)
147+
def parameterized_test_runner():
148+
test_parameterized_sum_positive(1, 2)
149+
test_parameterized_sum_positive(skip, a_few)
150150
# ... snip
151-
test_paramaterized_sum_positive(99, 100)
151+
test_parameterized_sum_positive(99, 100)
152152

153153
try:
154-
paramaterized_test_runner()
154+
parameterized_test_runner()
155155
print("The tests passed!")
156156
except AssertionError:
157157
print("A test failed.")
@@ -172,8 +172,8 @@ potential sources of error is considerable. By this metric,
172172
parameterized tests are exceedingly better than unparameterized tests.
173173

174174
-----------|------------|-------|--------------------|---------------------
175-
Function |Argument |\# | Lines for | Lines for
176-
Length |Length |Tests | Paramterized Tests | Unparamterized Tests
175+
Function |Argument |\# | Lines for | Lines for
176+
Length |Length |Tests | Parameterized Tests | Unparameterized Tests
177177
-----------|------------|-------|--------------------|--------------------
178178
2 | 1 | 100 | 102 | 200
179179
10 | 1 | 100 | 110 | 1000
@@ -316,7 +316,7 @@ list of argument with our previous approach of writing out each argument
316316
individually in terms of how many lines it takes to write the tests.
317317

318318
---------------------------|---------------------------|---------------
319-
Test Paramterizations | Lines for hand specification | Lines for programmatic creation
319+
Test Parameterizations | Lines for hand specification | Lines for programmatic creation
320320
---------------------------| --------------------------|--------------
321321
1 | 1 | 2
322322
2 | 2 | 2
@@ -334,7 +334,7 @@ where n is the number of test cases.
334334

335335
This is a great improvement, especially if the constant size of each
336336
argument is high, but even with the add function **generating a million
337-
test paramterizations programmatically is cheaper in terms of hand
337+
test parameterizations programmatically is cheaper in terms of hand
338338
movement than writing three manually**.
339339

340340
With that power in mind, one weakness of creating a list of test cases
@@ -389,7 +389,7 @@ def test_sum_positive(a, b):
389389

390390

391391
------------------------------|-------------------------------|-----------
392-
Test Paramterizations | Memory used for list creation | Memory used in lazy evaluation
392+
Test Parameterizations | Memory used for list creation | Memory used in lazy evaluation
393393
------------------------------|-------------------------------|-----------
394394
1 | 1 MB | 1 MB
395395
2 | 2 MB | 1 MB
@@ -725,14 +725,14 @@ def test_sum_positive(a, b):
725725
If you've been playing with these examples, you may have noticed that
726726
hypothesis has found breaking examples quite easily. Not just ones where
727727
the function returned a negative result, but actual errors. The add
728-
function isn't overloaded so as to support adding arbitray numeric
728+
function isn't overloaded so as to support adding arbitrary numeric
729729
types. If it wasn't an error you expected to see, than you may begin to
730730
understand just how useful this search for falsifying examples is. It
731731
doesn't just give you confidence that your code works. It can end up
732732
teaching you something you hadn't known.
733733

734734
At the same time, you might not have seen these errors. When hypothesis
735-
runs tests, it generates them stoachastically. It's possible for two
735+
runs tests, it generates them stochastically. It's possible for two
736736
different runs of hypothesis to generate different examples.
737737

738738
This is an important thing to keep in mind when using `hypothesis`. It
@@ -831,7 +831,7 @@ tests. It provides tools for limiting both the runtime of tests
831831
according to the clock and for limiting the runtime of tests in terms of
832832
the number of test cases it checks.
833833

834-
On the topic of a contiual test loop, it's also nice to keep a record
834+
On the topic of a continual test loop, it's also nice to keep a record
835835
of what tests fail and than re-run those tests on the next run of the
836836
tester. `hypothesis` does this too.
837837

@@ -851,7 +851,7 @@ def test_this_a_little(x):
851851

852852

853853
@given(strategies.integers())
854-
def test_this_with_default_setttings(x):
854+
def test_this_with_default_settings(x):
855855
assert True
856856

857857
@given(strategies.integers())
@@ -923,7 +923,7 @@ flow can be fast so as to maximize engineer productivity.
923923

924924
## [Library integration](#library-integration)
925925

926-
Hypothesis has support for many popular python librarys, including
926+
Hypothesis has support for many popular Python libraries, including
927927
django. It can infer strategy creation from a Django model or form. In a
928928
hypothetical example, let's say you have an email address model which
929929
is related to a contact model, which is related to an organization
@@ -976,7 +976,7 @@ Consider PowerPoint.
976976
One humorous example of PowerPoint struggling to capture importance is
977977
<a href="https://norvig.com/Gettysburg/">Peter Norvig's Gettysburg address</a>.
978978
But there are much more horrific <a href="https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001yB">examples</a>.
979-
These structual implications are part of the underyling reason behind why companies like Amazon are
979+
These structural implications are part of the underlying reason behind why companies like Amazon are
980980
<a href="https://www.inc.com/carmine-gallo/jeff-bezos-bans-powerpoint-in-meetings-his-replacement-is-brilliant.html">forsaking PowerPoint in favor of the written
981981
word</a>.
982982
</p>

_posts/2021-05-17-virtuous-cycles.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ Are there things you should be rewarding but aren't? Are there places where feed
145145
If you are already in a cycle is there a way to go through it more quickly so that you can get more benefit
146146
from that cycle?
147147

148-
Understanding implies the existence of underwalking which implies the existence of undersprinting. So get
148+
Understanding implies the existence of under‑walking which implies the existence of under‑sprinting. So get
149149
going. Knowing without applying means little.
150150

151151

_posts/2021-2-21-numbers-programmers-should-know.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,23 +10,23 @@ Many well known great software engineers believe that it takes more than a
1010
solid understanding of the [run time complexity](https://en.wikipedia.org/wiki/Big_O_notation)
1111
of the algorithms being used to write performant software. In particular they claim that
1212
the latency numbers associated with computer hardware are essential knowledge. Despite the claimed
13-
importance there is little factful information about this topic readily available online.
13+
importance there is little factual information about this topic readily available online.
1414

1515
The general assumption has been that the first viral spread of published numbers, originally
1616
posted in 2010, have changed, but not by a magnitude which would change our intuition.
1717
The current updates to the number which are popularly
1818
shared online are not the actual numbers, but numbers produced by a model. The degree to which
19-
they match reality has not been tested and primae facie investigation of the models claims
19+
they match reality has not been tested and prima facie investigation of the model's claims
2020
makes me skeptical that the model is accurate especially as of 2021.
2121

22-
This post is intended to [increase factfullness][destiny instinct] in software engineering by refreshing our knowledge of these now outdated numbers.
22+
This post is intended to [increase factfulness][destiny instinct] in software engineering by refreshing our knowledge of these now outdated numbers.
2323

2424
It is also not just merely an update to these numbers, but an update with context. In prior
2525
times when these numbers have been shared they have been shared without much context being
2626
given to what the numbers mean in practice. What I hope to show is not just what the numbers
2727
are, but the changes in decision making that these numbers are supposed to motivate.
2828

29-
Althought the numers have changed, there importance has only increased with time. In the
29+
Although the numbers have changed, their importance has only increased with time. In the
3030
past these numbers were important and we were also seeing our software become faster and
3131
faster on account of advances in hardware. In recent years the free lunch of hardware speed
3232
increases has been slowing.
@@ -98,7 +98,7 @@ Send packet CA->Netherlands->CA | 150,000,000 ns | 150,000 us | 150 ms
9898

9999
## Why The Numbers Matter
100100

101-
First lets start with the L1 cache and put it in context
101+
First let's start with the L1 cache and put it in context
102102
with main memory reference. If data is already in the cache it can be fetched in 0.5 ns, but if it
103103
isn't in either the L1 cache or the L2 cache then the CPU is going to be busy fetching data for
104104
100-200 times longer than it would have if the data was in the cache.
@@ -112,7 +112,7 @@ hit and we can start to see some of why it is that these numbers are so importan
112112
write your low level code so that the computer has a good idea of what data it needs and what paths
113113
it might be taking it can [speed up execution by a considerable constant factor][so: speedy].
114114

115-
Next lets break down compression and relate it to reading data. It takes 3000 ns to compress
115+
Next let's break down compression and relate it to reading data. It takes 3000 ns to compress
116116
1K bytes, but it takes 10,000 ns to send that many bytes over the network. We can assume that
117117
compressing 4k bytes is going to take 12000 ns, but reading 4k bytes from an SSD will take
118118
150,000 ns. That is an ideal for reading from storage devices and via the network. Things
@@ -128,15 +128,15 @@ your bag before you hop on a flight is a bit faster than taking a plane trip per
128128

129129
The exact amount of when it becomes worthwhile to do this is a bit hard to say without knowing
130130
your data. Some data formats are already compressed so you don't get as much utility from
131-
compresing them. For Snappy in particular as a compression mechanism the Google repo for the project
131+
compressing them. For Snappy in particular as a compression mechanism the Google repo for the project
132132
says it performs as follows:
133133

134134
<div class="p">
135135
<blockquote>
136136
Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input.
137137
</blockquote>
138138
<footer>
139-
<a href="https://github.com/google/snappy/blob/ea368c2f07de5f31146a10214f27d15091b09771/README.md#performance">Google Snappy Github README</a>
139+
<a href="https://github.com/google/snappy/blob/ea368c2f07de5f31146a10214f27d15091b09771/README.md#performance">Google Snappy GitHub README</a>
140140
</footer>
141141
</div>
142142

@@ -155,7 +155,7 @@ as shown in the stats, is one of the most costly operations.
155155
</blockquote>
156156
<footer>
157157
<a target="_blank" href="https://www.amazon.com/gp/product/1449373321/ref=as_li_tl?ie=UTF8&amp;tag=joshuacoles-20&amp;camp=1789&amp;creative=9325&amp;linkCode=as2&amp;creativeASIN=1449373321&amp;linkId=94ba2266d30810326c298c93c92b9296">
158-
Designing Data Intensive Applications
158+
Designing Data-Intensive Applications
159159
</a>
160160
</footer>
161161
</div>

_posts/2022-07-19-notebooks.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ author: joshuacole
66
---
77

88
This essay is about some philosophical motivations behind a tool I'm building to help intelligences which think via language.
9-
The idea it to help think lazily, efficiently, and in accordance with principles which help to produce thoughtful output.
9+
The idea is to help think lazily, efficiently, and in accordance with principles which help to produce thoughtful output.
1010
So to start I'm going to explain the basic idea behind the tool.
1111

12-
The basic idea is templatize known to be good structures: things like Polya's guide to reasoning about how to solve a problem, the Eisenhower matrix for thinking about prioritization, or the laws of probabilities. Then using those templates we transform away from linear documents to a graph representation whose continuations are in accordance with the templates. In some cases, like with the laws of probability, the templatized continuations are an automatic regurgitation of known to be structurally correct reasoning. In other cases as when solving a novel problem a great deal of creativity will need to be exercised by the document writer. Regardless the writer is expected to fill in the template if they care about reasoning correctly - or leave a path empty if they are too lazy to complete the work of thinking. After writing they can then walk a path through their graph to materialize a thoughtful and well structured document which explores the topic which is of interest to them in a manner optimized according to their intended use of the document.
12+
The basic idea is to templatize known‑good structures: things like Polya's guide to reasoning about how to solve a problem, the Eisenhower matrix for thinking about prioritization, or the laws of probability. Then, using those templates, we transform away from linear documents to a graph representation whose continuations are in accordance with the templates. In some cases, like with the laws of probability, the templatized continuations are an automatic regurgitation of known‑to‑be‑structurally‑correct reasoning. In other cases, as when solving a novel problem, a great deal of creativity will need to be exercised by the document writer. Regardless, the writer is expected to fill in the template if they care about reasoning correctly — or leave a path empty if they are too lazy to complete the work of thinking. After writing they can then walk a path through their graph to materialize a thoughtful and well‑structured document which explores the topic which is of interest to them in a manner optimized according to their intended use of the document.
1313

1414
Breaking down the above there are three important things to call attention to:
1515

@@ -27,9 +27,9 @@ Letters compose to form morphemes which compose to form words which compose to f
2727

2828
Tragically - apparently not.
2929

30-
We run into two problems. The first is cultural: attribution. People who craft a nice paragraph like to be recognized for it and feel cheated if you reuse it. Putting aside their theft of our shared language for their own selfish gain we still have another problem. Copying paragraph after paragraph in writing isn't especially tractable. Words are juts a few characters. Paragraphs are many. Writing a document which copies many paragraph would mean doing a lot of writing - a lot of work.
30+
We run into two problems. The first is cultural: attribution. People who craft a nice paragraph like to be recognized for it and feel cheated if you reuse it. Putting aside their theft of our shared language for their own selfish gain we still have another problem. Copying paragraph after paragraph in writing isn't especially tractable. Words are just a few characters. Paragraphs are many. Writing a document which copies many paragraphs would mean doing a lot of writing - a lot of work.
3131

32-
We can solve both problems by using a graph structure. The first problem is solved because we can attribute the author of the paragraph as metadata on the graph path that describes a document. Plagarism is not citation. Documents are path descriptions through a graph - not claims to the writing along the content of the path description. The second problem is solved for much the same reason. Since we are giving a path description through the graph we don't have to worry about the character cost of typing out each paragraph. We just have to write out the edge.
32+
We can solve both problems by using a graph structure. The first problem is solved because we can attribute the author of the paragraph as metadata on the graph path that describes a document. Plagiarism is not citation. Documents are path descriptions through a graph - not claims to the writing along the content of the path description. The second problem is solved for much the same reason. Since we are giving a path description through the graph we don't have to worry about the character cost of typing out each paragraph. We just have to write out the edge.
3333

3434
## Templatized Graph Structure
3535

0 commit comments

Comments
 (0)