coder_club/documentation.txt at master · keithreid-sfw/coder_club · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
Dr Keith Reid
Cailleach Computing Ltd
September 2021

Documentation for Haskell Poisson modeller
"Pwaskell"

POISSON DISTRIBUTION

Poisson distributions model the incidence of high numbers of independent random
incidents in a time (or space) series of epochs, or episodes, with a constant
probability.  For example, in any 8 hour shift a doctor may receieve 20
phonecalls on average. It may 19, or 21, but 20 is most likely. 17 is less
likely and 0, and 40, are much less likely.

Haskell as an example of a functional language

Haskell is a functional language.  Whereas Python and other object oriented
languages take "things" and "add bits on", and so are like lego, Haskell is like "marble run" or a domino run.  The thought goes into a continuous process and
the relatively simple objects elements are not changed much.  Object-orientated
furniture building we do something like

	table = top_bit
	table = table + legs
	table = turn_upside_down(table)
	table = table + placemats + food

But functional programming is more like

	table = put_on_top((invert(add_legs(top_bit))), (place_mats, food))


MATHS:

The formula for the poisson distribution is hard to write in vim.
^ 	means exponentiation, "to the power of"
e 	means "eulers number", approximately 2.71828,
	which is a fundamental constant like 1, 0, or Pi
x 	is the apparent number that your are estimating the probability of
L 	should be _lambda_ the expected or "average" number of events in a
	measured period


	p = probability of there being x events in a measured period

	     L^x * e^-L
	p =  _______

		x!

EXAMPLE

Dugarte et al, cited in wikipedia, say there are 2.5 goals on average in a world
cup footy match, and that it's in a Poisson distribution.

The probability of 2 goals is:

		6.25 * e^-2.5
	p =   __________________
		    2			(2! is 2*1 = 2)


	p ~ 0.257

This estimate of one point is called a "probability mass function"; you can
also have a "cumulative" function which would ask about a range of values e.g.
"will there be at least 2 goals" which adds the probabilities of 0 and 1 goals.

CODE IMPLEMENTATION

	Part A: Simulation

configure these:

L 	a float (meaning the computer can give it a decimal place) the expected 	number of incidents in a measured period
a 	an integer factor which is used to have a high number of possible
	incidents - 20 will do see below - this not part of Poisson per se
m 	an integer the number of trials for your model
e	Eulers number which we probably have to define as 2.71828...
x	the apparent number of incidents that you are estimating probability of

Set up an array whihc will hold the results of trials, it will have m trials
Set up a repeated loop of m trials.
For each trial m
	Create an array of random "element" numbers in the range [0-1.00)
	Elements are a programming thing, not a Poisson thing
	If they are over 0.05 they represent an incident.
	If L is 8, have 20 * 8 = 160 elements in a block representing
	  the potentiality of a measured block. There will be about 8
	  that incide (is that a word?) per trial - call that the count
	Count the incidents
	Stick that into your results array
	Go round
End up with an array like (6,7,8,8,8,7,6,8,9,9,7,8,6,8,7,8,8,8,7,9,4,8...
Count how many of each count there are
That's your distribution
I can't do graphs in Haskell yet. But I can make a data set that can be drawn
by Python. Or maybe a cheesy ASCII graph.


	Part B: Idealised graph preparation

Use the formula to plot a graph for comparison.


	Part C: Checking

Quantify the accuracy between the two probably using Chi Squared


	Part D: Stress Testing

Mess with the assumptions of Poisson and see how it affects C


Documentation for Users:

Current state of code:

do

	:l rand_ex.hs

then

	take 3 (randomlist 2)

So that gives 3 from a lazily constructed infinite list of floats from seed 2.
The seed gives reliability between tests.

currently ghci wont parse defintions like a =2 so I have to do that in the
interpreter.