summaryrefslogtreecommitdiff
path: root/celebrities
blob: 7827706319ef5b99dce62de57e535c9f6b27c506 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Celebrities die 2.7218 at a time

The claim that celebrities die in threes is usually dismissed as the
result of the human propensity to see patterns where there are
none. But celebrities don't die at regularly spaced intervals
either. It would be very weird if celebrities predictably dies on the
1st of every month. And once you deviate from a regularly spaced
pattern, some amount of clustering is inevitable. Can we make this
more precise?

Rather than trying to define exactly what constitutes a celebrity, I
will simply assume that they die at a fixed rate and that they do so
independently of each other (the day music died notwithstanding). In
other words, it is a Poisson process with intensity \(\lambda\) where
\(\lambda\) is the number of deaths that occur in some fixed time
period. In a Poisson process the time between events is exponentially
distributed with parameter \(\lambda\). The average waiting time is
\(1/\lambda\).

As an example, suppose we define celebrityhood in such a way that
twelve celebrities die each year. Then \lambda is 12/year, and the
average time between two deaths will be 1/(12/year) = 1/12th year, or
1 month.

What does it mean for celebrities to die \(n\) at a time? We will
simply say that two celebrities die together if the period between
them is shorter than expected. If the celebrity death rate is 12/year,
then two celebrities died together if their deaths were less than a
month apart. Similarly, we will say that three celebrities died
together if both the period between death 1 and death 2, and between
death 2 and death 3 was shorter than a month. In general, \(k\)
celebrities died together if the \(k - 1\) periods between them were
*all* shorter than expected.

Here is a diagram of 10 years worth of randomly generated deaths with
12 deaths per year and clusters highlighted:

   [ picture of clusters ]

Average cluster size

Suppose a celebrity has just died after a longer than average
wait. This death will start a new cluster, and we want to figure out
what the size of it is.  We can model the cluster size as a
stochastical variable \(C\) and figure out its distribution.

The cluster size will be 1 when the waiting time for the next death is
larger than or equal to the average. Plugging this into the cumulative
distribution function for the exponential distribution, we get:

   P(C = 1) = P(W > 1/lambda) = 1 - (1 - e^-lambda * (1/lambda)) = e^-1 = 0.3679

The probability that the death will be part a cluster of size 2 is the
probability that the next waiting time is shorter than average and the
next one after that is longer:

   P(C = 2) = P(W <= 1/lambda) * P(W > 1/lambda) = (1 - e^-1) * e^-1 = 0.2325

For size three, it's the probability that the next two waiting times
are shorter and the third one longer:

  P(C = 3) = P(W <= 1/lambda)^2 * P(W > 1/lambda) = (1 - e^-1)^2 * e^-1 = 0.1470

In general, the probability that a celebrity death will be part of
cluster of size \(k\) is:

  P(C = k) = P(W <= 1/lambda)^(k - 1) * P(W > 1/lambda) = (1 - e^-1)^(k-1)*e^-1

So what's the average size of a Celebrity Death Cluster? The expected
value of \(C\) is given by:

   \E[C] = \sum_{k=1}^\infty k * P(C = k) = 1/e * \sum_{k=1}^\infty k * (1 - 1/e)^(k - 1) * e^-1

It's not terribly hard to show that this infinite series has sum
\(e\), so on average, celebrities die 2.7218 at a time.