# An example: all possible 3-song playlists where the first sort is a salsa song, the second a bachata song, and the third a reggaeton song.
m_s = 200 # Salsa songs
m_b = 80 # Bachata songs
m_r = 100 # Reggaeton songs
m_s * m_b * m_r1600000
While counting things may seem simple, it can become quite when the number of things to count grows. For example, consider the following problem:
As the example of the garden of forking data shows, we need to be able to count the number of ways to choose a subset of elements from a set. This is the subject of combinatorics and there are four basic formulas that we will use to solve these problems.
In the garden of forking data we needed to count the number of ways a collection of three white and one blue balls could produce samples of three balls, selected with replacement.
The first selected ball can be any of the four balls in the box and there are four possibilities. The second selected ball can also be any of the four balls in the box and there are again four possibilities.
Up to now we have a total of 4 \times 4 = 16 possibilities. The last selected balls can also be any of the four balls in the box and there are four possibilities. The total number of possibilities is 4 \times 4 \times 4 = 64.
More generally, there are n^s ways to select s ordered elements from a set of n elements, with replacement.
Even more generally, consider a DJ who has a set of m_s salsa songs, m_b bachata songs and m_r reggaeton songs. The DJ plans to play a set of three songs, each of a different genre: first a salsa, then a bachata, and finally a reggaeton song. How many possible sequences of songs can the DJ play?
The total number of possible sequences of songs is m_s \times m_b \times m_r where the first song is a salsa song, the second song is a bachata song, and the third song is a reggaeton song.
# An example: all possible 3-song playlists where the first sort is a salsa song, the second a bachata song, and the third a reggaeton song.
m_s = 200 # Salsa songs
m_b = 80 # Bachata songs
m_r = 100 # Reggaeton songs
m_s * m_b * m_r1600000
A permutation is an arrangement of objects in a specific order. If we have a set of n (distinct) objects and we want to arrange them in a specific order, there are n! ways to do this. The factorial function is defined as:
n! = n \times (n-1) \times (n-2) \times \ldots \times 2 \times 1
How to explain this formula? Let’s set n = 5 and consider a set of five songs that a DJ wants to play in sequence. The first song can be any of the five songs, the second song can be any of the remaining four songs, the third song can be any of the remaining three songs, and so on. The total number of ways to arrange the songs is 5 \times 4 \times 3 \times 2 \times 1 = 5! = 120.
# To compute the factorial you can use the factorial function from the math module (which is imported here at the top of the notebook).
math.factorial(5)120
Example 5.1 (Permutations) You have 4 books on your shelf at home. In how many ways can you arrange them?
# Create a list of 3 books
books3 = ["Framed", "Malas", "Ellipses"]
print("The number of possible arrangements is 3! = 3*2*1 = 6")
print("Check it out:", math.factorial(3))
print("All possible arrangements of the books:")
# NOTE: code for illustration only
# Generate all permutations of the books in the list above and print them
for p in itertools.permutations(books3):
print(p)The number of possible arrangements is 3! = 3*2*1 = 6
Check it out: 6
All possible arrangements of the books:
('Framed', 'Malas', 'Ellipses')
('Framed', 'Ellipses', 'Malas')
('Malas', 'Framed', 'Ellipses')
('Malas', 'Ellipses', 'Framed')
('Ellipses', 'Framed', 'Malas')
('Ellipses', 'Malas', 'Framed')
# We can repeat this for a list of 4 books
books4 = ["Framed", "Malas", "Ellipses", "Beautiful Days"]
print("The number of possible arrangements is 4! = 4*3*2*1 = 24")
print("Compare it with the result from math.factorial:", math.factorial(4))
print("All possible arrangements of the books:")
# NOTE: # NOTE: code for illustration only
for p in itertools.permutations(books4):
print(p)The number of possible arrangements is 4! = 4*3*2*1 = 24
Compare it with the result from math.factorial: 24
All possible arrangements of the books:
('Framed', 'Malas', 'Ellipses', 'Beautiful Days')
('Framed', 'Malas', 'Beautiful Days', 'Ellipses')
('Framed', 'Ellipses', 'Malas', 'Beautiful Days')
('Framed', 'Ellipses', 'Beautiful Days', 'Malas')
('Framed', 'Beautiful Days', 'Malas', 'Ellipses')
('Framed', 'Beautiful Days', 'Ellipses', 'Malas')
('Malas', 'Framed', 'Ellipses', 'Beautiful Days')
('Malas', 'Framed', 'Beautiful Days', 'Ellipses')
('Malas', 'Ellipses', 'Framed', 'Beautiful Days')
('Malas', 'Ellipses', 'Beautiful Days', 'Framed')
('Malas', 'Beautiful Days', 'Framed', 'Ellipses')
('Malas', 'Beautiful Days', 'Ellipses', 'Framed')
('Ellipses', 'Framed', 'Malas', 'Beautiful Days')
('Ellipses', 'Framed', 'Beautiful Days', 'Malas')
('Ellipses', 'Malas', 'Framed', 'Beautiful Days')
('Ellipses', 'Malas', 'Beautiful Days', 'Framed')
('Ellipses', 'Beautiful Days', 'Framed', 'Malas')
('Ellipses', 'Beautiful Days', 'Malas', 'Framed')
('Beautiful Days', 'Framed', 'Malas', 'Ellipses')
('Beautiful Days', 'Framed', 'Ellipses', 'Malas')
('Beautiful Days', 'Malas', 'Framed', 'Ellipses')
('Beautiful Days', 'Malas', 'Ellipses', 'Framed')
('Beautiful Days', 'Ellipses', 'Framed', 'Malas')
('Beautiful Days', 'Ellipses', 'Malas', 'Framed')
Exercise 5.1 (Permutations) A small company has 16 employees. They want to line up for a group photo. In how many ways can they arrange themselves?
math.factorial(16)20922789888000
# NOTE: code for illustration only. The code here only prettifies the large number (16!) so that we can read it more easily. You don't need to remember the syntax.
print(f"{math.factorial(16):,}")
print(f"{math.factorial(16):e}")20,922,789,888,000
2.092279e+13
# To make some sense of this large number, let's say that the employees are very fast and can
# make a photo of each arrangement 10 milliseconds. How many years would it take to make all the photos?
10 * math.factorial(16) / 60 / 60 / 24 / 3656634573.150684931
# Let's say that half of the members of parliament take a group photo. How many arrangements of 120 people are there?
print(f"{math.factorial(120):e}")6.689503e+198
It is hard to imagine the magnitude of extremely large numbers such as 120! for example. To gain some perspective, take a look at the following short documentary (it goes to 10^{26}).
The empoyees of our small company decide that it would take too long to go through all the possible group photo arrangements. They decide to take a group photo with only 5 employees. In how many ways can they arrange themselves?
Again, think about the group photo as having five slots. The first slot can be filled by any of the 16 employees, the second slot can be filled by any of the remaining 15 employees, and so on.
The total number of ways to arrange the employees is
\underbrace{16}_{\text{Slot 1}} \times \underbrace{15}_{\text{Slot 2}} \times \underbrace{14}_{\text{Slot 3}} \times \underbrace{13}_{\text{Slot 4}} \times \underbrace{12}_{\text{Slot 5}} = 16 \times 15 \times 14 \times 13 \times 12 = 524160
16 * 15 * 14 * 13 * 12524160
More generally, we can arrange k elements from a set of n elements in
n \times (n-1) \times \ldots \times (n-(k - 1))
ways. In the previous example we had n = 16 and k = 5.
(16 - \underbrace{0}_{5 - 5}) \times (16 - \underbrace{1}_{5 - 4}) \times (16 - \underbrace{2}_{5 - 3}) \times (16 - \underbrace{3}_{5 - 2}) \times (16 - \underbrace{4}_{5 - 1}) = 16 \times 15 \times 14 \times 13 \times 12 = 524160
Usually we write this in a more compact form:
\frac{16!}{(16-5)!} = \frac{16!}{11!} = \frac{16 \times 15 \times 14 \times 13 \times 12 \times \cancel{11!}}{\cancel{11!}}
Or more generally there are
\frac{n!}{(n - k)!}
ways to arrange k elements from a set of n elements (the order of the elements matters).
math.factorial(16) / math.factorial(11)524160.0
math.comb(9, 2)36
There are occasions when we are not interested in the order of the elements in a subset. For example, the employees of the small company may not care about the order in which they are arranged in the group photo, only that they are in the photo. How many arrangements are there?
We computed that there are 524160 ways to arrange the employees in the group photo of 5 persons. However, some of these arrangements only differ in the order of the employees. For example, the arrangements
\begin{align*} (A,B,C,D,E) \\ (B,A,C,D,E) \end{align*}
are the same for the purpose of the group photo (both have the same employees, only in a different order).
For a group of 5 persons, there are 5! = 120 ways to arrange the employees. This means that there are 524160 / 120 = 4368 ways to arrange the employees in the group photo.
More generally, the number of ways to select a set (no order) of k elements from a set of n elements, without replacement:
\frac{n!}{k!(n-k)!} = \binom{n}{k}
You may have seen this notation before. The symbol
\binom{n}{k}
is read as “n choose k” and is called a binomial coefficient or combination. In the previous example we had n = 16 and k = 5.
math.comb(16, 5)4368
math.factorial(16) / (math.factorial(5) * math.factorial(16 - 5))4368.0
Now that the employees of the small company have agreed to ignore the order of the persons in the photos, they decide to take a group photo with 4 employees. To avoid the injustice of some employees being left out of the photo, they decide to form four groups of 4 employees. In how many ways can they form these groups?
We can approach this problem by considering the formation of groups sequentially.
The total number of ways to form the groups is:
\binom{16}{4} \times \binom{12}{4} \times \binom{8}{4}\times \binom{4}{4}
We can generalize this to the case where we have a set of n elements that must be partitioned into k groups of sizes n_1, n_2, \ldots, n_k. As every element must be assigned to a group, the sum of the sizes of the groups must be equal to n:
n_1 + n_2 + \ldots + n_k = n
The general formula for the number of ways to partition a set follow the same steps as the students example, except that now the group sizes may be different.
The total number of ways to partition the set is:
\binom{n}{n_1} \times \binom{n - n_1}{n_2} \times \ldots \times \binom{n - n_1 - n_2 - \ldots - n_{k-1}}{n_k}
We can simplify this expression as it contains many terms that cancel out:
\frac{n!}{n_1! (n - n_1)!} \times \frac{(n - n_1)!}{n_2! (n - n_1 - n_2)!} \times \frac{(n - n_1 - n_2)!}{n_3! (n - n_1 - n_2 - n_3)!} \times \ldots \times \frac{(n - n_1 - n_2 - \ldots - n_{k-1})!}{n_k! (n - n_1 - n_2 - \ldots - n_k)!}
Notice that the denominator of the first term is the numerator of the second term, the denominator of the second term is the numerator of the third term, and so on. This means that all the terms cancel out except for the numerator of the first term and the denominator of the last term. The total number of ways to partition the set is:
\frac{n!}{n_1! \cancel{(n - n_1)!}} \times \frac{\cancel{(n - n_1)!}}{n_2! \cancel{(n - n_1 - n_2)!}} \times \frac{\cancel{(n - n_1 - n_2)!}}{n_3! \cancel{(n - n_1 - n_2 - n_3)!}} \times \ldots \times \frac{\cancel{(n - n_1 - n_2 - \ldots - n_{k-1})!}}{n_k! (n - n_1 - n_2 - \ldots - n_k)!} = \frac{n!}{n_1! n_2! \ldots n_k!}
The term in the last denominator does not cancel out but it is equal to 1, because the group sizes must sum to n and 0! = 1 (by definition).
(n - (n_1 + n_2 + \ldots + n_k))! = 0! = 1
This means that the total number of ways to partition the set is:
\frac{n!}{n_1! n_2! \ldots n_k!}
This expression is called a multinomial coefficient and is denoted as
\binom{n}{n_1, n_2, \ldots, n_k} .
((2 * 26 + 10 + 20) ** 14) / 100e9 / 60 / 60 / 24 / 365197054939.86602503
We have already seen that there are n^k ways to select k elements from a set of n elements, with replacement and keeping track of the order of the elements.
Now we want to see how many ways we can select k elements from a set of n elements, with replacement, and without regard to order. This formula may be the most difficult to figure out.
Imagine that you have a lottery ticket with 2 numbers which can be any of the numbers from 1 to 4 and can be selected more than once. How many possible lottery tickets can you have with this setup if you ignore the order of the numbers on the ticket (e.g. 2 and 3 is the same as 3 and 2)?
For this problem it is useful to think of all the numbers from 1 to 4 as representing a box.
| 1 | 2 | 3 | 4 |
|---|
Now, selecting 2 numbers with replacement is the same as placing 2 marker on the numbers in the box. For example, the ticket with the numbers 1 and 3 can be represented as:
|X_1||X_2||
or
|X_2||X_1||
A selection of the first number twice can be represented as:
|X_1 X_2||||
Therefore, all the possible tickets will correspond to all the possible ways to arrange the two M and the three | in the box. We don’t count the outer |, e.g for selecting the first and the fourth numbers:
X_1 |_{1} |_{2} |_{3} X_2
We can arrange the 2 + 3 objects in (2 + 3)! = 5! ways. As the number of permutations accounts for the order of the elements, we need to divide by the number of ways to arrange the two M and the three |. There are 2! ways to arrange the two M and 3! ways to arrange the three |. The total number of ways to select two numbers from the box is:
\frac{5!}{2! 3!} = \frac{5 \times 4 \times 3 \times 2 \times 1}{2 \times 1 \times 3 \times 2 \times 1} = 10
We can generalize this to the case of selecting k elements from a set of n elements, with replacement, and without regard to order.
\frac{(n + k - 1)!}{k!(n - 1)!} = \binom{n + k - 1}{k}
| Without replacement | With replacement | |
|---|---|---|
| Ordered | \frac{n!}{(n - k)!} | n^k |
| Unordered | \binom{n}{k} | \binom{n + k - 1}{k} |
math.comb(2 + 4 - 1, 2)10
Exercise 5.2 (Number of Songs) Consider the DJ with the salsa (m_s), bachata (m_b), and reggaeton (m_r) songs.
We have already established that there are m_s \times m_b \times m_r ways to select a sequence of songs where the first song is a salsa song, the second song is a bachata song, and the third song is a reggaeton song.
How many different sequences can the DJ play if he can play the genres in any order?
The DJ can play the songs in any order, so there are 3! = 6 ways to select the sequence of songs. With only three genres we can even list all the possibilities:
For each of the six sequences of genres there are m_s \times m_b \times m_r ways to select the songs. The total number of ways to select the songs is 6 \times m_s \times m_b \times m_r.
Exercise 5.3 (Password Cracking) A site asks you to generate a password with exactly 6 characters. The password must contain only ASCII lowercase letters (a-z).
For a password of length 6 with only lowercase letters, there are 26^6 possible passwords, because the order of the letters matters and because you can repeat the letters, for example “aajump” is a valid password.
To see this, think about each of the positions of the characters of the password as a box
\underbrace{\_}_{\text{Box 1}} \quad \underbrace{\_}_{\text{Box 2}} \quad \underbrace{\_}_{\text{Box 3}} \quad \underbrace{\_}_{\text{Box 4}} \quad \underbrace{\_}_{\text{Box 5}} \quad \underbrace{\_}_{\text{Box 6}}
You can place any of the 26 letters in each box, so there are 26 \times 26 \times 26 \times 26 \times 26 \times 26 = 26^6 possible passwords.
If an attacker can try one billion passwords per second, it would take 26^6 / 10^9 seconds to try all possible passwords. This is approximately 0.3 seconds (see the code below).
# The number of 6-letter passwords (26 letters) divided by 1 billion
(26 ** 6) / 1e90.308915776
# The number of 6-letter passwords (26 lowcase letters, 26 uppercase letters, 10 digits and 12 special characters) divided by 1 billion
((26 + 26 + 10 + 12) ** 6) / 1e9164.206490176
#The number of possible passwords of length 14 (26 lowcase letters, 26 uppercase letters, 10 digits and 12 special characters) divided by 1 billion
((26 + 26 + 10 + 12) ** 14) / 1e91.476536122735822e+17
Exercise 5.4 (Number of Student Committees) Let’s say that you want to form a student council with 5 members out of a class of 35 students.
Exercise 5.5 (Power Set) The power set of a set S is the set of all subsets of S, including the empty set and S itself. For a set with n elements, how much elements does the power set have?
Let’s show the logic for a set with 3 elements. The set S = \{a, b, c\} has the following subsets:
Now think about how we can represent each of these subset. Let’s create a list of zeroes and ones, where a one in the i-th position means that the i-th element of the set is in the subset.
With these indicators we can reduce the question about counting the number of all subsets to the question of counting the number of arrangements of 0s and 1s. with replacement (i.e. we can have more than one 1 in the arrangement). Again, think of the positions in the indicator as boxes. We have two choices for each box:
So, we have two choices for each of the three boxes, and the total number of arrangements is 2^3 = 8.
\underbrace{2}_{\text{Box 1}} \quad \underbrace{2}_{\text{Box 2}} \quad \underbrace{2}_{\text{Box 3}}
For a set with n elements, the power set has 2^n elements, because we will have 2 choices for each of the n boxes.
Exercise 5.6 (Binomial Coefficients) Argue why the following sum of binomial coefficients for a fixed n is equal to 2^n:
\sum_{k = 0}^{n} \binom{n}{k} = \binom{n}{0} + \binom{n}{1} + \binom{n}{2} + \ldots + \binom{n}{n} = 2^n
In the previous exercise we saw that the power set of a set with n elements has 2^n elements. To see why the above equation is true, consider the following example with n = 3:
S = \{a, b, c\}
The power set of S has 2^3 elements. These subsets can have either zero, one, two, or three elements. Now there are
The number of all subsets is the sum of the numbers of zero, one, two, and three element subsets, which is 1 + 3 + 3 + 1 = 8 = 2^3.
You can generalize this to a set with n elements.
\sum_{k = 0}^{n} \binom{n}{k} = \underbrace{\binom{n}{0}}_{\text{\# zero el. sets}} + \underbrace{\binom{n}{1}}_{\text{\# one el. sets}} + \underbrace{\binom{n}{2}}_{\text{\# two el. sets}} + \ldots + \underbrace{\binom{n}{n}}_{\text{\# n el. sets}} = 2^n
Exercise 5.7 An IT department has four lead developers and 12 junior developers. The company wants to form four teams of equal size out of all developers. How many ways can the teams be formed?
If the teams are formed at random (i.e. each group of four developers is equally likely), what is the probability that all teams include a lead developer?
Exercise 5.8 You roll three dice (6-sided) and sum the results. Which event is more probable: rolling a sum of 11 or rolling a sum of 12?
# (Optional) Write a simulation to estimate the probabilities of the sum of three dice rolls being A) 11 and B) 12.
Exercise 5.9 (Birthdays in a Classroom) In our class of (let’s say) n students, what is the probability (as a function of n) that at least two students share the same birthday? For simplicity, assume that there are 365 days in a year and that each student is equally likely to be born on any day of the year. What is the smallest value of n for which the probability is greater than 1/2?
It is easier to compute the probability that no two students share the same birthday and later subtract this probability from 1 to get the probability that at least two students share the same birthday.
It is easier to calculate the probability that no two students share the same birthday and then subtract this probability from 1 (complement rule).
First, there are 365^{20} possible sequence of birthdays for the 20 students.
The number of sequences where no two students share a birthday is 365 \times 364 \times 363 \times \ldots \times 346 (k-permutations).
So the probability that no two students share a birthday is: (number of sequences where no two students share a birthday) / (total number of sequences), because we have assumed that all sequences are equally likely. So the probability of no two students sharing a birthday is:
\frac{365 \times 364 \times 363 \times \ldots \times 346}{365^{20}}
The probability that at least two students share a birthday is 1 - \text{probability of no two students sharing a birthday}.
The following code computes this probability using the fact that
\frac{365 \times 364 \times 363 \times \ldots \times 346}{365^{20}} = \frac{365!}{(365 - 20)!} \times \frac{1}{365^{20}}
# Probability of no shared birthdays in a group of 20 people
p_no_shared = (math.factorial(365) / math.factorial(365 - 20)) / (365 ** 20)
print("The probability of no shared birthdays in a group of 23 people is approximately", p_no_shared)
# Probability of at least one shared birthday in a group of 20 people
p_shared = 1 - p_no_shared
print("The probability of at least one shared birthday in a group of 23 people is approximately", p_shared)The probability of no shared birthdays in a group of 23 people is approximately 0.58856161641942
The probability of at least one shared birthday in a group of 23 people is approximately 0.41143838358058005
# Simulation
# The following simulation creates `n_groups` groups of `group_size` people and assigns them
# a random number between 1 and 365 (representing the day of the year).
# We then check if there are any shared birthdays in each group.
n_groups = 1000
group_size = 20
# We select one number between 1 and 365 for each person in each group. The reason we set the upper limit to 366 is that this boundary is not included in the random selection,
# so the last number we can get is 365.
groups = np.random.randint(1, 366, (n_groups, group_size))
groupsarray([[ 43, 224, 287, ..., 295, 147, 25],
[ 68, 157, 160, ..., 164, 49, 270],
[124, 158, 310, ..., 353, 141, 135],
...,
[ 82, 47, 297, ..., 115, 150, 172],
[355, 363, 94, ..., 315, 306, 223],
[198, 352, 194, ..., 45, 326, 364]])
groups.shape(1000, 20)
# Now we need to check if a group (row in the groups matrix) has any shared birthdays.
# We can do this by checking if the number of unique elements in a row is less than the number of elements in the row.
# If this is the case, it means that there are shared birthdays in the group.
n_shared = 0
for row in groups:
# np.unique() returns the sorted unique elements of an array
# len counts the number of elements in the array
if len(np.unique(row)) != len(row):
# If the number of unique elements is less than the number of elements, there are shared birthdays
# and we increment the counter n_shared
n_shared = n_shared + 1
# After we have checked all groups, we print the result
print("Found shared birthdays in", n_shared, "out of", n_groups, "groups.")
print("The proportion of groups with shared birthdays is", n_shared / n_groups)Found shared birthdays in 419 out of 1000 groups.
The proportion of groups with shared birthdays is 0.419
Exercise 5.10 (Cards) 
You play a game of poker with a standard deck of 52 cards. You are dealt a hand from a well-shuffled deck of cards. What are the probabilities of the following events:
First, note that with poker hands, we draw the cards without replacement and the order of the cards does not matter (it only matters which cards we have in the hand, not the order in which we draw them).
Therefore, there are \binom{52}{5} ways to draw a hand of 5 cards from a deck of 52 cards.
There are \binom{4}{4} = 1 way to draw four aces and there are five cards in the hand. The fifth card can be any of the remaining (52 - 4 = 48) cards in the deck. The probability of drawing a hand with four aces is:
\frac{\binom{4}{4} \times \binom{48}{1}}{\binom{52}{5}} = \frac{48}{\binom{52}{5}}
Four of a kind is more common than four aces, because we can have four cards of any rank (four 2s, four 3s, four jacks, etc.). There are \binom{13}{1} ways to select the rank of the four cards. As in the case of the aces, we still need to fill the fifth card and we can do it in 48 ways, because there are 48 cards left in the deck after we have drawn the four cards of the same rank. The probability of drawing a hand with four of a kind is:
\frac{\binom{13}{1} \times \binom{48}{1}}{\binom{52}{5}} = \frac{13 \times 48}{\binom{52}{5}}
For a full house we need to select two ranks, one for the three cards and one for the two cards. There are \binom{13}{1} ways to select the rank of the three cards and \binom{4}{3} ways to select the three cards of that rank. There are 12 remaining ranks and \binom{4}{2} ways to select the two cards of the second rank. The probability of drawing a full house is:
\frac{\binom{13}{1} \times \binom{4}{3} \times \binom{12}{1} \times \binom{4}{2}}{\binom{52}{5}}
Detailed explanation:
For example if we select the rank 2, we have \binom{4}{3} = 4 choices for the three of a kind:
Next we need to select the rank of the two cards and there are \binom{12}{1} ways to do this. As in the previous step, we must think about the number of ways to select the rank of these two cards. If we have selected the rank 2 for the three of a kind, we can have the two of a kind in the hand: 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A. There are 12 choices for the rank of the two of a kind.
Within each rank of the two cards we need to select two of the four suits: \binom{4}{2}.
For example if we select the rank 3, we can have the two of a kind in the hand:
For the probabilities of other events that we have not covered, look up the number of hands and probabilities in the Wikipedia article on poker probabilities.