6  Simulations Exercise

Open in Google Colab: Open in Colab

Exercise 6.1 (Dice Rolls Simulation) You roll two six-sided dice 1000 times. Assume that the dice are fair and that the trials are independent. For the following events first compute their probability by counting the number of outcomes in the sample space that satisfy the event and dividing by the total number of outcomes. After that, simulate the experiment and compare the theoretical probability with the empirical probability for each event using 10 and 10,000 simulations.

  • Event A: The sum of the dice is greater than 10.
  • Event B: The sum of the deice is 7 and the first die is 3.
  • Event C: The sum of the dice is greater than 7 and the first die is 5.
  • Event D: The sum of the dice is greater than 8 and the first roll is smaller or equal to the second roll.

Use Exercise 3.3 as a starting point.

We can consider a sample space of ordered pairs (first die, second die) with 36 equally likely outcomes. The probability of each outcome is 1/36.

The sample space is:

\Omega = \left\{ \begin{array}{cccccc} (1,1) & (1, 2) & (1, 3) & (1, 4) & (1, 5) & (1, 6) \\ (2,1) & (2, 2) & (2, 3) & (2, 4) & (2, 5) & (2, 6) \\ (3,1) & (3, 2) & (3, 3) & (3, 4) & (3, 5) & (3, 6) \\ (4,1) & (4, 2) & (4, 3) & (4, 4) & (4, 5) & (4, 6) \\ (5,1) & (5, 2) & (5, 3) & (5, 4) & (5, 5) & (5, 6) \\ (6,1) & (6, 2) & (6, 3) & (6, 4) & (6, 5) & (6, 6) \\ \end{array} \right\}

Let A be the event that the sum of the dice is greater than 10. The event A is:

A = \left\{ \begin{array}{cccccc} (5,6) & (6,5) & (6,6) \\ \end{array} \right\}

It has 3 outcomes, so the probability of A is 3/36 = 1/12.

Let B be the event that the sum of the dice is 7 and the first die is 3. The event B is:

B = \left\{ \begin{array}{c} (3,4) \\ \end{array} \right\}

It has 1 outcome, so the probability of B is 1/36.

Let C be the event that the sum of the dice is greater than 7 and the first die is 5. The event C is:

C = \left\{ \begin{array}{cccc} (5,3) & (5,4) & (5,5) & (5,6) \\ \end{array} \right\}

It has 4 outcomes, so the probability of C is 4/36 = 1/9.

Let D be the event that the sum of the dice is greater than 8 and the first die is smaller or equal to the second die. The event D is:

D = \left\{ \begin{array}{cccc} (6, 6) & (5, 6) & (4, 6) & (3, 6) \\ (5, 5) & (4, 5) \\ \end{array} \right\}

It has 6 outcomes, so the probability of D is 6/36 = 1/6.

import numpy as np

# First, throw a pair of dice 10 times.
# The result is a 10x2 matrix where each row is one throw.
# The first column is the result of the first die, the second column is the result of the second die.
# Each row corresponds to one repetition of the game

rolls = np.random.choice([1, 2, 3, 4, 5, 6], size=[10, 2])
rolls
array([[2, 3],
       [4, 5],
       [3, 3],
       [6, 4],
       [3, 6],
       [4, 3],
       [4, 2],
       [1, 5],
       [4, 1],
       [2, 4]])

As the events in the exercise involve the sum of both dice, it will be convenient to be able to calculate the sum for each game (row in the rolls matrix). The np.sum() function will help us here. To understand how it works, consider the following picture of a matrix:

Matrix

Think of a matrix as having two dimensions (axes): the rows and the columns. Because in Python we start counting from 0, the first axis (dimension) has index 0 and the second axis has index 1. The np.sum() function accepts an argument axis that specifies the axis along which the sum is calculated. If axis=0, the sum is calculated along the rows (downwards), and if axis=1, the sum is calculated along the columns (horizontally).

# The following uses the function np.sum()
# Run the cell to see what it does (it sums all the elements of the matrix rolls)

np.sum(rolls)
69
# The np.sum function can also be applied column-wise (i.e. it sums all the elements of each column)
# This is done by setting the axis parameter to 0

np.sum(rolls, axis=0)
array([33, 36])
# The np.sum function can also be applied row-wise (i.e. it sums all the elements of each row)
# This is done by setting the axis parameter to 1

np.sum(rolls, axis=1)
array([ 5,  9,  6, 10,  9,  7,  6,  6,  5,  6])
# The event A is defined as the sum of the two dice being greater than 10.
# To check if the event A occurred in each repetition, we can use the comparison operator >.
# It compares the sum of the two dice to 10 and returns a boolean array with True if the sum is greater than 10 and False otherwise.

np.sum(rolls, axis=1) > 10
array([False, False, False, False, False, False, False, False, False,
       False])
# For convenience, we can store the result in a variable called eventA.

eventA = (np.sum(rolls, axis=1) > 10)
eventA
array([False, False, False, False, False, False, False, False, False,
       False])
# Count the number of games in which event A occurred by summing the elements of the eventA array.

np.sum(eventA)
0
# Calculate the proportion of games in which that event A occurred. Because the average (mean) is simply the sum of the elements divided by the number of elements, 
# it gives the proportion of games in which event A occurred. Multiply by 100 to get the percentage.

100 * np.mean(eventA)
0.0
# Compare the result to the theoretical probability of the event A (here multiplied by 100 to get the percentage).

100 * 3 / 36
8.333333333333334
# You can also print the results with an explanatory text.

print("Event A occurred in", np.sum(eventA), "games.")
print("Event A occurred in", 100 * np.mean(eventA), "percent of the games.")
Event A occurred in 0 games.
Event A occurred in 0.0 percent of the games.

For the next events we need to access the first and the second of rolls. In numpy we can access the first column with rolls[:, 0] and the second column with rolls[:, 1]. The syntax works in the following way: a matrix with two dimensions is accessed with matrix[row, column]. If we want to access all the rows of a column, we use :. So, rolls[:, 0] means “all the rows of the first column of the matrix rolls”. Remember that we are counting from zero in Python, so the first column is the column with index 0.

# Selects the first column of the array (same as above)

rolls[:, 0]
array([2, 4, 3, 6, 3, 4, 4, 1, 4, 2])
rolls[:, 0] == 3
array([False, False,  True, False,  True, False, False, False, False,
       False])
# Create a new array called eventB that is True if:
# - the first die equals 3
# AND
# - the roll sum is equal to 7
# Hint: Use the & operator (logical AND)
 
eventB = (rolls[:, 0] == 3) & (np.sum(rolls, axis=1) == 7)
eventB
array([False, False, False, False, False, False, False, False, False,
       False])
# Create a new array called eventC that is True if:
# - the roll sum is greater than 7
# AND
# - the second die is equal to 5

eventC = (np.sum(rolls, axis=1) == 7) & (rolls[:, 0] == 3)
eventC
array([False, False, False, False, False, False, False, False, False,
       False])
# Create a new array called eventD that is True if:
# - the roll sum is greater than 8
# AND
# - the first die is smaller or equal to the second die

eventD = (np.sum(rolls, axis=1) > 8) & (rolls[:, 0] <= rolls[:, 1])

6.1 The Matching Problem

Some results in probability theory can be confusing and counter-intuitive. A classic example is the matching problem. Consider a room with n dancing couples. If the couples are randomly paired, what is the probability that after the pairing, no one is matched with their original partner?

Imagine that the leads are numbered from 1 to n and the follows are also numbered from 1 to n.

The original arrangement of the couples and two possible rearrangements are shown in the table below (for five couples numbered from 0 to 4).

Leads 0 1 2 3 4 Number of matches
Follows (original) 0 1 2 3 4 5
Follows (rearranged 1) 2 4 1 3 0 1 (the fourth pair)
Follows (rearranged 2) 0 1 3 4 2 2 (first and second pairs)
Follows (rearranged 2) 1 0 4 3 2 0 (no matches)

We will approach the problem with a simulation. We will randomly pair the couples and count the number of times no one is matched with their original partner.

How to Read the Simulation Code

The code below is a simulation of the matching problem. It has two parameters

  • n: the number of couples
  • repetitions: the number of repetitions (how many time we will randomly pair the couples)

The code right now sets n = 5 and repetitions = 3 so that you can inspect the data and understand the data flow (how the simulation actually works).

After you understand the code, you can change the parameters to for example n = 10000 and repetitions = 10000. You should play around with the parameters and look at what happens to the proportion of repetitions where no one is matched with their original partner when you vary n and repetitions.

n = 5 # Number of pairs
repetitions = 3 # Number of simulations

# Create a list (an array) of integers from 0 to n-1 (corresponding to the partners)

partners = np.arange(n)
partners
array([0, 1, 2, 3, 4])
# NOTE: code for illustration purposes only

# A single experiment: shuffle the partners and check if there are any matches

partners_shuffled = np.random.permutation(partners)
print("Partners after shuffling:")
partners_shuffled
Partners after shuffling:
array([0, 3, 1, 2, 4])
# If the value in the shuffled array is the same as in the original array, it means that the partners are the same
# as in the original couple

# We can check if there are any matches by comparing the two arrays element-wise

matches = (partners == partners_shuffled)

print("Matches:")
matches
Matches:
array([ True, False, False, False,  True])
# We are interested in the event "No partner gets their original partner".
# 

n_matches = np.sum(matches)

print("Number of matches:", n_matches)
Number of matches: 2
# Now we can repeat the experiment a large number of times (repetitions) 
# and count how many times there were no matches

# Set the number of matches to zero
no_matches_counter = 0

for i in range(repetitions):
    # Shuffle the partners
    partners_shuffled = np.random.permutation(partners)
    
    # Check if no one got their original partner.
    no_match_in_permutation = np.sum(partners == partners_shuffled) == 0
    
    # If yes, increment the no_matches counter
    if no_match_in_permutation:
        no_matches_counter += 1

print("Number of times (simulations) where no partner got their original partner:", no_matches_counter)
print("Proportion of no partner getting their original partner:", no_matches_counter / repetitions)
Number of times (simulations) where no partner got their original partner: 0
Proportion of no partner getting their original partner: 0.0

6.2 The for Loop in Python

The following pieces of code are not part of the simulation. They are just to show you how to use a for loop in Python and how to use the range function.

# The range starting from 0 to n-1
# Try it out

list(range(5))
[0, 1, 2, 3, 4]
# The previous code loops over the number of simulations and counts the number of matches in each simulation
# It uses a for loop and the np.random.permutation() function

# To see what a loop does you can use the print() function to print the value of a variable inside the loop

# range(3) will create a list of integers from 0 to 2
for i in range(3):
    print("Simulation", i)
Simulation 0
Simulation 1
Simulation 2
# We will use a variable to count the number of matches in each simulation

# See how this works in a simple example

counter = 0

for i in range(10):
    print("Counter before incrementing =", counter)
    counter = counter + 1
    print("Counter after incrementing =", counter)
    
print("Final counter value =", counter)
Counter before incrementing = 0
Counter after incrementing = 1
Counter before incrementing = 1
Counter after incrementing = 2
Counter before incrementing = 2
Counter after incrementing = 3
Counter before incrementing = 3
Counter after incrementing = 4
Counter before incrementing = 4
Counter after incrementing = 5
Counter before incrementing = 5
Counter after incrementing = 6
Counter before incrementing = 6
Counter after incrementing = 7
Counter before incrementing = 7
Counter after incrementing = 8
Counter before incrementing = 8
Counter after incrementing = 9
Counter before incrementing = 9
Counter after incrementing = 10
Final counter value = 10
# We can choose when to increment the counter by using an if statement

counter = 0

for i in range(10):
    if i % 2 == 0:
        print(i, "is even, incrementing counter")
        counter = counter + 1
    else:
        print(i, "is odd, not incrementing counter")

    print("Counter:", counter)

print("Final counter:", counter)
0 is even, incrementing counter
Counter: 1
1 is odd, not incrementing counter
Counter: 1
2 is even, incrementing counter
Counter: 2
3 is odd, not incrementing counter
Counter: 2
4 is even, incrementing counter
Counter: 3
5 is odd, not incrementing counter
Counter: 3
6 is even, incrementing counter
Counter: 4
7 is odd, not incrementing counter
Counter: 4
8 is even, incrementing counter
Counter: 5
9 is odd, not incrementing counter
Counter: 5
Final counter: 5