13 Estimation

Open in Google Colab:

Suppose that we want to estimate the share of persons who have read at least one of our magazines. One way that we can try to estimate this is by conducting a survey and simply ask a sample of randomly chosen persons whether they have read one of our magazines or no.

Assume that the share of persons who have read one of our magazines is p. We select n persons at random and record their answer to the question of whether they have read one of our magazines (1) or no (0).

We sample 1,000 persons and find that 300 of them answer yes to our question. What is the best estimate for the share of persons who have read one of our magazines?

In order to answer this question, we need to define the sense in which we mean “best estimate”. One way to define this is to say that the best estimate is the value of p that maximizes the likelihood of observing the data that we have observed. This is known as the maximum likelihood estimate.

The likelihood of observing the data that we have observed is given by the probability mass function of the binomial distribution. The probability mass function of the binomial distribution is given by

P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k},

where X is the random variable that counts the number of persons who have read one of our magazines, n is the number of persons that we have sampled, and x is the number of persons who have read one of our magazines. In our case, n = 1,000 and k = 300. So what is the value of p that maximizes this probability?

We can calculate the first derivative of the likelihood function with respect to p and set it equal to zero. This will give us the value of p that maximizes the likelihood function. This is known as maximum likelihood (ML) estimation.

The first derivative of the likelihood function with respect to p is given by

\frac{d}{dp} P(X = k) = \binom{n}{k} k p^{k - 1} (1 - p)^{n - k} - \binom{n}{k} p^k (n - k) (1 - p)^{n - k - 1}.

Now we need to find where this derivative is equal to zero:

\binom{n}{k} k p^{k - 1} (1 - p)^{n - k} - \binom{n}{k} p^k (n - k) (1 - p)^{n - k - 1} = 0.

Dividing both sides by \binom{n}{k} p^{k - 1} (1 - p)^{n - k - 1}, we get

k (1 - p) - p (n - k) = 0.

Now we can solve for p

p = \frac{k}{n} = \frac{k}{n}

We call this solution the maximum likelihood estimate (MLE) of p and denote it by \hat{p}_{ML}.

\hat{p}_{ML} = \frac{k}{n}

For the binomial distribution the MLE is simply the share of persons who have read one of our magazines in the sample. In our case this is 300/1000 = 0.3.

13.1 Expected Value and Variance of the Maximum Likelihood Estimate

When we study the statistical properties of an estimator (expected value and variance in our case), we are interested in knowing how the estimator behaves in large samples (asymptotically). In our case, we are interested in knowing how the sample proportion of readers behaves as the sample size increases.

Estimator

When we study the statistical properties of an estimator (a function of the observations) we must view it as a function (a random variable).

The expected value of the ML estimator of p is:

\begin{align*} E(\hat{p}_{ML}) & = E\left(\frac{1}{n} \sum_{i = 1}^{n} X_i \right) \\ & = \frac{1}{n} \sum_{i = 1}^{n} E(X_i) \\ & = \frac{1}{n} \sum_{i = 1}^{n} p \\ & = p. \end{align*}

So the expected value of the sample proportion of readers is p. This means that the sample proportion of readers is an unbiased estimator of the share of persons who have read one of our magazines.

What is the variance of this estimate? The variance of the sample proportion of readers is given by

\begin{align*} \text{Var}(\hat{p}_{ML}) & = \text{Var}\left(\frac{1}{n} \sum_{i = 1}^{n} X_i \right) \\ & = \frac{1}{n^2} \sum_{i = 1}^{n} \text{Var}(X_i) \\ & = \frac{1}{n^2} \sum_{i = 1}^{n} p (1 - p) \\ & = \frac{1}{n^2} n p (1 - p) \\ & = \frac{p (1 - p)}{n}. \end{align*}

We see that the variance of the sample proportion of readers decreases as the sample size increases. This is a general property of the sample proportion of ML estimators. We say that the ML estimator is asymptotically unbiased and consistent.

# Let's see how the ML estimate behaves in a simulation

import numpy as np
import matplotlib.pyplot as plt

# Number of samples
R = 10

# Sample size
n = 7

# True parameter
p = 0.3

# Generate the samples

simulations = np.random.binomial(1, p, (R, n))

print(simulations)
# Compute the ML estimate for each sample (this is simply the sample mean)

ml_estimates = np.mean(simulations, axis=1)
print(ml_estimates)
# Plot the histogram of the ML estimates

# The mean of the ML estimates is equal to the true parameter (you need a lot of samples to see this)

print('Mean of the ML estimates:', np.mean(ml_estimates))

# The variance of the ML estimates is the true variance divided by the sample size

print('Variance of the ML estimates:', np.var(ml_estimates))


plt.hist(ml_estimates, bins=50)
plt.xlabel('ML estimate')
plt.ylabel('Density')
plt.title('Histogram of ML estimates')
plt.show()

[[0 0 1 0 1 1 1]
 [0 0 1 1 1 1 0]
 [1 1 0 1 0 1 1]
 [0 0 1 0 1 0 0]
 [1 0 0 0 0 0 0]
 [0 1 1 0 0 1 0]
 [0 0 0 0 0 1 0]
 [0 0 0 0 1 0 1]
 [0 0 0 0 0 0 0]
 [1 0 1 0 0 0 0]]
[0.57142857 0.57142857 0.71428571 0.28571429 0.14285714 0.42857143
 0.14285714 0.28571429 0.         0.28571429]
Mean of the ML estimates: 0.34285714285714286
Variance of the ML estimates: 0.04571428571428572


# The distribution of the ML estimates is approximately normal for large sample sizes (central limit theorem)

# Let's plot the histogram of the ML estimates and the normal distribution with the same mean and variance

plt.hist(ml_estimates, bins=50, density=True)
x = np.linspace(0, 1, 100)
y = 1 / np.sqrt(2 * np.pi * np.var(ml_estimates)) * np.exp(- (x - np.mean(ml_estimates))**2 / (2 * np.var(ml_estimates)))
plt.plot(x, y, color='red')
plt.xlabel('ML estimate')
plt.ylabel('Density')
plt.title('Histogram of ML estimates and normal distribution')
plt.show()

13.2 Estimation of a Normal Distribution Mean

The ML estimation method can be used to estimate the parameters of other distributions. For example, suppose that we have a random sample of n observations from a normal distribution with unknown mean \mu and known variance \sigma^2. The likelihood function of the sample is given by.

L(\mu) = \prod_{i = 1}^{n} \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(X_i - \mu)^2}{2 \sigma^2}\right).

The log-likelihood function is given by

\log L(\mu) = -\frac{n}{2} \log(2 \pi \sigma^2) - \frac{1}{2 \sigma^2} \sum_{i = 1}^{n} (X_i - \mu)^2.

Now we need to find the value of \mu that maximizes this function. The first derivative of the log-likelihood function with respect to \mu is given by

\frac{d}{d\mu} \log L(\mu) = \frac{1}{\sigma^2} \sum_{i = 1}^{n} (X_i - \mu).

We set this derivative equal to zero and sove for \mu:

\hat{\mu}_{ML} = \frac{1}{n} \sum_{i = 1}^{n} X_i.

As in the case of the binomial distribution, the ML estimator of the mean of a normal distribution is the sample mean. The ML estimator of the mean is unbiased and consistent. To see that, calculate the expected value of the sample mean:

\begin{align*} E(\hat{\mu}_{ML}) & = E\left(\frac{1}{n} \sum_{i = 1}^{n} X_i \right) \\ & = \frac{1}{n} \sum_{i = 1}^{n} E(X_i) \\ & = \frac{1}{n} \sum_{i = 1}^{n} \mu \\ & = \mu. \end{align*}

The variance of the ML estimator of the mean of a normal distribution is given by

\text{Var}(\hat{\mu}_{ML}) = \frac{\sigma^2}{n}.

and obviously decreases as the sample size increases.

13.3 Estimation of a Normal Distribution Variance

The second distribution of the normal distribution is the variance (\sigma^2). We can derive the ML estimator in the same way as we did for the mean by differentiating the log-likelihood function with respect to \sigma^2 and setting the derivative equal to zero. Note that we are differentiating with respect to \sigma^2 and not \sigma.

\begin{align*} \frac{d}{d\sigma^2} \log L(\sigma^2) & = -\frac{n}{2 \sigma^2} + \frac{1}{2 \sigma^4} \sum_{i = 1}^{n} (X_i - \mu)^2 = 0\\ \frac{n}{2 \sigma^2} & = \frac{1}{2 \sigma^4} \sum_{i = 1}^{n} (X_i - \mu)^2 \\ \sigma^2 & = \frac{1}{n} \sum_{i = 1}^{n} (X_i - \mu)^2. \end{align*}

Unfortunately, we cannot use the result that we have obtained solving the equation directly because it involves the unknown parameter \mu. What we can do is to substitute the ML estimator of \mu into the equation and obtain the ML estimator of the variance:

\hat{\sigma}^2_{ML} = \frac{1}{n} \sum_{i = 1}^{n} (X_i - \hat{\mu}_{ML})^2.

However, the ML estimator of the variance is biased (though asymptically unbiased). To correct for this bias, we can the following (corrected) estimator which you have already seen under the name of sample variance:

\hat{\sigma}^2 = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_i - \hat{\mu}_{ML})^2.

Note that the difference between the sample variance and the ML estimator of the variance is the denominator and that this difference is only relevant for small samples. For large samples, the difference is negligible.

# Sample from a normal distribution with mean 0 and variance 1

R = 10000
n = 10

normal_samples = np.random.normal(0, 1, (R, n))

# Compute the ML estimate for mu for each sample (this is simply the sample mean)

ml_estimates = np.mean(normal_samples, axis=1)

# Print the average of the ML estimates

print('Mean of the ML estimates:', np.mean(ml_estimates))

Mean of the ML estimates: 0.001350875403208062

# Calculate the ML estimate for the variance

ml_estimate_variance = np.var(normal_samples, axis=1, ddof=0)

# Print the average of the ML estimates for the variance

print('Mean of the ML estimates for the variance:', np.mean(ml_estimate_variance))

Mean of the ML estimates for the variance: 0.8977580120279688

# Calculate the corrected ML estimate for the variance

ml_estimate_variance_corrected = np.var(normal_samples, axis=1, ddof=1)

# Print the average of the corrected ML estimates for the variance

print('Mean of the corrected ML estimates for the variance:', np.mean(ml_estimate_variance_corrected))

Mean of the corrected ML estimates for the variance: 0.9975089022532988