muchen 牧辰

Random Variable Generation

Updated 2017-11-05

Given an uniform random variable, how do we generate random numbers with a specific probability distribution?

Inverse Transform Algorithm

Discrete Random Variables

To generate discrete random variables with distribution \(\mathbb P(X=x_i)=p_i\), for \(i=1,\dotsc,n\), we follow the following algorithm.

  1. Generate continuous uniform random variable \(Y\sim U(0,1)\)
  2. Set \(X\) based on where \(Y\) is on the distribution:
\[X=\begin{cases} x_1& Y\in[0,p_1)\\ x_2 & Y\in[p_1, p_1+p_2)\\ x_3 & Y\in[p_1 + p_2, p_1 + p_2+p_3)\\ \vdots & \vdots\\ x_n & Y\in[\sum_{i=1}^{n-1}p_i,1) \end{cases}\]

Continuous Random Variables

To generate continuous random variables with distribution \(F_X(x)\), follow these steps:

  1. Generate continuous uniform random variable \(Y\sim U(0,1)\)
  2. Find the inverse of the continuous distribution \(F_X^{-1}(x)\)
  3. Set \(X=F_X^{-1}(Y)\)

“Pseudo” Proof: Suppose we have the uniform random variable \(Y\sim U(0,1)\) and we use the inverse distribution function \(F_X^{-1}\) to obtain random \(X\). Note that the distribution function for \(Y\) is, by definition, \(P_Y(Y<y)=y\).

\[\begin{align} \mathbb P(X\leq x)&=\mathbb P(F_X^{-1}(Y)\leq x)\\ &=\mathbb P(Y\leq F_X(x))\\ &=F_X(x) \end{align}\]

Which is what we wanted.

Note: This method does not work directly on Gaussian random variables, use Polar Algorithm for Gaussian random variables instead.

Polar Algorithm

As noted above, we can’t use inverse transformation method for Gaussian random variables. In order to simulate continuous random variable to have the distribution \(F_X(x)=N(0,1)\), we need to follow these steps:

  1. Generate continuous uniform random variables \(Y_1\sim U(0,1)\) and \(Y_2\sim U(0, 1)\)
  2. Set \(D=-2\ln(Y_1)\)
  3. Set \(\Theta=2\pi Y_2\)
  4. Set \(X=\sqrt{D}\cos(\Theta)\); \(X\) is now a random variable with distribution \(N(0,1)\)
  5. Set \(Z=\sqrt{D}\sin(\Theta)\); \(Z\) is now a random variable with distribution \(N(0,1)\) that is also independent from \(X\)

To obtain a random variable with non-standard normal distribution \(Z\sim N(\mu, \sigma^2)\), we use the linear properties of the Gaussian random variables.

  1. Generate \(X\sim N(0,1)\) using steps 1-4 above
  2. Set \(Z=\sigma X + \mu\); \(Z\) is now a random variable with distribution \(Z\sim N(\mu, \sigma^2)\)

Composition Method

Use this method if the distribution function \(F_X\) is composed of a sum of other distribution functions \(F_{X_i}\):

\[F_X(x)=\sum_{i=1}^{n}p_i F_{X_i}(x)\qquad\sum_{i=1}^n p_i=1,\qquad p_i\geq 0\]

Where \(F_{X_i}\) are CDFs.

To get a random variable with distribution \(F_X\), follow these steps:

  1. Generate a random variable \(I\) with discrete distribution \(\mathbb P(I=i)=p_i\) for \(i=1,2,\dotsc,n\) using the Inverse Transform method for discrete random variables
  2. Generate a random variable \(Y_I\) with distribution \(F_{X_I}\) using the Inverse Transform method for continuous random variables
  3. Set \(X\)=$Y_I\(;\)X$ is now a random variable with distribution \(X\sim F_X(x)\)

Acceptance-Rejection Method

We can use this method if it’s hard to use previous methods to generate a random variable \(X\) with distribution \(F_X\).

So we find a proposal distribution \(F_Y\) where it is easy to sample. Note that we must ensure the range of the density functions, \(f_X\) and \(f_Y\) are the same.

Next, we need to know the upper bound \(a\), where \(a\geq \frac{f_X(x)}{f_Y(x)}\).

With all that in mind, we may begin the algorithm:

  1. Generate a random variable \(Y\) from the distribution \(F_Y\) using methods previously described
  2. Generate a uniform random variable \(Z\sim U(0,1)\)
  3. Check if \(Z \leq \frac{f_X(Y)}{a f_Y(Y)}\)
    • If true (accept), then set \(X=Y\)
    • Else (reject), go back to step 1 and try again

Note: The probability of \(Y\) being accepted is \(\frac{1}{a}\), therefore we should always choose \(a\) to be as small as possible given \(a\geq 1\).

Example: half normal distribution

The half normal distribution function has the density \(f_X=\frac{2}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}\). The inverse CDF is very difficult to find, so we use the A-R method.

The range of the density of this distribution spans \([0,\infty)\), so it would be appropriate to choose the exponential random variable distribution as our proposal distribution. The density of the proposal distribution is \(f_Y(x)=e^{-x}\).

Thus

\[\frac{f_X(x)}{f_Y(x)}=\frac{2}{\sqrt{2\pi}}e^{-\frac{x^2}{2}+x}\]

Taking the derivative and the second derivative, we find the maximum is at \(x=1\). Thus, our upper bound is

\[\frac{f_X(1)}{f_Y(1)}=\frac{2}{\sqrt{2\pi}}e^\frac12\approx1.3=a\]

Now we apply the A-R algorithm. First we generate an exponential random variable, \(Y\sim f_Y\).

Next, we generate \(Z\sim U(0,1)\).

Lastly, we set \(X=Y\) if \(Z\leq \frac{\frac{2}{\sqrt{2\pi}}e^{y-\frac{y^2}{2}}}{1.3}\)

Vector Random Variables Generation

Suppose we have a vector consists of two random variables where \((x, y)\sim F_{X,Y}(x,y)\).

If \(X\) and \(Y\) are independent. Great! Then we can just generate each of them independently.

If not, then the algorithm is as follows:

  1. Express the joint PDF of the two random variables as a conditional PDF

    \[f_{X,Y}(x,y)=f_{y\vert x}(y\vert x)f_X(x)\]
  2. For \(i=1,2,\dotsc,n\) do:

    1. Generate \(x_i\sim f_X(x)\)
    2. Generate \(y_i\sim f_{Y\vert X}(\left.\cdots\right\vert _{x=x_i})\)

Example:

Suppose \(f_{X,Y}(x,y)=x+y\) and \(0\leq x,y \leq 1\), then the marginal PDF for \(X\) is

\[f_X(x)=\int_0^1 f_{X,Y}(x,y)\mathrm dy=x+\frac12\]

Next, using the marginal PDF for \(X\), we can find the conditional PDF:

\[f_{Y\vert X}(y\vert x)=\frac{f_{X,Y}(x,y)}{f_X(x)}=\frac{x_i+y}{x_i+\frac{1}{2}}\]

Following the rest of the algorithm, we sample \(x_i\sim x+\frac12\) and sample \(y_i\sim \frac{x_i + y}{x_i + \frac12}\) for all \(i\)’s.