ELEC 321

Random Variables

Updated 2017-10-19

Random Variables are used to represent numerical features of a random experiment.

It is a function with a mapping from the sample space to real numbers.

Definition & Notation

A random variable is a function defined on the sample space.



Then and

The event “random variable takes the value of ” is mathematically represented as . Which means .

Range of Random Variable

Random variable could be discrete or continuous.



number of defect items in a factor:


percentage yield of a chemical process:

Discrete Random Variables


is said to be the Bernoulli Random Variable if its probability mass function is given by:

Where is the probability that the trial is a success ().

Expected Value

Expected Value is an linear operator used to summarize all the random variables and allows it to be operated mathematically. Consider a function , the operator expected value (denoted by ) is defined as:

Where is the PDF given by . So since is random, is a defined function (such as ) and is also random.

The output of the expected value is given by . (If , then the expected value is given as .

The expected value is a number that is the (duh) expected value of the random variable.

A good physical interpretation of the expected value would be the center of mass given a rod with different density throughout. The density would be the probability (hence the name probability density function). And the center of mass takes consideration of not only the density, but also position.

Geometric Interpretation: minimizing sum of square of distances.


There is 0.2 chance of getting 0, and 0.8 chance of getting 1.

, and so which is misleading because the only two possible outcomes are 0 or 1.

Random Variable as List

  1. Find the mean (sum divided by number of elements in the list)
  2. ???
  3. profit

Random Variable as “X”

If is the number of red balls drawn from a bucket of colored balls. Then use the following algorithm:

  1. Identify possible output values for
  2. Compute the probability of getting each value of
  3. (Multiplying each value of by its corresponding probability)

Probability Mass Function

PMF: Probability Mass Function (A.K.K Probability Mean Function, Probability Density Function) - denoted as because it act as weights.

The domain describes the possible values, and the range describes the probability of the possible values.

Properties of PMF:

Cumulative Distribution Function

CDF: Cumulative distribution function are denoted by upper case letters such as . and they are defined as

Because of this we can also say: .

Properties of CDF:

  1. is non decreasing because it’s accumulating


The random variable is quantified to probability as shown in . Also notice that the properties for CDF applies at the sum of the probability of PMF

Example: sometimes the probability of random variables are not known completely

2 1

The parameter can be chosen to obtain a desired configuration of probabilities

Law of Large Numbers

Suppose that are independent measurement of random variable . Then it can be shown that as ,

Mind that are measurements of random variable , but once a measurement is made, realization occurs and they become .

The Law of Large Numbers states that as the sample average () will approach the population average (). Using this, we can approximate which is unknown in nature. More samples will yield more accurate approximation.


The operation is a linear operator. Which means:


For any variable where , the moment is given as

The moment generating function is the Laplace transform of the PMF

By differentiating the MGF, we can get the first (first derivative), second (second derivative), and third moment and so on.


The mean is the first moment () of the random variable.


The variance is the spread.

Standard Deviation

When is large, there is a large dispersion.

In practice, we do not know because we are missing so we will resort to measurements.

Of course, converges to as more more data is collected through experiments.

Example: there are chips:

We are sampling without replacement, and is the number of chips drawn. We want to get the maximum out of the chips drawn.

Suppose , can the maximum be ? The answer is no. The minimum is at least (drawing ).

The possible values for the maximum =

The distribution function is

The numerator is choose because we are choosing any number that isn’t larger than the maximum . The denominator is the number of ways of choosing chips.


Suppose , , we also probability want to calculate mean, variance, and standard deviation for .


Let , then

The mean minimizes the Mean Square Error

The variance and mean have this property:

Binomial Random Variables


is said to be the Binomial Random variable with parameters (, ) if represents number of “successes” that occur in the trials.

The PMF with parameter (, ) is given by:

The (probability of getting “successes”) multiplies with (probability of getting “fails”) since each trial is independent. They’re then multiplied with - total possible ways “successes” can happen given trials.

By Binomial Theorem, the probabilities sum to :

Example: coin toss

4 coins are flipped (independent). Find probability that 2 heads (H).

Let . is a binomial random variable with parameters (, ).

Example: defective items

Any item produced by a factory will be defective with probability 0.1 (items are independent). Find probability that in a sample of 3 items, at most 1 is defective.

Let , is a binomial random variable with parameters (, ).

Example: airplane engines

For what value of is a four-engine plane more preferable to a two-engine plane?

Let , and is a binomial random variable with two sets of parameters for the two type of planes: (, ) for the four-engine airplane and (, )or the two-engine airplane.

Now calculate the probability of four-engine airplane makes a successful flight ( is at least 2):

Also calculate the probability of two-engine plane makes a successful flight ( is at least 1):

The four-engine plane is safer if

Factoring the LHS of the inequality, we get

For the first term , we get the trivial solution of . So let’s try the second term.

For the second term , we solve for and get .

In conclusion, the four-engine airplane is safer to fly if the probability of engine working is greater than or equal to

Mean and Variance

The MGF of binomial random variable is

The mean (first derivative) is

And the variance is

The variance is maximized when

Example: urn of chips

An urn contains chips numbered 1 through . Draw chips without replacement and let be the highest number among those drawn.

Example: oil corps

Suppose finding oil when digging at certain locations has probability

Poisson Random Variables

Suppose we wish to count number of occurrences of a certain event such as

Then the rate of occurrence is the rate in which the event of interest will occur in a time interval.


The notation is the probability of occurrences of event of interest in the time interval .

Poisson is appropriate when some of the assumptions are true.

  1. Occurrences in disjoint time intervals are independent

  2. The rate of occurrence is proportional to the length of time interval (The longer the time, the more occurrence)

  3. There is at most 1 occurrence of in a small period of time (no two occurrences can happen at the exact same time)

Probability Mass Function

Let be the number of occurrences, the quantity of interest. The possible values of are in a time interval.

For , The Poisson PMF is

Mean and Variance

Moment Generating Function




Example: earthquakes

Suppose is number of earthquakes over 5.0 in some area. The Poisson random variable is ~$\mathcal P(\lambda)\lambda=3.6/\text{year}\lambda=0.3/\text{month}$.

  1. Probability of having at least two earthquakes over 5.0 in the next 6 months

    The rate for this case is rate per month times the number of months.

    Let be the number of earth quakes in the next 6 months ~ .

  2. Probability of having 1 earth quake over 5.0 next month

    Let be number of earthquakes in the next month ~.

  3. Probability of waiting more than 3 months for the next earthquake over 5.0

    Waiting 3 months means that we expect 0 earthquakes in the next three months. So let be number of earthquakes in the next three months ~.

Continuous Random Variables

Unlike discrete, continuous random variables can take any value in the real domain.

Probability Density Function

The PDF is similar to PMF but works on continuous scale.


  1. The density is non-negative:
  2. The density integrates to 1:
  3. The density is used to compute probability (area under the curve is probability):

Continuous Distribution Function

Integrating the PDF yields the distribution function:

Furthermore, the CDF can be used to compute probabilities:

Note 1: Because is continuous, it is impossible for to be exactly given some number :

Note 2: Sometimes , therefore

Note 3: For some small time interval ,

Note 4: Thanks to FTC, the derivative of the distribution function is the density function:

Note 5: Since ,

Mean, Variance, and Standard Deviation

Similar to the discrete counter parts, but replace the summation with integration.



Standard Deviation

Uniformly Distributed Random Variables

UDRV are continuous and are denoted using notation ~$\text{Unif}(\alpha, \beta)X$ is a uniform distributed random variable with as lower bound and upper bound.

Uniform Density Function

The density function is mathematically represented as

Uniform Distribution Function

The distribution function is the integral of :

We can write the short form if :

And thus


First Moment

Second Moment



Suppose ~ . Calculate and

Example: change of variable

Suppose ~

That is for and for .

Derive the distribution function and density function for .

First, notice the range of , which is . Thus for .

For , we substitute for :

Rearrange the inequality on the inside and we get


Finally, to get the density, we differentiate :

Exponential Random Variables

ERV is used to model the waiting time until the occurrence of a certain event such as

The random variable is denoted as ~ .

The exponential density function has a single parameter for . This is the rate of occurrence for the event.

Exponential Density Function

Exponential Distribution Function

The general relation between the uniform and exponential random variable is

Mean and Variance

If the rate at which the event occurring is known. The mean / expected wait time is given by

That said, the variance is given by

Memoryless Property

Suppose that ~ and represents “time to system fail”. For and , compute the probability of and probability of given that .

It is straight forward for (what is probability of system lasting longer than ):

Also for the second part (what is the probability of system lasting longer than given that the system is already running for time):

As we can see, the probability of surviving additional time at age is the same for all . Which also means that the system doesn’t wear out as time increases.

Failure Rate

Failure rate is defined as

Notice the limit is in the form of differentiation of .

Heuristic Interpretation: for small .

Over time, there are three things that can happen to failure rate:

  1. Constant (failure doesn’t change with time)
  2. Increasing (failure rate increases as system wears out with time)
  3. Decreasing (failure rate decreases as system improves with time)

Thus, more generally, using the failure rate equation from above, our distribution function is expressed as

Constant Failure Rate

Plugging into to the equation we derived above, we see that

Increasing Failure Rate


This distribution is also known as Weibull distribution.

Decreasing Failure Rate