Updated 2017-10-19
Random Variables are used to represent numerical features of a random experiment.
It is a function with a mapping from the sample space to real numbers.
A random variable is a function defined on the sample space.
Example
- Experiment: flipping a coin 10 times
- Sample space : All possible sequence of ten
- Random variable : Number of heads
- Random variable : Largest run of tails
Suppose
Then and
The event “random variable takes the value of ” is mathematically represented as . Which means .
Random variable could be discrete or continuous.
Example:
Discrete:
number of defect items in a factor:
Continuous:
percentage yield of a chemical process:
is said to be the Bernoulli Random Variable if its probability mass function is given by:
Where is the probability that the trial is a success ().
Expected Value is an linear operator used to summarize all the random variables and allows it to be operated mathematically. Consider a function , the operator expected value (denoted by ) is defined as:
Where is the PDF given by . So since is random, is a defined function (such as ) and is also random.
The output of the expected value is given by . (If , then the expected value is given as .
The expected value is a number that is the (duh) expected value of the random variable.
A good physical interpretation of the expected value would be the center of mass given a rod with different density throughout. The density would be the probability (hence the name probability density function). And the center of mass takes consideration of not only the density, but also position.
Geometric Interpretation: minimizing sum of square of distances.
Example:
There is 0.2 chance of getting 0, and 0.8 chance of getting 1.
, and so which is misleading because the only two possible outcomes are 0 or 1.
If is the number of red balls drawn from a bucket of colored balls. Then use the following algorithm:
PMF: Probability Mass Function (A.K.K Probability Mean Function, Probability Density Function) - denoted as because it act as weights.
The domain describes the possible values, and the range describes the probability of the possible values.
Properties of PMF:
CDF: Cumulative distribution function are denoted by upper case letters such as . and they are defined as
Because of this we can also say: .
Properties of CDF:
Example:
The random variable is quantified to probability as shown in . Also notice that the properties for CDF applies at the sum of the probability of PMF
Example: sometimes the probability of random variables are not known completely
0 1 2 1 The parameter can be chosen to obtain a desired configuration of probabilities
Suppose that are independent measurement of random variable . Then it can be shown that as ,
Mind that are measurements of random variable , but once a measurement is made, realization occurs and they become .
The Law of Large Numbers states that as the sample average () will approach the population average (). Using this, we can approximate which is unknown in nature. More samples will yield more accurate approximation.
The operation is a linear operator. Which means:
For any variable where , the moment is given as
The moment generating function is the Laplace transform of the PMF
By differentiating the MGF, we can get the first (first derivative), second (second derivative), and third moment and so on.
The mean is the first moment () of the random variable.
The variance is the spread.
When is large, there is a large dispersion.
In practice, we do not know because we are missing so we will resort to measurements.
Of course, converges to as more more data is collected through experiments.
Example: there are chips:
We are sampling without replacement, and is the number of chips drawn. We want to get the maximum out of the chips drawn.
Suppose , can the maximum be ? The answer is no. The minimum is at least (drawing ).
The possible values for the maximum =
The distribution function is
The numerator is choose because we are choosing any number that isn’t larger than the maximum . The denominator is the number of ways of choosing chips.
Also
Suppose , , we also probability want to calculate mean, variance, and standard deviation for .
Let , then
The mean minimizes the Mean Square Error
The variance and mean have this property:
Consider
is said to be the Binomial Random variable with parameters (, ) if represents number of “successes” that occur in the trials.
The PMF with parameter (, ) is given by:
The (probability of getting “successes”) multiplies with (probability of getting “fails”) since each trial is independent. They’re then multiplied with - total possible ways “successes” can happen given trials.
By Binomial Theorem, the probabilities sum to :
Example: coin toss
4 coins are flipped (independent). Find probability that 2 heads (H).
Let . is a binomial random variable with parameters (, ).
Example: defective items
Any item produced by a factory will be defective with probability 0.1 (items are independent). Find probability that in a sample of 3 items, at most 1 is defective.
Let , is a binomial random variable with parameters (, ).
Example: airplane engines
- There are four-engine airplanes and two-engine airplanes
- Probability of engine failing independently is (engine working has probability )
- Airplane makes a successful flight if at least 50% of the engine remains operative
For what value of is a four-engine plane more preferable to a two-engine plane?
Let , and is a binomial random variable with two sets of parameters for the two type of planes: (, ) for the four-engine airplane and (, )or the two-engine airplane.
Now calculate the probability of four-engine airplane makes a successful flight ( is at least 2):
Also calculate the probability of two-engine plane makes a successful flight ( is at least 1):
The four-engine plane is safer if
Factoring the LHS of the inequality, we get
For the first term , we get the trivial solution of . So let’s try the second term.
For the second term , we solve for and get .
In conclusion, the four-engine airplane is safer to fly if the probability of engine working is greater than or equal to
The MGF of binomial random variable is
The mean (first derivative) is
And the variance is
The variance is maximized when
Example: urn of chips
An urn contains chips numbered 1 through . Draw chips without replacement and let be the highest number among those drawn.
Range of :
Since we are drawing without replacement, the lower bound of is . The upper bound of is . Thus the range is .
CDF :
so (number of chips drawn) takes a value less than or equal to (a particular maximum number on the chip). Thus there are possibilities.
There are total chips, so the total possibilities is expressed as .
Thus .
PMF :
By the definition of CDF and PMF, , where .
Also due to the range of stated earlier, for all .
Thus we can say .
If , calculate mean, variance, and standard deviation for :
- Mean:
- Variance:
- Standard deviation:
Example: oil corps
Suppose finding oil when digging at certain locations has probability
How many well should we dig to find oil with probability
Assume that each diggings are independent. Let be the binomial random variable for the number of successful wells. The binomial random variable has the parameters
First, we know the lower bound . We also need to find .
Using where the probability of can be found using the binomial distribution formula.
Thus we can isolate the term with exponent
and solve for
Therefore, at least 29 wells should dug
How many wells should we dig to obtain at least 2 successful wells with probability
Assume that each diggings are independent. Let be the binomial random variable for the number of successful wells. The binomial random variable has the parameters
Let’s start with the probability:
We can then equate the RHS with our requirements
And we can solve for
Suppose we wish to count number of occurrences of a certain event such as
Then the rate of occurrence is the rate in which the event of interest will occur in a time interval.
The notation is the probability of occurrences of event of interest in the time interval .
Poisson is appropriate when some of the assumptions are true.
Occurrences in disjoint time intervals are independent
The rate of occurrence is proportional to the length of time interval (The longer the time, the more occurrence)
There is at most 1 occurrence of in a small period of time (no two occurrences can happen at the exact same time)
Let be the number of occurrences, the quantity of interest. The possible values of are in a time interval.
For , The Poisson PMF is
Derivation:
Example: earthquakes
Suppose is number of earthquakes over 5.0 in some area. The Poisson random variable is ~$\mathcal P(\lambda)\lambda=3.6/\text{year}\lambda=0.3/\text{month}$.
Probability of having at least two earthquakes over 5.0 in the next 6 months
The rate for this case is rate per month times the number of months.
Let be the number of earth quakes in the next 6 months ~ .
Probability of having 1 earth quake over 5.0 next month
Let be number of earthquakes in the next month ~.
Probability of waiting more than 3 months for the next earthquake over 5.0
Waiting 3 months means that we expect 0 earthquakes in the next three months. So let be number of earthquakes in the next three months ~.
Unlike discrete, continuous random variables can take any value in the real domain.
The PDF is similar to PMF but works on continuous scale.
Properties:
Integrating the PDF yields the distribution function:
Furthermore, the CDF can be used to compute probabilities:
Note 1: Because is continuous, it is impossible for to be exactly given some number :
Note 2: Sometimes , therefore
Note 3: For some small time interval ,
Note 4: Thanks to FTC, the derivative of the distribution function is the density function:
Note 5: Since ,
Similar to the discrete counter parts, but replace the summation with integration.
UDRV are continuous and are denoted using notation ~$\text{Unif}(\alpha, \beta)X$ is a uniform distributed random variable with as lower bound and upper bound.
The density function is mathematically represented as
The distribution function is the integral of :
We can write the short form if :
And thus
First Moment
Second Moment
Variance
Example:
Suppose ~ . Calculate and
Example: change of variable
Suppose ~
That is for and for .
Derive the distribution function and density function for .
First, notice the range of , which is . Thus for .
For , we substitute for :
Rearrange the inequality on the inside and we get
So
Finally, to get the density, we differentiate :
ERV is used to model the waiting time until the occurrence of a certain event such as
The random variable is denoted as ~ .
The exponential density function has a single parameter for . This is the rate of occurrence for the event.
The general relation between the uniform and exponential random variable is
If the rate at which the event occurring is known. The mean / expected wait time is given by
That said, the variance is given by
Suppose that ~ and represents “time to system fail”. For and , compute the probability of and probability of given that .
It is straight forward for (what is probability of system lasting longer than ):
Also for the second part (what is the probability of system lasting longer than given that the system is already running for time):
As we can see, the probability of surviving additional time at age is the same for all . Which also means that the system doesn’t wear out as time increases.
Failure rate is defined as
Notice the limit is in the form of differentiation of .
Heuristic Interpretation: for small .
Over time, there are three things that can happen to failure rate:
Thus, more generally, using the failure rate equation from above, our distribution function is expressed as
Plugging into to the equation we derived above, we see that
Thus
This distribution is also known as Weibull distribution.
Naturally,