# Modeling Chronic Costly Patients Using Data Mining Techniques CSC 323 Quarter: Winter 02/03 Daniela Stan Raicu School of CTI, DePaul University 02/09/20 Daniela Stan - CSC323 1 Outline Chapter 4: Probability The Study of Randomness The Law of Large Numbers Random Variables Means and Variances of Random Variables 02/09/20 Daniela Stan - CSC323 2 The Law of Large Numbers Probability describes the long-term proportion with which a certain outcome will occur in situations with short-term uncertainty. Probability is a measure of the likelihood of a random

phenomenon or a chance behavior. Example: Flip a coin 1000 times and compute the proportion of heads observed after each toss of the coin. Proportion of heads: 1/1, 1/2, 1/3, 1/4, 3/5, 4/6,, 504/998, 503/999, 501/1000 As the number of flips of the coin increases, the graph tends toward a proportion of 0.5 02/09/20 Daniela Stan - CSC323 3 The Law of Large Numbers Therefore, we say that the probability of observing a head is or 50% because as the number of repetitions of the experiment increases, the proportion of the heads tends towards . The phenomenon is referred to as the Law of Large Numbers. The law of large numbers is about regularity in the long run and forms the foundation of gambling casinos and insurance companies. For instance, at the craps table the chance of winning the dont pass bet is 49.29% making it almost a fair game. Some people win and some people lose, but on average the casino takes in 1.4cents per dollar bet. In the long run, as tens of thousands of people play, this 1.4 cents per bet is as predictable as a paycheck. In the same way an insurance company balances the risk of a huge payout by collecting many small premiums. The regularity over a large group of customers

makes up for the occasional unpredictable disaster. Other examples: Internet Server Providers use the law of large numbers for resources allocation 02/09/20 Daniela Stan - CSC323 4 Random Variable The longer a random process is repeated under the same conditions, the closer the observed proportion of each outcome occurrence is to the actual probability of occurrence. A convenient way of representing a random phenomenon is through a random variable. We can associate a variable to each random process. The values of such a variable are the possible outcomes of the random process. 02/09/20 Daniela Stan - CSC323

5 Random variables For instance X = number of heads in 4 tosses of a coin. These are the possible outcomes: TTTT X=0 HTTT THTT TTHT TTTH X=1 HHTT HTHT HTTH THHT THTH TTHH X=2 HHHT HHTH HTHH THHH X=3

HHHH X=4 A probability value can be associated to each value of X. 02/09/20 Daniela Stan - CSC323 6 TTTT HTTT THTT TTHT TTTH HHTT HTHT HTTH THHT THTH TTHH HHHT HHTH HTHH THHH

X=0 X=1 X=2 X=3 HHH H X=4 If X=0, then no head comes up, so the probability is 1/16 If X=1, then only one head come up, the probability is 4/16 If X=2, then 2 heads come up, the probability is 6/16 If X=3, then 3 heads come up, the probability is 4/16 If X=4, then 4 heads come up, the probability is 1/16 X 0 1 2 3 4

Probability 0.0625 0.25 0.375 0.25 0.0625 02/09/20 Daniela Stan - CSC323 7 Probability Histograms The probability table associated to a random process or a random variable can be displayed as a probability histogram. For example the probability histogram of the number of heads in 4 tosses of a coin is displayed below: X 0 Probability

Chance (%) 1 2 3 4 0.0625 0.25 0.375 0.25 0.0625 40 The number of heads in 4 tosses of a coin would be a number around 2. 30 20 10 0 02/09/20 1 2

3 4 Daniela Stan - CSC323 8 Expected value & standard error of a random variable Let us assume that we flip a coin for 100 tosses and we expect to get 50 heads; however, we might get 57 heads, which is 7 heads above the expected value of 50. Toss the coin 100 times again, you might get 55, which is 5 heads above 50 Again you might get 48, which is 2 heads below 50.and so on. The numbers delivered by the process vary around the expected value, the amount off being similar in size to the standard error. Thus in 100 tosses the expected value for the number of heads is 50. The standard error is a measure of the chance error. We will now define these two quantities: Expected value Standard error 02/09/20 Daniela Stan - CSC323 9

The Expected Value The expected value of the number of heads in 4 tosses of a coin is 2. The expected value of the number of heads in 100 tosses of a coin is 50. In statistical terms: It is calculated by multiplying each possible value by its probability, then adding all the products. number of heads in 4 tosses X Probability 0 1 0.0625 0.25 2 3 4 0.375 0.25 0.0625 Expected value of X= 0*0.0625+1*0.25+2*0.375+3*0.25+4*0.0625= = 0 + 0.25 + 0.75 +0.75 + 0.25 = 2

02/09/20 Daniela Stan - CSC323 10 The expected value The expected value is the mean (average) of the probability histogram. Ex: The expected value of the number of heads in 4 tosses of a coin is 2. Chance (%) 40 30 20 10 0 1 2 3 4 X=2 02/09/20

Daniela Stan - CSC323 11 Standard Error The standard error measures the spread of the probability histogram. The standard error of the number of heads in 4 tosses of a coin is 1. Chance (%) 40 30 X1s.e.=2 1=1 X+1s.e.=2+1=3 20 10 0 1 2 3 1 s.e. 1 s.e. 4 X=2

Remark: Observed values are rarely more than 2 or 3 standard errors away!! 02/09/20 Daniela Stan - CSC323 12 Mathematical Expressions Given a random variable X with probability table X x1 x2 x3 x4 xk Probability p1 p2

p3 p4 pk The expected value is X x1 p1 x 2 p 2 x3 p3 ... x k p k The standard error is S .E.( X ) ( x1 X ) 2 p1 ( x 2 X ) 2 p 2 ( x3 X ) 2 p3 ... ( x k X ) 2 p k 02/09/20 Daniela Stan - CSC323 13 Expected payoff in gambling A game is fair if the expected value for the net gain equals 0: on the average players neither win or lose. Keno: In the game Keno, there are 80 balls, numbered 1 to 80. On each play, the casino chooses 20 balls at random. Suppose you bet \$1 on 17 in each Keno play. When you win, the casino gives you your dollar back and 2 dollars more When you lose, the casino keeps your dollar. The bet pays 3 to 1. Is the bet fair? Event

Probability X=+3 You win, 17 is among the 20/80=0.25 20 balls Create the random variable: X=1 You lose, 17 is not 60/80=0.75 among 20 balls The expected value of X is 1*0.75+3*0.25=0. The game is fair! 02/09/20 Daniela Stan - CSC323 14 Remarks on random processes An observed value should be somewhere around the expected value; the difference is chance error. The likely size of the chance error is the standard error. Observed values are rarely two or three standard errors away from the expected value. The standard error is defined for random processes and measures the chance error. (Subtle difference) The standard error makes more sense if the probability histogram of the random variable is bell-shaped, (similar to the normal distribution). 02/09/20

Daniela Stan - CSC323 15 Recommended problems Problems 4.60, 4.61/page 334 Problems 4.96, 4.101/page 355 Problems 4.113, 4.115/page 360 02/09/20 Daniela Stan - CSC323 16