Descriptive Statistics
Measures of Central Tendency
Mean (
where:
= total number of data points = each data point
Mean (
where:
= number of data points in the sample
Median
- If
is odd:
- If
is even:
Mode
- Value that appears most frequently in the dataset.
Measures of Dispersion
Variance (
Standard Deviation (
Variance (
Standard Deviation (
Range
Probabilities
Basic Concepts
Probability of an event
Complementary Probability
Probability Rules
Addition Rule (for mutually exclusive events)
Addition Rule (for non-mutually exclusive events)
Multiplication Rule (for independent events)
Multiplication Rule (for dependent events)
where
Law of Large Numbers
As the size of a sample increases, the sample mean will approach the expected mean of the population.
Weak Form
For a sequence of independent and identically distributed (i.i.d.) random variables with a mean
for any
This means that the sample mean tends to get closer to the population mean, but there is no guarantee that this will happen in every case.
Strong Form
States that the sample mean converges almost surely (with probability 1) to the population mean, as the sample size approaches infinity
This means that the sample mean will definitely equal the population mean when the number of trials is infinite.
Example
We roll a die 1000 times and calculate the mean of the results. As the number of rolls (n) increases, the mean of the results will approach the expected value (
If we roll the die once, we can get any number from 1 to 6, but as we increase the number of rolls, the mean of those rolls will approach 3.5.
Combinatorics
Fundamental Counting Principle
If one task can be performed in
Permutations
Permutations of
Permutations of
Combinations
Combinations of
Combinations with repetition
Binomial Theorem
Binomial expansion
Bernoulli Trials
A Bernoulli trial is a random experiment that has the following characteristics:
Discrete Outcomes: It has only two possible outcomes, typically called success (usually represented by 1) and failure (represented by 0).
Constant Probability: The probability of success
is constant in each trial. Consequently, the probability of failure is .Independence: The trials are independent, meaning the outcome of one trial does not affect the outcome of another.
Examples
- Tossing a coin, where “heads” can be considered a success and “tails” a failure
- Taking an exam, where “passing” is considered a success and “failing” a failure
- Measuring the effectiveness of a medical treatment in which the outcome can be “effective” (success) or “ineffective” (failure)
Probability Distributions
Discrete Distribution
Discrete distributions are those that describe probabilities of variables that can only take specific and finite values, such as integers.
Binomial Distribution
Models the number of successes in a sequence of independent trials, each with the same probability of success.
where:
= number of trials = number of successes = probability of success in a single trial = binomial coefficient
Negative Binomial Distribution
Models the number of failures before achieving a fixed number of successes in Bernoulli trials.
where:
is the number of successes required is the probability of success is the number of failures
Geometric Distribution
Models the number of trials until the first success in a sequence of Bernoulli trials.
where:
is the probability of success is the number of trials until the first success
Hypergeometric Distribution
Models the number of successes in a fixed-size sample drawn without replacement from a finite population.
where:
is the population size is the number of successes in the population is the sample size
Poisson Distribution
Models the number of events that occur in a fixed interval of time or space when events occur with a constant average rate.
where:
is the average rate of occurrence of the events is the number of events
Continuous Distribution
Normal (Gaussian) Distribution
Models many variables in nature and society. It is symmetric and bell-shaped.
where:
is the mean is the variance
Cumulative Distribution Function of the Normal (CDF)
Exponential Distribution
Models the time between events in a Poisson process. It is used in reliability theory and waiting times.
where:
is the rate of occurrence of the events.
Uniform Distribution
Models a variable that has the same probability of taking any value within a defined interval.
where:
and are the limits of the interval.
Gamma Distribution
Generalizes the exponential distribution. Models the time until
where:
is a shape parameter is a rate parameter
Beta Distribution
Primarily used in Bayesian statistics to model probability distributions in proportions or probabilities.
where:
and are shape parameters is the beta function
Cauchy Distribution
Models phenomena where the mean and variance are undefined or infinite.
where:
is the location is the scale parameter
Student’s t-distribution
Used to estimate the mean of a normally distributed population when the sample size is small and the variance is unknown.
where:
are the degrees of freedom, which determine the shape of the distribution. As increases, the t-distribution converges to a standard normal distribution.
Correlation and Regression
Correlation
Pearson Correlation Coefficient (
Linear Regression
Regression Line Equation
where:
= slope = intercept
Slope
where:
and are the standard deviations of and , respectively.
Statistical Inference
Parameter Estimation
Point Estimation (sample mean)
Confidence Intervals
Confidence Interval for the Mean
where
Hypothesis Testing
Hypothesis Test (p-value)
- If
is the null hypothesis and is the alternative hypothesis:
Test Statistic for the Mean
Correlation and Regression
Correlation
Pearson Correlation Coefficient (
Linear Regression
Regression Line Equation
where:
= slope = intercept
Slope
where:
and are the standard deviations of and , respectively.