Applied Statistical
2023-11-13 06:06:49 0 举报
AI智能生成
161111 Applied Statistics 学习笔记
作者其他创作
大纲/内容
Exploring Data
Types of Data
Categorical Data
Table
Insert pivot table in Excel
Bar plot
Insert bar or colum graph in Excel
Numerical Data
Histogram
How to dicr histograms data)
How to dicr histograms data)
Center (mode=tallest peak)
. What does the peak of the distibution tell us
. What does the peak of the distibution tell us
Spread (range)
. What does the range tell us?
. What does the range tell us?
Shape(Skew? 1 peak?)
. Is the distibution symmetric or skew?
. Is there more than one peak?
. Is the distibution symmetric or skew?
. Is there more than one peak?
Outliers (extreme values?)
. Are there any extreme values?
. Are there any extreme values?
Boxplot
(Five Number Summery )
(Five Number Summery )
IQR(inter quartile range)
Minimum, Lower Quartile, Median, Upper Quartile, Maximum
. Minimum is the smallest value.
. Lower Quartile (LQ) is the 25% value.
. Median is the halfway value.
. Upper Quartile (UQ) is the 75% value.
. Maximum is the largest value.
. Minimum is the smallest value.
. Lower Quartile (LQ) is the 25% value.
. Median is the halfway value.
. Upper Quartile (UQ) is the 75% value.
. Maximum is the largest value.
boxplots
Bar Chart VS Pie Charts
子主题
key word
The mode
the most common group
Mean
Median
Maximum and Minimum
Standard deviation (SD)
measures how close the data are to the mean value
Shape
Sampling
Collecting Data
compare Censue and Sample
Censue: Collecting data from all individual s in the population.
Sampl: Collecting data from a subset of the individuals in the population
Parameter
Numerical summaries calculated from a population are called parameters.
Statistic
Numerical summaries calculasted from a sample are called statistics
Statistical Sampling Methods
Simple Random
Systematic
Stratified
Cluster
Sampling methods
Representative sample
Error
Sampling erreor
Non-sampling error
Coverage error
Some members of the population are not in the sampling frame.
Non-response error
People may refuse to answer some or all of the questions.
Measurement error
Badly worded or misleading questions.
Experiments and Observational studies
Experiment
Experiment unit
帮助理解 纸飞机举例: In our paper plane example, the experimental units are the paper planes themselves.
• We try to keep the paper planes similar by making them the same size, with the same material, etc.
• We try to keep the paper planes similar by making them the same size, with the same material, etc.
Response variable (反应变量)
帮助理解 纸飞机举例: In our paper plane example, the response variable is the length of time the plane remains in flight.
Treatment variable
纸飞机举例: In our paper plane example, the treatment variable is the plane design; with two levels, simple and complex.
例子,帮助理解
Experimental Design
Replication
纸飞机举例:• In our paper plane example, we would build 30 planes of the simple design, and 30 planes of the complex design.
Randomisation
纸飞机举例: • In our paper plane example, we would randomise the order in which we throw the planes. This avoids potential bias due to fatigue in the thrower.
blinding
纸飞机举例: In our paper plane example, it is difficult to include blinding; Blinding would involve the thrower not knowing which plane design is being thrown
Pairing
举例巧克力棒配方测试
Observational study
• explain the difference between an experiment and an observational study.
• identify lurking variables and explain how to control for them in an experiment.
Experiment vs Observational study
子主题
Normality Distribution
(正态分布)
(正态分布)
Properties of the normal curve
- Shape: Symmetry, one peak, flat tails, no outliers.
- mean=median=mode
例子
z score
The empircal rule
Sampling Distribution and Confidence
Sampling Distribution (抽样分布)
Standard error (SE). (标准误差)
Sampling distribution from a normal population
Central limit theorem
中心极限定理
中心极限定理
子主题
Standard error (SE). (标准误差)
两个举例的样版
Confidence inervals (置信区间 or 可信区间)
举例练习
CI = mean _+ 2 x SE (计算示例)
Conition for CI
Two conditions need to be satisfied for a CI to be for valid:
The sample is representative of the population.
The sampling distribution is normal.
If both conditions are satisfied, then we trust the CI.
The sample is representative of the population.
The sampling distribution is normal.
If both conditions are satisfied, then we trust the CI.
Interpreting confidence intervals
解释置信区间
(利用本章节所有知识,解决实际问题)
解释置信区间
(利用本章节所有知识,解决实际问题)
95% chance of what?
There is a 95% chance that the CI contains the true population mean.
This is not the same as saying there is a 95% chance that the population mean lies within the CI.
This is not the same as saying there is a 95% chance that the population mean lies within the CI.
Interpreting confidence intervals 陈述规则
关键例题, 搞懂这个题,全章节就懂了
答案1
答案2
Hypotheses testing
Write the hypotheses
The null: The true mean weight of chocolate bars is 250g.
The alternative: The true mean weight of chocolate bars is not 250g.
The alternative: The true mean weight of chocolate bars is not 250g.
Test statistic
子主题
Use p-value to make deciion
Write conclusionincontext
Check conditions
The sample is representative.
The sample mean is normally distributed.
If both conditions are satisfied, the conclusion can be trusted.
The sample mean is normally distributed.
If both conditions are satisfied, the conclusion can be trusted.
T test of differences
Compute test statitic
CI with t-test of differences
Are conditions met
Chi squared test
Write the hypotheses
The null: Infection status is not related to lake.
The alternative: Infection status is related to lake.
The alternative: Infection status is related to lake.
test statistic
the test statistic here is 7.6988
Use p-value to make decision
We decide in favour of the null, because the p-value of 0.05266 is greater than 0.05.
Write the conclusion in context
We have no evidence to suggest that infection status is related to lake.
conditions met
The sample is representative.
All expected counts are at least 5.
All expected counts are at least 5.
Calculate expected counts
Linear Modellling
Residuals and condition
checkiing conditions
Linearity.
Equal spread.
Normality.
Independence.
Equal spread.
Normality.
Independence.
子主题
Residual calculate
Correlation coefficient
One-sample t-test of slope
Compute test statistic
子主题
Use p-value to make decision

收藏
0 条评论
下一页