14 infer包

14.1 简介

library(infer)
  • specify() allows you to specify the variable, or relationship between variables, that you’re interested in.
  • hypothesize() allows you to declare the null hypothesis.
  • generate() allows you to generate data reflecting the null hypothesis.
  • calculate() allows you to calculate a distribution of statistics from the generated data to form the null distribution.

本文所有vignette都使用gss(General Social Survey)数据集,共包含11个变量。

14.1.1 SPECIFY()指定被解释变量和解释变量

specify函数可以用来指定数据集中感兴趣的变量。如果你只对受访者的年龄感兴趣,你可以这样写:

gss %>%
  specify(response = age)
## Response: age (numeric)
## # A tibble: 500 × 1
##      age
##    <dbl>
##  1    36
##  2    34
##  3    24
##  4    42
##  5    31
##  6    32
##  7    48
##  8    36
##  9    30
## 10    33
## # ℹ 490 more rows

如果您正在对一个比例或比例差异进行推理,则需要使用success参数来指定响应变量的哪个级别是成功的。例如,如果你对拥有大学学位的人口比例感兴趣,你可以使用以下代码:

# specifying for inference on proportions
gss %>%
  specify(response = college, success = "degree")
## Response: college (factor)
## # A tibble: 500 × 1
##    college  
##    <fct>    
##  1 degree   
##  2 no degree
##  3 degree   
##  4 no degree
##  5 degree   
##  6 no degree
##  7 no degree
##  8 degree   
##  9 degree   
## 10 no degree
## # ℹ 490 more rows

14.1.2 HYPOTHESIZE(): 指定零假设

14.1.3 GENERATE():

14.2 卡方检验