Stata 与 R 等效命令备忘录

作者：任建辉（山西财经大学）
邮箱：jianhui1986_love@126.com

Stata连享会计量专题 || 精品课程 || 简书推文 || 公众号合集

连享会计量方法专题……，https://gitee.com/arlionn/Course

1.引言

「左手 Stata，右手 Python / R」，精通一个，掌握一些。

该备忘录总结了常见的 Stata 计量经济分析命令，并提供它们在 R 中的等效命令与之对应。更多关于导入/清理数据、变量转换和其他基本命令可参考Hanck等（2019）的《Econometrics with R》，以及 Wickham和Grolemund（2017）的《R for Data Science》。本示例选自 wooldridge《计量经济学导论：现代观点》，其中 Stata 数据集的下载链接为datasets, R 数据集可直接通过安装 wooldridge 包来获取，更加的方便。除了特别说明外，所有 R 命令都源自基础R包。在其后的每小节中，我们都是分两部分代码段来展开，前一段为 stata 代码块，后一段为等效的 R 代码块。

特别申明：资料来源为 https://github.com/rstudio/cheatsheets

2.安装

注意：在stata中，一般主要依赖log文件来储存命令和结果输出，R却不然。在R中，通常使用由谢益辉编写的Rmarkdown语法创建R-markdown文件来捕获代码和结果输出。

stata代码块

ssc install outreg2
//  安装outreg2包。注意，stata安装包不需要每次使用时调用
//  在R中每次使用相应的包，需要输入library（packages name）来调用

R代码块

install.packages("wooldridge")
#install `wooldridge` package
data(package = "wooldridge")
#list datasets in `wooldridge` package
load（wage1）
#load `wage1` dataset into session
?wage1
#consult documentation on `wage1` dataset

3.基本绘图

基础绘图部分主要演示了直方图、散点图、散点图加拟合线以及分组箱线图，示例数据为 wage1。

stata代码块

use http://fmwww.bc.edu/ec-p/data/wooldridge/wage1
hist(wage)
//histogram of `wage`hist(wage), by(nonwhite)
scatter (wage edu)
//scatter plot of `wage` by `educ`
twoway (scatter wage educ) (lfit wage educ)
//scatter plot with fitted line
graph box wage, by(nonwhite)
//boxplot of wage by `nonwhite`

R代码块

library(wooldridge)
// 其余部分R代码块的运行，都是提前加载wooldridge包，不再进一步重复。
hist(wage1$wage)
# histogram of `wage``
plot(y = wage$1wage, x = wage1$educ)
abline(lm(wage1$wage~wage1$educ),col=“red”)
# add fitted line to scatterplot
boxplot(wage1$wage~wage1$nonwhite)
# boxplot of `wage` by `nonwhite`

连享会计量方法专题……

4.汇总数据

Stata的劣势是仅允许一个人每次使用一个数据集，在R中却可以同时调入多个数据集，因此必须在每个函数调用中指定。注意：R没有等同于Stata中codebook的命令。在R中，安装AER包时，会自动安装其他有用的附属包:car、lmtest、sandwich。

stata代码块

browse
// open browser for loaded data
describe
// describe structure of loaded data
summarize
// display summary statistics for all variables in dataset
list in 1/6
// display first 6 rows
tabulate educ
// tabulate `educ`variable frequencies
tabulate educ female
// cross-tabulate `educ` and `female` frequencies

R代码块

View(wage1)
# open browser for loaded`wage1` data
str(wage1)
# describe structure of `wage1` data
summary(wage1)
# display summary statistics for `wage1` variables
head(wage1)
# display first 6 (default) rows data
tail(wage1)
# display last 6 rows
table(wage1$educ)
#tabulate `educ` frequencies
table(“yrs_edu” = wage1$educ, “female” =wage1$female)
# tabulate `educ`frequencies name table columns

5.生成或编辑变量

本部分涉及生成新变量、计算变量的均值、选取部分变量、生成虚拟变量等相关内容

stata代码块

gen exper2 = exper^2
// create`exper` squared variable
egen wage_avg = mean(wage)
// create average wage variable
drop tenursq
// drop `tenursq`variable
keep wage educ exper nonwhite
// keep selected variables
tab numdep, gen(numdep)
// create dummy variables for `numdep`
recode exper (1/20 = 1 "1 to 20 years") (21/40 = 2 "21 to 40 years") (41/max = 3 "41+ years"),gen(experlvl)
// recode `exper` and gen new variable

R代码块

wage1$exper2 <- wage1$exper^2
#create `exper` squared variable
wage1$wage_avg <- mean(wage1$wage)
#create average wage variable
wage1$tenursq <- NULL
#drop `tenursq`
wage1 <- wage1[ , c(“wage”, “educ”,“exper”, “nonwhite”)]
# keep selected variables
wage1 <-fastDummies::dummy_cols(wage1,select_columns = “numdep”)
# create dummy variables for `numdep`, use {fastDummies} package
wage1$experlvl <- 3
# recode `exper`
wage1$experlvl[wage1$exper < 41] <- 2
wage1$experlvl[wage1$exper < 21] <- 1

6.估计模型,1/2

本部分主要针对横截面数据，因变量为连续变量的OLS估计和因变量为二值选择或截断时的Logit和Tobit模型。

6.1 OLS

stata代码块

reg wage educ
// simple regression of `wage` by `educ` (Results printed automatically)
reg wage educ if nonwhite==1
// add condition with if statement
reg wage educ exper, robust
//multiple regression using HC1 robust standard errors
reg wage educ exper,cluster(numdep)
// use clustered standard errors

R代码块

mod1 <- lm(wage ~ educ, data =wage1)
# simple regression of`wage` by `educ`, store results in`mod1`
summary(mod1)
# print summary of `mod1` results
mod2 <- lm(wage ~ educ, data =wage1[wage1$nonwhite==1, ])
# add condition with if statement`
mod3 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1, se_type= “stata”)
# multiple regressionwith HC1 (Stata default) robust standard errors, use {estimatr} package
mod4 <- estimatr::lm_robust(wage ~ educ + exper, data = wage1,clusters = numdep)
# use clustered standard errors.

6.2 MLE (Logit/Probit/Tobit)

示例数据mroz

stata代码块

use http://fmwww.bc.edu/ec-p/data/wooldridge/mroz
logit inlf nwifeinc educ
//estimate logistic regression
probit inlf nwifeinc educ
//estimate logistic regression
tobit hours nwifeinc educ, ll(0)
// estimate tobit regression,lower-limit of y censored at zero

R代码块

mod_log <- glm(inlf~nwifeinc + educ+ family=binomial(link="logit"),data=mroz)
# estimate logistic regression
mod_pro <- glm(inlf~nwifeinc + educ+ family=binomial(link=“probit"),data=mroz)
# estimate logistic regression
mod_tob <- AER::tobit(hours ~ nwifeinc + educ, left = 0, data = mroz)
# estimate tobit regression,lower-limit of y censored at zero,use {AER} package

连享会计量方法专题……，https://gitee.com/arlionn/Course

7.统计检验与诊断

本部分主要涉及异方差检验、遗漏变量检验和组间t检验。

stata代码块

reg lwage educ exper
// estimation used for examples below
estat hettest
// Breusch-Pagan /Cook-Weisberg test for heteroskedasticity
estat ovtest
// Ramsey RESET test for omitted variables
ttest wage, by(nonwhite)
// independent group t-test, compare means of same variable between groups

R代码块

mod <-lm(lwage ~ educ exper, data =wage1)
# estimate used for examples below
lmtest::bptest(mod)
# Breusch-Pagan/ Cook-Weisberg test for heteroskedasticity using the {lmtest} package
lmtest::resettest(mod)
# Ramsey RESET test
t.test(wage ~ nonwhite, data =wage1)
# independent group t-test

8.交互项，类别/连续变量

在Stata中，通常使用特殊运算符指代变量为连续变量（c.）或类别变量（i.）。同样，“＃”运算符表示不同的方式来返回它们之间的交互变量。在这里，我们展示了这些运算符的常见用法及其R等效处理方式。

stata代码块

reg lwage i.numdep
// treat `numdep` as a factor variable
reg lwage c.educ#c.exper
// return interaction term only
reg lwage c.educ##c.exper
// return full factorial specification
reg lwage c.exper##i.numdep
//return full, interact continuous and categorical

R代码块

lm(lwage ~ as.factor(numdep), data= wage1)
# treat `numdep` as factor
lm(lwage ~ educ:exper, data =wage1)
# return interaction termonly
lm(lwage ~ educ*exper, data =wage1)
# return full factorial specification
lm(wage ~ exper*as.factor(numdep),data = wage1)
# return full,interact continuous and categorical

9.估计模型，2/2

9.1 面板/纵向

示例数据murder

stata代码块

xtset id year
// set `id` as entities (panel) and `year` as time variable
xtdescribe
// describe pattern of xt data
xtsum
// summarize xt data
xtreg mrdrte unem, fe
// fixed effects regression

R代码块

plm::is.pbalanced(murder$id,murder$year)
# check panel balancewith {plm} package
modfe <- plm::plm(mrdrte ~ unem,index = c("id", "year"),model ="within", data = murder)
# estimatefixed effects (“within”) model
summary(modfe)
# display results

9.2 工具变量（2SLS）

内生性问题是大家比较关心的问题，示例数据mroz

stata代码块

ivreg lwage (educ = fatheduc),first
// show results of firststage regression
etest first
// test IV and endogenous variable
ivreg lwage(educ = fatheduc)
//show results of 2SLS directly

R代码块

modiv <-AER::ivreg(lwage ~ educ |fatheduc, data = mroz)
# estimate 2SLS with {AER} package
summary(modiv, diagnostics = TRUE)
# get diagnostic tests of IV andendogenous variable

10.后续估计

在Stata中，后续估计必须紧接着回归估计,而R是面向对象编程，不存在这样的困扰。本部分主要涉及回归结果输出和边际效应展示。

stata代码块

reg lwage educ exper##exper
//estimation used for following postestimation commands
estimates store mod1
// stores inmemory the last estimation resultsto `mod1`
margins
// get average predictive
margins
margins, dydx(*)
// get average marginal effects for all variables
marginsplot
// plot marginal effects
margins, dydx(exper)
// average marginal effects of experience
margins, at(exper=(1(10)51))
// average predictive margins over `exper` range at 10-year increments
estimates use mod1
// loads `mod1` back into working memory
estimates table mod1 mod2
// display table with stored estimation results

R代码块

mod1 <- lm(lwage ~ educ + exper + I(exper^2), data = wage1)
# Note: in R, mathematical expressions inside a formula call must be isolated with `I()`
margins::prediction(mod1)
# get average predictive margins with {margins} package
m1 <- margins::margins(mod1)
# get average marginal effects for all variables
plot(m)
# plot marginal effects
summary(m)
# get detailed summary of marginal effects
margins::prediction(mod1, at = list(exper = seq(1,51,10)))
# predictive margins over `exper` range at 10-year increments
stargazer::stargazer(mod1, mod2, type = “text”)
# use {stargazer} package, with `type=text` to display results within R. Note: `type= ` also can be changed for LaTex and HTML output.

关于我们

Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。
欢迎赐稿： 欢迎赐稿至StataChina@163.com。录用稿件达三篇以上，即可免费获得一期 Stata 现场培训资格。
往期精彩推文：
Stata绘图 | 时间序列+面板数据 | Stata资源 | 数据处理+程序 | 回归分析-交乘项-内生性

Stata+R: Stata 与 R 等效命令备忘录相关推荐

r library car_Stata+R：Stata 与 R 等效命令备忘录
作者:任建辉(山西财经大学) 邮箱:jianhui1986_love@126.com 连享会-知乎推文列表 Note: 助教招聘信息请进入「课程主页」查看. 因果推断-内生性专题 ⌚ 2020.11 ...
使用SAS，Stata，HLM，R，SPSS和Mplus的多层线性模型HLM
简介最近我们被客户要求撰写关于多层线性模型的研究报告,包括一些图形和统计输出.本文档用于比较六个不同统计软件程序(SAS,Stata,HLM,R,SPSS和Mplus)的两级多层(也称分层或层次)线 ...
python和sas哪个有用考研_金融学研究生用好计量软件的好处呢？ stata sas s-plus python R哪个更有重要呢？...
金融经济 | Stata在毕业论文中的数据处理应用利益相关:我们是英国文文校园,一群热心的学姐学长帮你解答有关英国留学学习生活的疑难杂症!!这个问题邀请了谢菲尔德经济博士Bo马住回答! Tips & ...
LINUX下用CTRL+R快速搜索HISTORY历史命令，快速索引到之前使用过的命令行语句
LINUX下用CTRL+R快速搜索HISTORY历史命令,快速索引到之前使用过的命令行语句前提是,搜索已经使用的命令,否则是查不出来结果的. ctrl+r 用途:反向搜索执行过的命令.(revers ...
150分钟学会R语言与R常见问题解答
序: 这篇文档内容的来源多样,既有来自于 R官方文档(包括 R intro,R data,R admin),也有来自于互联网的 contributed documents:还有若干来自于 Capit ...
普林斯顿Stata教程 - Stata编程
译者:谢作翰 | 连玉君 | (知乎 | 简书 | 码云) 原文链接:Princeton Stata 在线课程 (Princeton University - Stata Tutorial ) ...
linux常用命令大全（linux基础命令入门到精通+实例讲解+持续更新+命令备忘录+面试复习）
前言本文特点授之以渔:了解命令学习方法.用途:不再死记硬背,拒绝漫无目的: 准确无误:所有命令执行通过(环境为centos7),拒绝复制粘贴: 实用性高:命令多为实际工作中用到的,实例讲解,拒绝纯 ...
linux跑r语言代码,R语言快捷键(示例代码)
一.控制台功能 Windows & Linux Mac 移动鼠标到控制台 Ctrl+2 Ctrl+2 移动到鼠标命令编辑 Ctrl+1 Ctrl+1 控制台清屏 Ctrl+L Comm ...
dataframe第二列 r语言_123.R简介和统计绘图
123R简介和统计绘图本节作者:刘永鑫中国科学院遗传与发育生物学研究所:陈同中国中医科学院版本1.0.2,更新日期:2020年8月31日本项目永久地址:https://github.com/ ...

Stata+R: Stata 与 R 等效命令备忘录

Stata 与 R 等效命令备忘录

连享会计量方法专题……，https://gitee.com/arlionn/Course

1.引言

2.安装

3.基本绘图

连享会计量方法专题……

4.汇总数据

5.生成或编辑变量

6.估计模型,1/2

6.1 OLS

6.2 MLE (Logit/Probit/Tobit)

连享会计量方法专题……，https://gitee.com/arlionn/Course

7.统计检验与诊断

8.交互项，类别/连续变量

9.估计模型，2/2

9.1 面板/纵向

9.2 工具变量（2SLS）

10.后续估计

关于我们

Stata+R: Stata 与 R 等效命令备忘录相关推荐

最新文章

热门文章