

This report aims to investigate whether the police force is racially biased towards White individuals, compared to those of minority race and ethnicities such as Black or Hispanic. I analyzed the data from the Stanford Open Policing Project, who collected standardized data on police stops in Philadelphia, Pennsylvania, between 2013 and 2017. I assessed biased decisions made by the police based on examining the rate at which the police searched the stopped drivers and whether these searches were successful in finding contraband. The raw data contains information on the driver’s age, sex, race/ethnicity, the location of the stop, and whether the police searched them.

本报告旨在调查与少数族裔和黑人或西班牙裔美国人相比,警察是否在种族上偏向白人。 我分析了斯坦福大学开放警务项目的数据,该项目在2013年至2017年之间收集了宾夕法尼亚州费城警察停靠站的标准化数据。我通过检查警察搜查停下来的驾驶员和警察的比率,评估了警察的偏颇决定。这些搜索是否成功找到了违禁品。 原始数据包含有关驾驶员的年龄,性别,种族/民族,停靠站的位置以及警方是否进行过搜索的信息。


Racial bias in the police force has been a controversial topic for decades. However, the recent news of police stops and searches have led to violence against innocent Black Americans. For example, George Floyd, a 46-year-old unarmed Black American man, was killed by a police officer for allegedly using a counterfeit bill. This event sparked protests and outrage worldwide, due to the discrimination in policing against Black people.

几十年来,警察部队中的种族偏见一直是一个有争议的话题。 但是,最近有关警察停止和搜查的消息导致针对无辜黑人的暴力行为。 例如,一名46岁的手无寸铁的黑人美国人乔治·弗洛伊德(George Floyd)因涉嫌使用假钞被一名警察杀害。 由于对黑人警务的歧视,这一事件在全世界引发了抗议和愤怒。

Assessing data of the rate of stops and searches could provide evidence on whether there is racial discrimination against minorities. Another strategy would be assessing the proportion of searches that successfully identify contraband. If the proportion of successful searches of minorities are less than those of Whites, this could suggest that officers are searching minorities based on less evidence. To narrow the analysis scope, I subsetted the data to focus on vehicular police stops in 2017.

评估停止和搜查的速度的数据可以提供证据,证明是否存在针对少数群体的种族歧视。 另一种策略是评估成功识别违禁品的搜索比例。 如果少数族裔成功搜寻的比例低于白人,则可能表明军官正在根据较少的证据搜寻少数族裔。 为了缩小分析范围,我对数据进行了子集化,重点关注2017年的车辆警察站。


In this data, 1,756,587 police stops occurred between January 1, 2014, and December 31, 2017, in Philadelphia, Pennsylvania. 294,060 of these police stops were vehicular police stops in 2017.

在此数据中,2014年1月1日至2017年12月31日期间,在宾夕法尼亚州的费城发生了1,756,587例警察停靠。 这些警车站中有294,060个是2017年的车辆警察车站。

Figure 1 图1

The histogram (Figure 1) shows the distribution of the drivers’ age. The data is skewed to the right. This suggests that younger people are more likely to be stopped by the police compared to older people. The average age to be stopped is approximately 35 years old.

直方图(图1)显示了驾驶员年龄的分布。 数据向右偏斜。 这表明,与老年人相比,年轻人更容易被警察拦住。 要停止的平均年龄约为35岁。

Figure 2 图2

Figure 2 shows the distribution of the driver’s race/ethnicity. Black drivers are overrepresented because about 69% of stops were conducted on Black drivers. Only 17% of the stops were conducted on White drivers and 10% on Hispanic drivers. However, this may be because most of the Philadelphia population consists of Black individuals.

图2显示了驾驶员种族/民族的分布。 黑人司机人数过多,因为约有69%的停车是在黑人司机身上进行的。 只有17%的停车位是白人司机,而10%的西班牙裔司机。 但是,这可能是因为费城大多数人口都是黑人。

70% of drivers who got stopped are male and 30% are female.






Below is the code chunk for a chi-square test analysis. This is used to check if the driver’s race/ethnicity depends on whether the police will search the driver. ‘False’ represents the number of drivers who were not searched and ‘True’ represents the number of drivers who were searched.

以下是卡方检验分析的代码块。 这用于检查驾驶员的种族/种族是否取决于警察是否会搜索驾驶员。 “ False”表示未搜索到的驾驶员数量,“ True”表示已搜索到的驾驶员数量。

##  Pearson's Chi-squared test ##  ## data:  search.conducted.table ## X-squared = 430.36, df = 4, p-value < 2.2e-16# Expected values for each race/ethnicitychisq.test(search.conducted.table)$expected##         ##                       white       black      hispanic      ##   FALSE              47745.775   190926.34   28194.879     ##   TRUE               2723.225    10889.66     1608.121#Residual plotchisq.test(search.conducted.table)$resid##         ##                         white      black    hispanic  ##   FALSE              3.0764323  -2.4243816   0.6081798      ##   TRUE             -10.3478119  10.1514171  -2.5465819



H0: There is no association between the police conducting a search and the race/ethnicity of the driver


H1: There is an association between the police conducting a search and the race/ethnicity of the driver


Let alpha = 0.05. The p-value of the x² test statistic is highly significant (p< 2.2e-16). There is sufficient evidence to reject the null hypothesis and conclude that there is an association between the police conducting a search and the driver’s race/ethnicity.

令α= 0.05。 x²测试统计的p值非常显着(p <2.2e-16)。 有充分的证据拒绝原假设,并得出结论认为进行搜查的警察与驾驶员的种族/种族之间存在关联。

Based on the residuals, more Black individuals than expected were searched. Less White individuals than expected were searched.

根据残差,搜索到的黑人多于预期。 搜索的白人人数少于预期。



The assumptions for the chi-square test is met. It is reasonable to believe the sample is independent, as the individuals were sampled across parts of Philadelphia. All expected values are greater than 10.

满足卡方检验的假设。 有理由相信样本是独立的,因为个人是在费城的部分地区进行抽样的。 所有期望值均大于10。


Below is the code chunk for logistical regression analysis to predict if the driver will be searched after being stopped. A model is fitted to estimate the association between the police searching the driver and the driver’s race/ethnicity, adjusting for confounding variables sex and age.

以下是用于逻辑回归分析的代码块,以预测在停止后是否会搜索驾驶员。 拟合模型以估计搜索驾驶员的警察与驾驶员的种族/族裔之间的关联,并调整混杂的性别和年龄变量。

##  ## Call: ## glm(formula = search_conducted ~ subject_race + subject_age +  ## subject_sex, family = binomial(link = "logit"), data = stops.subset ##  ## Deviance Residuals:  ##     Min       1Q   Median       3Q      Max   ## -0.5829  -0.4007  -0.2894  -0.2184   3.4213   ##  ## Coefficients: ##                             Estimate Std. Error z value Pr(>|z|)     ## (Intercept)               -1.9656439  0.0774345 -25.385  < 2e-16 *** ## subject_raceblack          0.7817956  0.0735729  10.626  < 2e-16 *** ## subject_racehispanic       0.5594270  0.0777094   7.199 6.07e-13 *** ## subject_raceother/unknown  0.2212841  0.1079140   2.051   0.0403 *   ## subject_racewhite          0.4758386  0.0764091   6.228 4.74e-10 *** ## subject_age               -0.0418716  0.0008089 -51.762  < 2e-16 *** ## subject_sexfemale         -1.0650471  0.0234808 -45.358  < 2e-16 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## (Dispersion parameter for binomial family taken to be 1) ##  ##     Null deviance: 123230  on 293539  degrees of freedom ## Residual deviance: 117002  on 293533  degrees of freedom ##   (520 observations deleted due to missingness) ## AIC: 117016 ##  ## Number of Fisher Scoring iterations: 6



The model intercept does not have a meaningful interpretation because of the inclusion of age in the model. The model intercept represents the predicted log odds of the police searching a male of age 0 years and Asian/pacific islander race. It is not reasonable to predict the log odds of the police searching a newborn.

由于模型中包含年龄,因此模型截距没有有意义的解释。 模型截距代表警察搜索0岁男性和亚洲/太平洋岛民种族的预期对数赔率。 预测警察搜查新生儿的几率是不合理的。

The slope coefficient for Black race indicates that the estimated log odds of the police searching a Black individual is 0.78 higher than an Asian/pacific islander individual, holding age and sex constant.


The slope coefficients for Black race and Hispanic ethnicity are higher than the slope coefficient for White race. For example, the log odds of the police searching a Black individual is about 1.36 times as large as the odds of the police searching a White individual. This is if both individuals are male and 35 years old.

黑人和西班牙裔的斜率系数高于白人。 例如,警察搜索黑人个体的原木赔率大约是警察搜索白人个体的原木赔率的1.36倍。 如果两个人都是男性且年龄在35岁。

This suggests that the police are more likely to search minorities rather than a White individual.


The slope coefficient for each race/ethnicity has a p-value that is less than alpha=0.05. Therefore, this data provides evidence that the police searching the driver is significantly associated with the driver’s race/ethnicity.

每个种族/民族的斜率系数的p值小于alpha = 0.05。 因此,该数据提供了证据,表明搜索驾驶员的警察与驾驶员的种族/民族有很大关系。




Below is the code chunk for a two-sample proportion test. This is used to determine whether the proportion of White individuals who possess contraband is lower than Black individuals. If the proportion of successful searches of Black individuals is less than those of Whites, this suggests that officers are searching minorities based on less evidence.

下面是两个样本比例测试的代码块。 这用于确定拥有违禁品的白人个体的比例是否低于黑人个体。 如果黑人的成功搜索比例低于白人,则表明官员正在根据较少的证据搜索少数群体。

#conduct test prop.test(successes, n)##  ## 2-sample test for equality of proportions with continuity correction ##  ## data:  successes out of n ## X-squared = 6.8023, df = 1, p-value = 0.009104 ## alternative hypothesis: two.sided## 95 percent confidence interval: ##  -0.046884129 -0.005958199 ## sample estimates: ##   prop 1    prop 2  ## 0.2237007 0.2501219



p1 represents the proportion of Black individuals who possess contraband, among the searched Black individuals.


p2 represents the proportion of White individuals who possess contraband, among the searched White individuals.


H0 : p1 = p2HA: p1 is not equal p2

H0:p1 = p2HA:p1不等于p2

Let alpha= 0.05. The p-value of the test statistic is significant (p= 0.009104). There is sufficient evidence to reject the null hypothesis and conclude that there is a difference in proportions between both racial groups.

令α= 0.05。 检验统计量的p值很显着(p = 0.009104)。 有足够的证据拒绝原假设,并得出结论,两个种族之间的比例存在差异。

Based on comparing p1 and p2, the proportion of White individuals who possess contraband is higher than Black individuals.




The 95% confidence interval for the difference in proportions is (-0.047,-0.006). With 95% confidence, the difference in the proportion of White individuals versus Black individuals who possess contraband is captured by the interval (-0.047,-0.006). The interval does not contain 0. This is consistent with statistically significant evidence of a difference.

比例差异的95%置信区间为(-0.047,-0.006)。 在95%的置信度下,该白人个体与拥有违禁品的黑人个体之间的比例差异被该间隔(-0.047,-0.006)捕获。 该间隔不包含0。这与具有统计学意义的差异证据相符。



Below is the code chunk to check the number of drivers who possess contraband for each race/ethnicity. ‘False’ represents the number of drivers who do not possess contraband, and ‘True’ represents the number of drivers who possess contraband.

以下是代码块,用于检查每个种族/种族拥有违禁品的驾驶员数量。 “ False”代表不拥有违禁品的驾驶员数量,“ True”代表具有违禁品的驾驶员数量。

(table(searches$subject_race, searches$contraband_found))##                          ##                          FALSE TRUE ##   asian/pacific islander   147   48 ##   black                   9276 2673 ##   hispanic                1202  304 ##   other/unknown            137   29 ##   white                   1538  513

The success-failure condition is met. For both confidence intervals and hypothesis testing, each racial/ethnic group’s expected number of successes and failures to find contraband is over 10. It is reasonable to assume that the samples are independent.

满足成功失败条件。 对于置信区间和假设检验,每个种族/族裔群体发现违禁品的成功和失败次数均超过10。可以合理地假设样本是独立的。


Below is the code chunk for logistical regression analysis to predict if the driver will possess contraband after being searched. A model is fitted to estimate the association between the police identifying contraband and the race/ethnicity of the driver, adjusting for the confounding variables sex and age.

以下是用于逻辑回归分析的代码块,以预测驾驶员在被搜索后是否拥有违禁品。 拟合模型来估计警察识别违禁品与驾驶员的种族/民族之间的关联,并根据性别和年龄等混杂变量进行调整。

##  ## Call: ## glm(formula = contraband_found ~ subject_race + subject_age +  ##     subject_sex, family = binomial(link = "logit"), data = stops.subset) ##  ## Deviance Residuals:  ##     Min       1Q   Median       3Q      Max   ## -0.8486  -0.7355  -0.7030  -0.6152   2.0279   ##  ## Coefficients: ##                            Estimate Std. Error z value Pr(>|z|)     ## (Intercept)               -0.767719   0.177718  -4.320 1.56e-05 *** ## subject_raceblack         -0.156001   0.168090  -0.928 0.353365     ## subject_racehispanic      -0.290847   0.178694  -1.628 0.103605     ## subject_raceother/unknown -0.452432   0.263905  -1.714 0.086460 .   ## subject_racewhite          0.074991   0.174405   0.430 0.667210     ## subject_age               -0.010238   0.001929  -5.308 1.11e-07 *** ## subject_sexfemale         -0.215671   0.058023  -3.717 0.000202 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## (Dispersion parameter for binomial family taken to be 1) ##  ##     Null deviance: 16875  on 15827  degrees of freedom ## Residual deviance: 16816  on 15821  degrees of freedom ##   (278232 observations deleted due to missingness) ## AIC: 16830 ##  ## Number of Fisher Scoring iterations: 4



The slope coefficient for Black race indicates that the estimated log odds of the police identifying contraband for a Black individual is 0.156 lower than an Asian/pacific islander individual, holding age and sex constant.


The slope coefficient for White race indicates that the estimated log odds of the police identifying contraband for a White individual is 0.075 higher than for an Asian/pacific islander individual, holding age and sex constant.


The slope coefficient for Black race is negative, and the slope coefficient for White race is positive. This suggests that a White individual is more likely to possess contraband than a Black individual.

黑色种族的斜率系数为负,白色种族的斜率系数为正。 这表明,白人比黑人更有可能拥有违禁品。

The slope coefficient for each race/ethnicity has p-values greater than alpha=0.05. Therefore, there is insufficient evidence to conclude that the police identifying contraband is significantly associated with the driver’s race/ethnicity.

每个种族/民族的斜率系数的p值大于alpha = 0.05。 因此,没有足够的证据可以得出结论,认为违禁品的警察与驾驶员的种族/民族密切相关。


Based on data from 1,756,587 stopped drivers in Philadelphia, Pennsylvania, between 2013 and 2017, there is evidence that the results suggest bias against Black drivers. This study specifically examined if the driver’s race/ethnicity depends on whether the police will search the driver. Black and Hispanic individuals are more likely to be searched, compared to White individuals. The data also assessed the proportion of individuals from each race/ethnicity who possess contraband. It showed that the proportion of White individuals who possess contraband is higher than Black individuals. Therefore, it is reasonable to conclude that officers are searching minorities based on less evidence.

根据2013年至2017年之间宾夕法尼亚州费城的1,756,587名停车驾驶员的数据,有证据表明结果表明,偏向黑人驾驶员。 这项研究专门检查了驾驶员的种族/种族是否取决于警察是否会搜查驾驶员。 与白人相比,黑人和西班牙裔人更有可能被搜寻。 数据还评估了来自每个种族/民族的拥有违禁品的个人比例。 结果表明,拥有违禁品的白人个人所占比例高于黑人。 因此,可以合理地得出结论认为,官员正在根据较少的证据搜寻少数群体。

Strictly speaking, the results are generalizable to people living in Philadelphia, Pennsylvania. It could also be reasonable to generalize these results to other parts of Pennsylvania because the policing strategies are the same in this state. It would be inappropriate to generalize to all of the other states due to differences in the rate at which stopped drivers are searched. So the police force in other states may not necessarily be biased against minorities.

严格来说,结果可推广到宾夕法尼亚州费城的人们。 将这些结果推广到宾夕法尼亚州的其他地区也是合理的,因为在该州,治安策略是相同的。 由于搜索停止的驾驶员的速度差异,将其推广到所有其他状态是不合适的。 因此,其他州的警察部队可能不一定偏向少数群体。

翻译自: https://towardsdatascience.com/racial-disparities-in-police-stops-and-searches-e58319f278a2




