
The entire quarter-billion-record GDELT Event Database is now available as a public dataset in Google BigQuery.

完整的25亿记录的GDELT事件数据库现在可以作为公共数据集在Google BigQuery中使用。

This is the sentence at the top of the release post, and it’s a really big deal.


加特 (GDELT)

The Global Database of Events, Language and Tone is one of the largest datasets on the planet. It is the quantitative database of human society, relying on thousands of news sources from every corner of the globe dating back to 1979.

全球事件,语言和语气数据库是地球上最大的数据集之一。 它是人类社会的定量数据库,它依赖于追溯到1979年的全球各个角落的数千个新闻来源。

It was thought up by Kalev Leetaru, who is also the author of the Google release post referenced above. The GDELT covers all countries globally spanning a third of a century, and consists of daily updates during that time period. Hundreds of millions of records, each with 59 fields narrating into detail the actors and events having taken place. Every record is georeferenced, so you can globally place it, and all actors are tagged with appropriate ethnic and religious affiliation. All this – free and available for your perusal, and you don’t even have to have the computing power to handle it.

Kalev Leetaru曾想过,他也是上述Google发布帖子的作者。 GDELT涵盖了跨越一个世纪的全球所有国家,并在此期间进行每日更新。 数以亿计的记录,每个记录都有59个字段,详细介绍了演员和已发生的事件。 每条记录都经过地理定位,因此您可以在全球范围内放置它,并且所有演员都被标记为具有适当的族裔和宗教信仰。 所有这些都是免费的,可供您细阅,您甚至不必具备计算能力即可处理。

Google BigQuery, “Google’s powerful cloud-based analytical database service” is, basically, the world’s fastest SQL engine, and it’s completely free for any and all uses of GDELT. Due to the sheer power of BigQuery, you can get results on GDELT queries in near real-time and any permutation of fields and values you can think of won’t be enough to bog it down to a halt – unless you really mess things up and go against the grain. If you deal with databases in any regards and the following paragraph doesn’t send chills down your spine, you’re probably dead:

基本上,Google BigQuery是“ Google强大的基于云的分析数据库服务”,它是世界上最快SQL引擎,它对GDELT的所有使用都是完全免费的。 由于BigQuery的强大功能,您可以近乎实时地获得GDELT查询的结果,并且您认为无法对字段和值进行的任何排列都不足以使它陷入停顿–除非您真的搞砸了并反对谷物。 如果您以任何方式处理数据库,而以下段落都没有使您不寒而栗,则您可能已经死了:

For us, the most groundbreaking part of having GDELT in BigQuery is that it opens the door not only to fast complex querying and extracting of data, but also allows for the first time real-world analyses to be run entirely in the database. Imagine computing the most significant conflict interaction in the world by month over the past 35 years, or performing cross-tabbed correlation over different classes of relationships between a set of countries. Such queries can be run entirely inside of BigQuery and return in just a handful of seconds. This enables you to try out “what if” hypotheses on global-scale trends in near-real time.

对于我们来说,在BigQuery中使用GDELT最具突破性的部分是,它不仅为快速复杂的数据查询和提取打开了方便之门,而且还允许首次在数据库中完全运行真实世界的分析。 想象一下,计算过去35年中每月最重要的冲突互动,或者对一组国家之间不同类别的关系执行交叉表关联。 这样的查询可以完全在BigQuery内部运行,并且只需几秒钟即可返回。 这使您可以近乎实时地对全球规模趋势进行假设假设。

Currently, GDELT on BigQuery is updated daily, but there are plans to move to a near real-time update schedule – updating the dataset every 15 minutes.


Before you get too excited – there is a limit, but it’s not a quota you’ll easily hit. To read more about free quotas, see here and keep in mind you can always pay for more if you actually develop a commercially viable application on top of this data.

你太激动之前- 有极限的,但它不是你会很容易打配额。 要了解有关免费配额的更多信息,请参见此处 ,请记住,如果您实际上是在此数据之上开发出具有商业可行性的应用程序,则可以始终为更多的价格付费。

运行示例查询 (Running a sample query)

You can start playing around with GDELT on BigQuery by visiting this URL – you might have to make a new project if you don’t have one already. After gaining access, you should see a screen not unlike the following:

您可以通过访问以下URL在BigQuery上开始使用GDELT-如果您还没有一个新项目,则可能需要创建一个新项目。 获得访问权限后,您应该会看到一个与以下内容相似的屏幕:

To run the sample query from the release post, click the red “Compose Query” button, paste the SQL into the newly opened textarea and click “Run Query”. Mine took 20 seconds, yours may take anywhere from 5 to 30, but you should get a result not unlike this one:

要从发布发布中运行示例查询,请单击红色的“撰写查询”按钮,将SQL粘贴到新打开的文本区域中,然后单击“运行查询”。 我的花费了20秒,您的花费可能是5到30,但是您应该得到的结果与以下内容完全相同:

在PHP中使用它 (Using it with PHP)

To see how you can use BigQuery and PHP, stay tuned on SitePoint for articles that target that specific combination – they’re coming in June. For now, you can check out this excellent post post that runs through it in a very approachable manner.

要了解如何使用BigQuery和PHP,请继续关注SitePoint上针对该特定组合的文章-它们将在6月发布。 目前,您可以通过非常平易近人的方式查看这篇出色的Lever.rs帖子 。

In a nutshell, you need to use the PHP library Google provides and install it with Composer or through alternative means. Once done, you need to include the lib in your project as you normally would, through Composer’s autoload file, and you can start using the API.

简而言之,您需要使用Google提供的PHP库 , 并通过Composer或通过其他方式进行安装。 完成后,您需要像通常一样通过Composer的自动加载文件将lib包含在项目中,然后就可以开始使用API​​。

For a full introduction on how to get started, obtain API keys and get deep into using Google APIs for access to BigQuery and similar services, please see this guide. You can also RSS subscribe to the Google App Engine tag and you’ll be instantly notified of new posts in that category.

有关如何入门,获取API密钥以及深入使用Google API访问BigQuery和类似服务的完整介绍,请参阅本指南 。 您也可以RSS订阅Google App Engine标记,并且该类别中的新帖子会立即收到通知。

结论 (Conclusion)

The GDELT project has long been an admirable one, but the advent of its BigQuery release marked a new milestone – a general availability to the public never before seen. Everyone now has the ability to query the world’s history, and we can’t wait to see what you build – judging by Kalev, the author, neither can the GDELT team. They’re inviting you to share your queries and experiments with them and if impressive enough, they just might share them with the world on the official blog. If you do come up with anything stunning, let us know – we’re keen to publish tutorials and analyses on it!

GDELT项目长期以来一直是令人钦佩的项目,但BigQuery版本的出现标志着一个新的里程碑–公众从未见过的普遍可用性。 现在,每个人都有能力查询世界的历史,我们迫不及待想看到您的建筑-由作者Kalev判断,GDELT团队也不能。 他们邀请您与他们分享您的查询和实验,如果足够令人印象深刻,他们可能会在官方博客上与世界分享。 如果您确实提出了任何惊人的建议,请告诉我们-我们渴望发布有关它的教程和分析!




