The Quick, Draw! Dataset

The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!.


Full dataset seperated by categories

Sketch-RNN QuickDraw Dataset(sketch-rnn论文里用的格式)

This data is also used for training the Sketch-RNN model. An open source, TensorFlow implementation of this model is available in the Magenta Project, (link to GitHub repo). You can also read more about this model in this Google Research blog post. The data is stored in compressed .npz files, in a format suitable for inputs into a recurrent neural network.

In this dataset, 75K samples (70K Training, 2.5K Validation, 2.5K

Test) has been randomly selected from each category, processed with RDP line simplification with an epsilon parameter of 2.0. Each category will be stored in its own .npz file, for example, cat.npz.

We have also provided the full data for each category, if you want to

use more than 70K training examples. These are stored with the .full.npz extensions.


Each example in the dataset is stored as list of coordinate offsets:∆x,∆y, and a binary value representing whether the pen is lifted away from the paper. This format, we refer to as stroke-3, is described in this paper. Note that the data format described in the paper has 5 elements (stroke-5 format), and this conversion is done automatically inside the DataLoader. Below is an example sketch of a turtle using this format:



.ndjson 存的可是坐标啊↓↓↓,直接插值就成线条了哦

Each line contains one drawing. Here's an example of a single drawing:





"timestamp":"2017-03-01 20:41:36.70725 UTC",




The format of the drawing array is as following:


[ // First stroke [x0, x1, x2, x3, ...],

[y0, y1, y2, y3, ...],

[t0, t1, t2, t3, ...]


[ // Second stroke [x0, x1, x2, x3, ...],

[y0, y1, y2, y3, ...],

[t0, t1, t2, t3, ...]


... // Additional strokes]


There is an example in examples/nodejs/simplified-parser.js showing how to read ndjson files in NodeJS.


git clone

cd nvm


source ./

nvm install v6.2.2


用提供的simplified-parser.js可以直接解析,我将其解析后的数据保存为json,python就能读取了。node simplified-parser.js

var fs = require('fs');

var ndjson = require('ndjson'); // npm install ndjson

function parseSimplifiedDrawings(fileName, callback) {

var drawings = [];

var fileStream = fs.createReadStream(fileName)



.on('data', function(obj) {



.on("error", callback)

.on("end", function() {

callback(null, drawings)



parseSimplifiedDrawings("dataset_path/full_simplified_cat.ndjson", function(err, drawings) {

if(err) return console.error(err);

drawings.forEach(function(d) {

// Do something with the drawing console.log(d.key_id, d.countrycode);


console.log("# of drawings:", drawings);

var filename = "dataset_path/full_simplified_cat.json";//这里保存 fs.writeFileSync(filename, JSON.stringify(drawings));//这里保存})


import json

from scipy import interpolate

import pylab as pl

f = open("dataset_path/full_simplified_cat.json")

setting = json.load(f)

for j in range(0,10):    #先试试10个图

for i in range(0,len(setting[j]['drawing'])):

x = setting[j]['drawing'][i][0]

y = setting[j]['drawing'][i][1]

f=interpolate.interp1d(x,y,kind="slinear") #线性插值


ax = pl.gca()  #一个猫的所有线条画一起

ax.xaxis.set_ticks_position('top') # convert x,没有ax这几句猫就反着了




pl.close()  #不关闭的话所有图都画一起了

转化前是(比如第一个图):[{"word":"cat","countrycode":"VE","timestamp":"2017-03-02 23:25:10.07453 UTC","recognized":true,"key_id":"5201136883597312","drawing":[[[130,113,99,109,76,64,55,48,48,51,59,86,133,154,170,203,214,217,215,208,186,176,162,157,132],[72,40,27,79,82,88,100,120,134,152,165,184,189,186,179,152,131,114,100,89,76,0,31,65,70]],[[76,28,7],[136,128,128]],[[76,23,0],[160,164,175]],[[87,52,37],[175,191,204]],[[174,220,246,251],[134,132,136,139]],[[175,255],[147,168]],[[171,208,215],[164,198,210]],[[130,110,108,111,130,139,139,119],[129,134,137,144,148,144,136,130]],[[107,106],[96,113]]]},








