之前已经弄完了基本的PySpider的项目,把数据直接return返回,在运行后,点击results,导出csv了:

现在想要去把数据保存到mysql中。

先去看官网教程:

难道是需要

先去本地mysql数据库中,先建立好对应的数据库和表结构?

然后才能保存?

不过好像是去利用ResultWorker保存数据的

pyspider save to mysql

前面都是重写on_result,好像是旧的做法。

这个重写ResultWorker,才是新的做法

参考:

然后同时去运行mysql server:➜  ~ /usr/local/mysql/support-files/mysql.server status

ERROR! MySQL is not running

➜  ~ /usr/local/mysql/support-files/mysql.server start

Starting MySQL

. SUCCESS!

➜  ~ /usr/local/mysql/support-files/mysql.server status

SUCCESS! MySQL running (61419)

然后再去创建对应的mysql数据库:

然后去写代码操作之前,需要先去搞清楚:

再去写配置和自己的worker

先去新建配置文件:

config.json{

"taskdb":     "mysql+taskdb://root:crifan_mysql@127.0.0.1:3306/AutohomeTaskdb",

"projectdb":  "mysql+projectdb://root:crifan_mysql@127.0.0.1:3306/AutohomeProjectdb",

"resultdb":   "mysql+resultdb://root:crifan_mysql@127.0.0.1:3306/AutohomeResultdb",

"result_worker":{

"result_cls": "AutohomeResultWorker.AutohomeResultWorker"

}

}

然后去运行试试

结果出错:

最后知道了,运行要用:pyspider -c config.json

才能确保:运行了webui

-》浏览器能打开:

然后没了mysql问题,但是出现别的问题:

然后解决了后,代码是:#!/usr/bin/env python

# -*- encoding: utf-8 -*-

# Project: autohomeBrandData

# Function: implement custom result worker for autohome car data

# Author: Crifan Li

# Date: 20180512

# Note:

#   If you want to modify to your mysql and table, you need:

#   (1) change change MysqlDb config to your mysql config

#   (2) change CurrentTableName to your table name

#   (3) change CreateTableSqlTemplate to your sql to create new mysql table fields

#   (4) before use this ResultWorker, run py file to execute testMysqlDb, to init db and create table

#   (5) if your table field contain more type, edit insert to add more type for "TODO: add more type formatting if necessary"

import pymysql

import pymysql.cursors

from pyspider.result import ResultWorker

CurrentTableName = "tbl_autohome_car_info"

CreateTableSqlTemplate = """CREATE TABLE IF NOT EXISTS `%s` (

`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '自增,主键',

`cityDealerPrice` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '经销商参考价',

`msrpPrice` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '厂商指导价',

`mainBrand` char(20) NOT NULL DEFAULT '' COMMENT '品牌',

`subBrand` varchar(20) NOT NULL DEFAULT '' COMMENT '子品牌',

`brandSerie` varchar(20) NOT NULL DEFAULT '' COMMENT '车系',

`brandSerieId` varchar(15) NOT NULL DEFAULT '' COMMENT '车系ID',

`model` varchar(50) NOT NULL DEFAULT '' COMMENT '车型',

`modelId` varchar(15) NOT NULL DEFAULT '' COMMENT '车型ID',

`modelStatus` char(5) NOT NULL DEFAULT '' COMMENT '车型状态',

`url` varchar(200) NOT NULL DEFAULT '' COMMENT '车型url',

PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;"""

class AutohomeResultWorker(ResultWorker):

def __init__(self, resultdb, inqueue):

"""init mysql db"""

print("AutohomeResultWorker init: resultdb=%s, inqueue=%s" % (resultdb, inqueue))

ResultWorker.__init__(self, resultdb, inqueue)

self.mysqlDb = MysqlDb()

print("self.mysqlDb=%s" % self.mysqlDb)

def on_result(self, task, result):

"""override pyspider on_result to save data into mysql"""

# assert task['taskid']

# assert task['project']

# assert task['url']

# assert result

print("AutohomeResultWorker on_result: task=%s, result=%s" % (task, result))

insertOk = self.mysqlDb.insert(result)

print("insertOk=%s" % insertOk)

class MysqlDb:

config = {

'host': '127.0.0.1',

'port': 3306,

'user': 'root',

'password': 'crifan_mysql',

'database': 'AutohomeResultdb',

'charset': "utf8"

}

defaultTableName = CurrentTableName

connection = None

def __init__(self):

"""init mysql"""

# 1. connect db first

if self.connection is None:

isConnected = self.connect()

print("Connect mysql return %s" % isConnected)

# 2. create table for db

createTableOk = self.createTable(self.defaultTableName)

print("Create table %s return %s" %(self.defaultTableName, createTableOk))

def connect(self):

try:

self.connection = pymysql.connect(**self.config, cursorclass=pymysql.cursors.DictCursor)

print("connect mysql ok, self.connection=", self.connection)

return True

except pymysql.Error as err:

print("Connect mysql with config=", self.config, " error=", err)

return False

def quoteIdentifier(self, identifier):

"""

for mysql, it better to quote identifier xxx using backticks to `xxx`

in case, identifier:

contain special char, such as space

or same with system reserved words, like select

"""

quotedIdentifier = "`%s`" % identifier

# print("quotedIdentifier=", quotedIdentifier)

return quotedIdentifier

def executeSql(self, sqlStr, actionDescription=""):

print("executeSql: sqlStr=%s, actionDescription=%s" % (sqlStr, actionDescription))

if self.connection is None:

print("Please connect mysql first before %s" % actionDescription)

return False

cursor = self.connection.cursor()

print("cursor=", cursor)

try:

cursor.execute(sqlStr)

self.connection.commit()

return True

except pymysql.Error as err:

print("Execute sql %s occur error %s for %s" % (sqlStr, err, actionDescription))

return False

def createTable(self, newTablename):

print("createTable: newTablename=", newTablename)

createTableSql = CreateTableSqlTemplate % (newTablename)

print("createTableSql=", createTableSql)

return self.executeSql(sqlStr=createTableSql, actionDescription=("Create table %s" % newTablename))

def dropTable(self, existedTablename):

print("dropTable: existedTablename=", existedTablename)

dropTableSql = "DROP TABLE IF EXISTS %s" % (existedTablename)

print("dropTableSql=", dropTableSql)

return self.executeSql(sqlStr=dropTableSql, actionDescription=("Drop table %s" % existedTablename))

# def insert(self, **valueDict):

def insert(self, valueDict, tablename=defaultTableName):

"""

inset dict value into mysql table

makesure the value is dict, and its keys is the key in the table

"""

print("insert: valueDict=%s, tablename=%s" % (valueDict, tablename))

dictKeyList = valueDict.keys()

dictValueList = valueDict.values()

print("dictKeyList=", dictKeyList, "dictValueList=", dictValueList)

keyListSql = ", ".join(self.quoteIdentifier(eachKey) for eachKey in dictKeyList)

print("keyListSql=", keyListSql)

# valueListSql = ", ".join(eachValue for eachValue in dictValueList)

valueListSql = ""

formattedDictValueList = []

for eachValue in dictValueList:

# print("eachValue=", eachValue)

eachValueInSql = ""

valueType = type(eachValue)

# print("valueType=", valueType)

if valueType is str:

eachValueInSql = '"%s"' % eachValue

elif valueType is int:

eachValueInSql = '%d' % eachValue

# TODO: add more type formatting if necessary

print("eachValueInSql=", eachValueInSql)

formattedDictValueList.append(eachValueInSql)

valueListSql = ", ".join(eachValue for eachValue in formattedDictValueList)

print("valueListSql=", valueListSql)

insertSql = """INSERT INTO %s (%s) VALUES (%s)""" % (tablename, keyListSql, valueListSql)

print("insertSql=", insertSql)

# INSERT INTO tbl_car_info_test (`url`, `mainBrand`, `subBrand`, `brandSerie`, `brandSerieId`, `model`, `modelId`, `modelStatus`, `cityDealerPrice`, `msrpPrice`) VALUES ("https://www.autohome.com.cn/spec/5872/#pvareaid=2042128", "宝马", "华晨宝马", "宝马3系", "66", "2010款 320i 豪华型", "5872", "停售", 325000, 375000)

return self.executeSql(sqlStr=insertSql, actionDescription=("Insert value to table %s" % tablename))

def delete(self, modelId, tablename=defaultTableName):

"""

delete item from car model id for existing table of autohome car info

"""

print("delete: modelId=%s, tablename=%s" % (modelId, tablename))

deleteSql = """DELETE FROM %s WHERE modelId = %s""" % (tablename, modelId)

print("deleteSql=", deleteSql)

return self.executeSql(sqlStr=deleteSql, actionDescription=("Delete value from table %s by model id %s" % (tablename, modelId)))

def testMysqlDb():

"""test mysql"""

testDropTable = True

testCreateTable = True

testInsertValue = True

testDeleteValue = True

# 1.test connect mysql

mysqlObj = MysqlDb()

print("mysqlObj=", mysqlObj)

# testTablename = "autohome_car_info"

# testTablename = "tbl_car_info_test"

testTablename = CurrentTableName

print("testTablename=", testTablename)

if testDropTable:

# 2. test drop table

dropTableOk = mysqlObj.dropTable(testTablename)

print("dropTable", testTablename, "return", dropTableOk)

if testCreateTable:

# 3. test create table

createTableOk = mysqlObj.createTable(testTablename)

print("createTable", testTablename, "return", createTableOk)

if testInsertValue:

# 4. test insert value dict

valueDict = {

"url": "https://www.autohome.com.cn/spec/5872/#pvareaid=2042128", #车型url

"mainBrand": "宝马", #品牌

"subBrand": "华晨宝马", #子品牌

"brandSerie": "宝马3系", #车系

"brandSerieId": "66", #车系ID

"model": "2010款 320i 豪华型", #车型

"modelId": "5872", #车型ID

"modelStatus": "停售", #车型状态

"cityDealerPrice": 325000, #经销商参考价

"msrpPrice": 375000 # 厂商指导价

}

print("valueDict=", valueDict)

insertOk = mysqlObj.insert(valueDict=valueDict, tablename=testTablename)

print("insertOk=", insertOk)

if testDeleteValue:

toDeleteModelId = "5872"

deleteOk = mysqlObj.delete(modelId=toDeleteModelId, tablename=testTablename)

print("deleteOk=", deleteOk)

def testAutohomeResultWorker():

"""just test for create mysql db is ok or not"""

autohomeResultWorker = AutohomeResultWorker(None, None)

print("autohomeResultWorker=%s" % autohomeResultWorker)

if __name__ == '__main__':

testMysqlDb()

# testAutohomeResultWorker()

去运行:pyspider -c config.json

但是运行了很长时间之后,出错:

以及期间发现:

pyspider 爬取结果 mysql_【已解决】PySpider中保存数据到mysql相关推荐

  1. Scrapy框架的学习(2.scrapy入门,简单爬取页面,并使用管道(pipelines)保存数据)

    上个博客写了:  Scrapy的概念以及Scrapy的详细工作流程 https://blog.csdn.net/wei18791957243/article/details/86154068 1.sc ...

  2. python3 练手:爬取爱问知识人,运用sqlite3保存数据

    python3 练手:爬取爱问知识人 参考地址:https://cuiqingcai.com/1972.html 获取页面:https://iask.sina.com.cn/c/74.html 分析: ...

  3. 【已解决】Java保存数据超时失败 ClickHouse exception, code 1002, 8123 failed to respon,keep_alive_timeout参数

    问题 使用clickhouse作为数据库,存储前端设备采集到的海量数据 使用Java语言开发,连接数据库保存数据,实时分批写入,使用的maven依赖如下: <!-- clickhouse 相关依 ...

  4. xml文件拆分 python_用Python提取合并由集搜客爬取的多个xml文件中的数据 | 向死而生...

    为了爬点小数据同时试用了八爪鱼和集搜客.两者都有免费版本,但八爪鱼数据导出需要积分,集搜客可以不用积分.不过八爪鱼导出的数据有多种格式可选,而集搜客如果不用积分就只能得到一堆xml文件.本着能省则省的 ...

  5. cpp导入excel到mysql_将EXCEL表格中的数据导入mysql数据库表中

    本文转载自http://blog.sina.com.cn/s/blog_5d972ae00100gjij.html 今天项目上遇到需要将excel中的数据导入到数据库的classify表中,于是乎拼命 ...

  6. pyspider爬取王者荣耀数据(下)

    咪哥杂谈 本篇阅读时间约为 4 分钟. 1 前言 本篇来继续完成数据的爬取.离上周文章已经过了一星期了,忘记的可以回顾下:<pyspider爬取王者荣耀数据(上)> 上篇文章中写到的,无非 ...

  7. pyspider爬取王者荣耀数据(上)

    咪哥杂谈 本篇阅读时间约为 8 分钟. 1 前言 不知道还有多少人记得几个月前写的两篇文章,介绍关于 PyQuery 入门使用的教程.忘记的朋友,可以去回顾下用法: 爬虫神器之 PyQuery 实用教 ...

  8. Python爬虫入门 | 7 分类爬取豆瓣电影,解决动态加载问题

      比如我们今天的案例,豆瓣电影分类页面.根本没有什么翻页,需要点击"加载更多"新的电影信息,前面的黑科技瞬间被秒--   又比如知乎关注的人列表页面:   我复制了其中两个人昵称 ...

  9. scrapy爬取知名问答网站(解决登录+保存cookies值+爬取问答数据)--完整版完美解决登录问题

    菜鸟写Python:scrapy爬取知名问答网站 实战(3) 一.文章开始: 可能看到这篇文章的朋友,大多数都是受慕课网bobby讲师课程的影响,本人也有幸在朋友处了解过这个项目,但是似乎他代码中登录 ...

最新文章

  1. php动态语言静态化
  2. WARNING:tensorflow:Layer gru will not use cuDNN kernels since it doesn‘t meet the criteria. It will
  3. linux打开sqlite3数据库,Centos6.5中如何用sqlite3命令打开’.db’后缀的数据库执行sql...
  4. 【NOIP2013模拟9.29】TheSwaps
  5. python 服务监控_python实现监控某个服务 服务崩溃即发送邮件报告
  6. linux7网卡启动的过程,linux网络启动
  7. Java 初始化块
  8. ASP.NET 数据访问类
  9. 梦记录:1204(梦到观世音菩萨像)
  10. 【优化算法】搜索引擎优化算法(BES)【含Matlab源码 1426期】
  11. 亲测微信活码裂变系统源码+解除限制/附安装说明
  12. 自动安装JDK、HADOOP、ZOOKEEPER、HIVE的shell脚本
  13. css中clip:rect用法
  14. uniapp实战项目 (仿知识星球App) - - 配置开发工具和全局css样式
  15. LeetCode 110 Balanced Binary Tree
  16. JavaScript(5)-内置对象
  17. 如何避免编程从入门到放弃?
  18. 入坑QT3之安装之后的系统环境配置和程序打包过程------主要是程序打包之后出现各种文件缺失问题
  19. HPC 网络技术 — Overview
  20. java虚拟机的端口映射_怎样使用Holer实现将主机上多个端口映射到外部网络访问...

热门文章

  1. Excel 合并一个工作簿中的所有工作表
  2. SAP PI PO JDBC接口培训视频
  3. 白炽灯与led灯哪个对眼睛好?双十二值得入手的led护眼灯
  4. Oracle数据库中对误删数据的快速恢复
  5. 车辆运动学模型—bicycle model
  6. 管螺纹如何标注_螺纹的种类竟有这么乱,你才接触过几种?看完你就知道了
  7. 计算机关机时间设置方法,电脑定时关机的设置方法
  8. (十八)Python爬虫:XPath的使用
  9. 完美解决在Latex的表格里的单元格内的文本紧贴着上边框线条的问题
  10. 怎样避免论文查重率过高