前言

提示：这里可以添加本文要记录的大概内容：
例如：随着人工智能的不断发展，机器学习这门技术也越来越重要，很多人都开启了学习机器学习，本文就介绍了机器学习的基础内容。

提示：以下是本篇文章正文内容，下面案例可供参考

一、要求

二、使用步骤

1.引入库

代码如下（示例）：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import  ssl
ssl._create_default_https_context = ssl._create_unverified_context

2.maoyanspider.py

# -*- coding: utf-8 -*-
import scrapy
from ..items import MaoyanItem
import urllibclass MaoyanspiderSpider(scrapy.Spider):name = 'maoyanspider'allowed_domains = ['maoyan.com']start_urls = ['https://maoyan.com/board/4']def parse(self, response):dls = response.xpath("//dl[@class='board-wrapper']/dd")for dl in dls:item = MaoyanItem()item['name'] = dl.xpath("div[@class='board-item-main']/div[@class='board-item-content']/div[@class='movie-item-info']/p[@class='name']/a/text()").extract_first()item['actors'] = dl.xpath("div[@class='board-item-main']/div[@class='board-item-content']/div[@class='movie-item-info']/p[@class='star']/text()").extract_first().strip()item['releasetime'] = dl.xpath("div[@class='board-item-main']/div[@class='board-item-content']/div[@class='movie-item-info']/p[@class='releasetime']/text()").extract_first()yield itemnext_page = response.xpath('//div[@class="pager-main"]/ul/li/a[contains(text(), "下一页")]/@href').extract_first()if next_page is not None:new_link = urllib.parse.urljoin(response.url, next_page)yield scrapy.Request(new_link, callback=self.parse)

3.items.py

import scrapyclass MaoyanItem(scrapy.Item):# define the fields for your item here like:name = scrapy.Field()actors = scrapy.Field()releasetime = scrapy.Field()

4.pipelines.py

import pymysql, csvclass MaoyanPipeline(object):def process_item(self, item, spider):data_list = [item['name'], item['actors'], item['releasetime']]head = ('company', 'salary', 'address', 'experience', 'education', 'number_people')with open('maoyan.csv', 'a+', encoding='utf-8', newline='') as file:writer = csv.writer(file)# writer.writerow(head)  # 写入表头  也就是文件标题writer.writerow(data_list)return item
class MaoyanMysqlPipeline(object):def open_spider(self, spider):print('爬虫开始执行')self.db = pymysql.connect(host='localhost', user='root',password='123456', database='test', port=3306, charset='utf8')# 执行语句，游标对象self.cursor = self.db.cursor()self.df =  open("maoyan.csv", "w", newline="")def process_item(self, item, spider):t = (item['name'], item['actors'], item['releasetime'])sql = 'insert into maoyan values (%s, %s, %s)'self.cursor.execute(sql, t)self.db.commit()return itemdef close_spider(self, spider):self.cursor.close()self.db.close()print('退出爬虫')

scrapy猫眼爬虫相关推荐

scrapy 分布式爬虫- RedisSpider
scrapy 分布式爬虫- RedisSpider 爬去当当书籍信息多台机器同时爬取,共用一个redis记录 scrapy_redis 带爬取的request对象储存在redis中,每台机器读取re ...
Crawler之Scrapy：Python实现scrapy框架爬虫两个网址下载网页内容信息
Crawler之Scrapy:Python实现scrapy框架爬虫两个网址下载网页内容信息目录输出结果实现代码输出结果后期更新-- 实现代码 import scrapy class Dmoz ...
快速认识网络爬虫与Scrapy网络爬虫框架
本课程为之后Scrapy课程的预先课程,非常简单,主要是为了完全没有基础的小白准备的,如果你已经有了一定的了解那么请跳过该部分问:什么是网络爬虫答:就是从网上下载数据的一个程序,只不过这个程序下载 ...
Spider Scrapy 框架爬虫
scrapy 是一款常用的爬虫框架,可以实现分布式爬虫和高性能的爬虫 scrapy 框架的创建实在cmd命令行下进行的: 首先要在命令行下转到你要创建的文件夹下: cd 目标文件夹路径创建的是一个工 ...
mysql scrapy 重复数据_大数据python（scrapy）爬虫爬取招聘网站数据并存入mysql后分析...
基于Scrapy的爬虫爬取腾讯招聘网站岗位数据视频(见本头条号视频) 根据TIOBE语言排行榜更新的最新程序语言使用排行榜显示,python位居第三,同比增加2.39%,为什么会越来越火,越来越受欢迎 ...
十 web爬虫讲解2—Scrapy框架爬虫—Scrapy安装—Scrapy指令
Scrapy框架安装 1.首先,终端执行命令升级pip: python -m pip install --upgrade pip 2.安装,wheel(建议网络安装) pip install whee ...
python scrapy框架爬虫_Python Scrapy爬虫框架
Scrapy爬虫框架结构: 数据流的3个路径: 一: 1.Engine从Spider处获得爬取请求(Request) 2.Engine将爬取请求转发给Scheduler,用于调度二: 3.Engin ...
34.scrapy解决爬虫翻页问题
34.scrapy解决爬虫翻页问题参考文章: (1)34.scrapy解决爬虫翻页问题 (2)https://www.cnblogs.com/lvjing/p/9706509.html (3)htt ...
Python基础知识回顾及scrapy框架爬虫基础
1.函数函数参数:必须默认关键可变函数种类:外部内部匿名 lambda 装饰函数:@语法糖函数总是要返回的 ,若没有return,None总是被返回 2.面向对象: 对象:已存在, ...
scrapy 用爬虫规则指定爬行轨迹自动抓取
需求给定爬虫起始地址,爬行路径,获得目标页的指定内容约定路径起点首页 --> (历史)频道 --> 小说info页–> 章节详情起始页 --> 作者中心 --> 大 ...

scrapy猫眼爬虫

文章目录

前言

一、要求

二、使用步骤

1.引入库

2.maoyanspider.py

3.items.py

4.pipelines.py

scrapy猫眼爬虫相关推荐

最新文章

热门文章