python3爬取豆瓣最火电影

首先我们需要导入的有requests

bs4和BeautifulSoup

还有就是导入xlwt这个是excel文件

import requests
from bs4 import BeautifulSoup
import xlwtdef request_douban(url):try:response = requests.get(url)if response.status_code == 200:return response.textexcept requests.RequestException:return None
# 创建一个 excel 的 sheet
#
#
#
# 每一列就是我们要的关键内容book = xlwt.Workbook(encoding='utf-8', style_compression=0)sheet = book.add_sheet('豆瓣电影Top1000', cell_overwrite_ok=True)
sheet.write(0, 0, '名称')
sheet.write(0, 1, '图片')
sheet.write(0, 2, '排名')
sheet.write(0, 3, '评分')
sheet.write(0, 4, '作者')
sheet.write(0, 5, '简介')n = 1def save_to_excel(soup):list = soup.find(class_='grid_view').find_all('li')for item in list:item_name = item.find(class_='title').stringitem_img = item.find('a').find('img').get('src')item_index = item.find(class_='').stringitem_score = item.find(class_='rating_num').stringitem_author = item.find('p').textif (item.find(class_='inq') != None):item_intr = item.find(class_='inq').string# print('爬取电影：' + item_index + ' | ' + item_name +' | ' + item_img +' | ' + item_score +' | ' + item_author +' | ' + item_intr )print('爬取电影：' + item_index + ' | ' + item_name + ' | ' + item_score + ' | ' + item_intr)# global只做全局变量的声明，而不是一般的执行语句global nsheet.write(n, 0, item_name)sheet.write(n, 1, item_img)sheet.write(n, 2, item_index)sheet.write(n, 3, item_score)sheet.write(n, 4, item_author)sheet.write(n, 5, item_intr)n = n + 1def main(page):url = 'https://movie.douban.com/top250?start=' + str(page * 25) + '&filter='html = request_douban(url)soup = BeautifulSoup(html, 'lxml')save_to_excel(soup)if __name__ == '__main__':for i in range(0, 40):main(i)book.save(u'豆瓣最受欢迎的1000部电影.xlsx')

python3爬取豆瓣最火电影相关推荐

Python3 爬取豆瓣电影信息
原文链接: Python3 爬取豆瓣电影信息上一篇: python3 爬取电影信息下一篇: neo4j 查询豆瓣api https://developers.douban.com/wiki/?t ...
第一次写爬虫程序爬取豆瓣5W条电影数据
第一次写爬虫程序爬取豆瓣5W条电影数据最近工作比较不是很忙,想到之前使用httpclient和jsoup爬取过一次豆瓣电影TOP250,但总觉得数据量太小,不过瘾.于是趁着最近不是很忙的机会,重新写 ...
Python爬取豆瓣热映电影
Python爬取豆瓣热映电影 # encoding: utf-8import requests from lxml import etree# 1. 将目标网站上的页面抓取下来 headers = { ...
Python爬虫小白教程（二）—— 爬取豆瓣评分TOP250电影
文章目录前言安装bs4库网站分析获取页面爬取页面页面分析其他页面爬虫系列前言经过上篇博客Python爬虫小白教程(一)-- 静态网页抓取后我们已经知道如何抓取一个静态的页面了,现在 ...
使用python3爬取豆瓣电影top250
经过一个多星期的学习,对python3的语法有了一定了解,马上动手做了一个爬虫,检验学习效果目标爬取豆瓣电影top250中每一部电影的名称.排名.链接.名言.评分准备工作运行平台:window ...
使用python3 爬取豆瓣电影热映和即将上映
使用python3爬取都摆即将上映和正在热映的电影,代码如下直接使用bs4获取页面,使用css 获取到对应的信息后,使用字符串拼接的方式,将正在热映和即将上映的信息拼接出来并写入到html页面中,在 ...
用Python爬取豆瓣首页所有电影名称、每部电影影评及生成词云
1.爬取环境: window 7 Chrome 浏览器注册豆瓣.注册超级鹰 2.安装第三方库:安装第三方库: 主程序用到的库有 import sys, time import pytesseract ...
Python 爬取周杰伦歌曲信息，爬取豆瓣top250的电影，并保存至excel中
使用requests.BeautifulSoup模块,在网上爬取信息.有的网页可以直接爬取到,有些则需要分步加载,这时就需要使用network进行分析找到信息对应的请求. 有的会反爬虫,则需要添加he ...
python爬虫实践之爬取豆瓣高评分电影
目录概述准备所需模块涉及知识点运行效果完成爬虫 1. 分析网页 2. 爬虫代码 3. 整理总结概述爬取豆瓣的高评分的电影. 准备所需模块 re模块 requests模块涉及知识点 ...
python scrapy爬取豆瓣即将上映电影用邮件定时推送给自己
本文不是python.scrapy的教程,而是分享一个好玩的点子. python教程请看python教程,scrapy教程请看scrapy教程爬取豆瓣高分电影教程参考python爬虫入门笔记:用sc ...

python3爬取豆瓣最火电影

python3爬取豆瓣最火电影相关推荐

最新文章

热门文章