Python 3.X 要使用urllib.request 来抓取网络资源。转
Python 3.X 要使用urllib.request 来抓取网络资源。
最简单的方式:
#coding=utf-8
import urllib.request
response = urllib.request.urlopen('http://python.org/')
buff = response.read()
#显示
html = buff.decode("utf8")
response.close()
print(html)
使用Request的方式:
#coding=utf-8
import urllib.request
req = urllib.request.Request('http://www.voidspace.org.uk')
response = urllib.request.urlopen(req)
buff = response.read()
#显示
the_page = buff.decode("utf8")
response.close()
print(the_page)
这种方式同样可以用来处理其他URL,例如FTP:
#coding=utf-8
import urllib.request
req = urllib.request.Request('ftp://ftp.pku.edu.cn/')
response = urllib.request.urlopen(req)
buff = response.read()
#显示
the_page = buff.decode("utf8")
response.close()
print(the_page)
使用POST请求:
import urllib.parseimport
urllib.requesturl = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord','location' : 'Northampton','language' : 'Python' }data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
the_page = response.read()
使用GET请求:
import urllib.request
import urllib.parse
data = {}
data['name'] = 'Somebody Here'
data['location'] = 'Northampton'
data['language'] = 'Python'
url_values = urllib.parse.urlencode(data)
print(url_values)
name=Somebody+Here&language=Python&location=Northampton
url = 'http://www.example.com/example.cgi'
full_url = url + '?' + url_values
data = urllib.request.open(full_url)
添加header:
import urllib.parse
import urllib.requesturl = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Michael Foord','location' : 'Northampton','language' : 'Python' }
headers = { 'User-Agent' : user_agent }data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data, headers)
response = urllib.request.urlopen(req)
the_page = response.read()
错误处理:
req = urllib.request.Request('http://www.pretend_server.org')
try: urllib.request.urlopen(req)
except urllib.error.URLError as e:print(e.reason)
返回的错误代码:
# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {100: ('Continue', 'Request received, please continue'),101: ('Switching Protocols','Switching to new protocol; obey Upgrade header'),200: ('OK', 'Request fulfilled, document follows'),201: ('Created', 'Document created, URL follows'),202: ('Accepted','Request accepted, processing continues off-line'),203: ('Non-Authoritative Information', 'Request fulfilled from cache'),204: ('No Content', 'Request fulfilled, nothing follows'),205: ('Reset Content', 'Clear input form for further input.'),206: ('Partial Content', 'Partial content follows.'),300: ('Multiple Choices','Object has several resources -- see URI list'),301: ('Moved Permanently', 'Object moved permanently -- see URI list'),302: ('Found', 'Object moved temporarily -- see URI list'),303: ('See Other', 'Object moved -- see Method and URL list'),304: ('Not Modified','Document has not changed since given time'),305: ('Use Proxy','You must use proxy specified in Location to access this ''resource.'),307: ('Temporary Redirect','Object moved temporarily -- see URI list'),400: ('Bad Request','Bad request syntax or unsupported method'),401: ('Unauthorized','No permission -- see authorization schemes'),402: ('Payment Required','No payment -- see charging schemes'),403: ('Forbidden','Request forbidden -- authorization will not help'),404: ('Not Found', 'Nothing matches the given URI'),405: ('Method Not Allowed','Specified method is invalid for this server.'),406: ('Not Acceptable', 'URI not available in preferred format.'),407: ('Proxy Authentication Required', 'You must authenticate with ''this proxy before proceeding.'),408: ('Request Timeout', 'Request timed out; try again later.'),409: ('Conflict', 'Request conflict.'),410: ('Gone','URI no longer exists and has been permanently removed.'),411: ('Length Required', 'Client must specify Content-Length.'),412: ('Precondition Failed', 'Precondition in headers is false.'),413: ('Request Entity Too Large', 'Entity is too large.'),414: ('Request-URI Too Long', 'URI is too long.'),415: ('Unsupported Media Type', 'Entity body in unsupported format.'),416: ('Requested Range Not Satisfiable','Cannot satisfy request range.'),417: ('Expectation Failed','Expect condition could not be satisfied.'),500: ('Internal Server Error', 'Server got itself in trouble'),501: ('Not Implemented','Server does not support this operation'),502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),503: ('Service Unavailable','The server cannot process the request due to a high load'),504: ('Gateway Timeout','The gateway server did not receive a timely response'),505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),}
转载于:https://www.cnblogs.com/mmbbflyer/p/6340375.html
Python 3.X 要使用urllib.request 来抓取网络资源。转相关推荐
- python和人工智能爬虫_Python 演示人工智能爬虫 抓取新浪微博数据
时间:2019-04-10 概述:搜索爬虫 人工智能 一个Python 人工智能爬虫演示,本代码演示抓取新浪微博数据,若正在采集新浪微博数据,如需取消请按CTRL+C 退出程序. #!/usr/bin ...
- python爬虫怎么爬同一个网站的多页数据-如何用Python爬数据?(一)网页抓取
如何用Python爬数据?(一)网页抓取 你期待已久的Python网络数据爬虫教程来了.本文为你演示如何从网页里找到感兴趣的链接和说明文字,抓取并存储到Excel. 需求 我在公众号后台,经常可以收到 ...
- 小猪的Python学习之旅 —— 14.项目实战:抓取豆瓣音乐Top 250数据存到Excel中
小猪的Python学习之旅 -- 14.项目实战:抓取豆瓣音乐Top 250数据存到Excel中 标签:Python 一句话概括本文: 利用Excel存储爬到的抓取豆瓣音乐Top 250数据信息,还有 ...
- python爬网站数据实例-如何用Python爬数据?(一)网页抓取
如何用Python爬数据?(一)网页抓取 你期待已久的Python网络数据爬虫教程来了.本文为你演示如何从网页里找到感兴趣的链接和说明文字,抓取并存储到Excel. 需求 我在公众号后台,经常可以收到 ...
- python爬虫(一)urllib.request库学习总结
一.简单介绍 urllib库:是python的内置请求库,常用于网页的请求访问. 包括以下模块: urllib.request 请求模块 urllib.error 异常处理模块 urllib.pars ...
- python 字典字符串转字典——urllib.request.Request发送get,post请求,发送json参数
1.eval方法即可[字典字符串转字典] file_content = eval(file_content) 2.urllib.request.Request发送post请求,发送json参数 fro ...
- 【实战】基于urllib.request登录爬取163邮箱
目录 前言 参考项目 整体思路 基于urllib实现非自动爬取163邮箱 抓取网页信息 获取收件箱列表内容 获取信件 前言 最近在做一项从邮箱拿取数据进行处理的任务,由于数据量过大并且数据是邮件中的内 ...
- python soup提取叶子标签_python3用BeautifulSoup抓取div标签
#-*- coding:utf-8 -*-#python 2.7#XiaoDeng#http://tieba.baidu.com/p/2460150866#标签操作 from bs4 importBe ...
- python爬虫实现股票数据存储_Python爬虫抓取东方财富网股票数据并实现MySQL数据库存储!...
Python爬虫可以说是好玩又好用了.现想利用Python爬取网页股票数据保存到本地csv数据文件中,同时想把股票数据保存到MySQL数据库中.需求有了,剩下的就是实现了. 在开始之前,保证已经安装好 ...
最新文章
- javascript工具类(util)-持续更新
- html5添加随机率,HTML5 canvas 绘制随机曲线 并实现放大功能
- java中的数组、队列、堆栈
- 自定义 View 循环滚动刻度控件
- [css] 如何使用css实现跨浏览器的最小高度?
- linux dns函数,Linux DNS (1)的基本概念
- wxpython 表格粘贴,wxpython在整个应用程序中剪切复制和粘贴
- [OpenGL] glColor 和 glClearColor 区别
- Hibernate(2012/2/27)
- 阶段3 1.Mybatis_12.Mybatis注解开发_1 mybatis注解开发的环境搭建
- 世界 图书 学科 分类
- Highcharter绘制中国地图
- 软件工程工具图(软件开发过程中可能用到的工具图)
- 【PS】证件照修改尺寸
- Java基础知识和进阶
- JAVA UTC时区时间相互转换
- mDNS原理的简单理解
- 多维尺度分析MDS详解
- QCA988x Windows7驱动安装方法
- 2018尚硅谷SpringBoot视频教程附代码+笔记+课件(内含Docker)
热门文章
- 安装windows时loading files结束就重启_Boot Camp安装windows 10
- python列表生成多个号码_python按需生成固定数量电话号码并保存为excel(不重复)...
- mysql jdbc url设置时区
- nginx log_format 中的变量
- linux 工具 SecureCRT 使用 rz 和 sz 命令
- 【uni-app】uParse 富文本解析插件遇到长图、大图宽高比异常问题
- 【H2 Database】Server模式启动
- 【maven】改造已有项目
- LINUX安装JDK1.8(wget rpm)
- bgp 建立邻居发送的报文_大型网络BGP之IBGP和EBGP邻居关系基础配置