Python 3.X 要使用urllib.request 来抓取网络资源。

最简单的方式:

#coding=utf-8
import urllib.request
response = urllib.request.urlopen('http://python.org/')
buff = response.read()
#显示
html = buff.decode("utf8")
response.close()
print(html)

使用Request的方式:

#coding=utf-8
import urllib.request
req = urllib.request.Request('http://www.voidspace.org.uk')
response = urllib.request.urlopen(req)
buff = response.read()
#显示
the_page = buff.decode("utf8")
response.close()
print(the_page)

这种方式同样可以用来处理其他URL,例如FTP:

#coding=utf-8
import urllib.request
req = urllib.request.Request('ftp://ftp.pku.edu.cn/')
response = urllib.request.urlopen(req)
buff = response.read()
#显示
the_page = buff.decode("utf8")
response.close()
print(the_page)

使用POST请求:

import urllib.parseimport
urllib.requesturl = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord','location' : 'Northampton','language' : 'Python' }data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
the_page = response.read()

使用GET请求:

import urllib.request
import urllib.parse
data = {}
data['name'] = 'Somebody Here'
data['location'] = 'Northampton'
data['language'] = 'Python'
url_values = urllib.parse.urlencode(data)
print(url_values)
name=Somebody+Here&language=Python&location=Northampton
url = 'http://www.example.com/example.cgi'
full_url = url + '?' + url_values
data = urllib.request.open(full_url)

添加header:

import urllib.parse
import urllib.requesturl = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Michael Foord','location' : 'Northampton','language' : 'Python' }
headers = { 'User-Agent' : user_agent }data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data, headers)
response = urllib.request.urlopen(req)
the_page = response.read()

错误处理:

req = urllib.request.Request('http://www.pretend_server.org')
try: urllib.request.urlopen(req)
except urllib.error.URLError as e:print(e.reason)

返回的错误代码:

# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {100: ('Continue', 'Request received, please continue'),101: ('Switching Protocols','Switching to new protocol; obey Upgrade header'),200: ('OK', 'Request fulfilled, document follows'),201: ('Created', 'Document created, URL follows'),202: ('Accepted','Request accepted, processing continues off-line'),203: ('Non-Authoritative Information', 'Request fulfilled from cache'),204: ('No Content', 'Request fulfilled, nothing follows'),205: ('Reset Content', 'Clear input form for further input.'),206: ('Partial Content', 'Partial content follows.'),300: ('Multiple Choices','Object has several resources -- see URI list'),301: ('Moved Permanently', 'Object moved permanently -- see URI list'),302: ('Found', 'Object moved temporarily -- see URI list'),303: ('See Other', 'Object moved -- see Method and URL list'),304: ('Not Modified','Document has not changed since given time'),305: ('Use Proxy','You must use proxy specified in Location to access this ''resource.'),307: ('Temporary Redirect','Object moved temporarily -- see URI list'),400: ('Bad Request','Bad request syntax or unsupported method'),401: ('Unauthorized','No permission -- see authorization schemes'),402: ('Payment Required','No payment -- see charging schemes'),403: ('Forbidden','Request forbidden -- authorization will not help'),404: ('Not Found', 'Nothing matches the given URI'),405: ('Method Not Allowed','Specified method is invalid for this server.'),406: ('Not Acceptable', 'URI not available in preferred format.'),407: ('Proxy Authentication Required', 'You must authenticate with ''this proxy before proceeding.'),408: ('Request Timeout', 'Request timed out; try again later.'),409: ('Conflict', 'Request conflict.'),410: ('Gone','URI no longer exists and has been permanently removed.'),411: ('Length Required', 'Client must specify Content-Length.'),412: ('Precondition Failed', 'Precondition in headers is false.'),413: ('Request Entity Too Large', 'Entity is too large.'),414: ('Request-URI Too Long', 'URI is too long.'),415: ('Unsupported Media Type', 'Entity body in unsupported format.'),416: ('Requested Range Not Satisfiable','Cannot satisfy request range.'),417: ('Expectation Failed','Expect condition could not be satisfied.'),500: ('Internal Server Error', 'Server got itself in trouble'),501: ('Not Implemented','Server does not support this operation'),502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),503: ('Service Unavailable','The server cannot process the request due to a high load'),504: ('Gateway Timeout','The gateway server did not receive a timely response'),505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),}

转载于:https://www.cnblogs.com/mmbbflyer/p/6340375.html

Python 3.X 要使用urllib.request 来抓取网络资源。转相关推荐

  1. python和人工智能爬虫_Python 演示人工智能爬虫 抓取新浪微博数据

    时间:2019-04-10 概述:搜索爬虫 人工智能 一个Python 人工智能爬虫演示,本代码演示抓取新浪微博数据,若正在采集新浪微博数据,如需取消请按CTRL+C 退出程序. #!/usr/bin ...

  2. python爬虫怎么爬同一个网站的多页数据-如何用Python爬数据?(一)网页抓取

    如何用Python爬数据?(一)网页抓取 你期待已久的Python网络数据爬虫教程来了.本文为你演示如何从网页里找到感兴趣的链接和说明文字,抓取并存储到Excel. 需求 我在公众号后台,经常可以收到 ...

  3. 小猪的Python学习之旅 —— 14.项目实战:抓取豆瓣音乐Top 250数据存到Excel中

    小猪的Python学习之旅 -- 14.项目实战:抓取豆瓣音乐Top 250数据存到Excel中 标签:Python 一句话概括本文: 利用Excel存储爬到的抓取豆瓣音乐Top 250数据信息,还有 ...

  4. python爬网站数据实例-如何用Python爬数据?(一)网页抓取

    如何用Python爬数据?(一)网页抓取 你期待已久的Python网络数据爬虫教程来了.本文为你演示如何从网页里找到感兴趣的链接和说明文字,抓取并存储到Excel. 需求 我在公众号后台,经常可以收到 ...

  5. python爬虫(一)urllib.request库学习总结

    一.简单介绍 urllib库:是python的内置请求库,常用于网页的请求访问. 包括以下模块: urllib.request 请求模块 urllib.error 异常处理模块 urllib.pars ...

  6. python 字典字符串转字典——urllib.request.Request发送get,post请求,发送json参数

    1.eval方法即可[字典字符串转字典] file_content = eval(file_content) 2.urllib.request.Request发送post请求,发送json参数 fro ...

  7. 【实战】基于urllib.request登录爬取163邮箱

    目录 前言 参考项目 整体思路 基于urllib实现非自动爬取163邮箱 抓取网页信息 获取收件箱列表内容 获取信件 前言 最近在做一项从邮箱拿取数据进行处理的任务,由于数据量过大并且数据是邮件中的内 ...

  8. python soup提取叶子标签_python3用BeautifulSoup抓取div标签

    #-*- coding:utf-8 -*-#python 2.7#XiaoDeng#http://tieba.baidu.com/p/2460150866#标签操作 from bs4 importBe ...

  9. python爬虫实现股票数据存储_Python爬虫抓取东方财富网股票数据并实现MySQL数据库存储!...

    Python爬虫可以说是好玩又好用了.现想利用Python爬取网页股票数据保存到本地csv数据文件中,同时想把股票数据保存到MySQL数据库中.需求有了,剩下的就是实现了. 在开始之前,保证已经安装好 ...

最新文章

  1. javascript工具类(util)-持续更新
  2. html5添加随机率,HTML5 canvas  绘制随机曲线 并实现放大功能
  3. java中的数组、队列、堆栈
  4. 自定义 View 循环滚动刻度控件
  5. [css] 如何使用css实现跨浏览器的最小高度?
  6. linux dns函数,Linux DNS (1)的基本概念
  7. wxpython 表格粘贴,wxpython在整个应用程序中剪切复制和粘贴
  8. [OpenGL] glColor 和 glClearColor 区别
  9. Hibernate(2012/2/27)
  10. 阶段3 1.Mybatis_12.Mybatis注解开发_1 mybatis注解开发的环境搭建
  11. 世界 图书 学科 分类
  12. Highcharter绘制中国地图
  13. 软件工程工具图(软件开发过程中可能用到的工具图)
  14. 【PS】证件照修改尺寸
  15. Java基础知识和进阶
  16. JAVA UTC时区时间相互转换
  17. mDNS原理的简单理解
  18. 多维尺度分析MDS详解
  19. QCA988x Windows7驱动安装方法
  20. 2018尚硅谷SpringBoot视频教程附代码+笔记+课件(内含Docker)

热门文章

  1. 安装windows时loading files结束就重启_Boot Camp安装windows 10
  2. python列表生成多个号码_python按需生成固定数量电话号码并保存为excel(不重复)...
  3. mysql jdbc url设置时区
  4. nginx log_format 中的变量
  5. linux 工具 SecureCRT 使用 rz 和 sz 命令
  6. 【uni-app】uParse 富文本解析插件遇到长图、大图宽高比异常问题
  7. 【H2 Database】Server模式启动
  8. 【maven】改造已有项目
  9. LINUX安装JDK1.8(wget rpm)
  10. bgp 建立邻居发送的报文_大型网络BGP之IBGP和EBGP邻居关系基础配置