pyhton mechanize 学习笔记
1:简单的使用
importmechanize#response = mechanize.urlopen("http://www.hao123.com/") request = mechanize.Request("http://www.hao123.com/") response=mechanize.urlopen(request)printresponse.geturl()printresponse.info()#print response.read()
2:mechanize.urlretrieve
>>> importmechanize>>>help(mechanize.urlretrieve) Help on function urlretrieveinmodule mechanize._opener:urlretrieve(url, filename=None, reporthook=None, data=None, timeout=<object object>)
- 参数 finename 指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据。)
- 参数 reporthook 是一个回调函数,当连接上服务器、以及相应的数据块传输完毕时会触发该回调,我们可以利用这个回调函数来显示当前的下载进度。
- 参数 data 指 post 到服务器的数据,该方法返回一个包含两个元素的(filename, headers)元组,filename 表示保存到本地的路径,header 表示服务器的响应头
- 参数 timeout 是设定的超时对象
reporthook(block_read,block_size,total_size)定义回调函数,block_size是每次读取的数据块的大小,block_read是每次读取的数据块个数,taotal_size是一一共读取的数据量,单位是byte。可以使用reporthook函数来显示读取进度。
简单的例子
def cbk(a, b, c):printa,b,curl= 'http://www.hao123.com/'local= 'd://hao.html'mechanize.urlretrieve(url,local,cbk)
3:form表单登陆
br =mechanize.Browser() br.set_handle_robots(False) br.open("http://www.zhaopin.com/") br.select_form(nr=0) br['loginname'] = '**'自己注册一个账号密码就行了br['password'] = '**'r=br.submit() print os.path.dirname(__file__)+'\login.html'h= file(os.path.dirname(__file__)+'\login.html',"w") rt=r.read() h.write(rt) h.close()
4:Browser
看完help的文档基本可以成神了
Help on class Browser inmodule mechanize._mechanize:classBrowser(mechanize._useragent.UserAgentBase)| Browser-like class with support for history, forms andlinks.| | BrowserStateError is raised whenever the browser is inthe wrong state to| complete the requested operation - e.g., when .back() iscalled when the| browser history is empty, or when .follow_link() iscalled when the current| response does notcontain HTML data.| |Public attributes:| |request: current request (mechanize.Request)|form: currently selected form (see .select_form())| |Method resolution order:|Browser|mechanize._useragent.UserAgentBase|mechanize._opener.OpenerDirector|mechanize._urllib2_fork.OpenerDirector| |Methods defined here:| | __getattr__(self, name)| | __init__(self, factory=None, history=None, request_class=None)|Only named arguments should be passed to this constructor.| |factory: object implementing the mechanize.Factory interface.|history: object implementing the mechanize.History interface. Note| this interface is still experimental and may change infuture.| request_class: Request classto use. Defaults to mechanize.Request| | The Factory and History objects passed in are 'owned'by the Browser,| so they should notbe shared across Browsers. In particular,| factory.set_response() should not be called exceptby the owning|Browser itself.| | Note that the supplied factory's request_class is overridden by this| constructor, to ensure only one Request class isused.| | __str__(self)| | back(self, n=1)| Go back n steps in history, and returnresponse object.| | n: go back this number of steps (default 1step)| |clear_history(self)| | click(self, *args, **kwds)| See mechanize.HTMLForm.click fordocumentation.| | click_link(self, link=None, **kwds)| Find a link and return a Request object forit.| | Arguments are as for .find_link(), exceptthat a link may be supplied|as the first argument.| |close(self)| |encoding(self)| | find_link(self, **kwds)| Find a link incurrent page.| |Links are returned as mechanize.Link objects.| | #Return third link that .search()-matches the regexp "python"| #(by ".search()-matches", I mean that the regular expression method| #.search() is used, rather than .match()).| find_link(text_regex=re.compile("python"), nr=2)| | #Return first http link in the current page that points to somewhere| #on python.org whose link text (after tags have been removed) is| #exactly "monty python".| find_link(text="monty python",| url_regex=re.compile("http.*python.org"))| | #Return first link with exactly three HTML attributes.| find_link(predicate=lambda link: len(link.attrs) == 3)| | Links include anchors (<a>), image maps (<area>), and frames (<frame>,| <iframe>).| | All arguments must be passed by keyword, not position. Zero ormore|arguments may be supplied. In order to find a link, all arguments|supplied must match.| | If a matching link is not found, mechanize.LinkNotFoundError israised.| | text: link text between link tags: e.g. <a href="blah">this bit</a>(as|returned by pullparser.get_compressed_text(), ie. without tags but| with opening tags "textified"as per the pullparser docs) must compare| equal to this argument, ifsupplied|text_regex: link text between tag (as defined above) must match the| regular expression object orregular expression string passed as this| argument, ifsupplied| name, name_regex: as for text andtext_regex, but matched against the|name HTML attribute of the link tag| url, url_regex: as for text andtext_regex, but matched against the| URL of the link tag (note this matches against Link.url, which isa| relative or absolute URL according to how it was written inthe HTML)| tag: element name of opening tag, e.g. "a"|predicate: a function taking a Link object as its single argument,|returning a boolean result, indicating whether the links|nr: matches the nth link that matches all other criteria (default 0)| | follow_link(self, link=None, **kwds)| Find a link and.open() it.| | Arguments are as for.click_link().| | Return value is same as forBrowser.open().| |forms(self)|Return iterable over forms.| |The returned form objects implement the mechanize.HTMLForm interface.| |geturl(self)|Get URL of current document.| |global_form(self)| Return the global form object, or None ifthe factory implementation| did notsupply one.| | The "global" form object contains all controls that are notdescendants|of any FORM element.| |The returned form object implements the mechanize.HTMLForm interface.| | This is a separate method since the global form is notregarded as part| of the sequence of forms in the document -- mostly for| backwards-compatibility.| | links(self, **kwds)|Return iterable over links (mechanize.Link objects).| | open(self, url, data=None, timeout=<object object>)| |open_local_file(self, filename)| | open_novisit(self, url, data=None, timeout=<object object>)|Open a URL without visiting it.| | Browser state (including request, response, history, forms andlinks)| isleft unchanged by calling this function.| | The interface is the same as for.open().| | This is useful forthings like fetching images.| |See also .retrieve().| |reload(self)| Reload current document, and returnresponse object.| |response(self)|Return a copy of the current response.| |The returned object has the same interface as the object returned by| .open() (ormechanize.urlopen()).| | select_form(self, name=None, predicate=None, nr=None)| Select an HTML form forinput.| | This is a bit like giving a form the "input focus" ina browser.| | If a form isselected, the Browser object supports the HTMLForm| interface, so you can call methods like .set_value(), .set(), and|.click().| | Another way to select a form isto assign to the .form attribute. The|form assigned should be one of the objects returned by the .forms()|method.| | At least one of the name, predicate andnr arguments must be supplied.| If no matching form is found, mechanize.FormNotFoundError israised.| | If name isspecified, then the form must have the indicated name.| | If predicate isspecified, then the form must match that function. The| predicate function is passed the HTMLForm as its single argument, and| should returna boolean value indicating whether the form matched.| | nr, if supplied, is the sequence number of the form (where 0 isthe| first). Note that control 0 isthe first form matching all the other| arguments (if supplied); it is not necessarily the first control inthe| form. The "global form" (consisting of all form controls notcontained| in any FORM element) is considered not to be part of this sequence and| to have no name, so will not be matched unless both name andnr are|None.| |set_cookie(self, cookie_string)|Request to set a cookie.| | Note that it isNOT necessary to call this method under ordinary| circumstances: cookie handling isnormally entirely automatic. The| intended use case israther to simulate the setting of a cookie by| client script ina web page (e.g. JavaScript). In that case, use of| this method is necessary because mechanize currently does notsupport|JavaScript, VBScript, etc.| | The cookie is added in the same way as ifit had arrived with the|current response, as a result of the current request. This means that,| for example, if it is notappropriate to set the cookie based on the|current request, no cookie will be set.| |The cookie will be returned automatically with subsequent responses| made by the Browser instance whenever that's appropriate.| | cookie_string should be a valid value of the Set-Cookie header.| |For example:| |browser.set_cookie(| "sid=abcdef; expires=Wednesday, 09-Nov-06 23:12:40 GMT")| | Currently, this method does not allow for adding RFC 2986cookies.| This limitation will be lifted ifanybody requests it.| |set_handle_referer(self, handle)|Set whether to add Referer header to each request.| |set_response(self, response)|Replace current response with (a copy of) response.| |response may be None.| | This is intended mostly for HTML-preprocessing.| | submit(self, *args, **kwds)|Submit current form.| | Arguments are as formechanize.HTMLForm.click().| | Return value is same as forBrowser.open().| |title(self)| Return title, or None if there is no title element inthe document.| | Treatment of any tag children of attempts to follow Firefox andIE|(currently, tags are preserved).| |viewing_html(self)|Return whether the current response contains HTML data.| | visit_response(self, response, request=None)| Visit the response, as ifit had been .open()ed.| |Unlike .set_response(), this updates history rather than replacing the|current response.| | ----------------------------------------------------------------------| Data andother attributes defined here:| | default_features = ['_redirect', '_cookies', '_refresh', '_equiv', '_b...| | handler_classes = {'_basicauth': <classmechanize._urllib2_fork.HTTPBa...| | ----------------------------------------------------------------------| Methods inherited frommechanize._useragent.UserAgentBase:| |add_client_certificate(self, url, key_file, cert_file)| Add an SSL client certificate, forHTTPS client auth.| | key_file and cert_file must be filenames of the key andcertificate| files, inPEM format. You can use e.g. OpenSSL to convert a p12 (PKCS| 12) file to PEM format:| | openssl pkcs12 -clcerts -nokeys -in cert.p12 -out cert.pem| openssl pkcs12 -nocerts -in cert.p12 -out key.pem| | | Note that client certificate password input isvery inflexible ATM. At| the moment this seems to be console only, which ispresumably the|default behaviour of libopenssl. In future mechanize may support| third-party libraries that (I assume) allow more options here.| | add_password(self, url, user, password, realm=None)| | add_proxy_password(self, user, password, hostport=None, realm=None)| |set_client_cert_manager(self, cert_manager)| Set a mechanize.HTTPClientCertMgr, orNone.| |set_cookiejar(self, cookiejar)| Set a mechanize.CookieJar, orNone.| |set_debug_http(self, handle)|Print HTTP headers to sys.stdout.| |set_debug_redirects(self, handle)|Log information about HTTP redirects (including refreshes).| | Logging is performed using module logging. The logger name is| "mechanize.http_redirects". To actually printsome debug output,|eg:| | importsys, logging| logger = logging.getLogger("mechanize.http_redirects")|logger.addHandler(logging.StreamHandler(sys.stdout))|logger.setLevel(logging.INFO)| |Other logger names relevant to this module:| | "mechanize.http_responses"| "mechanize.cookies"| |To turn on everything:| | importsys, logging| logger = logging.getLogger("mechanize")|logger.addHandler(logging.StreamHandler(sys.stdout))|logger.setLevel(logging.INFO)| |set_debug_responses(self, handle)|Log HTTP response bodies.| | See docstring for .set_debug_redirects() fordetails of logging.| | Response objects may be .seek()able if this isset (currently returned| responses are, raised HTTPError exception responses are not).| | set_handle_equiv(self, handle, head_parser_class=None)| Set whether to treat HTML http-equiv headers like HTTP headers.| | Response objects may be .seek()able if this isset (currently returned| responses are, raised HTTPError exception responses are not).| |set_handle_gzip(self, handle)|Handle gzip transfer encoding.| |set_handle_redirect(self, handle)|Set whether to handle HTTP 30x redirections.| | set_handle_refresh(self, handle, max_time=None, honor_time=True)|Set whether to handle HTTP Refresh headers.| |set_handle_robots(self, handle)| Set whether to observe rules fromrobots.txt.| |set_handled_schemes(self, schemes)|Set sequence of URL scheme (protocol) strings.| | For example: ua.set_handled_schemes(["http", "ftp"])| | If this fails (with ValueError) because you've passed an unknown| scheme, the set of handled schemes will notbe changed.| |set_password_manager(self, password_manager)| Set a mechanize.HTTPPasswordMgrWithDefaultRealm, orNone.| | set_proxies(self, proxies=None, proxy_bypass=None)|Configure proxy settings.| |proxies: dictionary mapping URL scheme to proxy specification. None| means use the default system-specific settings.|proxy_bypass: function taking hostname, returning whether proxy should| be used. None means use the default system-specific settings.| | The default is to try to obtain proxy settings fromthe system (see the| documentation for urllib.urlopen forinformation about the| system-specific methods used -- note that's urllib, not urllib2).| | To avoid all use of proxies, passan empty proxies dict.| | >>> ua =UserAgentBase()| >>> defproxy_bypass(hostname):| ... return hostname == "noproxy.com"| >>>ua.set_proxies(| ... {"http": "joe:password@myproxy.example.com:3128",| ... "ftp": "proxy.example.com"},|... proxy_bypass)| |set_proxy_password_manager(self, password_manager)| Set a mechanize.HTTPProxyPasswordMgr, orNone.| | ----------------------------------------------------------------------| Data and other attributes inherited frommechanize._useragent.UserAgentBase:| | default_others = ['_unknown', '_http_error', '_http_default_error']| | default_schemes = ['http', 'ftp', 'file', 'https']| | ----------------------------------------------------------------------| Methods inherited frommechanize._opener.OpenerDirector:| |add_handler(self, handler)| | error(self, proto, *args)| | retrieve(self, fullurl, filename=None, reporthook=None, data=None, timeout=<object object>, open=<built-in function open>)|Returns (filename, headers).| |For remote objects, the default filename will refer to a temporary|file. Temporary files are removed when the OpenerDirector.close()| method iscalled.| | For file: URLs, at present the returned filename isNone. This may| change infuture.| | If the actual number of bytes read isless than indicated by the| Content-Length header, raises ContentTooShortError (a URLError| subclass). The exception's .result attribute contains the (filename,|headers) that would have been returned.| | ----------------------------------------------------------------------| Data and other attributes inherited frommechanize._opener.OpenerDirector:| | BLOCK_SIZE = 8192
转载于:https://www.cnblogs.com/qwj-sysu/p/3892043.html
pyhton mechanize 学习笔记相关推荐
- Pyhton学习笔记第一天(Python基本语句)
Python学习笔记第一天 注释 行注释 块注释 输出语句 举一反三 标识符 举一反三 多行语句 结束语 注释 什么是注释,注释相当于备注的信息,也可以在调试代码的时候隐藏执行代码,但只适合新手.老手 ...
- 学习笔记(二)——CSS基础
文章目录 一.什么是CSS 二.CSS基本使用 2.1.行内式(内联样式) 2.2.内部样式 2.3.外部样式 2.3.1.嵌入式 2.3.2.导入式 三.选择器 3.1.基础选择器 3.1.1.标签 ...
- OpenCV之Python学习笔记(1)(2): 图像的载入、显示和保存 图像元素的访问、通道分离与合并
OpenCV之Python学习笔记 一直都在用Python+OpenCV做一些算法的原型.本来想留下发布一些文章的,可是整理一下就有点无奈了,都是写零散不成系统的小片段.现在看到一本国外的新书< ...
- Python学习笔记:Day 16 编写移动App
前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...
- Python学习笔记:Day15 部署Web App
前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...
- Python学习笔记:Day14 完成Web App
前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...
- Python学习笔记:Day13 提升开发效率
前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...
- Python学习笔记:Day 12 编写日志列表页
前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此, 写下这些 ...
- Python学习笔记:Day11 编写日志创建页
前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...
最新文章
- activeMQ 填坑记
- iOS 关于第三方键盘
- dict过滤 python_关于python:过滤dict以只包含某些键?
- 数据结构(python语言)目录链接
- MySql错误1045 Access denied for user 'root'@'localhost' (using password:YES)
- 零负债之人的10个习惯
- Terraform Module 可视化正式发布
- 井通swtc能不能涨到2元_买一支2块到3块之间的股票,买几十万股吗?这样操作效果怎么样?...
- 调试代码和解决问题的总体思路和 技术路线应该持有的心态
- ZT 80-90年代港台300部电视剧 你看过多少?
- PHP队列的实现,看完秒懂
- 文献检索与SCI、EI、ISTP
- 技能梳理7@stm32+OLED+flash掉电保存+按键
- 用华为手机拍照!要学会这4个功能,随手一拍都是单反大片
- Vue子传父详细教程
- SKLEARN实例:【泰坦尼克号生存者预测】
- 【开源项目推荐-ColugoMum】这群本科生基于国产深度学习框架PaddlePadddle开源了零售行业解决方案
- Mac系统打开命令行终端及查看操作系统版本号的方法
- 与判别网络对抗的生成网络 (Generative Adversarial Nets)
- OpenCV Error:(mtype == CV_8U || mtype == CV_8S) _mask.sameSize(*psrc1) in function ‘binary_op‘
热门文章
- 原神清理门户任务攻略
- “为硬件保留的内存”过大解决方法
- 培训管理,剃头担子还是豆腐脑担子
- unity中UI界面的一些动画实现总结
- rocksdb原理_RocksDB事务实现TransactionDB分析
- ARFoundation入门到精通 - 1.7 检测设备是否支持AR功能
- R和Rstudio的安装及R语言入门
- SpringBoot - @PreAuthorize注解详解
- 尤大都说Vue3 + script setup + TS + Volar真香,你说香不香?
- 挑选食物油时应注意什么?