1:简单的使用

importmechanize#response = mechanize.urlopen("http://www.hao123.com/")
request = mechanize.Request("http://www.hao123.com/")
response=mechanize.urlopen(request)printresponse.geturl()printresponse.info()#print response.read()

2:mechanize.urlretrieve

>>> importmechanize>>>help(mechanize.urlretrieve)
Help on function urlretrieveinmodule mechanize._opener:urlretrieve(url, filename=None, reporthook=None, data=None, timeout=<object object>)

  • 参数 finename 指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据。)
  • 参数 reporthook 是一个回调函数,当连接上服务器、以及相应的数据块传输完毕时会触发该回调,我们可以利用这个回调函数来显示当前的下载进度。
  • 参数 data 指 post 到服务器的数据,该方法返回一个包含两个元素的(filename, headers)元组,filename 表示保存到本地的路径,header 表示服务器的响应头
  • 参数 timeout 是设定的超时对象

reporthook(block_read,block_size,total_size)定义回调函数,block_size是每次读取的数据块的大小,block_read是每次读取的数据块个数,taotal_size是一一共读取的数据量,单位是byte。可以使用reporthook函数来显示读取进度。

简单的例子

def cbk(a, b, c):printa,b,curl= 'http://www.hao123.com/'local= 'd://hao.html'mechanize.urlretrieve(url,local,cbk)

3:form表单登陆

br =mechanize.Browser()
br.set_handle_robots(False)
br.open("http://www.zhaopin.com/")
br.select_form(nr=0)
br['loginname'] = '**'自己注册一个账号密码就行了br['password'] = '**'r=br.submit()
print os.path.dirname(__file__)+'\login.html'h= file(os.path.dirname(__file__)+'\login.html',"w")
rt=r.read()
h.write(rt)
h.close()

4:Browser

看完help的文档基本可以成神了

Help on class Browser inmodule mechanize._mechanize:classBrowser(mechanize._useragent.UserAgentBase)|  Browser-like class with support for history, forms andlinks.|  |  BrowserStateError is raised whenever the browser is inthe wrong state to|  complete the requested operation - e.g., when .back() iscalled when the|  browser history is empty, or when .follow_link() iscalled when the current|  response does notcontain HTML data.|  |Public attributes:|  |request: current request (mechanize.Request)|form: currently selected form (see .select_form())|  |Method resolution order:|Browser|mechanize._useragent.UserAgentBase|mechanize._opener.OpenerDirector|mechanize._urllib2_fork.OpenerDirector|  |Methods defined here:|  |  __getattr__(self, name)|  |  __init__(self, factory=None, history=None, request_class=None)|Only named arguments should be passed to this constructor.|      |factory: object implementing the mechanize.Factory interface.|history: object implementing the mechanize.History interface.  Note|       this interface is still experimental and may change infuture.|      request_class: Request classto use.  Defaults to mechanize.Request|      |      The Factory and History objects passed in are 'owned'by the Browser,|      so they should notbe shared across Browsers.  In particular,|      factory.set_response() should not be called exceptby the owning|Browser itself.|      |      Note that the supplied factory's request_class is overridden by this|      constructor, to ensure only one Request class isused.|  |  __str__(self)|  |  back(self, n=1)|      Go back n steps in history, and returnresponse object.|      |      n: go back this number of steps (default 1step)|  |clear_history(self)|  |  click(self, *args, **kwds)|      See mechanize.HTMLForm.click fordocumentation.|  |  click_link(self, link=None, **kwds)|      Find a link and return a Request object forit.|      |      Arguments are as for .find_link(), exceptthat a link may be supplied|as the first argument.|  |close(self)|  |encoding(self)|  |  find_link(self, **kwds)|      Find a link incurrent page.|      |Links are returned as mechanize.Link objects.|      |      #Return third link that .search()-matches the regexp "python"|      #(by ".search()-matches", I mean that the regular expression method|      #.search() is used, rather than .match()).|      find_link(text_regex=re.compile("python"), nr=2)|      |      #Return first http link in the current page that points to somewhere|      #on python.org whose link text (after tags have been removed) is|      #exactly "monty python".|      find_link(text="monty python",|                url_regex=re.compile("http.*python.org"))|      |      #Return first link with exactly three HTML attributes.|      find_link(predicate=lambda link: len(link.attrs) == 3)|      |      Links include anchors (<a>), image maps (<area>), and frames (<frame>,|      <iframe>).|      |      All arguments must be passed by keyword, not position.  Zero ormore|arguments may be supplied.  In order to find a link, all arguments|supplied must match.|      |      If a matching link is not found, mechanize.LinkNotFoundError israised.|      |      text: link text between link tags: e.g. <a href="blah">this bit</a>(as|returned by pullparser.get_compressed_text(), ie. without tags but|       with opening tags "textified"as per the pullparser docs) must compare|       equal to this argument, ifsupplied|text_regex: link text between tag (as defined above) must match the|       regular expression object orregular expression string passed as this|       argument, ifsupplied|      name, name_regex: as for text andtext_regex, but matched against the|name HTML attribute of the link tag|      url, url_regex: as for text andtext_regex, but matched against the|       URL of the link tag (note this matches against Link.url, which isa|       relative or absolute URL according to how it was written inthe HTML)|      tag: element name of opening tag, e.g. "a"|predicate: a function taking a Link object as its single argument,|returning a boolean result, indicating whether the links|nr: matches the nth link that matches all other criteria (default 0)|  |  follow_link(self, link=None, **kwds)|      Find a link and.open() it.|      |      Arguments are as for.click_link().|      |      Return value is same as forBrowser.open().|  |forms(self)|Return iterable over forms.|      |The returned form objects implement the mechanize.HTMLForm interface.|  |geturl(self)|Get URL of current document.|  |global_form(self)|      Return the global form object, or None ifthe factory implementation|      did notsupply one.|      |      The "global" form object contains all controls that are notdescendants|of any FORM element.|      |The returned form object implements the mechanize.HTMLForm interface.|      |      This is a separate method since the global form is notregarded as part|      of the sequence of forms in the document -- mostly for|      backwards-compatibility.|  |  links(self, **kwds)|Return iterable over links (mechanize.Link objects).|  |  open(self, url, data=None, timeout=<object object>)|  |open_local_file(self, filename)|  |  open_novisit(self, url, data=None, timeout=<object object>)|Open a URL without visiting it.|      |      Browser state (including request, response, history, forms andlinks)|      isleft unchanged by calling this function.|      |      The interface is the same as for.open().|      |      This is useful forthings like fetching images.|      |See also .retrieve().|  |reload(self)|      Reload current document, and returnresponse object.|  |response(self)|Return a copy of the current response.|      |The returned object has the same interface as the object returned by|      .open() (ormechanize.urlopen()).|  |  select_form(self, name=None, predicate=None, nr=None)|      Select an HTML form forinput.|      |      This is a bit like giving a form the "input focus" ina browser.|      |      If a form isselected, the Browser object supports the HTMLForm|      interface, so you can call methods like .set_value(), .set(), and|.click().|      |      Another way to select a form isto assign to the .form attribute.  The|form assigned should be one of the objects returned by the .forms()|method.|      |      At least one of the name, predicate andnr arguments must be supplied.|      If no matching form is found, mechanize.FormNotFoundError israised.|      |      If name isspecified, then the form must have the indicated name.|      |      If predicate isspecified, then the form must match that function.  The|      predicate function is passed the HTMLForm as its single argument, and|      should returna boolean value indicating whether the form matched.|      |      nr, if supplied, is the sequence number of the form (where 0 isthe|      first).  Note that control 0 isthe first form matching all the other|      arguments (if supplied); it is not necessarily the first control inthe|      form.  The "global form" (consisting of all form controls notcontained|      in any FORM element) is considered not to be part of this sequence and|      to have no name, so will not be matched unless both name andnr are|None.|  |set_cookie(self, cookie_string)|Request to set a cookie.|      |      Note that it isNOT necessary to call this method under ordinary|      circumstances: cookie handling isnormally entirely automatic.  The|      intended use case israther to simulate the setting of a cookie by|      client script ina web page (e.g. JavaScript).  In that case, use of|      this method is necessary because mechanize currently does notsupport|JavaScript, VBScript, etc.|      |      The cookie is added in the same way as ifit had arrived with the|current response, as a result of the current request.  This means that,|      for example, if it is notappropriate to set the cookie based on the|current request, no cookie will be set.|      |The cookie will be returned automatically with subsequent responses|      made by the Browser instance whenever that's appropriate.|      |      cookie_string should be a valid value of the Set-Cookie header.|      |For example:|      |browser.set_cookie(|          "sid=abcdef; expires=Wednesday, 09-Nov-06 23:12:40 GMT")|      |      Currently, this method does not allow for adding RFC 2986cookies.|      This limitation will be lifted ifanybody requests it.|  |set_handle_referer(self, handle)|Set whether to add Referer header to each request.|  |set_response(self, response)|Replace current response with (a copy of) response.|      |response may be None.|      |      This is intended mostly for HTML-preprocessing.|  |  submit(self, *args, **kwds)|Submit current form.|      |      Arguments are as formechanize.HTMLForm.click().|      |      Return value is same as forBrowser.open().|  |title(self)|      Return title, or None if there is no title element inthe document.|      |      Treatment of any tag children of attempts to follow Firefox andIE|(currently, tags are preserved).|  |viewing_html(self)|Return whether the current response contains HTML data.|  |  visit_response(self, response, request=None)|      Visit the response, as ifit had been .open()ed.|      |Unlike .set_response(), this updates history rather than replacing the|current response.|  |  ----------------------------------------------------------------------|  Data andother attributes defined here:|  |  default_features = ['_redirect', '_cookies', '_refresh', '_equiv', '_b...|  |  handler_classes = {'_basicauth': <classmechanize._urllib2_fork.HTTPBa...|  |  ----------------------------------------------------------------------|  Methods inherited frommechanize._useragent.UserAgentBase:|  |add_client_certificate(self, url, key_file, cert_file)|      Add an SSL client certificate, forHTTPS client auth.|      |      key_file and cert_file must be filenames of the key andcertificate|      files, inPEM format.  You can use e.g. OpenSSL to convert a p12 (PKCS|      12) file to PEM format:|      |      openssl pkcs12 -clcerts -nokeys -in cert.p12 -out cert.pem|      openssl pkcs12 -nocerts -in cert.p12 -out key.pem|      |      |      Note that client certificate password input isvery inflexible ATM.  At|      the moment this seems to be console only, which ispresumably the|default behaviour of libopenssl.  In future mechanize may support|      third-party libraries that (I assume) allow more options here.|  |  add_password(self, url, user, password, realm=None)|  |  add_proxy_password(self, user, password, hostport=None, realm=None)|  |set_client_cert_manager(self, cert_manager)|      Set a mechanize.HTTPClientCertMgr, orNone.|  |set_cookiejar(self, cookiejar)|      Set a mechanize.CookieJar, orNone.|  |set_debug_http(self, handle)|Print HTTP headers to sys.stdout.|  |set_debug_redirects(self, handle)|Log information about HTTP redirects (including refreshes).|      |      Logging is performed using module logging.  The logger name is|      "mechanize.http_redirects".  To actually printsome debug output,|eg:|      |      importsys, logging|      logger = logging.getLogger("mechanize.http_redirects")|logger.addHandler(logging.StreamHandler(sys.stdout))|logger.setLevel(logging.INFO)|      |Other logger names relevant to this module:|      |      "mechanize.http_responses"|      "mechanize.cookies"|      |To turn on everything:|      |      importsys, logging|      logger = logging.getLogger("mechanize")|logger.addHandler(logging.StreamHandler(sys.stdout))|logger.setLevel(logging.INFO)|  |set_debug_responses(self, handle)|Log HTTP response bodies.|      |      See docstring for .set_debug_redirects() fordetails of logging.|      |      Response objects may be .seek()able if this isset (currently returned|      responses are, raised HTTPError exception responses are not).|  |  set_handle_equiv(self, handle, head_parser_class=None)|      Set whether to treat HTML http-equiv headers like HTTP headers.|      |      Response objects may be .seek()able if this isset (currently returned|      responses are, raised HTTPError exception responses are not).|  |set_handle_gzip(self, handle)|Handle gzip transfer encoding.|  |set_handle_redirect(self, handle)|Set whether to handle HTTP 30x redirections.|  |  set_handle_refresh(self, handle, max_time=None, honor_time=True)|Set whether to handle HTTP Refresh headers.|  |set_handle_robots(self, handle)|      Set whether to observe rules fromrobots.txt.|  |set_handled_schemes(self, schemes)|Set sequence of URL scheme (protocol) strings.|      |      For example: ua.set_handled_schemes(["http", "ftp"])|      |      If this fails (with ValueError) because you've passed an unknown|      scheme, the set of handled schemes will notbe changed.|  |set_password_manager(self, password_manager)|      Set a mechanize.HTTPPasswordMgrWithDefaultRealm, orNone.|  |  set_proxies(self, proxies=None, proxy_bypass=None)|Configure proxy settings.|      |proxies: dictionary mapping URL scheme to proxy specification.  None|        means use the default system-specific settings.|proxy_bypass: function taking hostname, returning whether proxy should|        be used.  None means use the default system-specific settings.|      |      The default is to try to obtain proxy settings fromthe system (see the|      documentation for urllib.urlopen forinformation about the|      system-specific methods used -- note that's urllib, not urllib2).|      |      To avoid all use of proxies, passan empty proxies dict.|      |      >>> ua =UserAgentBase()|      >>> defproxy_bypass(hostname):|      ...     return hostname == "noproxy.com"|      >>>ua.set_proxies(|      ...     {"http": "joe:password@myproxy.example.com:3128",|      ...      "ftp": "proxy.example.com"},|...     proxy_bypass)|  |set_proxy_password_manager(self, password_manager)|      Set a mechanize.HTTPProxyPasswordMgr, orNone.|  |  ----------------------------------------------------------------------|  Data and other attributes inherited frommechanize._useragent.UserAgentBase:|  |  default_others = ['_unknown', '_http_error', '_http_default_error']|  |  default_schemes = ['http', 'ftp', 'file', 'https']|  |  ----------------------------------------------------------------------|  Methods inherited frommechanize._opener.OpenerDirector:|  |add_handler(self, handler)|  |  error(self, proto, *args)|  |  retrieve(self, fullurl, filename=None, reporthook=None, data=None, timeout=<object object>, open=<built-in function open>)|Returns (filename, headers).|      |For remote objects, the default filename will refer to a temporary|file.  Temporary files are removed when the OpenerDirector.close()|      method iscalled.|      |      For file: URLs, at present the returned filename isNone.  This may|      change infuture.|      |      If the actual number of bytes read isless than indicated by the|      Content-Length header, raises ContentTooShortError (a URLError|      subclass).  The exception's .result attribute contains the (filename,|headers) that would have been returned.|  |  ----------------------------------------------------------------------|  Data and other attributes inherited frommechanize._opener.OpenerDirector:|  |  BLOCK_SIZE = 8192

转载于:https://www.cnblogs.com/qwj-sysu/p/3892043.html

pyhton mechanize 学习笔记相关推荐

  1. Pyhton学习笔记第一天(Python基本语句)

    Python学习笔记第一天 注释 行注释 块注释 输出语句 举一反三 标识符 举一反三 多行语句 结束语 注释 什么是注释,注释相当于备注的信息,也可以在调试代码的时候隐藏执行代码,但只适合新手.老手 ...

  2. 学习笔记(二)——CSS基础

    文章目录 一.什么是CSS 二.CSS基本使用 2.1.行内式(内联样式) 2.2.内部样式 2.3.外部样式 2.3.1.嵌入式 2.3.2.导入式 三.选择器 3.1.基础选择器 3.1.1.标签 ...

  3. OpenCV之Python学习笔记(1)(2): 图像的载入、显示和保存 图像元素的访问、通道分离与合并

    OpenCV之Python学习笔记 一直都在用Python+OpenCV做一些算法的原型.本来想留下发布一些文章的,可是整理一下就有点无奈了,都是写零散不成系统的小片段.现在看到一本国外的新书< ...

  4. Python学习笔记:Day 16 编写移动App

    前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...

  5. Python学习笔记:Day15 部署Web App

    前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...

  6. Python学习笔记:Day14 完成Web App

    前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...

  7. Python学习笔记:Day13 提升开发效率

    前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...

  8. Python学习笔记:Day 12 编写日志列表页

    前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此, 写下这些 ...

  9. Python学习笔记:Day11 编写日志创建页

    前言 最近在学习深度学习,已经跑出了几个模型,但Pyhton的基础不够扎实,因此,开始补习Python了,大家都推荐廖雪峰的课程,因此,开始了学习,但光学有没有用,还要和大家讨论一下,因此,写下这些帖 ...

最新文章

  1. activeMQ 填坑记
  2. iOS 关于第三方键盘
  3. dict过滤 python_关于python:过滤dict以只包含某些键?
  4. 数据结构(python语言)目录链接
  5. MySql错误1045 Access denied for user 'root'@'localhost' (using password:YES)
  6. 零负债之人的10个习惯
  7. Terraform Module 可视化正式发布
  8. 井通swtc能不能涨到2元_买一支2块到3块之间的股票,买几十万股吗?这样操作效果怎么样?...
  9. 调试代码和解决问题的总体思路和 技术路线应该持有的心态
  10. ZT 80-90年代港台300部电视剧 你看过多少?
  11. PHP队列的实现,看完秒懂
  12. 文献检索与SCI、EI、ISTP
  13. 技能梳理7@stm32+OLED+flash掉电保存+按键
  14. 用华为手机拍照!要学会这4个功能,随手一拍都是单反大片
  15. Vue子传父详细教程
  16. SKLEARN实例:【泰坦尼克号生存者预测】
  17. 【开源项目推荐-ColugoMum】这群本科生基于国产深度学习框架PaddlePadddle开源了零售行业解决方案
  18. Mac系统打开命令行终端及查看操作系统版本号的方法
  19. 与判别网络对抗的生成网络 (Generative Adversarial Nets)
  20. OpenCV Error:(mtype == CV_8U || mtype == CV_8S) _mask.sameSize(*psrc1) in function ‘binary_op‘

热门文章

  1. 原神清理门户任务攻略
  2. “为硬件保留的内存”过大解决方法
  3. 培训管理,剃头担子还是豆腐脑担子
  4. unity中UI界面的一些动画实现总结
  5. rocksdb原理_RocksDB事务实现TransactionDB分析
  6. ARFoundation入门到精通 - 1.7 检测设备是否支持AR功能
  7. R和Rstudio的安装及R语言入门
  8. SpringBoot - @PreAuthorize注解详解
  9. 尤大都说Vue3 + script setup + TS + Volar真香,你说香不香?
  10. 挑选食物油时应注意什么?