
importmechanize#response = mechanize.urlopen("http://www.hao123.com/")
request = mechanize.Request("http://www.hao123.com/")
response=mechanize.urlopen(request)printresponse.geturl()printresponse.info()#print response.read()


>>> importmechanize>>>help(mechanize.urlretrieve)
Help on function urlretrieveinmodule mechanize._opener:urlretrieve(url, filename=None, reporthook=None, data=None, timeout=<object object>)

  • 参数 finename 指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据。)
  • 参数 reporthook 是一个回调函数,当连接上服务器、以及相应的数据块传输完毕时会触发该回调,我们可以利用这个回调函数来显示当前的下载进度。
  • 参数 data 指 post 到服务器的数据,该方法返回一个包含两个元素的(filename, headers)元组,filename 表示保存到本地的路径,header 表示服务器的响应头
  • 参数 timeout 是设定的超时对象



def cbk(a, b, c):printa,b,curl= 'http://www.hao123.com/'local= 'd://hao.html'mechanize.urlretrieve(url,local,cbk)


br =mechanize.Browser()
br['loginname'] = '**'自己注册一个账号密码就行了br['password'] = '**'r=br.submit()
print os.path.dirname(__file__)+'\login.html'h= file(os.path.dirname(__file__)+'\login.html',"w")



Help on class Browser inmodule mechanize._mechanize:classBrowser(mechanize._useragent.UserAgentBase)|  Browser-like class with support for history, forms andlinks.|  |  BrowserStateError is raised whenever the browser is inthe wrong state to|  complete the requested operation - e.g., when .back() iscalled when the|  browser history is empty, or when .follow_link() iscalled when the current|  response does notcontain HTML data.|  |Public attributes:|  |request: current request (mechanize.Request)|form: currently selected form (see .select_form())|  |Method resolution order:|Browser|mechanize._useragent.UserAgentBase|mechanize._opener.OpenerDirector|mechanize._urllib2_fork.OpenerDirector|  |Methods defined here:|  |  __getattr__(self, name)|  |  __init__(self, factory=None, history=None, request_class=None)|Only named arguments should be passed to this constructor.|      |factory: object implementing the mechanize.Factory interface.|history: object implementing the mechanize.History interface.  Note|       this interface is still experimental and may change infuture.|      request_class: Request classto use.  Defaults to mechanize.Request|      |      The Factory and History objects passed in are 'owned'by the Browser,|      so they should notbe shared across Browsers.  In particular,|      factory.set_response() should not be called exceptby the owning|Browser itself.|      |      Note that the supplied factory's request_class is overridden by this|      constructor, to ensure only one Request class isused.|  |  __str__(self)|  |  back(self, n=1)|      Go back n steps in history, and returnresponse object.|      |      n: go back this number of steps (default 1step)|  |clear_history(self)|  |  click(self, *args, **kwds)|      See mechanize.HTMLForm.click fordocumentation.|  |  click_link(self, link=None, **kwds)|      Find a link and return a Request object forit.|      |      Arguments are as for .find_link(), exceptthat a link may be supplied|as the first argument.|  |close(self)|  |encoding(self)|  |  find_link(self, **kwds)|      Find a link incurrent page.|      |Links are returned as mechanize.Link objects.|      |      #Return third link that .search()-matches the regexp "python"|      #(by ".search()-matches", I mean that the regular expression method|      #.search() is used, rather than .match()).|      find_link(text_regex=re.compile("python"), nr=2)|      |      #Return first http link in the current page that points to somewhere|      #on python.org whose link text (after tags have been removed) is|      #exactly "monty python".|      find_link(text="monty python",|                url_regex=re.compile("http.*python.org"))|      |      #Return first link with exactly three HTML attributes.|      find_link(predicate=lambda link: len(link.attrs) == 3)|      |      Links include anchors (<a>), image maps (<area>), and frames (<frame>,|      <iframe>).|      |      All arguments must be passed by keyword, not position.  Zero ormore|arguments may be supplied.  In order to find a link, all arguments|supplied must match.|      |      If a matching link is not found, mechanize.LinkNotFoundError israised.|      |      text: link text between link tags: e.g. <a href="blah">this bit</a>(as|returned by pullparser.get_compressed_text(), ie. without tags but|       with opening tags "textified"as per the pullparser docs) must compare|       equal to this argument, ifsupplied|text_regex: link text between tag (as defined above) must match the|       regular expression object orregular expression string passed as this|       argument, ifsupplied|      name, name_regex: as for text andtext_regex, but matched against the|name HTML attribute of the link tag|      url, url_regex: as for text andtext_regex, but matched against the|       URL of the link tag (note this matches against Link.url, which isa|       relative or absolute URL according to how it was written inthe HTML)|      tag: element name of opening tag, e.g. "a"|predicate: a function taking a Link object as its single argument,|returning a boolean result, indicating whether the links|nr: matches the nth link that matches all other criteria (default 0)|  |  follow_link(self, link=None, **kwds)|      Find a link and.open() it.|      |      Arguments are as for.click_link().|      |      Return value is same as forBrowser.open().|  |forms(self)|Return iterable over forms.|      |The returned form objects implement the mechanize.HTMLForm interface.|  |geturl(self)|Get URL of current document.|  |global_form(self)|      Return the global form object, or None ifthe factory implementation|      did notsupply one.|      |      The "global" form object contains all controls that are notdescendants|of any FORM element.|      |The returned form object implements the mechanize.HTMLForm interface.|      |      This is a separate method since the global form is notregarded as part|      of the sequence of forms in the document -- mostly for|      backwards-compatibility.|  |  links(self, **kwds)|Return iterable over links (mechanize.Link objects).|  |  open(self, url, data=None, timeout=<object object>)|  |open_local_file(self, filename)|  |  open_novisit(self, url, data=None, timeout=<object object>)|Open a URL without visiting it.|      |      Browser state (including request, response, history, forms andlinks)|      isleft unchanged by calling this function.|      |      The interface is the same as for.open().|      |      This is useful forthings like fetching images.|      |See also .retrieve().|  |reload(self)|      Reload current document, and returnresponse object.|  |response(self)|Return a copy of the current response.|      |The returned object has the same interface as the object returned by|      .open() (ormechanize.urlopen()).|  |  select_form(self, name=None, predicate=None, nr=None)|      Select an HTML form forinput.|      |      This is a bit like giving a form the "input focus" ina browser.|      |      If a form isselected, the Browser object supports the HTMLForm|      interface, so you can call methods like .set_value(), .set(), and|.click().|      |      Another way to select a form isto assign to the .form attribute.  The|form assigned should be one of the objects returned by the .forms()|method.|      |      At least one of the name, predicate andnr arguments must be supplied.|      If no matching form is found, mechanize.FormNotFoundError israised.|      |      If name isspecified, then the form must have the indicated name.|      |      If predicate isspecified, then the form must match that function.  The|      predicate function is passed the HTMLForm as its single argument, and|      should returna boolean value indicating whether the form matched.|      |      nr, if supplied, is the sequence number of the form (where 0 isthe|      first).  Note that control 0 isthe first form matching all the other|      arguments (if supplied); it is not necessarily the first control inthe|      form.  The "global form" (consisting of all form controls notcontained|      in any FORM element) is considered not to be part of this sequence and|      to have no name, so will not be matched unless both name andnr are|None.|  |set_cookie(self, cookie_string)|Request to set a cookie.|      |      Note that it isNOT necessary to call this method under ordinary|      circumstances: cookie handling isnormally entirely automatic.  The|      intended use case israther to simulate the setting of a cookie by|      client script ina web page (e.g. JavaScript).  In that case, use of|      this method is necessary because mechanize currently does notsupport|JavaScript, VBScript, etc.|      |      The cookie is added in the same way as ifit had arrived with the|current response, as a result of the current request.  This means that,|      for example, if it is notappropriate to set the cookie based on the|current request, no cookie will be set.|      |The cookie will be returned automatically with subsequent responses|      made by the Browser instance whenever that's appropriate.|      |      cookie_string should be a valid value of the Set-Cookie header.|      |For example:|      |browser.set_cookie(|          "sid=abcdef; expires=Wednesday, 09-Nov-06 23:12:40 GMT")|      |      Currently, this method does not allow for adding RFC 2986cookies.|      This limitation will be lifted ifanybody requests it.|  |set_handle_referer(self, handle)|Set whether to add Referer header to each request.|  |set_response(self, response)|Replace current response with (a copy of) response.|      |response may be None.|      |      This is intended mostly for HTML-preprocessing.|  |  submit(self, *args, **kwds)|Submit current form.|      |      Arguments are as formechanize.HTMLForm.click().|      |      Return value is same as forBrowser.open().|  |title(self)|      Return title, or None if there is no title element inthe document.|      |      Treatment of any tag children of attempts to follow Firefox andIE|(currently, tags are preserved).|  |viewing_html(self)|Return whether the current response contains HTML data.|  |  visit_response(self, response, request=None)|      Visit the response, as ifit had been .open()ed.|      |Unlike .set_response(), this updates history rather than replacing the|current response.|  |  ----------------------------------------------------------------------|  Data andother attributes defined here:|  |  default_features = ['_redirect', '_cookies', '_refresh', '_equiv', '_b...|  |  handler_classes = {'_basicauth': <classmechanize._urllib2_fork.HTTPBa...|  |  ----------------------------------------------------------------------|  Methods inherited frommechanize._useragent.UserAgentBase:|  |add_client_certificate(self, url, key_file, cert_file)|      Add an SSL client certificate, forHTTPS client auth.|      |      key_file and cert_file must be filenames of the key andcertificate|      files, inPEM format.  You can use e.g. OpenSSL to convert a p12 (PKCS|      12) file to PEM format:|      |      openssl pkcs12 -clcerts -nokeys -in cert.p12 -out cert.pem|      openssl pkcs12 -nocerts -in cert.p12 -out key.pem|      |      |      Note that client certificate password input isvery inflexible ATM.  At|      the moment this seems to be console only, which ispresumably the|default behaviour of libopenssl.  In future mechanize may support|      third-party libraries that (I assume) allow more options here.|  |  add_password(self, url, user, password, realm=None)|  |  add_proxy_password(self, user, password, hostport=None, realm=None)|  |set_client_cert_manager(self, cert_manager)|      Set a mechanize.HTTPClientCertMgr, orNone.|  |set_cookiejar(self, cookiejar)|      Set a mechanize.CookieJar, orNone.|  |set_debug_http(self, handle)|Print HTTP headers to sys.stdout.|  |set_debug_redirects(self, handle)|Log information about HTTP redirects (including refreshes).|      |      Logging is performed using module logging.  The logger name is|      "mechanize.http_redirects".  To actually printsome debug output,|eg:|      |      importsys, logging|      logger = logging.getLogger("mechanize.http_redirects")|logger.addHandler(logging.StreamHandler(sys.stdout))|logger.setLevel(logging.INFO)|      |Other logger names relevant to this module:|      |      "mechanize.http_responses"|      "mechanize.cookies"|      |To turn on everything:|      |      importsys, logging|      logger = logging.getLogger("mechanize")|logger.addHandler(logging.StreamHandler(sys.stdout))|logger.setLevel(logging.INFO)|  |set_debug_responses(self, handle)|Log HTTP response bodies.|      |      See docstring for .set_debug_redirects() fordetails of logging.|      |      Response objects may be .seek()able if this isset (currently returned|      responses are, raised HTTPError exception responses are not).|  |  set_handle_equiv(self, handle, head_parser_class=None)|      Set whether to treat HTML http-equiv headers like HTTP headers.|      |      Response objects may be .seek()able if this isset (currently returned|      responses are, raised HTTPError exception responses are not).|  |set_handle_gzip(self, handle)|Handle gzip transfer encoding.|  |set_handle_redirect(self, handle)|Set whether to handle HTTP 30x redirections.|  |  set_handle_refresh(self, handle, max_time=None, honor_time=True)|Set whether to handle HTTP Refresh headers.|  |set_handle_robots(self, handle)|      Set whether to observe rules fromrobots.txt.|  |set_handled_schemes(self, schemes)|Set sequence of URL scheme (protocol) strings.|      |      For example: ua.set_handled_schemes(["http", "ftp"])|      |      If this fails (with ValueError) because you've passed an unknown|      scheme, the set of handled schemes will notbe changed.|  |set_password_manager(self, password_manager)|      Set a mechanize.HTTPPasswordMgrWithDefaultRealm, orNone.|  |  set_proxies(self, proxies=None, proxy_bypass=None)|Configure proxy settings.|      |proxies: dictionary mapping URL scheme to proxy specification.  None|        means use the default system-specific settings.|proxy_bypass: function taking hostname, returning whether proxy should|        be used.  None means use the default system-specific settings.|      |      The default is to try to obtain proxy settings fromthe system (see the|      documentation for urllib.urlopen forinformation about the|      system-specific methods used -- note that's urllib, not urllib2).|      |      To avoid all use of proxies, passan empty proxies dict.|      |      >>> ua =UserAgentBase()|      >>> defproxy_bypass(hostname):|      ...     return hostname == "noproxy.com"|      >>>ua.set_proxies(|      ...     {"http": "joe:password@myproxy.example.com:3128",|      ...      "ftp": "proxy.example.com"},|...     proxy_bypass)|  |set_proxy_password_manager(self, password_manager)|      Set a mechanize.HTTPProxyPasswordMgr, orNone.|  |  ----------------------------------------------------------------------|  Data and other attributes inherited frommechanize._useragent.UserAgentBase:|  |  default_others = ['_unknown', '_http_error', '_http_default_error']|  |  default_schemes = ['http', 'ftp', 'file', 'https']|  |  ----------------------------------------------------------------------|  Methods inherited frommechanize._opener.OpenerDirector:|  |add_handler(self, handler)|  |  error(self, proto, *args)|  |  retrieve(self, fullurl, filename=None, reporthook=None, data=None, timeout=<object object>, open=<built-in function open>)|Returns (filename, headers).|      |For remote objects, the default filename will refer to a temporary|file.  Temporary files are removed when the OpenerDirector.close()|      method iscalled.|      |      For file: URLs, at present the returned filename isNone.  This may|      change infuture.|      |      If the actual number of bytes read isless than indicated by the|      Content-Length header, raises ContentTooShortError (a URLError|      subclass).  The exception's .result attribute contains the (filename,|headers) that would have been returned.|  |  ----------------------------------------------------------------------|  Data and other attributes inherited frommechanize._opener.OpenerDirector:|  |  BLOCK_SIZE = 8192


