0. 参考




1. rfc1738

2.1. The main parts of URLsA full BNF description of the URL syntax is given in Section 5.In general, URLs are written as follows:<scheme>:<scheme-specific-part>A URL contains the name of the scheme being used (<scheme>) followedby a colon and then a string (the <scheme-specific-part>) whoseinterpretation depends on the scheme.Scheme names consist of a sequence of characters. The lower caseletters "a"--"z", digits, and the characters plus ("+"), period("."), and hyphen ("-") are allowed. For resiliency, programsinterpreting URLs should treat upper case letters as equivalent tolower case in scheme names (e.g., allow "HTTP" as well as "http").


2. python2


 1 >>> import urllib
 2 >>> url = 'http://web page.com'
 3 >>> url_en = urllib.quote(url)    #空格编码为“%20”
 4 >>> url_plus = urllib.quote_plus(url)    #空格编码为“+”
 5 >>> url_en_twice = urllib.quote(url_en)
 6 >>> url
 7 'http://web page.com'
 8 >>> url_en
 9 'http%3A//web%20page.com'
10 >>> url_plus
11 'http%3A%2F%2Fweb+page.com'
12 >>> url_en_twice
13 'http%253A//web%2520page.com'    #出现%25说明是二次编码
14 #相应解码
15 >>> urllib.unquote(url_en)
16 'http://web page.com'
17 >>> urllib.unquote_plus(url_plus)
18 'http://web page.com'

2.2 URL含有中文

1 >>> import urllib
2 >>> url_zh = u'http://movie.douban.com/tag/美国'
3 >>> url_zh_en = urllib.quote(url_zh.encode('utf-8'))    #参数为string
4 >>> url_zh_en
5 'http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD'
6 >>> print urllib.unquote(url_zh_en).decode('utf-8')
7 http://movie.douban.com/tag/美国

3. python3


 1 >>> import urllib
 2 >>> url = 'http://web page.com'
 3 >>> url_en = urllib.parse.quote(url)    #注意是urllib.parse.quote
 4 >>> url_plus = urllib.parse.quote_plus(url)
 5 >>> url_en
 6 'http%3A//web%20page.com'
 7 >>> url_plus
 8 'http%3A%2F%2Fweb+page.com'
 9 >>> urllib.parse.unquote(url_en)
10 'http://web page.com'
11 >>> urllib.parse.unquote_plus(url_plus)
12 'http://web page.com'

3.2 URl含中文

1 >>> import urllib
2 >>> url_zh = 'http://movie.douban.com/tag/美国'
3 >>> url_zh_en = urllib.parse.quote(url_zh)
4 >>> url_zh_en
5 'http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD'
6 >>> urllib.parse.unquote(url_zh_en)
7 'http://movie.douban.com/tag/美国'

4. 其他

 1 >>> help(urllib.urlencode)
 2 Help on function urlencode in module urllib:
 4 urlencode(query, doseq=0)
 5     Encode a sequence of two-element tuples or dictionary into a URL query string.
 7     If any values in the query arg are sequences and doseq is true, each
 8     sequence element is converted to a separate parameter.
10     If the query arg is a sequence of two-element tuples, the order of the
11     parameters in the output will match the order of parameters in the
12     input.
14 >>>



