首次使用requests库抓取代码

1.抓取教务处主界面，存于一个txt文档中.

import requests
file_path = r"E:\教务处.txt"
try:kv = {'user-agent':'Mozilla/5.0'}r = requests.get("http://jwch.sdut.edu.cn/", headers=kv)r.raise_for_status()r.encoding = r.apparent_encodingwith open(file_path, 'w') as file_obj:file_obj.write(r.text)
except:print("爬取失败")

2.百度搜索关键字。

import requests
try:kv = {'wd':'Python'}r = requests.get("http://www.baidu.com/s", params=kv)#百度的关键词接口：http://www.baidu.com/s?wd=keywordprint(r.request.url)r.raise_for_status()print(len(r.text))
except:print("爬取失败")

3.抓取图片。

import requests
import osurl = "http://img1001.pocoimg.cn/image/poco/works/36/2018/0307/21/15204284272111499_46378737_H1920.jpg"
root = 'E://pics//'
image_path = root + url.split('/')[-1]
try:r = requests.get(url)r.raise_for_status()if not os.path.exists(root):os.mkdir(root)if not os.path.exists(image_path):r = requests.get(url)with open(image_path, 'wb') as file_obj:file_obj.write(r.content)print('图片保留成功')
except:print("爬取失败")

4. ip138 爬取

import requestsurl = "http://m.ip138.com/ip.asp?ip="#ip138 查询接口
ip = '202.204.80.112'
try:r = requests.get(url + ip)r.raise_for_status()r.encoding= r.apparent_encodingprint(r.text[-500:])print("爬取成功！")
except:print("爬取失败！")

5.抓取中国大学排名

from bs4 import BeautifulSoup
import requests
import bs4
kv = {"user-agent":"Mozilla/5.0"}
def getHTMLText(url):try:r = requests.get(url, headers = kv, timeout = 30)r.raise_for_status()r.encoding = r.apparent_encodingreturn r.textexcept:print("getHTMLText fail")return ""def fillUnivList(ulist, html):soup = BeautifulSoup(html, "html.parser")for tr in soup.find('tbody').children:if isinstance(tr, bs4.element.Tag):tds = tr('td')ulist.append([tds[0].string, tds[1].string, tds[3].string])def printUnivList(ulist, num):prmod = "{0:^10}\t {1:{3}^10}\t {2:{3}^10}\t"print(prmod.format("排名","学校", "总分", chr(12288)))for i in range(num):print(prmod.format(ulist[i][0],ulist[i][1], ulist[i][2], chr(12288)))
def main():uinfo = []url = "http://www.zuihaodaxue.com/zuihaodaxuepaiming2016.html"html = getHTMLText(url)fillUnivList(uinfo, html)printUnivList(uinfo, 20)
main()

6. 淘宝抓取商品信息

import re
import requestsdef getHtml(url):try:kv = {"ueser-agent":"Mozalli/5.0"}r = requests.get(url, timeout = 30, headers = kv)r.encoding = r.apparent_encodingr.raise_for_status()return r.textexcept:print("getHtml faild.")return ""def parserHtml(html, good_list):regename = r'"raw_title":".*?"'regexprice = r'"view_price":"[\d.]*"'regexn = re.compile(regename)regexp = re.compile(regexprice)names = regexn.findall(html)prices = regexp.findall(html)for i in range(len(names)):name = eval(names[i].split(":")[1])price = eval(prices[i].split(":")[1])good_list.append([name, price])def display(good_list):print_mode = "{0:{3}<4}\t{1:{3}<16}\t {2:{3}<8}\t"cnt = 1for i in range(len(good_list)):print(print_mode.format(cnt, good_list[i][1], good_list[i][0], chr(12288)))cnt += 1def main():name = input("输入货物名：")raw_url = "https://s.taobao.com/search?q=" + namebase = 44num = input("输入查询深度：")num = int(num)cnt = 1good_list = []print_mode = "{0:{3}<4}\t{1:{3}<16}\t {2:{3}<8}\t"print(print_mode.format("序号", "价格", "商品名", chr(12288)))for i in range(num):try:html = getHtml(raw_url + '&s=' + str(num * i))parserHtml(html, good_list)except:continuegood_list.sort(key = lambda a: float(a[1]))display(good_list)
main()

7.爬去股票信息

import re
import  requests
from bs4 import BeautifulSoup
urllist = "http://quote.eastmoney.com/stocklist.html"
urlbaidu = "https://gupiao.baidu.com/stock/"
def getHtml(url):kv = {"user-agent":"Mozilla/5.0"}try:r = requests.get(url, headers=kv)r.raise_for_status()r.encoding = r.apparent_encodingreturn  r.textexcept:return ''def getStockList():html = getHtml(urllist)soup = BeautifulSoup(html, "html.parser")tmp = soup.find('div', attrs={'class':'qox'})tagA = tmp.find_all('div', attrs={'class':'quotebody'})tagA = tmp.find_all('a')regex = r'[s][hz]\d{6}'regex = re.compile(regex)stockid = []for a in tagA:try:href = a.attrs['href']sid = regex.findall(href)[0]stockid.append(sid)except:continuereturn stockiddef getinfoDict():stockid = getStockList()stockList =[]for id in stockid:try:infoDict = {}url = urlbaidu + id + '.html'html = getHtml(url)if html == '':continuesoup = BeautifulSoup(html, 'html.parser')tables = soup.find('div', attrs={'class': 'stock-bets'})name = tables.find(attrs={'class': 'bets-name'}).text.split()[0]infoDict.update({"股票名称": name})print("股票名称：%s %s" % (infoDict["股票名称"], id))div = tables.find('div', attrs={'class': 'bets-content'})dts = div.find_all('dt')dds = div.find_all('dd')for i in range(len(dts)):print(dts[i].string + ':' + dds[i].string)infoDict[dts[i].string] = dds[i].stringstockList.append(infoDict)except:continuereturn stockListgetinfoDict()

首次使用requests库抓取代码相关推荐

利用requests库抓取猫眼电影排行
文章目录 1.抓取目标 2.准备工作 3.抓取分析 4.抓取首页 5.正则提取 6.写入文件 7.整合代码 8.分页爬取 9.运行结果 10.本节代码最近刚开始了解爬虫,学习了一下基本库的使用.跟着 ...
python 利用requests库抓取网站图片
截图放在下方: 我们来看下我们要的图片都在哪框起来这些图就是我要的,数量多的不得了,看来这个网站积累了很久了,现在我们要用5分钟时间来拿到所有图片接下来让我们看下源代码来解析一下这些图片的地址吧. ...
通过python requests第三方库抓取淘宝商品名称和信息价格
项目名称:淘宝爬虫之抓取商品标题和价格信息任务背景: 公司要求提取各电商平台的咖啡机的价格信息,在淘宝开放平台找不到合适的API..获取价格就是为了产品定价,和将来打价格战. 实现用到的库:requ ...
用bs4和requests库，抓取nga舰队Collection萌战玩家投票
import requests from bs4 import BeautifulSoupurl = 'http://bbs.ngacn.cc/read.php?tid=13428951' req = ...
用python爬取qq空间内容_利用Fiddler抓包和py的requests库爬取QQ空间说说内容并写入文件...
[Python] 纯文本查看复制代码#!C:\Program Files\Python36 python # -*- coding: UTF-8 -*- """ @au ...
python爬虫基础-requests库
python爬虫基础-requests库 python爬虫 1.什么是爬虫? 通过编写程序,模拟浏览器上网,然后让其去互联网上抓取数据的过程. 注意:浏览器抓取的数据对应的页面是一个完整的页面. 为什 ...
requests库的安装
本文是基于中国大学MOOC教程中<Python网络爬虫与信息提取> 做的学习笔记,笔者在这里做一个分享 Request 库是python的第三方库,它也是目前公认的爬取网页最好的第三方 ...
python accept解析_python中requests库使用方法详解
一.什么是Requests Requests 是⽤Python语⾔编写,基于urllib,采⽤Apache2 Licensed开源协议的 HTTP 库.它⽐ urllib 更加⽅便,可以节约我们⼤量的 ...
python编写请求参数带文件_python requests 库请求带有文件参数的接口实例
有些接口参数是一个文件格式,比如fiddler 抓包参数如下显示这个接口的 form-data fiddler 显示的和不带文件参数的接口有明显区别,显示的不是简单的键值对,所以我们也不能只通过 d ...

首次使用requests库抓取代码

首次使用requests库抓取代码相关推荐

最新文章

热门文章