我们看一下效果

我实现的功能是

1、爬虫热门城市之间航班信息的查询

2、 存储到mysql数据库中;

当然你也可以爬取所有城市之间的航班信息,我会提供完整的city代码给你;只是我需要实现的是自动功能;


我们要学会怎么爬虫,而不是复制代码,然后改不出来就一直停滞那里,然后骂别人写的垃圾;

我们先看一下携程网的信息

从以上我们可以获取两个信息

1、我是从上海到成都的

2、7条航班信息

教你如何爬虫
 我不教静态网页的爬虫,因为太简单了;我们直接上手携程网

我们到这个上述这个页面以后,我推荐用谷歌浏览器,方便;

1、谷歌浏览器

2、携程网显示机票信息页面

3、f12

4、然后我们刷新页面,network下就有很多的文件了

选择XHR,然后出现以下几条信息

点击其中一条,然后点击右侧的preview

我们可以看到里面有数据,不一定是第一条,我们查看里面的数据有没有我们要的信息

举个例子,其中的pageData,我们点击,然后发现previewer里面的数据很少,根本没有一条是我们需要的;所以肯定不是pageData这个文件

其实是第一条product

现在已经有眉目了,我们需要这个product文件的信息,点击Headers

从上面我们可以知道我们爬虫的地址是url = 'http://flights.ctrip.com/itinerary/api/12808/products'

这个请求是post请求;那么我们需要发送什么样的数据才会有这样的结果呢?

废话不多说,直接贴上代码

记住,不要频繁的爬取,会被网站设置填写校验码才能搜素的;然后你就会爬不出来任何东西;

如果你已经这样了,那么要么换一个无线网,要么换一个电脑,他们的Ip不一样就可以了;

我的表结构是这样的

因为我后续会对这个表进行处理,所以用了Django框架的表生成

from django.db import models
# Create your models here.
class airport(models.Model):id = models.AutoField(primary_key=True)   #是否设置为主键fcity = models.CharField(max_length=32)#开始城市tcity = models.CharField(max_length=32)#目的城市date = models.CharField(max_length=32)#日期airlineName = models.CharField(max_length=32)#航空公司flightNumber = models.CharField(max_length=32)#航空公司编码airportName = models.CharField(max_length=32)#机场名称departureDate = models.CharField(max_length=32)#出发时间arrivalDate = models.CharField(max_length=32)#结束时间punctualityRate = models.CharField(max_length=32)#优惠率jprice = models.CharField(max_length=32)  #经济舱价格fprice = models.CharField(max_length=32)  # 公务舱价格def __str__(self):return "<airport:{fcity=%s,tcity=%s,date=%s,airlineName=%,flightNumber=%,airportName=%,departureDate=%,arrivalDate=%,punctualityRate=%,jprice=%,fprice=%}>"\%(self.id,self.fcity,self.tcity,self.date,self.airlineName,self.flightNumber,self.airportName,self.departureDate,self.arrivalDate,self.punctualityRate,self.jprice,self.fprice)
from prettytable import PrettyTable
import requests
import json
import pymysql
import time
from operator import itemgetter
class FLIGHT(object):def __init__(self):self.url = 'http://flights.ctrip.com/itinerary/api/12808/products'self.headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0","Content-Type": "application/json",  # 声明文本类型为 json 格式"referer": r"https://flights.ctrip.com/itinerary/oneway/SHA-TAO?date=2020-04-11"}self. city = {"BJS": "北京","SHA": "上海", "CAN": "广州","SZX": "深圳","CTU": "成都","HGH": "杭州", "WUH": "武汉","SIA": "西安","CKG": "重庆","TAO": "青岛","CSX": "长沙","NKG": "南京","XMN": "厦门","KMG": "昆明","DLC": "大连","TSN": "天津","CGO": "郑州","SYX": "三亚","TNA": "济南","FOC": "福州"}def insert(self,value):db = pymysql.connect("localhost", "root", "123456", "python")cursor = db.cursor()sql = "INSERT INTO airport_airport(fcity,tcity,date,airlineName,flightNumber,airportName,departureDate,arrivalDate,punctualityRate,jprice,fprice) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"try:cursor.execute(sql, value)db.commit()print('插入数据成功')except:db.rollback()print("插入数据失败")db.close()def xiecheng(self,date):##循环遍历for fromcode, fromcity in sorted(self.city.items(), key=itemgetter(0)):for tocode, tocity in sorted(self.city.items(), key=itemgetter(0)):if fromcode != tocode:request_payload = {"flightWay": "Oneway","classType": "ALL","hasChild": 'false',"hasBaby": 'false',"searchIndex": 1,"airportParams": [{"dcity":fromcode, "acity":tocode,"dcityname":fromcity ,"acityname":tocity ,"date": date}]};time.sleep(20)print(fromcode,fromcity,tocode,tocity,date)self.findall(request_payload,fromcity,tocity,date)def findall(self,request_payload,fromcity,tocity,date):# 这里传进去的参数必须为 json 格式response = requests.post(self.url, data=json.dumps(request_payload), headers=self.headers).textrouteList = json.loads(response).get('data').get('routeList')#循环这个数据集合print("接受的数据集合:",routeList)if(routeList!='' and routeList!=None):for route in routeList:if len(route.get('legs')) == 1:print("开始存储数据库....")legs = route.get('legs')[0]flight = legs.get('flight')airlineName= flight.get('airlineName')# companyflightNumber= flight.get('flightNumber')# companyidairportName=flight.get('departureAirportInfo').get('airportName')#portnamedepartureDate=flight.get('departureDate')#starttimearrivalDate=flight.get('arrivalDate')#endtimepunctualityRate= flight.get('punctualityRate')#ratelowestPrice = legs.get('characteristic').get('lowestPrice')  # 经济舱价格fprice = legs.get('characteristic').get('lowestCfPrice')#公务舱价格#存储到数据库中value=[fromcity,tocity,date,airlineName,flightNumber,airportName,departureDate,arrivalDate,punctualityRate,fprice,lowestPrice]print("路线:",fromcity,tocity,date,airlineName,flightNumber,airportName,departureDate,arrivalDate,punctualityRate,fprice,lowestPrice)self.insert(value)else:return;if __name__ == "__main__":fly = FLIGHT()date ="2020-05-10"#创建数据库#循环遍历fly.xiecheng(date)

完整的city部分是这样的

city={"AAT":"阿勒泰","ACX":"兴义","AEB":"百色","AKU":"阿克苏","AOG":"鞍山","AQG":"安庆","AVA":"安顺","AXF":"阿拉善左旗","BAV":"包头","BFJ":"毕节","BHY":"北海","BJS":"北京","BPE":"秦皇岛","BPL":"博乐","BPX":"昌都","BSD":"保山","CAN":"广州","CDE":"承德","CGD":"常德","CGO":"郑州","CGQ":"长春","CHG":"朝阳","CIF":"赤峰","CIH":"长治","CKG":"重庆","CSX":"长沙","CTU":"成都","CWJ":"沧源","CYI":"嘉义","CZX":"常州","DAT":"大同","DAX":"达县","DBC":"白城","DCY":"稻城","DDG":"丹东","DIG":"香格里拉(迪庆)","DLC":"大连","DLU":"大理","DNH":"敦煌","DOY":"东营","DQA":"大庆","DSN":"鄂尔多斯","DYG":"张家界","EJN":"额济纳旗","ENH":"恩施","ENY":"延安","ERL":"二连浩特","FOC":"福州","FUG":"阜阳","FUO":"佛山","FYJ":"抚远","GOQ":"格尔木","GYS":"广元","GYU":"固原","HAK":"海口","HDG":"邯郸","HEK":"黑河","HET":"呼和浩特","HFE":"合肥","HGH":"杭州","HIA":"淮安","HJJ":"怀化","HKG":"香港","HLD":"海拉尔","HLH":"乌兰浩特","HMI":"哈密","HPG":"神农架","HRB":"哈尔滨","HSN":"舟山","HTN":"和田","HUZ":"惠州","HYN":"台州","HZG":"汉中","HZH":"黎平","INC":"银川","IQM":"且末","IQN":"庆阳","JDZ":"景德镇","JGD":"加格达奇","JGN":"嘉峪关","JGS":"井冈山","JHG":"西双版纳","JIC":"金昌","JIQ":"黔江","JIU":"九江","JJN":"晋江","JMJ":"澜沧","JMU":"佳木斯","JNG":"济宁","JNZ":"锦州","JSJ":"建三江","JUH":"池州","JUZ":"衢州","JXA":"鸡西","JZH":"九寨沟","KCA":"库车","KGT":"康定","KHG":"喀什","KHN":"南昌","KJH":"凯里","KMG":"昆明","KNH":"金门","KOW":"赣州","KRL":"库尔勒","KRY":"克拉玛依","KWE":"贵阳","KWL":"桂林","LCX":"龙岩","LDS":"伊春","LFQ":"临汾","LHW":"兰州","LJG":"丽江","LLB":"荔波","LLF":"永州","LLV":"吕梁","LNJ":"临沧","LPF":"六盘水","LUM":"芒市","LXA":"拉萨","LYA":"洛阳","LYG":"连云港","LYI":"临沂","LZH":"柳州","LZO":"泸州","LZY":"林芝","MDG":"牡丹江","MFK":"马祖","MFM":"澳门","MIG":"绵阳","MXZ":"梅州","NAO":"南充","NBS":"白山","NDG":"齐齐哈尔","NGB":"宁波","NGQ":"阿里","NKG":"南京","NLH":"宁蒗","NNG":"南宁","NNY":"南阳","NTG":"南通","NZH":"满洲里","OHE":"漠河","PZI":"攀枝花","RHT":"阿拉善右旗","RIZ":"日照","RKZ":"日喀则","RLK":"巴彦淖尔","SHA":"上海","SHE":"沈阳","SIA":"西安","SJW":"石家庄","SWA":"揭阳","SYM":"普洱","SYX":"三亚","SZX":"深圳","TAO":"青岛","TCG":"塔城","TCZ":"腾冲","TEN":"铜仁","TGO":"通辽","THQ":"天水","TLQ":"吐鲁番","TNA":"济南","TSN":"天津","TVS":"唐山","TXN":"黄山","TYN":"太原","URC":"乌鲁木齐","UYN":"榆林","WEF":"潍坊","WEH":"威海","WMT":"遵义(茅台)","WNH":"文山","WNZ":"温州","WUA":"乌海","WUH":"武汉","WUS":"武夷山","WUX":"无锡","WUZ":"梧州","WXN":"万州","XFN":"襄阳","XIC":"西昌","XIL":"锡林浩特","XMN":"厦门","XNN":"西宁","XUZ":"徐州","YBP":"宜宾","YCU":"运城","YIC":"宜春","YIE":"阿尔山","YIH":"宜昌","YIN":"伊宁","YIW":"义乌","YNJ":"延吉","YNT":"烟台","YNZ":"盐城","YTY":"扬州","YUS":"玉树","YZY":"张掖","ZAT":"昭通","ZHA":"湛江","ZHY":"中卫","ZQZ":"张家口","ZUH":"珠海","ZYI":"遵义(新舟)"}"""{"KJI":"布尔津"}"""

python爬虫之获取携程网所有航班机票信息,与携程网共同变化的爬虫博客相关推荐

  1. Python 爬虫-爬取阿里旅行特价机票信息(1)

    本着对于出游的向往,但又苦于没有找到合适的机票价格.于是,萌生了去获取相关网站的机票信息.一开始是想去获取全站机票信息,但是那个工作量太大,而且机票价格在一天时间里经常变更,给数据的爬取增加了极大的难 ...

  2. python爬虫(14)获取淘宝MM个人信息及照片(中)

    python爬虫(14)获取淘宝MM个人信息及照片(中) python爬虫(14)获取淘宝MM个人信息及照片(上) python爬虫(14)获取淘宝MM个人信息及照片(下)(windows版本) 在上 ...

  3. python爬虫(14)获取淘宝MM个人信息及照片(上)

    python爬虫(14)获取淘宝MM个人信息及照片(上) python爬虫(14)获取淘宝MM个人信息及照片(中) python爬虫(14)获取淘宝MM个人信息及照片(下)(windows版本) 网上 ...

  4. Python学习之爬虫(一)--获取论坛中学生获得offer信息

    Python学习之爬虫(一)–获取论坛中学生获得offer信息 目的:爬取帖子中每个学生的offer信息,并保存在Excel文档中. 爬取结果 过程 1. 爬取每个帖子中学生offer信息 1.1 查 ...

  5. python爬取携程网航班机票信息并存储到数据库中,2020年最新版本

    我们先看一下携程网的信息 从以上我们可以获取两个信息 1.我是从上海到成都的 2.7条航班信息 教你如何爬虫 我不教静态网页的爬虫,因为太简单了:我们直接上手携程网 我们到这个上述这个页面以后,我推荐 ...

  6. python经典爬虫之获取酷狗音乐TOP500信息

    前几天小菌分享的博客<用python爬虫制作图片下载器(超有趣!)>收到了粉丝们较多的关注,小菌决定再分享一些简单的爬虫项目给爬虫刚入门的小伙伴们,希望大家能在钻研的过程中,感受爬虫的魅力 ...

  7. 使用python+selenium爬取同城旅游网机票信息

    最近使用python+selenium爬取了同城旅游网机票信息 相关主要代码如下,通过模拟人为操作,拿下了这个机票列表的html代码,然后就可以使用xpath或者re等方式从中提取需要的字段信息了. ...

  8. java实现获取各网站的机票信息_java爬取某个机票查询网站上面的信息(刚学!!!)...

    [Java] 纯文本查看 复制代码[ 本帖最后由 shangjS009 于 2018-5-25 15:41 编辑 ]\n\n@RequestMapping(value = "${adminP ...

  9. python第三周测试_第三周作业 - 作业 - 信息与计算17数31SWE - 班级博客 - 博客园...

    格式要求 请大家在作业开头添加格式描述,仿照如下 这个作业要求在哪里 我在这个课程的目标是 此作业在哪个具体方面帮我实现目标 其他参考文献 作业正文 博客作业可以给出链接 正文 ........... ...

最新文章

  1. TVM 高效保护隐私 ML
  2. 专访SIGDIAL2020最佳论文一作高信龙一:成功都是一步步走出来的
  3. Jmeter安装出现Not able to find Java executable or version问题解决方案
  4. C语言 泛型链表的实现
  5. 泛型(比较杂 后期整理)
  6. FastDFS入门步骤
  7. OBJECT_ID(Transact-Sql)
  8. 16. jQuery - 获取并设置 CSS 类
  9. 简述osi参考模型各层主要功能_简述OSI参考模型及各层的功能
  10. 雅可比矩阵(Jacobian)、海森矩阵(Hessian)
  11. 光学遥感和微波遥感异同点?影响微波散射的因素有哪些?
  12. Windows下装MySQL
  13. ConcurrentHashMap1.8 源码分析
  14. 简述空串和空格串(或称空格符串)的区别。
  15. 复旦大学计算机学院肖江,【学术报道】复旦大学肖江教授应邀来我校学术交流...
  16. MACD详细计算方法及例子
  17. [nRF52832开发板:Nordic Thingy:52]Nordic Thingy:52到手
  18. ul和ol的区别以及经验总结
  19. 华为云14天鸿蒙设备开发-Day7WIFI功能开发
  20. p2p shareaza 老牌多功能跨协议P2P客户端

热门文章

  1. 01背包(求具体方案的最小字典序)
  2. 整理的汉字及拼音、编码数据文件,依据拼音声母进行归类共计2万多条
  3. 复杂问题简单化以及简单问题复杂化
  4. 使用Python爬取中国Mooc网讨论区内所有评论
  5. 无线上把锁:WEP、WPA无线加密方式对比
  6. 最新全自动磁环电感线圈绕法跟磁环绕线后产品工艺流程介绍
  7. 一周5G资讯 | 华为、中兴、红米发布5G手机;三大运营商2020年5G计划;5G标准推迟发布...
  8. 冰激凌机器厂商改变商业模式,瞬间打开市场,每天躺着赚钱
  9. 【二、FreeBSD的系统按装步骤】(基于FreeBSD虚拟机安装好的基础之上)
  10. 多种方式通过ISBN获取图书信息