使用python获取中国证券投资基金业协会上数据

目标网址：http://www.amac.org.cn/

这个网站上只有查询功能，但是业务想要上面的数据进行更有效的分析。

所以记录一下。

就拿私募基金这个相对比较难的，难点不在于数据难拿，难点在于我们想要的数据要层层筛选，也就是说为了拿到一条完整的数据需要请求三次页面才能够拿到完整的数据，且，后一页的数据通过前一页面上的数据拿到后一页的请求地址，所以要发送三次请求。如下图所示：

看看这个页面的请求：

看到这个请求后，还是比较简单的。为一个算“反爬”措施的是 rand=XXXXX,仔细看，其实就是生产一个随机数，即可。

剩下的几个参数就没什么好介绍的了。

看一下相应，如下图所示：

如果单单拿这个页面的数据就比较简单了，直接post请求，拿代码就可以了。

但是我需要公司的披露信息：如下图所示：

但是这个页面是通过第一个页面的managerurl来获取，所以要再发一次请求。

同样的原理获取公示信息也一样：如下图所示：

所以直接源代码吧：


#私募基金
def sminfo():rows=[]for i in range(1, 6397):  # 2382-print("第%s页====================="%str(i))headers={'Accept':'application/json,text/javascript,*/*; q=0.01',             'Accept-Encoding':'gzip,deflate','Connection':'keep-alive','Host':'gs.amac.org.cn','Content-Type':'application/json;charset=UTF-8','Origin':'http://gs.amac.org.cn','X-Requested-With':'XMLHttpRequest','Referer':'http://gs.amac.org.cn/amac-infodisc/res/pof/fund/index.html','User-Agent':'自己的'}r=random.random()print(str(r))     url="http://gs.amac.org.cn/amac-infodisc/api/pof/fund?rand="+str(r)+"&page="+str(i)+"&size=20"data = {}data = json.dumps(data)page = 0response = requests.post(url = url,data=data ,headers =headers )#s = requests.session()#s.keep_alive = Falsetry:datas = json.loads(response.text)["content"]except ValueError:try:datas = json.loads(response.text)["content"]except ValueError:datas = json.loads(response.text)["content"]count=0for data1 in datas:count+=1print("正在爬取第"+str(count)+"条数据")jjid = data1['id'] #基金IDmanagerurl=data1['managerUrl'] #经理页urlfundName = data1['fundName']  #基金名称managename = data1['managerName']  #基金管理人名称url = data1['url']url2='http://gs.amac.org.cn/amac-infodisc/res/pof/fund/'+urltry:response = requests.get(url2,headers=headers)except Exception:response = requests.get(url2,headers=headers)while response.status_code!=200:response = requests.get(url2,headers=headers)response.encoding = "utf-8"aa= response.texthtml = etree.HTML(aa)
#获取基本信息basicinfo = html.xpath('/html/body/div[3]/div/div[2]/div[1]/div/table//tbody')[0].xpath('string(.)').replace('\r\n','').replace(" ","").replace("\t","")#获取基金管理人名称信息try:url3='http://gs.amac.org.cn/amac-infodisc/res/pof/'+managerurl[3:]responsesecondinfo = requests.get(url3,headers=headers)while responsesecondinfo.status_code==202:responsesecondinfo = requests.get(url3,headers=headers)except Exception:responsesecondinfo = requests.get(url3,headers=headers)while responsesecondinfo.status_code!=200:responsesecondinfo = requests.get(url3,headers=headers)#responsesecondinfo.encoding = responsesecondinfo.apparent_encodingresponsesecondinfo.encoding = "utf-8"rescontent= responsesecondinfo.texthtml3managedata = etree.HTML(rescontent).xpath('/html/body/div[3]/div/div[4]/div[1]/div/table//tbody')[0].xpath('string(.)').replace('\r\n','').replace(" ","").replace("\t","")#print(html3managedata)#获取诚信信息html3belidata=etree.HTML(rescontent).xpath('/html/body/div[3]/div/div[4]/div[10]/div[2]/table//tbody')[0].xpath('string(.)').replace('\r\n','').replace(" ","").replace("\t","：")#print(html3belidata)#获取披露信息t2=html.xpath('/html/body/div[3]/div/div[2]/div[2]/div[2]/table//tbody')[0].xpath('string(.)').replace('\r\n','').replace(" ","").replace("\t","")row=(fundName,managename,basicinfo,t2,html3managedata,html3belidata)rows.append(row)with codecs.open('company26.csv', 'wb',encoding='gbk',errors='ignore') as f:writer = csv.writer(f)writer.writerow(["基金名称","基金管理人名称","基金详细信息","信息披露情况","机构诚信信息","机构诚信信息特别提示"])writer.writerows(rows)

同时还获取了：证券公司集合资管产品基金公司及子公司集合资管产品资产支持专项计划等。这几个比较简单就不写了。

使用python获取中国证券投资基金业协会上数据相关推荐

Python+Selenium自动搜索基金业协会指定企业名单，抓取指定信息并保存到数据库...
Python+Selenium自动搜索基金业协会指定企业名单,抓取指定信息并保存到数据库.网址https://gs.amac.org.cn/amac-infodisc/res/pof/manager/ ...
python获取2020年国家统计局省市县三级数据
python获取2020年国家统计局省市县三级数据一.数据来源二.获取思路三.完整代码四.成果四.获取地址一.数据来源国家统计局2020年最新的数据二.获取思路寻找url的规律所有 ...
Python 获取当前文件当前目录上级目录上上级目录
Python 获取当前文件当前目录上级目录上上级目录文章目录 Python 获取当前文件当前目录上级目录上上级目录前言一.获取当前文件二.获取当前目录三.读入数据四.获取上上级 ...
中国证券投资基金从业考试笔记(时间相关)
考了个试笔了个记如题,整理了一下时间相关的知识点,都是看书整理来的,本来准备直接网上搜一下,结果没有这样整理的,求人不如求己,就这样吧! 欢迎转载,请注明:FROM 年糕大侠! 时间 1. ...
Python获取当前时间的上一年，上一月，上一日
使用python获取当前月份之前的一个月 datetimeNow = datetime.date.today()print(datetimeNow)datetimeOneMonthAgo = str( ...
python 获取当前目录，上级目录，上上级目录，N级目录
import osprint '***获取当前目录***' print os.getcwd() print os.path.abspath(os.path.dirname(__file__))prin ...
python量化分析系列之---使用python获取股票历史数据和实时分笔数据
财经数据接口包tushare的使用(一) Tushare是一款开源免费的金融数据接口包,可以用于获取股票的历史数据.年度季度报表数据.实时分笔数据.历史分笔数据,本文对tushare的用法,已经存在的 ...
Python获取中国疫情数据（最新版）
获取疫情数据的脚本是2020年就写了,网上有多个地方可以获取: 比如百度有: 实时更新:新型冠状病毒肺炎疫情地图腾讯有:实时更新:新冠肺炎疫情最新动态因为一开始写的时候,觉得比较简单,所以一直没有 ...
如何获取下拉框lable的数据_如何使用Python获取指定股票的日K线数据?
本篇仅介绍指定股票的数据拉取,批量拉取将在之后的教程中进行介绍,首先你需要安装BaoStock,参见往期教程: 如何使用Python安装BaoStock? 该数据是从2015年开始的,优点是可以用Py ...

使用python获取中国证券投资基金业协会上数据

使用python获取中国证券投资基金业协会上数据相关推荐

最新文章

热门文章