python 从题库excel中读取需要的属性生成json，然后爬取问卷星比对出答案

1.excel文件
https://download.csdn.net/download/qq_42972591/74125316

import pandas as pd
import re
import jsondf=pd.read_excel('文化题库.xlsx',sheet_name ='Sheet1')
k='[A-Z]'
dic={}
#清空base.txt
with open('base.txt','w') as f:pass
#表格第一行被读取成columns了,所以从1开始
for i in range(1,161):line=df.iloc[i]#line[8]有nan值，需去掉，否则list(line[8])错误#line[8]!=line[8]   去掉nanif line[0]=='题型' or line[8]!=line[8]:continueanswer=list(line[8])#多选选项拆分answers=''#匹配选择题if re.search(k,line[8]):for it in answer:pos=ord(it)-63  #'A'的ascii为65，-63对应到表格答案相应的列answers+=line[pos]+';'  else:answers=line[8]         #判断题直接取答案line[1]=line[1].replace('\n','')#去掉换行符key=str(i)+'.'+line[1]value=answers.split(';')value=[x for x in value if x]#去掉空值dic[key]=valuewith open('base.txt','a') as f:print(i,'.',line[1],file=f)if line[0]=='判断题':print('答案：',answers+'\n',file=f)else:print('答案：',line[8],answers+'\n',file=f)
#ensure_ascii=False使中文不乱码
with open('base.json','w',encoding='ANSI') as file:file.write(json.dumps(dic,ensure_ascii=False))

2.selenium爬取问卷星源码

from selenium import webdriver
import time
from lxml import etreeurl='https://ks.wjx.top/vm/trKN70Z.aspx'
browser=webdriver.Edge()
browser.get(url)
time.sleep(5)
browser.close()
pageSource = browser.page_source
with open('题目code.txt','w',encoding='ANSI') as f:f.write(pageSource)html=etree.HTML(pageSource)
result=html.xpath('//div[@class="field-label"]/text()')
with open('题目.txt','w',encoding='ANSI') as f:for line in result:print(line,'\n',file=f)

3.问卷星源码筛选出题目-选项

import re
from lxml import etree
import jsonans_1=[]
dic={}
with open('题目code.txt','r',encoding='ANSI')as f:html=f.read()
ht=etree.HTML(html)
result=ht.xpath('//div[@class="field-label"]/text()')k=r'(<div class="label" for="q{}_.">)(.*?)(</div>)'
for i in range(3,53):ans_=[]ans=re.findall(k.format(str(i+1)),html)for it in ans:ans_.append(it[1])dic[result[i]]=ans_with open('all.json','w',encoding='ANSI')as f:pass
with open('all.json','w',encoding='ANSI')as f:f.write(json.dumps(dic,ensure_ascii=False))

4.比对得到答案

import jsondef compare(answer_key,i):t=Truefor data_key in data_keys:if data_key.find(answer_key[4:])!=-1:t=Falsewith open('m.txt','a',encoding='ANSI')as f:f.write(str(i)+'.')for pos,it in enumerate(answer[answer_key]):if it=='对':f.write(data[data_key][0])breakelif it in data[data_key]:f.write(chr(ord('A')+pos))f.write('\n')if t:global countcount+=1with open('m.txt','a',encoding='ANSI')as f:f.write(str(i)+'\n')
i=1
count=0
with open('base.json','r',encoding='ANSI') as f:data=json.loads(f.read())
with open('all.json','r',encoding='ANSI') as f:answer=json.loads(f.read())
data_keys=data.keys()
answer_keys=answer.keys()with open('m.txt','w',encoding='ANSI') as f:pass
for answer_key in answer_keys:compare(answer_key,i)i+=1
with open('m.txt','a',encoding='ANSI') as f:f.write(str(count)+'题未找到')

python 从题库excel中读取需要的属性生成json，然后爬取问卷星比对出答案相关推荐

Python爬取问卷星内容
Python爬取问卷星内容问卷星标题和选项内容爬取从以下博客中学习到的,加了些自己的解释 Python3 爬虫- 问卷星内容爬取先贴代码: import time from requests_h ...
从Excel中读取数据并自动生成BPMN标准流程图
2022年6月,由于某个项目建设的要求,需要从Excel中读取流程数据并且自动生成遵循BPMN标准的流程图,以用于作业处理,目前支持这些流程图的主流开源框架有Activiti.Flowable.Cam ...
使用python requests+re库+curl.trillworks.com神器实现淘宝页面信息爬取
慕课[Python网络爬虫与信息提取]课程随手练习~! 和嵩天老师课程中的示范不同的是,淘宝页面现在不能直接爬取,要修改下访问请求的headers表头信息. 目标:使用python的requests+ ...
python爬取问卷星内容,Python 问卷星自动填写爬虫
Python帮你填问卷星 ps:网上看到的代码,虽然技术不深,但我觉得挺有用的,所以在此分享给大家在某段时间朋友圈是否一直充斥着各种问卷调查,为了达成某种要求我们不断向好友求助填写问卷,今天,我就把 ...
python爬取问卷星内容_python问卷星爬虫bug求助
[TOC] 原bug MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9659): Max retries exceeded with ...
python爬取问卷星内容_Python 自动填写问卷星（适用问卷星的所有类型题目）
为应付"上面"的需要,公司接了个帮助推广市政公益项目的问卷.整个问卷有单选题.多选题.填空题.日期框.地理位置框.矩阵多选和矩阵单选等几乎涵盖了问卷星所有类型问题的问卷,题目有近7 ...
python爬取问卷星内容_Python Selenium 问卷星自动填写
从问卷星网站找的一个模板编写思路大致是这样xpath定位网页元素 driver.find_element_by_xpath 对元素进行操作,有点击click和填入send_keys 最后点击提交问卷 ...
Python爬取问卷星问题与选项
代码实现 import requests from bs4 import BeautifulSoupres = requests.get(url="https://.aspx") ...
python怎么从excel获取数据_python怎么从excel中读取数据？/python 读取 excle
如何通过python快速输出数据库数据到excel 扩展库 xlrd 读excle xlwt 写excle 直接度就能下载下载后使用 import xlrd 就可excle文件了打开文件: xls ...

python 从题库excel中读取需要的属性生成json，然后爬取问卷星比对出答案

python 从题库excel中读取需要的属性生成json，然后爬取问卷星比对出答案相关推荐

最新文章

热门文章