【抢课】用Python网页爬虫来进行选(qiang)课

一前言

每当选课的时候，都如同打仗一般
都有自己想要的课，但是名额就那么一点
于是各显神通，有人用js，有人用chrome的console
人生苦短，我用Python

二环境依赖

Python 2.7.12
(NEW) Python 3.3 & Python 3.6
pip freeze > Requirement.txt

Requirement.txt

beautifulsoup4==4.6.0
bs4==0.0.1
configparser==3.5.0
lxml==3.7.3
requests==2.13.0
tqdm==4.11.2

三使用方法

获取程序

你可以直接git clone最新版本的程序

$ git clone https://github.com/okcd00/CDSelector.git
$ cd CDSelector
$ vim config 修改登陆信息
$ vim courseid 修改选课信息
$ python CDSelector

你也可以去release下载当前最新的稳定版Release
https://github.com/okcd00/CDSelector/releases

修改文件 config

[info]
username = [你的帐户名邮箱]
password = [你的密码]
runtime  = [每隔多少秒尝试选课一次][action]
debug = true [debug模式输出中间变量，为节省资源可设置为false]
enroll = true [轮询模式下无限循环尝试，没想过什么情况下需要设为false]
evaluate = true [验证选课成功与否，建议开启]
select_bat = false [批选课，应用于类似英语B这种不让单独选，必须同时选俩才给提交表单的特殊情况]

修改文件 courseid

类似如下的每行一门课的课程编号即可，

091M7014H
091M7021H

特别的，如果这门课要选成学位课的话，

091M7014H
091M7021H on

形似第二行加个空格加个on即可

然后运行 CDSelector.py

$ python CDSelector.py
Debug Mode: True
Login success
Enrolling start
> Course Selection is unreachable or not started. <1134> Thu Jun 01 08:43:42 2017

如果显示ImportError，就是说缺少了某些python包，在这里提供了requirement.txt
可以在当前目录下直接

$ pip install -r requirement.txt

一次性安装所有依赖项

四 Source Code

由于时不时还更新一下，所以这里只是v1.0.0首发版本
web端访问部分参考了scusjs，功能强化参考了zoecur

(Updated 2017-09-07) 现在更新一波，v1.0.7版本。

#coding=utf8
# =====================================================
#   Copyright (C) 2016 All rights reserved.
#
#   filename : CDSelector.py
#   author   : okcd00 / okcd00@qq.com
#   refer    : scusjs@foxmail.com
#   date     : 2017-01-06
#   desc     : UCAS Course_Selection Program
# =====================================================import os
import sys
import time
import requests
from bs4 import BeautifulSoup
from configparser import RawConfigParserclass UCASEvaluate:def __init__(self):self.__readCoursesId('./courseid')cf = RawConfigParser()cf.read('config')self.username = cf.get('info', 'username')self.password = cf.get('info', 'password')self.runtime = cf.getint('info', 'runtime')self.debug = cf.getboolean('action', 'debug')self.enroll = cf.getboolean('action', 'enroll')self.evaluate = cf.getboolean('action', 'evaluate')self.select_bat = cf.getboolean('action', 'select_bat')self.loginPage = 'http://sep.ucas.ac.cn'self.loginUrl = self.loginPage + '/slogin'self.courseSystem = self.loginPage + '/portal/site/226/821'self.courseBase = 'http://jwxk.ucas.ac.cn'self.courseIdentify = self.courseBase + '/login?Identity='self.courseSelected = self.courseBase + '/courseManage/selectedCourse'self.courseSelectionBase = self.courseBase + '/courseManage/main'self.courseCategory = self.courseBase + '/courseManage/selectCourse?s='self.courseSave = self.courseBase + '/courseManage/saveCourse?s='self.studentCourseEvaluateUrl = 'http://jwjz.ucas.ac.cn/Student/DeskTopModules/'self.selectCourseUrl = 'http://jwjz.ucas.ac.cn/Student/DesktopModules/Course/SelectCourse.aspx'self.enrollCount = {}self.headers = {'Host': 'sep.ucas.ac.cn','Connection': 'keep-alive','Pragma': 'no-cache','Cache-Control': 'no-cache','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','Upgrade-Insecure-Requests': '1','User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36','Accept-Encoding': 'gzip, deflate, sdch','Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6',}self.s = requests.Session()loginPage = self.s.get(self.loginPage, headers=self.headers)self.cookies = loginPage.cookiesdef login(self):postdata = {'userName': self.username,'pwd': self.password,'sb': 'sb'}self.s.post(self.loginUrl, data=postdata, headers=self.headers)if 'sepuser' in self.s.cookies.get_dict(): return Truereturn Falsedef getMessage(self, restext):css_soup = BeautifulSoup(restext, 'html.parser')text = css_soup.select('#main-content > div > div.m-cbox.m-lgray > div.mc-body > div')[0].textreturn "".join(line.strip() for line in text.split('\n'))def __readCoursesId(self, filename):coursesFile = open(filename, 'r')self.coursesId = {}for line in coursesFile.readlines():line = line.strip().replace(' ', '').split(':')courseId = line[0]isDegree = Falseif len(line) == 2 and line[1] == 'on':isDegree = Trueself.coursesId[courseId] = isDegreedef enrollCourses(self):response = self.s.get(self.courseSystem, headers=self.headers)soup = BeautifulSoup(response.text, 'html.parser')try:identity = str(soup).split('Identity=')[1].split('"'[0])[0]coursePage = self.courseIdentify + identityresponse = self.s.get(coursePage)response = self.s.get(self.courseSelected)idx, lastMsg = 0, ""while True:msg = ""if self.select_bat: result, msg = self.__enrollCourses(self.coursesId)if result: self.coursesId.clear()else:   for eachCourse in self.coursesId:if eachCourse in response.text:print("Course " + eachCourse + " has been selected.")continueif (eachCourse in self.enrollCount andself.enrollCount[eachCourse] == 0):continueself.enrollCount[eachCourse] = 1result, msg = self.__enrollCourse(eachCourse, self.coursesId[eachCourse])if result:self.enrollCount[eachCourse] = 0for enroll in self.enrollCount:if self.enrollCount[enroll] == 0:self.coursesId.pop(enroll)self.enrollCount.clear()if not self.coursesId: return 'INVALID COURSES_ID'idx += 1time.sleep(self.runtime)showText = "\r> " + "%s <%d> %s" % (msg, idx,time.asctime( time.localtime(time.time()) ))lastMsg = msgsys.stdout.write(showText)sys.stdout.flush()except KeyboardInterrupt:print("\nKeyboardInterrupt Detected, bye!")return "STOP"except Exception as exception:return "Course_Selection_Port is not open, waiting..."def __enrollCourse(self, courseId, isDegree):response = self.s.get(self.courseSelectionBase)if self.debug:with open('./check.html', 'wb+') as f:f.write(response.text.encode('utf-8'))soup = BeautifulSoup(response.text, 'html.parser')categories = dict([(label.contents[0][:2], label['for'][3:])for label in soup.find_all('label')[2:]])categoryId = categories[courseId[:2]]identity = soup.form['action'].split('=')[1]postdata = {'deptIds': categoryId,'sb': 0}categoryUrl = self.courseCategory + identityresponse = self.s.post(categoryUrl, data=postdata)if self.debug:print ("Now Posting, save snapshot in check2.html.")with open('./check2.html', 'wb+') as f:f.write(response.text.encode('utf-8'))soup = BeautifulSoup(response.text, 'html.parser')courseTable = soup.body.form.tableif courseTable:courseTable = courseTable.find_all('tr')[1:]else: return False, "Course Selection is unreachable or not started."courseDict = dict([(c.span.contents[0], c.span['id'].split('_')[1])for c in courseTable])if courseId in courseDict:postdata = {'deptIds': categoryId,'sids': courseDict[courseId]}if isDegree:postdata['did_' + courseDict[courseId]] = courseDict[courseId]courseSaveUrl = self.courseSave + identityresponse = self.s.post(courseSaveUrl, data=postdata)print ("Now Checking, save snapshot in result.html.")with open('result.html','wb+') as f:f.write(response.text.encode('utf-8'))if 'class="error' not in response.text:return True, '[Success] ' + courseIdelse: return False, self.getMessage(response.text).strip()else:return False, "No such course"def __enrollCourses(self, courseIds):  # For Englishresponse = self.s.get(self.courseSelectionBase)if self.debug: with open('./check.html', 'wb+') as f:f.write(response.text.encode('utf-8'))soup = BeautifulSoup(response.text, 'html.parser')categories = dict([(label.contents[0][:2], label['for'][3:])for label in soup.find_all('label')[2:]])identity = soup.form['action'].split('=')[1]categoryIds = []for courseId in courseIds:categoryIds.append(categories[courseId[:2]])postdata = {'deptIds': categoryIds,'sb': 0}categoryUrl = self.courseCategory + identityresponse = self.s.post(categoryUrl, data=postdata)if self.debug: print ("Now Posting, save snapshot in check2.html.")with open('./check2.html', 'wb+') as f:f.write(response.text.encode('utf-8'))soup = BeautifulSoup(response.text, 'html.parser')courseTable = soup.body.form.tableif courseTable:courseTable = courseTable.find_all('tr')[1:]else: return False, "Course Selection is unreachable or not started."courseDict = dict([(c.span.contents[0], c.span['id'].split('_')[1])for c in courseTable])postdata = {'deptIds': categoryIds,'sids': [courseDict[courseId] for courseId in courseIds]}courseSaveUrl = self.courseSave + identityresponse = self.s.post(courseSaveUrl, data=postdata)print ("Now Checking, save snapshot in result.html.")with open('result.html','wb+') as f:f.write(response.text.encode('utf-8'))if 'class="error' not in response.text:return True, '[Success] ' + courseIdelse: return False, self.getMessage(response.text).strip()if __name__ == "__main__":print("starting...")os.system('MODE con: COLS=128 LINES=32 & TITLE Welcome to CDSelector')from logo import show_logoshow_logo() # delete this for faster start 23333os.system('cls')time.sleep(1)os.system("color 0A")os.system('MODE con: COLS=80 LINES=10 & TITLE CD_Course_Selecting is working')while True:try:ucasEvaluate = UCASEvaluate()breakexcept Exception as e:if e[0]=="Connection aborted.":ucasEvaluate = UCASEvaluate()if ucasEvaluate.debug:print ("Debug Mode: %s" % str(ucasEvaluate.debug) )print ("In debug mode, you can check snapshot with html files.")print ("By the way, Ctrl+C to stop.")if not ucasEvaluate.login():print('Login error. Please check your username and password.')exit()print('Login success: ' + ucasEvaluate.username)print('Enrolling starts')while ucasEvaluate.enroll:status = ucasEvaluate.enrollCourses()if status == 'STOP':breakelse: status += time.asctime( time.localtime(time.time()) )sys.stdout.write("%s\r" % status)print('Enrolling finished')

五获取途径

Github： https://github.com/okcd00/CDSelector
Release： https://github.com/okcd00/CDSelector/releases
说明文档： http://blog.csdn.net/okcd00/article/details/72827861

【抢课】用Python网页爬虫来进行选(qiang)课相关推荐

【选课脚本】用Python网页爬虫来进行选(qiang)课（更新至v1.0.8）
0x00 前言每当选课的时候,都如同打仗一般都有自己想要的课,但是名额就那么一点于是各显神通,有人用 js,有人用 chrome 的 console 人生苦短,我用Python (Last Up ...
python网页爬虫-python网页爬虫浅析
Python网页爬虫简介: 有时候我们需要把一个网页的图片copy 下来.通常手工的方式是鼠标右键 save picture as ... python 网页爬虫可以一次性把所有图片copy 下来. ...
python网页爬虫-Python网页爬虫
曾经因为NLTK的缘故开始学习Python,之后渐渐成为我工作中的第一辅助脚本语言,虽然开发语言是C/C++,但平时的很多文本数据处理任务都交给了Python.离开腾讯创业后,第一个作品课程图谱也是选 ...
Python 网页爬虫文本处理科学计算机器学习数据挖掘兵器谱 - 数客
曾经因为NLTK的缘故开始学习Python,之后渐渐成为我工作中的第一辅助脚本语言,虽然开发语言是C/C++,但平时的很多文本数据处理任务都交给了Python.离开腾讯创业后,第一个作品课程图谱也是选 ...
Python 网页爬虫文本处理科学计算机器学习数据挖掘兵器谱
Python 网页爬虫 & 文本处理 & 科学计算 & 机器学习 & 数据挖掘兵器谱 2015-04-27 程序猿程序猿来自:我爱自然语言处理,www.52nlp. ...
python网页爬虫+简单的数据分析
python网页爬虫+简单的数据分析文章目录 python网页爬虫+简单的数据分析一.数据爬取二.数据分析 1.我们今天爬取的目标网站是:http://pm25.in/ 2.需要爬取的目标数据是 ...
python 网页爬虫作业调度_第3次作业-MOOC学习笔记：Python网络爬虫与信息提取
1.注册中国大学MOOC 2.选择北京理工大学嵩天老师的<Python网络爬虫与信息提取>MOOC课程 3.学习完成第0周至第4周的课程内容,并完成各周作业. 4.提供图片或网站显示的学习 ...
Python网页爬虫--
pycharm里安装beautifulSoup以及lxml,才能使爬虫功能强大. 做网页爬虫需要,<网页解析器:从网页中提取有价值数据的工具 http://blog.csdn.net/ochan ...
python 网页爬虫nike_python网络爬虫-爬取网页的三种方式（1）
0.前言 0.1 抓取网页本文将举例说明抓取网页数据的三种方式:正则表达式.BeautifulSoup.lxml. 获取网页内容所用代码详情请参照Python网络爬虫-你的第一个爬虫.利用该代码获取 ...

【抢课】用Python网页爬虫来进行选(qiang)课

一前言

二环境依赖

三使用方法

获取程序

修改文件 config

修改文件 courseid

然后运行 CDSelector.py

四 Source Code

五获取途径

【抢课】用Python网页爬虫来进行选(qiang)课相关推荐

最新文章

热门文章

【抢课】用Python网页爬虫来进行选(qiang)课

一 前言

二 环境依赖

三 使用方法

获取程序

修改文件 config

修改文件 courseid

然后运行 CDSelector.py

四 Source Code

五 获取途径

【抢课】用Python网页爬虫来进行选(qiang)课相关推荐

最新文章

热门文章

一前言

二环境依赖

三使用方法

五获取途径