Python用Selenium和Chromedriver爬取智联招聘的职位

步骤：

1.在智联招聘网站选择好职位关键词和作用地址。
2.运行代码。

其中注意点
1.用driver爬取首页时，会弹出如下图窗口。此时可以在代码中设置睡眠2秒，自己手动取消窗口。

2.在这一次爬取中，发现网页的翻页按钮不能单纯用buttonTag.click()点击，于是改为self.driver.execute_script("arguments[0].click()",nextBtn)

代码如下

# encoding: utf-8from lxml import etree
from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import csvclass ZhiLian(object):driver_path = r"C:\...\chromedriver.exe"def __init__(self):self.driver = webdriver.Chrome(executable_path=ZhiLian.driver_path)self.url = "https://sou.zhaopin.com/?jl=768&sf=0&st=0&kw=%E7%A0%94%E5%8F%91%E5%B7%A5%E7%A8%8B%E5%B8%88&kt=3"self.fp = open('zhilian_yanfa.csv','a',encoding='utf-8')self.writer = csv.DictWriter(self.fp,['title','salary','address','work_years','education','desc','detail_address','link'])self.positions = []def run(self):self.driver.get(self.url)time.sleep(2)while True:WebDriverWait(self.driver, timeout=10).until(EC.presence_of_element_located((By.XPATH, "//button[contains(@class,'soupager__btn')]")))source = self.driver.page_sourceself.parse_list_page(source)time.sleep(2)nextBtn = self.driver.find_element_by_xpath("//div[@class='soupager']/button[2]")if "disable" in nextBtn.get_attribute('class'):break# nextBtn.click()self.driver.execute_script("arguments[0].click()",nextBtn)def parse_list_page(self,source):html = etree.HTML(source)links = html.xpath("//div[@class='contentpile__content__wrapper__item clearfix']//@href")for link in links:if "jobs" in link:self.parse_detail_page(link)time.sleep(1)def parse_detail_page(self,url):self.driver.execute_script("window.open('%s')"%url)self.driver.switch_to.window(self.driver.window_handles[1])WebDriverWait(self.driver,timeout=10).until(EC.presence_of_element_located((By.CLASS_NAME,"describtion__detail-content")))source = self.driver.page_sourcehtml = etree.HTML(source)title = html.xpath("//h3[@class='summary-plane__title']/text()")[0]salary = html.xpath("//span[@class='summary-plane__salary']/text()")[0]address = "".join(html.xpath("//ul[@class='summary-plane__info']/li[1]//text()"))work_years = html.xpath("//ul[@class='summary-plane__info']/li[2]//text()")[0]education = html.xpath("//ul[@class='summary-plane__info']/li[3]//text()")[0]desc = "".join(html.xpath("//div[@class='describtion']//text()"))detail_address = html.xpath("//span[@class='job-address__content-text']//text()")[0]position = {'title': title,'salary': salary,'address': address,'work_years': work_years,'education': education,'desc': desc,'detail_address': detail_address,'link':url}self.driver.close()self.driver.switch_to.window(self.driver.window_handles[0])self.write_position(position)def write_position(self,position):if len(self.positions) >=10:self.writer.writerows(self.positions)self.positions.clear()self.positions.append(position)print(position['title'])if __name__ == '__main__':spider = ZhiLian()spider.run()

Python用Selenium和Chromedriver爬取智联招聘的职位相关推荐

给大家整理了一篇Python+selenium爬取智联招聘的职位信息
整个爬虫是基于selenium和Python来运行的,运行需要的包 1 mysql,matplotlib,selenium 需要安装selenium火狐浏览器驱动,百度的搜寻. 整个爬虫是模块化组织的 ...
招聘网python职位_Python+selenium爬取智联招聘的职位信息
整个爬虫是基于selenium和Python来运行的,运行需要的包 1 mysql,matplotlib,selenium 需要安装selenium火狐浏览器驱动,百度的搜寻. 整个爬虫是模块化组织的 ...
scrapy框架下的两个爬虫分工合作爬取智联招聘所有职位信息。
爬虫一本次爬取为两个爬虫,第一个爬虫爬取需要访问的URL并且存储到文本中,第二个爬虫读取第一个爬虫爬取的URl然后依次爬取该URL下内容,先运行第一个爬虫然后运行第二个爬虫即可完成爬取. 本帖仅供学 ...
python+selenium爬取智联招聘信息
python+selenium爬取智联招聘信息需求准备代码结果需求老板给了我一份公司名单(大概几百家如下图),让我到网上看看这些公司分别在招聘哪些岗位,通过分析他们的招聘需求大致能推断出我 ...
selenium+PyQuery+chrome headless 爬取智联招聘求职信息
最近导师让自己摸索摸索Python爬虫,好了就开始一发不可收拾的地步.正巧又碰到有位同学需要一些求职信息对求职信息进行数据分析,本着练练手的目的写了用Python爬取智联招聘网站的信息.这一爬取不得了 ...
python爬虫多url_Python爬虫实战入门六：提高爬虫效率—并发爬取智联招聘
之前文章中所介绍的爬虫都是对单个URL进行解析和爬取,url数量少不费时,但是如果我们需要爬取的网页url有成千上万或者更多,那怎么办? 使用for循环对所有的url进行遍历访问? 嗯,想法很好,但是 ...
Python爬虫爬取智联招聘职位信息
目的:输入要爬取的职位名称,五个意向城市,爬取智联招聘上的该信息,并打印进表格中 #coding:utf-8 import urllib2 import re import xlwtclass ZLZ ...
【Python爬虫案例学习20】Python爬虫爬取智联招聘职位信息
目的:输入要爬取的职位名称,五个意向城市,爬取智联招聘上的该信息,并打印进表格中 ####基本环境配置: Python版本:2.7 开发工具:pycharm 系统:win10 ####相关模块: im ...
深圳python数据分析师招聘_Python爬取智联招聘数据分析师岗位相关信息的方法
Python爬取智联招聘数据分析师岗位相关信息的方法发布时间:2020-09-23 23:23:12 来源:脚本之家阅读:88 进入智联招聘官网,在搜索界面输入'数据分析师',界面跳转,按F12查 ...

Python用Selenium和Chromedriver爬取智联招聘的职位

步骤：

Python用Selenium和Chromedriver爬取智联招聘的职位相关推荐

最新文章

热门文章