mac python+selenium+阿布云爬取拉钩网

废话不多说直接上代码

import time
import csv
from selenium import webdriver
import string
import zipfile
from lxml import etree# 代理服务器
proxyHost = "http-dyn.abuyun.com"
proxyPort = "9020"# 代理隧道验证信息
proxyUser = "HD2A47190U2xxxx"
proxyPass = "CB1FDB9303ABxxxx"def create_proxy_auth_extension(proxy_host, proxy_port,proxy_username, proxy_password,scheme='http', plugin_path=None):if plugin_path is None:plugin_path = '{}_{}@http-dyn.abuyun.com_9020.zip'.format(proxy_username, proxy_password)manifest_json = """{"version": "1.0.0","manifest_version": 2,"name": "Abuyun Proxy","permissions": ["proxy","tabs","unlimitedStorage","storage","<all_urls>","webRequest","webRequestBlocking"],"background": {"scripts": ["background.js"]},"minimum_chrome_version":"22.0.0"}"""background_js = string.Template("""var config = {mode: "fixed_servers",rules: {singleProxy: {scheme: "${scheme}",host: "${host}",port: parseInt(${port})},bypassList: ["foobar.com"]}};chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});function callbackFn(details) {return {authCredentials: {username: "${username}",password: "${password}"}};}chrome.webRequest.onAuthRequired.addListener(callbackFn,{urls: ["<all_urls>"]},['blocking']);""").substitute(host=proxy_host,port=proxy_port,username=proxy_username,password=proxy_password,scheme=scheme,)with zipfile.ZipFile(plugin_path, 'w') as zp:zp.writestr("manifest.json", manifest_json)zp.writestr("background.js", background_js)return plugin_pathproxy_auth_plugin_path = create_proxy_auth_extension(proxy_host=proxyHost,proxy_port=proxyPort,proxy_username=proxyUser,proxy_password=proxyPass)option = webdriver.ChromeOptions()
prefs = {'profile.managed_default_content_settings.images': 2,
'permissions.default.stylesheet': 2
}option.add_argument("--start-maximized")
option.add_extension(proxy_auth_plugin_path)
option.add_experimental_option('prefs', prefs)
path = '/Users/lin/Desktop/demo/chromedriver'
driver = webdriver.Chrome(executable_path=path,chrome_options=option)link = 'https://www.lagou.com/'driver.get(link)time.sleep(5)# 搜索传智播客
driver.find_element_by_id("search_input").send_keys("python")
# 点击搜索按钮
driver.find_element_by_id("search_button").click()time.sleep(5)pageSource = driver.page_sourceet = etree.HTML(pageSource)list_li = et.xpath('//*[@id="s_position_list"]/ul/li')list_csvs = []for i in list_li:headline = i.xpath('./div[1]/div[1]/div/a/h3/text()')[0] #标题place=i.xpath('./div[1]/div[1]/div[1]/a/span/em/text()')[0] #地方company=i.xpath('./div[1]/div[2]/div[1]/a/text()')[0] #公司pay=i.xpath('./div[1]/div[1]/div[2]/div/span/text()')[0]#薪资experience_background = i.xpath('./div[1]/div[1]/div[2]/div/text()[3]')#经验学历experience=experience_background[0].split('/')[0] #经验background=experience_background[0].split('/')[1].replace('\t','').replace(' ','').replace('\n','') #经验release = i.xpath('./div[1]/div[1]/div[1]/span/text()')[0] #发布日期list_csvs.append([headline,place,company,pay,experience,background,release])# python2可以用file替代open
with open("test.csv", "a") as csvfile:writer = csv.writer(csvfile)# 先写入columns_namewriter.writerow(["标题", "地点", "公司名称","经验",'薪资',"学历",'发布时间'])# 写入多行用writerowswriter.writerows(list_csvs)

结果：

mac python+selenium+阿布云爬取拉钩网相关推荐

mac python+selenium+阿布云
from selenium import webdriver import string import zipfile# 代理服务器 proxyHost = "http-dyn.abuyun ...
python + selenium +pyquery 爬虫爬取 1688详情图片阿里巴巴详情图片与标题下载图片并进行压缩
python + selenium +pyquery 爬虫爬取 1688详情图片阿里巴巴详情图片与标题下载图片并进行压缩用到的库和源码下载地址需要用到chromedriver 包含wi ...
python 爬取拉钩网数据
python 爬取拉钩网数据完整代码下载:https://github.com/tanjunchen/SpiderProject/blob/master/lagou/LaGouSpider.py # ...
python爬取拉勾网_python爬虫—爬取拉钩网
本人自学python,小试牛刀,爬取广州片区拉钩网招聘信息.仅用于学习参考文章:https://blog.csdn.net/SvJr6gGCzUJ96OyUo/article/details/805 ...
python爬虫—爬取拉钩网
本人自学python,小试牛刀,爬取广州片区拉钩网招聘信息.仅用于学习参考文章:https://blog.csdn.net/SvJr6gGCzUJ96OyUo/article/details/805 ...
基于python+selenium+Chrome自动化爬取巨潮资讯网A股财务报表
转自同学的博客引言: 网页爬虫分为静态网页爬虫和动态网页爬虫,前者是指索要获取的网页内容不需要经过js运算或者人工交互, 后者是指获取的内容必须要经过js运算或者人工交互.这里的js运算可能是aja ...
python 异步加载图片_Python 爬取拉钩网异步加载页面
如下是我简单的获取拉钩网异步加载页面信息的过程获取的是深圳 Python 岗位的所有信息,并保存在Mongo中 (对于异步加载,有的人说是把你要爬页面的信息整个页面先爬下来,保存本地,然后再看有没有 ...
Python selenium练习：爬取京东商品搜索结果
Selenium是一个自动化测试工具,利用它可以驱动浏览器执行特定的动作(具体的配置或使用可以百度).我用的谷歌浏览器,先在镜像下载谷歌浏览器版本对应的驱动版本https://npm.taobao.o ...
python +selenium+phantomjs 登录爬取新浪微博动态js页面
登录新浪微博最近新浪微博好烦,都取消不了验证码这个难搞得东西,而且跳来跳去,一改版以前的代码就都不能用了.目前整理的资料有三种方法: 1. 设Cookie:简单粗暴,免去了模拟登录的好多麻烦,只是要 ...

mac python+selenium+阿布云爬取拉钩网

mac python+selenium+阿布云爬取拉钩网相关推荐

最新文章

热门文章

mac python+selenium+阿布云 爬取拉钩网

mac python+selenium+阿布云 爬取拉钩网相关推荐

最新文章

热门文章

mac python+selenium+阿布云爬取拉钩网

mac python+selenium+阿布云爬取拉钩网相关推荐