Splash Lua脚本http://localhost:8050,端口为8050

入口及返回值

function main(splash, args)splash:go("http://www.baidu.com")splash:wait(0.5)local title = splash:evaljs("document.title")return {title=title}
end
通过 evaljs()方法传人 JavaSer刷脚本, 而 document.title 的执行结果就是返回网页标题,执行完毕后将其赋值给一个 title 变盘,随后将其 返回 。

异步处理

按照不同步的程序处理问题

function main(splash, args)local example_urls = {"www.baidu.com", "www.taobao.com", "www.zhihu.com"}local urls = args.urls or example_urlslocal results = {}for index, url in ipairs(urls) dolocal ok, reason = splash:go("http://" .. url)if ok thensplash:wait(2)results[url] = splash:png()endendreturn results
end
wait(2)    等待2秒
字符串拼接符使用的是..操作符
go()方法    返回加载页面的结果状态
运行结果:(如果页面州现 4xx 或5xx状态码, ok变量就为空,就不会返回加载后的图片。)

Splash对象属性

args属性

获取加载时配置的参数

运行:

输出:

js_enableb属性

js_enabled属性是Splash的JavaScript执行开关
可以将其配置为true或false来控制是否执行JavaScript代码,默认为true。

function main(splash, args)splash:go("https://www.baidu.com")splash.js_enabled = falselocal title = splash:evaljs("document.title")return {title=title}
end
go()方法,加载页面
js_enabled = false,禁止执行JavaScript代码

运行情况:

HTTP Error 400 (Bad Request)Type: ScriptError -> JS_ERRORError happened while executing Lua script[string "function main(splash, args)
..."]:4: unknown JS error: None{"type": "ScriptError","info": {"splash_method": "evaljs","line_number": 4,"js_error_message": null,"type": "JS_ERROR","error": "unknown JS error: None","message": "[string \"function main(splash, args)\r...\"]:4: unknown JS error: None","source": "[string \"function main(splash, args)\r...\"]"},"description": "Error happened while executing Lua script","error": 400
}

resource_timeout属性

resource_timeout属性设置加载的超时时间,单位是秒。

function main(splash)splash.resource_timeout = 0.1assert(splash:go('https://www.taobao.com'))return splash:png()
end
png()方法,返回页面截图
resource_timeout = 0.1    表示设置的加载超时时间为0.1s

images_enabled属性

images_enabled属性设置图片是否加载,默认情况下是加载的。不加载图片,加载的速度会快很多。

function main(splash, args)splash.images_enabled = falseassert(splash:go('https://www.jd.com'))return {png=splash:png()}
end

运行后请求加载的网页不加载图片

plugins_enabled属性

plugins_enabled属性可以控制浏览器插件(如Flash插件)是否开启。默认情况下,此属性是false,表示不开启。

splash.plugins_enabled = true/false

scoll_position属性

scroll_position属性可以控制页面上下或左右滚动

function main(splash, args)assert(splash:go('https://www.taobao.com'))splash.scroll_position = {y=400}return {png=splash:png()}end
# 向下滚动400像素

Splash对象的方法

go()方法:请求某个链接

  1. go 方法参数

    ok, reason = splash:go{url, baseurl=nil, headers=nil, http_method="GET", body=nil, formdata=nil}baseurl----资源加载的相对路径
    headers----请求头
    http_method----请求方法,GET或POST
    body----发POST请求时的表单数据,使用的Content-type为application/json。
    formdata----POST请求时的表单数据,,使用的Content-type为application/x-www-form-urlencoded。
    
  2. go 方法实例

    function main(splash, args)local ok, reason = splash:go{"http://httpbin.org/post", http_method="POST", body="name=Germey"}if ok thenreturn splash:html()end
    end
    

    输出结果:

    <html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"args": {}, "data": "", "files": {}, "form": {"name": "Germey"}, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Encoding": "gzip, deflate", "Accept-Language": "en,*", "Content-Length": "11", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "Origin": "null", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) splash Version/9.0 Safari/602.1"}, "json": null, "origin": "120.239.195.171, 120.239.195.171", "url": "https://httpbin.org/post"
    }
    </pre></body></html>
    

wait()方法:控制页面的等待时间

  1. wait 方法参数

    ok, reason = splash:wait{time, cancel_on_redirect=false, cancel_on_error=true}time----等待时间/s
    cancel_on_redirect----如果发生重定向就停止等待,并返回重定向结果
    cancel_on_error----如果发生了加载错我,就停止等待
    
  2. wait 方法实例

    function main(splash)splash:go("https://www.taobao.com")splash:wait(2)return {html=splash:html()}
    end
    

jsfunc()方法

jsfunc()方法,直接调用JavaScript定义的方法,即实现JavaScript方法到Lua脚本的转换。

function main(splash, args)local get_div_count = splash:jsfunc([[function () {var body = document.body;var divs = body.getElementsByTagName('div');return divs.length;}]])splash:go("https://www.baidu.com")return ("There are %s DIVs"):format(get_div_count())
end
运行结果:
"There are 22 DIVs"

evaljs()方法

执行JavaScript代码,并返回最后一条JavaScript语句的返回结果

result = splash:evalijs(js)

runjs()方法

runjs()方法于evaljs()方法功能类似

function main(splash, args)splash:go("https://www.baidu.com")splash:runjs("foo = function() { return 'bar' }")local result = splash:evaljs("foo()")return result
end
输出:
"bar"

autoload()方法:sutoload()设置每个页面访问时自动加载的对象

  1. autoload()方法参数

    ok, reason = splash:autoload{source_or_url, source=nil, url=nil}source_or_url----JavaScript代码或者JavaScript库链接。
    source----JavaScript代码。
    url----JavaScript库链接。
    
  2. autoload()方法例子

    function main(splash, args)splash:autoload([[function get_document_title(){return document.title;}]])splash:go("https://www.baidu.com")return splash:evaljs("get_document_title()")
    end
    
    结果
    Splash Response: "百度一下,你就知道"
    

call_later()方法

通过设置定时任务和延迟时间来实现任务延时执行,并且可以再执行前通过cancel()方法重新执行定时任务。

function main(splash, args)local snapshots = {}local timer = splash:call_later(function()snapshots["a"] = splash:png()splash:wait(1.0)snapshots["b"] = splash:png()end, 0.2)splash:go("https://www.taobao.com")splash:wait(3.0)return snapshots
end

http_get()方法

模拟发送HTTP的GET请求

  1. http_get()方法参数

    response = splash:http_get{url, headers=nil, follow_redirects=true}url----请求URL。
    headers----可选参数,默认为空,请求头。
    follow_redirects----可选参数,表示是否启动自动重定向,默认为true。
    
  2. http_get()方法例子

    function main(splash, args)local treat = require("treat")local response = splash:http_get("http://httpbin.org/get")return {html=treat.as_string(response.body),url=response.url,status=response.status}
    end
    
    输出结果:
    html: String (length 347)
    {"args": {}, "headers": {"Accept-Encoding": "gzip, deflate", "Accept-Language": "en,*", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) splash Version/9.0 Safari/602.1"}, "origin": "120.239.195.171, 120.239.195.171", "url": "https://httpbin.org/get"
    }
    status: 200
    url: "http://httpbin.org/get"
    

http_post()方法

模拟发送HTTP的POST请求

  1. http_post()方法参数

    response = splash:http_post{url, headers=nil, follow_redirects=true, body=nil}url----请求URL。
    headers----可选参数,默认为空,请求头。
    follow_redirects----可选参数,表示是否启动自动重定向,默认为true。
    body----可选参数,即表单数据,默认为空。
    
  2. http_post()方法例子

    function main(splash, args)local treat = require("treat")local json = require("json")local response = splash:http_post{"http://httpbin.org/post",    body=json.encode({name="Germey"}),headers={["content-type"]="application/json"}}return {html=treat.as_string(response.body),url=response.url,status=response.status}
    end
    

set_content()方法:设置页面的内容

function main(splash)assert(splash:set_content("<html><body><h1>hello</h1></body></html>"))return splash:png()
end

html()方法:获取网页源代码

获取https://httpbin.org/get的源代码

function main(splash, args)splash:go("https://httpbin.org/get")return splash:html()
end

png()方法:获取png格式的网页截图

function main(splash, args)splash:go("https://www.taobao.com")return splash:png()
end

jpeg()方法:获取jpng格式的网页截图

function main(splash, args)splash:go("https://www.taobao.com")return splash:jpeg()
end

har()方法:获取页面的加载过程

function main(splash, args)splash:go("https://www.baidu.com")return splash:har()
end

url()方法:获取当前页面正在访问的URL

function main(splash, args)splash:go("https://www.baidu.com")return splash:url()
end
// 输出:
https://www.baidu.com/

get_cookies()方法:获取当前页面的Cookies

function main(splash, args)splash:go("https://www.baidu.com")return splash:get_cookies()
end
// 输出:
0: Object
domain: ".baidu.com"
expires: "2087-08-08T12:53:28Z"
httpOnly: false
name: "BAIDUID"
path: "/"
secure: false
value: "B556658F0EAB497638556503063F6AEE:FG=1"
1: Object
domain: ".baidu.com"
expires: "2087-08-08T12:53:28Z"
httpOnly: false
name: "BIDUPSID"
path: "/"
secure: false
value: "B556658F0EAB497638556503063F6AEE"
2: Object
domain: ".baidu.com"
expires: "2087-08-08T12:53:28Z"
httpOnly: false
name: "PSTM"
path: "/"
secure: false
value: "1563701961"
3: Object
domain: ".baidu.com"
httpOnly: false
name: "delPer"
path: "/"
secure: false
value: "0"
4: Object
domain: "www.baidu.com"
httpOnly: false
name: "BD_HOME"
path: "/"
secure: false
value: "0"
5: Object
domain: ".baidu.com"
httpOnly: false
name: "H_PS_PSSID"
path: "/"
secure: false
value: "29547_1434_21089_18560_29522_29518_28518_29099_28833_29220_26350_29459"
6: Object
domain: "www.baidu.com"
expires: "2019-07-31T09:39:21Z"
httpOnly: false
name: "BD_UPN"
path: "/"
secure: false
value: "143354"

add_cookie()方法:为当前页面添加Cookie

  1. add_cookie()方法参数

    cookies = splash:add_cookie{name, value, path=nil, domain=nil, expires=nil, httpOnly=nil, secure=nil}
    
  2. add_cookie()方法例子

    function main(splash)splash:add_cookie{"sessionid", "237465ghgfsd", "/", domain="http://example.com"}splash:go("http://example.com/")return splash:html()
    end
    
    // 输出:
    <!DOCTYPE html><html><head><title>Example Domain</title><meta charset="utf-8"><meta http-equiv="Content-type" content="text/html; charset=utf-8"><meta name="viewport" content="width=device-width, initial-scale=1"><style type="text/css">body {background-color: #f0f0f2;margin: 0;padding: 0;font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;}div {width: 600px;margin: 5em auto;padding: 50px;background-color: #fff;border-radius: 1em;}a:link, a:visited {color: #38488f;text-decoration: none;}@media (max-width: 700px) {body {background-color: #fff;}div {width: auto;margin: 0 auto;border-radius: 0;padding: 1em;}}</style>
    </head><body>
    <div><h1>Example Domain</h1><p>This domain is established to be used for illustrative examples in documents. You may use thisdomain in examples without prior coordination or asking for permission.</p><p><a href="http://www.iana.org/domains/example">More information...</a></p>
    </div></body></html>
    

clear_cookies()方法:清除所有Cookies

function main(splash)splash:go("https://www.baidu.com/")splash:clear_cookies()return splash:get_cookies()
end
// 输出:
Array[0]

get_viewport_size()方法:获取页面的宽高

function main(splash)splash:go("https://www.baidu.com/")return splash:get_viewport_size()
end

set_viewport_size()方法:设置页面的宽高

  1. set_viewport_size()参数

    splash:set_viewport_size(width, height)
    
  2. set_viewport_size()方法例子

    function main(splash)splash:set_viewport_size(400, 700)assert(splash:go("http://cuiqingcai.com"))return splash:png()
    end
    

set_viewport_full()方法:浏览器全频显示

function main(splash)splash:set_viewport_full()assert(splash:go("http://cuiqingcai.com"))return splash:png()
end

set_user_agent()方法:设置浏览器的User_agent

function main(splash)splash:set_user_agent('Splash')splash:go("http://httpbin.org/get")return splash:html()
end
// 这里我们将浏览器的User-Agent设置为Splash

set_custom_headers()方法:设置请求头

function main(splash)splash:set_custom_headers({["User-Agent"] = "Splash",["Site"] = "Splash",})splash:go("http://httpbin.org/get")return splash:html()
end
// 输出:
<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"args": {}, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Encoding": "gzip, deflate", "Accept-Language": "en,*", "Host": "httpbin.org", "Site": "Splash", "User-Agent": "Splash"}, "origin": "120.239.195.171, 120.239.195.171", "url": "https://httpbin.org/get"
}
</pre></body></html>

select()方法:选中符合条件的第一个节点----参数为CSS选择器

function main(splash)splash:go("https://www.baidu.com/")input = splash:select("#kw")input:send_text('Splash')splash:wait(3)return splash:png()
end
// 首先访问了百度,然后选中了搜索框,随后调用了send_text()方法填写了文本,然后返回网页截图。

select_all()方法

选中符合条件的所有节点(参数为CSS选择器)

function main(splash)local treat = require('treat')assert(splash:go("http://quotes.toscrape.com/"))assert(splash:wait(0.5))local texts = splash:select_all('.quote .text')local results = {}for index, text in ipairs(texts) doresults[index] = text.node.innerHTMLendreturn treat.as_array(results)
end
// 输出:
Array[10]
0: "“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”"
1: "“It is our choices, Harry, that show what we truly are, far more than our abilities.”"
2: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
3: "“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”"
4: "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”"
5: "“Try not to become a man of success. Rather become a man of value.”"
6: "“It is better to be hated for what you are than to be loved for what you are not.”"
7: "“I have not failed. I've just found 10,000 ways that won't work.”"
8: "“A woman is like a tea bag; you never know how strong it is until it's in hot water.”"
9: "“A day without sunshine is like, you know, night.”"

mouse_click()方法

模拟鼠标点击操作,传入的参数为坐标值xy。此外,也可以直接选中某个节点,然后调用此方法

function main(splash)splash:go("https://www.baidu.com/")input = splash:select("#kw")input:send_text('Splash')submit = splash:select('#su')submit:mouse_click()splash:wait(3)return splash:png()
end
首先选中页面的输入框,输入了文本,然后选中“提交”按钮,调用了mouse_click()方法提交查询,然后页面等待三秒,返回截图。

本文作者:Lee Hua

Splash的简单使用相关推荐

  1. WinForm Splash的简单实现

    WinForm Splash的简单实现 从Form派生一个新类 设置一张图片到BackgroundImage属性. 重写OnPaintBackground 画上版本号 protected overri ...

  2. Splash的爬虫应用

    Splash的爬虫应用 Splash是一个JavaScript渲染服务,它是一个带有HTTP API的轻型Web浏览器.Python可以通过HTTP API调用Splash中的一些方法实现对页面的渲染 ...

  3. 2020年8个效率最高的爬虫框架

    一些较为高效的Python爬虫框架.分享给大家. 1.Scrapy Scrapy是一个为了爬取网站数据,提取结构性数据而编写的应用框架. 可以应用在包括数据挖掘,信息处理或存储历史数据等一系列的程序中 ...

  4. Python爬虫:常用的爬虫工具汇总

    按照网络爬虫的的思路: #mermaid-svg-YOkYst4FalQf6wUn {font-family:"trebuchet ms",verdana,arial,sans-s ...

  5. Qt for Android Splash启动页最简单延时关闭

    前言 随着 Qt 版本的更新,对移动端的开发接口也越来越多,这给 Qt 开发移动端提供了极大的便利,也越来越爱上了这种跨平台的开发.今天要讲的是关于 Qt for Android 启动页显示的问题,首 ...

  6. splash安装和简单使用

    需要:安装docker. 安装: docker pull scrapinghub/splash 运行: docker run -p 8050:8050 scrapinghub/splash 安装完的访 ...

  7. Android App启动图启动界面(Splash)的简单实现

    第一步:创建一个Activity 第二步:创建一个新的Activity 命名为Splash new -> Activity -> Empty Activity 第三步:将准备好的启动图片放 ...

  8. [转]WinForm下Splash(启动画面)制作

    本文转自:http://www.smartgz.com/blog/Article/1088.asp 原文如下: 本代码可以依据主程序加载进度来显示Splash.     static class Pr ...

  9. android os开机画面,Android简单实现启动画面的方法

    本文实例讲述了Android简单实现启动画面的方法.分享给大家供大家参考,具体如下: 核心代码: package com.demo.app; import android.app.Activity; ...

最新文章

  1. Java并发学习二:编译优化带来的有序性问题导致的并发Bug
  2. Winform中设置Dialog的显示位置居中
  3. 微信能远程控制电脑吗_牛皮!微信远程控制电脑这个神器太厉害了!
  4. Linux入门笔记——less
  5. openEuler Developer Day 启动大会招募环节,报名通道同步开启!
  6. cesium+ geoserverTerrainProvide+png展示3D高程图展示
  7. matlab2c使用c++实现matlab函数系列教程-histc函数
  8. css之div内部靠右
  9. Struts 2中文件上传
  10. PR视频剪辑软件教程
  11. secure CRT 信号灯超时时间已到
  12. sgu 309 Real Fun
  13. 中国十大会计师事务所排名公布!刚刚,中注协正式通知!
  14. ReactNative第三方组件库
  15. Linux系统分区备份工具,linux系统备份工具:clonezilla
  16. 模式识别基本概念小结(学习笔记)
  17. 焊接机器人VS传统焊接的优势
  18. 盘复分支语句和循环语句的那些知识
  19. 西瓜皮——被丢掉的真金白银,夏天的健康守护神
  20. db2advis DB2索引优化建议

热门文章

  1. linux命令:echo
  2. python 提取复杂 json 的数据
  3. openresty lua-resty-balancer动态负载均衡
  4. EasyPusher安卓直播推流到EasyDarwin开源流媒体服务器工程简析
  5. 用计算机弹seve,使用shell写简单的计算机
  6. 跨越速运 x DorisDB:统一查询引擎,强悍性能带来极速体验
  7. ThreadPool.QueueUserWorkItem的性能问题
  8. strdup和strcpy比较
  9. 氮化镓HUB,把氮化镓适配器于HUB合为一体,带笔记本出去只需带一个适配器就够了
  10. 移相功分器仿真及测试(二)