本文翻译自:Headless Browser and scraping - solutions [closed]

I'm trying to put list of possible solutions for browser automatic tests suits and headless browser platforms capable of scraping. 我正在尝试列出适用于浏览器自动测试套装和能够抓取的无头浏览器平台的可能解决方案。


  • Selenium - polyglot flagship in browser automation, bindings for Python, Ruby, JavaScript, C#, Haskell and more, IDE for Firefox (as an extension) for faster test deployment. Selenium-浏览器自动化中的多语种旗舰产品,适用于Python,Ruby,JavaScript,C#,Haskell等的绑定,适用于Firefox的IDE(作为扩展),可加快测试部署。 Can act as a Server and has tons of features. 可以充当服务器并具有大量功能。


  • PhantomJS - JavaScript , headless testing with screen capture and automation, uses Webkit . PhantomJS - JavaScript使用Webkit进行无头测试,包括屏幕捕获和自动化。 As of version 1.8 Selenium's WebDriver API is implemented, so you can use any WebDriver binding and tests will be compatible with Selenium 从1.8版开始,Selenium的WebDriver API已实现,因此您可以使用任何WebDriver绑定,并且测试将与Selenium兼容
  • SlimerJS - similar to PhantomJS, uses Gecko (Firefox) instead of WebKit SlimerJS-与PhantomJS相似,使用Gecko (Firefox)代替WebKit
  • CasperJS - JavaScript , build on both PhantomJS and SlimerJS, has extra features CasperJS-基于PhantomJS和SlimerJS的JavaScript具有附加功能
  • Ghost Driver - JavaScript implementation of the WebDriver Wire Protocol for PhantomJS . Ghost Driver - PhantomJSWebDriver Wire协议的 JavaScript实现。
  • new PhantomCSS - CSS regression testing. 新的 PhantomCSS -CSS回归测试。 A CasperJS module for automating visual regression testing with PhantomJS and Resemble.js . 一个CasperJS模块,用于使用PhantomJS和Resemble.js自动化视觉回归测试。
  • new WebdriverCSS - plugin for Webdriver.io for automating visual regression testing WebdriverCSS -插件为Webdriver.io自动化视觉回归测试
  • new PhantomFlow - Describe and visualize user flows through tests. 新的 PhantomFlow-描述和可视化通过测试的用户流程。 An experimental approach to Web user interface testing. Web用户界面测试的实验方法。
  • new trifleJS - ports the PhantomJS API to use the Internet Explorer engine. 新的 trifleJS-移植 PhantomJS API以使用Internet Explorer引擎。
  • new CasperJS IDE (commercial) 新的 CasperJS IDE (商业)


  • Node-phantom - bridges the gap between PhantomJS and node.js Node- phantom-弥合PhantomJSnode.js之间的鸿沟
  • WebDriverJs - Selenium WebDriver bindings for node.js by Selenium Team WebDriverJs -Selenium Team对node.js的Selenium WebDriver绑定
  • WD.js - node module for WebDriver/Selenium 2 WD.js -WebDriver / Selenium 2的节点模块
  • yiewd - WD.js wrapper using latest Harmony generators! yiewd - WD.js包装采用最新和谐发电机! Get rid of the callback pyramid with yield yield摆脱回调金字塔
  • ZombieJs - Insanely fast, headless full-stack testing using node.js ZombieJs-使用node.js进行快速,无头的全栈测试
  • NightwatchJs - Node JS based testing solution using Selenium Webdriver NightwatchJs-使用Selenium Webdriver的基于Node JS的测试解决方案
  • Chimera - Chimera: can do everything what phantomJS does, but in a full JS environment Chimera -Chimera:可以完成phantomJS的所有工作,但要在完整的JS环境中
  • Dalek.js - Automated cross browser testing with JavaScript through Selenium Webdriver Dalek.js-通过Selenium Webdriver使用JavaScript自动进行跨浏览器测试
  • Webdriver.io - better implementation of WebDriver bindings with predefined 50+ actions Webdriver.io-通过预定义的50多个操作更好地实现WebDriver绑定
  • Nightmare - Electron bridge with a high-level API. 噩梦 -具有高级API的电子桥。
  • jsdom - Tailored towards web scraping. jsdom-专门针对网络抓取。 A very lightweight DOM implemented in Node.js, it supports pages with javascript. Node.js中实现的一种非常轻量级的DOM,它支持带有javascript的页面。
  • new Puppeteer - Node library which provides a high-level API to control Chrome or Chromium. 新的 Puppeteer-节点库,提供了用于控制Chrome或Chromium的高级API。 Puppeteer runs headless by default. 默认情况下,Puppeteer无头运行。


  • Scrapy - Python , mainly a scraper/miner - fast, well documented and, can be linked with Django Dynamic Scraper for nice mining deployments, or Scrapy Cloud for PaaS (server-less) deployment, works in terminal or an server stand-alone proces, can be used with Celery , built on top of Twisted Scrapy - Python ,主要是一个scraper / miner-快速,完善的文档说明,可以与Django Dynamic Sc​​raper链接以进行良好的挖掘部署,或与Scrapy Cloud进行PaaS(无服务器)部署相结合,可在终端或服务器独立程序中运行,可与Twisted顶部的Celery一起使用
  • Snailer - node.js module, untested yet. Snailer - node.js模块,未经测试。
  • Node-Crawler - node.js module, untested yet. Node-Crawler - node.js模块,尚未测试。


  • new Web Scraping Language - Simple syntax to crawl the web 新的 Web爬网语言 -用于爬网的简单语法

  • new Online HTTP client - Dedicated SO answer 新的 在线HTTP客户端 -专用SO答案

  • dead CasperBox - Run CasperJS scripts online 死的 CasperBox-在线运行CasperJS脚本


  • Comparsion of Webscraping software Web抓取软件的比较
  • new Resemble.js : Image analysis and comparison 新的 Resemble.js :图像分析和比较

Questions: 问题:

  • Any pure Node.js solution or Nodejs to PhanthomJS/CasperJS module that actually works and is documented? 是否有任何纯粹的Node.js解决方案或PhanthomJS / CasperJS模块的Nodejs实际上有效并已记录在案?

Answer: Chimera seems to go in that direction, checkout Chimera 答: Chimera似乎朝这个方向发展,结帐Chimera

  • Other solutions capable of easier JavaScript injection than Selenium? 其他比Selenium更容易注入JavaScript的解决方案?

  • Do you know any pure ruby solutions? 您知道任何纯红宝石解决方案吗?

Answer: Checkout the list created by rjk with ruby based solutions 答案:检出由rjk使用基于ruby的解决方案创建的列表

  • Do you know any related tech or solution? 您知道任何相关技术或解决方案吗?

Feel free to edit this question and add content as you wish! 随意编辑此问题并根据需要添加内容! Thank you for your contributions! 感谢您的贡献!




If Ruby is your thing, you may also try: 如果您是Ruby,也可以尝试:

  • https://github.com/chriskite/anemone (dev stopped) https://github.com/chriskite/anemone (开发已停止)
  • https://github.com/sparklemotion/mechanize https://github.com/sparklemotion/mechanize
  • https://github.com/postmodern/spidr https://github.com/postmodern/spidr
  • https://github.com/stewartmckee/cobweb https://github.com/stewartmckee/cobweb
  • http://watirwebdriver.com/ (Selenium) http://watirwebdriver.com/ (硒)

also, Nokogiri gem can be used for scraping: 此外,Nokogiri宝石还可用于刮擦:

  • http://nokogiri.org/ http://nokogiri.org/

there is a dedicated book about how to utilise nokogiri for scraping by packt publishing 有专门的书,介绍如何利用nokogiri通过packt出版进行抓取


A kind of JS-based Selenium is Dalek.js . Dalek.js是一种基于JS的Selenium。 It not only aims for automated frontend-tests, you can also do screenshots with it. 它不仅旨在进行自动化的前端测试,还可以使用它进行屏幕截图。 It has webdrivers for all important browsers. 它具有适用于所有重要浏览器的Web驱动程序。 Unfortunately those webdrivers seem to be worth improving (just not to say "buggy" to Firefox). 不幸的是,那些网络驱动程序似乎值得改进(只是对Firefox而言不是“笨拙”)。




