Tutorial

This document describes how to use LuaWebDriver step by step. If you don't install LuaWebDriver yet, install LuaWebDriver before you read this document.

Visit to a website

You can use Session:navigate_to to visit a specific website with the web browser.

First of all, you make a callback function for visit to a website. You specify the URL as the argument of Session:navigate_to

Second, you specify your callback as the argument of Firefox:start_session and call Firefox:start_session. the session is destroyed auto after calling your callback.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL = "https://clear-code.gitlab.io/lua-web-driver/sample/"

-- Make your callback and start session
driver:start_session(function(session)
  session:navigate_to(URL)
end)

Serialize to website

You can use Session:xml to serialize a website as XML.

First of all, you visit to a website to serialize as below example.

Second, you call Session:xml.

Then you can serialize a current website as XML. You can use this XML as Lua's string.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/"

-- Make your callback and start session
driver:start_session(function(session)
  session:navigate_to(URL)
-- Serialize a current website as XML.
  local xml = session:xml()
  print(xml)
end)

Save a screenshot

You can use Session:save_screenshot to save a screenshot of current website. The screenshot is saved in PNG format.

First of all, you visit to a website to save a screenshot as below example.

second, you call Session:save_screenshot.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/"

driver:start_session(function(session)
  session:navigate_to(URL)
-- Save screenshot in PNG format
  session:save_screenshot("sample.png")
end)

Move on website

You can move on a website with below features.

In this example take turns at login and link click, get a text.

First of all, you visit a target website.

Second, you input user name and password, because of this the web site needs authentication. you can input user name and password with Session:css_select and ElementSet:send_keys."

you get element object for inputting user name and password with Session:css_select. In this example get element object with the CSS selector, however, you can also get it using the XPath with Session:xpath_search.

you call ElementSet:send_keys of acquired elementset object. You specify input string as the argument of ElementSet:send_keys.

Third, you push login button with Session:css_select and ElementSet:click.

Fourth, you click link on website in after login with Session:link_search and ElementSet:click.

Fifth, you get text of specific element in after moved web site with ElementSet:text. You get element object for getting text with Session:css_select. you call ElementSet:text of acquired elementset object. You can use acquired value of the text as Lua's string.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/move.html"

-- Make your callback and start session
driver:start_session(function(session)
  session:navigate_to(URL)

-- Get forms in a website
  local form = session:css_select('form')
-- Get form for inputting username
  local text_form = form:css_select('input[name=username]')
-- Input username to form
  text_form:send_keys("username")
-- Get form for inputting password
  local password_form = form:css_select('input[name=password]')
-- Input password to form
  password_form:send_keys("password")

-- Get button for submitting username and password
  local button = form:css_select("input[type=submit]")
-- Submit username and password
  button:click()

-- Get element object for link operating
  local link = session:link_search ("1")
-- Click the link
  link:click()
  local elements = session:css_select("p")
-- Get text of acquired element
  print(elements:text())
end)

Button operation on a specific form

You can use Session:css_select and ElementSet:click to button operation on a specific form.

First of all, you visit a website to button operation as below example.

Second, you get element object for button operating with Session:css_select." In this example get element object with the CSS selector, however, you can also get it using the XPath.

Third, you call ElementSet:click of acquired element object.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/button.html"

-- Make your callback and start session
driver:start_session(function(session)
  session:navigate_to(URL)
-- Get elementset object for button operating
  local elements = session:css_select('#announcement')
-- Click the acquired button object
  elements:click()

--Get text of specific element in after moved web site
  elements = session:css_select('a[name=announcement]')
  local informations_summary = elements:texts()
  for _, summary in ipairs(informations_summary) do
    print(summary)
  end
end)

Input string into specific a form

You can use ElementSet:send_keys to input string into specific a form.

First of all, you visit a website to input string into a form.

Second, you get element object for inputting string with Session:css_select." In this example get element object with the CSS selector, however, you can also get it using the XPath.

Third, you call ElementSet:send_keys of acquired element object. You specify input string as the argument of ElementSet:send_keys.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/index.html"

-- Make your callback and start session
driver:start_session(function(session)
  session:navigate_to(URL)
-- Get elementset object for inputting string
  local elements = session:css_select('input[name=name]')
-- Input string to form
  elements:send_keys("This is test")
  print(elements[1].value)
end)

Get attribute of element

You can use Element:get_attribute to get attribute of specific element.

First of all, you get element object for getting attribute with Session:css_select.

Second, you call Element:get_attribute of acquired element object. You specify attribute name as the argument of Element:get_attribute. You can use acquired value of the attribute as Lua's string.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/get-attribute.html"

-- Make your callback and start session
driver:start_session(function(session)
  session:navigate_to(URL)
-- Get elementset object for getting attribute
  local elements = session:css_select('p')
  for _, element in ipairs(elements) do
-- Get attribute of acquired element
    if element["data-value-type"] == "number" then
      print(element:text())
    end
  end
end)

Get text of element

You can use ElementSet:text to get text of sepecific element.

First of all, you get element object for getting text with Session:css_select.

Second, you call ElementSet:text of acquired element object. You can use acquired value of the test as Lua's string.

Example:

local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()

local URL = "https://clear-code.gitlab.io/lua-web-driver/sample/"

-- Make your callback and start session
driver:start_session(function(session)
  session:navigate_to(URL)
-- Get elementset object for getting text
  local element_set = session:css_select('#p2')
-- Get text of acquired element
  local text = element_set:text()
  print(text)
end)

Customize of a user agent

You can customize a user agent of a web browser by option of web-driver.Firefox.new(). For example, this feature useful when crawling websites for a smartphone.

First of all, you set user agent to options.preferences as string.

Second, you set the options to argument of web-driver.Firefox.new() and call.

Here is an example customizing user agent to iPhone's user agent.

Example:

local web_driver = require("web-driver")

local user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X)"..
                   " "..
                   "AppleWebKit/602.3.12 (KHTML, like Gecko)"..
                   " "..
                   "Version/10.0 Mobile/14C92 Safari/602.1"
local options = {
  preferences = {
    ["general.useragent.override"] = user_agent,
  }
}
local driver = web_driver.Firefox.new(options)

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/"

driver:start_session(function(session)
  session:navigate_to(URL)
  print(session:request_headers()["User-Agent"])
  -- Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko) Version/10.0 Mobile/14C92 Safari/602.1
end)

If you use LuaWebDriver with multi-thread, you can customize a user agent by setting options.preferences to argument of web-driver.ThreadPool.new() as below example.

local web_driver = require("web-driver")
local log = require("log")

local url =
  "https://clear-code.gitlab.io/lua-web-driver/sample/"
local log_level = "info"
local n_threads = 2

local logger = log.new(log_level)
local function crawler(context)
  local logger = context.logger
  local session = context.session
  local url = context.job
  local prefix = url:match("^https?://[^/]+/")
  logger:debug("Opening...: " .. url)
  session:navigate_to(url)
  local status_code = session:status_code()
  if status_code and status_code ~= 200 then
    logger:notice(string.format("%s: Error: %d",
                                url,
                                status_code))
    return
  end
  logger:notice(string.format("%s: Title: %s",
                              url,
                              session:title()))
  local anchors = session:css_select("a")
  local anchor
  for _, anchor in pairs(anchors) do
    local href = anchor.href
    local normalized_href = href:gsub("#.*$", "")
    logger:notice(string.format("%s: Link: %s (%s): %s",
                                url,
                                href,
                                normalized_href,
                                anchor:text()))
    if normalized_href:sub(1, #prefix) == prefix then
      context.job_pusher:push(normalized_href)
    end
  end
end

local user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X)"..
                   " "..
                   "AppleWebKit/602.3.12 (KHTML, like Gecko)"..
                   " "..
                   "Version/10.0 Mobile/14C92 Safari/602.1"
local options = {
  logger = logger,
  size = n_threads,
  firefox_options = {
    preferences = {
      ["general.useragent.override"] = user_agent,
    },
  }
}
local pool = web_driver.ThreadPool.new(crawler, options)
logger.debug("Start crawling: " .. url)
pool:push(url)
pool:join()
logger.debug("Done crawling: " .. url)

Logger

LuaWebDriver has used lua-log to the logger.

You can use the same logger object as the caller by making the logger object at the caller and passing the logger object to as an argument web-driver.Firefox.new().

You can use log level below.

The above log level specifies as a string.

Example:

local web_driver = require("web-driver")
local log = require("log")


local logger = log.new("trace")
local options = { logger = logger }
local driver = web_driver.Firefox.new(options)

local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/"

driver:start_session(function(session)
  session:navigate_to(URL)
end)

If you use LuaWebDriver with multi-thread, pass the logger object to as an argument of a web-driver.ThreadPool.new().

Example:

local web_driver = require("web-driver")
local log = require("log")

local url =
  "https://clear-code.gitlab.io/lua-web-driver/sample/"
local log_level = "trace"
local n_threads = 2

local logger = log.new(log_level)
local function crawler(context)
  local logger = context.logger
  local session = context.session
  local url = context.job
  local prefix = url:match("^https?://[^/]+/")
  logger:debug("Opening...: " .. url)
  session:navigate_to(url)
  local status_code = session:status_code()
  if status_code and status_code ~= 200 then
    logger:notice(string.format("%s: Error: %d",
                                url,
                                status_code))
    return
  end
  logger:notice(string.format("%s: Title: %s",
                              url,
                              session:title()))
  local anchors = session:css_select("a")
  local anchor
  for _, anchor in pairs(anchors) do
    local href = anchor.href
    local normalized_href = href:gsub("#.*$", "")
    logger:notice(string.format("%s: Link: %s (%s): %s",
                                url,
                                href,
                                normalized_href,
                                anchor:text()))
    if normalized_href:sub(1, #prefix) == prefix then
      context.job_pusher:push(normalized_href)
    end
  end
end
local options = {
  logger = logger,
  size = n_threads,
}
local pool = web_driver.ThreadPool.new(crawler, options)
logger.debug("Start crawling: " .. url)
pool:push(url)
pool:join()
logger.debug("Done crawling: " .. url)

You can also set log level with environment value as below.

Example:

export LUA_WEB_DRIVER_LOG_LEVEL="trace"

If you are not set logger object and environment value, LuaWebDriver output the Firefox's log and geckodriver's log with "info" level.

Multithread

You can use LuaWebDriver with multiple threads. You need use web-driver.ThreadPool object for using LuaWebDriver with multiple threads as below.

Here is an example crawl on web pages with a URL given to argument of web-driver.ThreadPool:push() as the start point.

Example:

local web_driver = require("web-driver")
local log = require("log")


local URL =
  "https://clear-code.gitlab.io/lua-web-driver/sample/"

local log_level = "notice"

local logger = log.new(log_level)
local function crawler(context)
  local web_driver = require("web-driver")
  local logger = context.logger
  local session = context.session
  local url = context.job
  local prefix = url:match("^https?://[^/]+/")
  logger:debug("Opening...: " .. url)
  session:navigate_to(url)
  logger:notice(string.format("%s: Title: %s",
                              url,
                              session:title()))
  local anchors = session:css_select("a")
  local anchor
  for _, anchor in pairs(anchors) do
    local href = anchor.href
    local normalized_href = href:gsub("#.*$", "")
    logger:notice(string.format("%s: Link: %s (%s): %s",
                                url,
                                href,
                                normalized_href,
                                anchor:text()))
    if normalized_href:sub(1, #prefix) == prefix then
      context.job_pusher:push(normalized_href)
    end
  end
end
local pool = web_driver.ThreadPool.new(crawler, {logger = logger})
logger.debug("Start crawling: " .. URL)
pool:push(URL)
pool:join()
logger.debug("Done crawling: " .. URL)

You can write the processing you want to execute in the function given to argument of web-driver.ThreadPool.new().

By executing web-driver.JobPusher:push() (web-driver.JobPusher:push() is context.job_pusher:push() in the above example) in a function given to argument of web-driver.ThreadPool.new(), the idle thread executes job one by one.

Number of argument of a function given to argument of web-driver.ThreadPool.new() is one. (The function given to argument of web-driver.ThreadPool.new() is crawler in the above example.) This argument has all informations for crawl on web pages. (The argument is context in the above example.)

If you register the same job, LuaWebDriver ignores the same job by default.

A job only recives the string. We suggest give URL to the job.

A failed job retry automatically. A Number of retries are three by default. If a job failed beyond the number of retries, LuaWebDriver deletes it.

You can also specify the number of retries as an argument of web-driver.ThreadPool.new() as below.

Example:

local pool = web_driver.ThreadPool.new(crawler, {max_n_failures = 5})

Some notes as below for use LuaWebDriver with multiple threads

Next step

Now, you knew all major LuaWebDriver features! If you want to understand each feature, see reference manual for each feature.