This document describes how to use LuaWebDriver step by step. If you don't install LuaWebDriver yet, install LuaWebDriver before you read this document.
You can use Session:navigate_to
to visit a specific website with the web browser.
First of all, you make a callback function for visit to a website.
You specify the URL as the argument of Session:navigate_to
Second, you specify your callback as the argument of Firefox:start_session
and call Firefox:start_session
.
the session is destroyed auto after calling your callback.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL = "https://clear-code.gitlab.io/lua-web-driver/sample/"
-- Make your callback and start session
driver:start_session(function(session)
session:navigate_to(URL)
end)
You can use Session:xml
to serialize a website as XML.
First of all, you visit to a website to serialize as below example.
Second, you call Session:xml
.
Then you can serialize a current website as XML. You can use this XML as Lua's string.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/"
-- Make your callback and start session
driver:start_session(function(session)
session:navigate_to(URL)
-- Serialize a current website as XML.
local xml = session:xml()
print(xml)
end)
You can use Session:save_screenshot
to save a screenshot of current website.
The screenshot is saved in PNG format.
First of all, you visit to a website to save a screenshot as below example.
second, you call Session:save_screenshot
.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/"
driver:start_session(function(session)
session:navigate_to(URL)
-- Save screenshot in PNG format
session:save_screenshot("sample.png")
end)
You can move on a website with below features.
In this example take turns at login and link click, get a text.
First of all, you visit a target website.
Second, you input user name and password, because of this the web site needs authentication.
you can input user name and password with Session:css_select
and ElementSet:send_keys
."
you get element object for inputting user name and password with Session:css_select
.
In this example get element object with the CSS selector, however, you can also get it using the XPath with Session:xpath_search
.
you call ElementSet:send_keys
of acquired elementset object.
You specify input string as the argument of ElementSet:send_keys
.
Third, you push login button with Session:css_select
and ElementSet:click
.
Fourth, you click link on website in after login with Session:link_search
and ElementSet:click
.
Fifth, you get text of specific element in after moved web site with ElementSet:text
.
You get element object for getting text with Session:css_select
.
you call ElementSet:text
of acquired elementset object.
You can use acquired value of the text as Lua's string.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/move.html"
-- Make your callback and start session
driver:start_session(function(session)
session:navigate_to(URL)
-- Get forms in a website
local form = session:css_select('form')
-- Get form for inputting username
local text_form = form:css_select('input[name=username]')
-- Input username to form
text_form:send_keys("username")
-- Get form for inputting password
local password_form = form:css_select('input[name=password]')
-- Input password to form
password_form:send_keys("password")
-- Get button for submitting username and password
local button = form:css_select("input[type=submit]")
-- Submit username and password
button:click()
-- Get element object for link operating
local link = session:link_search ("1")
-- Click the link
link:click()
local elements = session:css_select("p")
-- Get text of acquired element
print(elements:text())
end)
You can use Session:css_select
and ElementSet:click
to button operation on a specific form.
First of all, you visit a website to button operation as below example.
Second, you get element object for button operating with Session:css_select
."
In this example get element object with the CSS selector, however, you can also get it using the XPath.
Third, you call ElementSet:click
of acquired element object.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/button.html"
-- Make your callback and start session
driver:start_session(function(session)
session:navigate_to(URL)
-- Get elementset object for button operating
local elements = session:css_select('#announcement')
-- Click the acquired button object
elements:click()
--Get text of specific element in after moved web site
elements = session:css_select('a[name=announcement]')
local informations_summary = elements:texts()
for _, summary in ipairs(informations_summary) do
print(summary)
end
end)
You can use ElementSet:send_keys
to input string into specific a form.
First of all, you visit a website to input string into a form.
Second, you get element object for inputting string with Session:css_select
."
In this example get element object with the CSS selector, however, you can also get it using the XPath.
Third, you call ElementSet:send_keys
of acquired element object.
You specify input string as the argument of ElementSet:send_keys
.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/index.html"
-- Make your callback and start session
driver:start_session(function(session)
session:navigate_to(URL)
-- Get elementset object for inputting string
local elements = session:css_select('input[name=name]')
-- Input string to form
elements:send_keys("This is test")
print(elements[1].value)
end)
You can use Element:get_attribute
to get attribute of specific element.
First of all, you get element object for getting attribute with Session:css_select
.
Second, you call Element:get_attribute
of acquired element object.
You specify attribute name as the argument of Element:get_attribute
.
You can use acquired value of the attribute as Lua's string.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/get-attribute.html"
-- Make your callback and start session
driver:start_session(function(session)
session:navigate_to(URL)
-- Get elementset object for getting attribute
local elements = session:css_select('p')
for _, element in ipairs(elements) do
-- Get attribute of acquired element
if element["data-value-type"] == "number" then
print(element:text())
end
end
end)
You can use ElementSet:text
to get text of sepecific element.
First of all, you get element object for getting text with Session:css_select
.
Second, you call ElementSet:text
of acquired element object.
You can use acquired value of the test as Lua's string.
Example:
local web_driver = require("web-driver")
local driver = web_driver.Firefox.new()
local URL = "https://clear-code.gitlab.io/lua-web-driver/sample/"
-- Make your callback and start session
driver:start_session(function(session)
session:navigate_to(URL)
-- Get elementset object for getting text
local element_set = session:css_select('#p2')
-- Get text of acquired element
local text = element_set:text()
print(text)
end)
You can customize a user agent of a web browser by option of web-driver.Firefox.new()
.
For example, this feature useful when crawling websites for a smartphone.
First of all, you set user agent to options.preferences
as string.
Second, you set the options
to argument of web-driver.Firefox.new()
and call.
Here is an example customizing user agent to iPhone's user agent.
Example:
local web_driver = require("web-driver")
local user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X)"..
" "..
"AppleWebKit/602.3.12 (KHTML, like Gecko)"..
" "..
"Version/10.0 Mobile/14C92 Safari/602.1"
local options = {
preferences = {
["general.useragent.override"] = user_agent,
}
}
local driver = web_driver.Firefox.new(options)
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/"
driver:start_session(function(session)
session:navigate_to(URL)
print(session:request_headers()["User-Agent"])
-- Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko) Version/10.0 Mobile/14C92 Safari/602.1
end)
If you use LuaWebDriver with multi-thread, you can customize a user agent by setting options.preferences
to argument of web-driver.ThreadPool.new()
as below example.
local web_driver = require("web-driver")
local log = require("log")
local url =
"https://clear-code.gitlab.io/lua-web-driver/sample/"
local log_level = "info"
local n_threads = 2
local logger = log.new(log_level)
local function crawler(context)
local logger = context.logger
local session = context.session
local url = context.job
local prefix = url:match("^https?://[^/]+/")
logger:debug("Opening...: " .. url)
session:navigate_to(url)
local status_code = session:status_code()
if status_code and status_code ~= 200 then
logger:notice(string.format("%s: Error: %d",
url,
status_code))
return
end
logger:notice(string.format("%s: Title: %s",
url,
session:title()))
local anchors = session:css_select("a")
local anchor
for _, anchor in pairs(anchors) do
local href = anchor.href
local normalized_href = href:gsub("#.*$", "")
logger:notice(string.format("%s: Link: %s (%s): %s",
url,
href,
normalized_href,
anchor:text()))
if normalized_href:sub(1, #prefix) == prefix then
context.job_pusher:push(normalized_href)
end
end
end
local user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X)"..
" "..
"AppleWebKit/602.3.12 (KHTML, like Gecko)"..
" "..
"Version/10.0 Mobile/14C92 Safari/602.1"
local options = {
logger = logger,
size = n_threads,
firefox_options = {
preferences = {
["general.useragent.override"] = user_agent,
},
}
}
local pool = web_driver.ThreadPool.new(crawler, options)
logger.debug("Start crawling: " .. url)
pool:push(url)
pool:join()
logger.debug("Done crawling: " .. url)
LuaWebDriver has used lua-log
to the logger.
You can use the same logger object as the caller by making the logger object at the caller and passing the logger object to as an argument web-driver.Firefox.new()
.
You can use log level below.
The above log level specifies as a string.
Example:
local web_driver = require("web-driver")
local log = require("log")
local logger = log.new("trace")
local options = { logger = logger }
local driver = web_driver.Firefox.new(options)
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/"
driver:start_session(function(session)
session:navigate_to(URL)
end)
If you use LuaWebDriver with multi-thread, pass the logger object to as an argument of a web-driver.ThreadPool.new()
.
Example:
local web_driver = require("web-driver")
local log = require("log")
local url =
"https://clear-code.gitlab.io/lua-web-driver/sample/"
local log_level = "trace"
local n_threads = 2
local logger = log.new(log_level)
local function crawler(context)
local logger = context.logger
local session = context.session
local url = context.job
local prefix = url:match("^https?://[^/]+/")
logger:debug("Opening...: " .. url)
session:navigate_to(url)
local status_code = session:status_code()
if status_code and status_code ~= 200 then
logger:notice(string.format("%s: Error: %d",
url,
status_code))
return
end
logger:notice(string.format("%s: Title: %s",
url,
session:title()))
local anchors = session:css_select("a")
local anchor
for _, anchor in pairs(anchors) do
local href = anchor.href
local normalized_href = href:gsub("#.*$", "")
logger:notice(string.format("%s: Link: %s (%s): %s",
url,
href,
normalized_href,
anchor:text()))
if normalized_href:sub(1, #prefix) == prefix then
context.job_pusher:push(normalized_href)
end
end
end
local options = {
logger = logger,
size = n_threads,
}
local pool = web_driver.ThreadPool.new(crawler, options)
logger.debug("Start crawling: " .. url)
pool:push(url)
pool:join()
logger.debug("Done crawling: " .. url)
You can also set log level with environment value as below.
Example:
export LUA_WEB_DRIVER_LOG_LEVEL="trace"
If you are not set logger object and environment value, LuaWebDriver output the Firefox's log and geckodriver's log with "info" level.
You can use LuaWebDriver with multiple threads. You need use web-driver.ThreadPool
object for using LuaWebDriver with multiple threads as below.
Here is an example crawl on web pages with a URL given to argument of web-driver.ThreadPool:push()
as the start point.
Example:
local web_driver = require("web-driver")
local log = require("log")
local URL =
"https://clear-code.gitlab.io/lua-web-driver/sample/"
local log_level = "notice"
local logger = log.new(log_level)
local function crawler(context)
local web_driver = require("web-driver")
local logger = context.logger
local session = context.session
local url = context.job
local prefix = url:match("^https?://[^/]+/")
logger:debug("Opening...: " .. url)
session:navigate_to(url)
logger:notice(string.format("%s: Title: %s",
url,
session:title()))
local anchors = session:css_select("a")
local anchor
for _, anchor in pairs(anchors) do
local href = anchor.href
local normalized_href = href:gsub("#.*$", "")
logger:notice(string.format("%s: Link: %s (%s): %s",
url,
href,
normalized_href,
anchor:text()))
if normalized_href:sub(1, #prefix) == prefix then
context.job_pusher:push(normalized_href)
end
end
end
local pool = web_driver.ThreadPool.new(crawler, {logger = logger})
logger.debug("Start crawling: " .. URL)
pool:push(URL)
pool:join()
logger.debug("Done crawling: " .. URL)
You can write the processing you want to execute in the function given to argument of web-driver.ThreadPool.new()
.
By executing web-driver.JobPusher:push()
(web-driver.JobPusher:push()
is context.job_pusher:push()
in the above example) in a function given to argument of web-driver.ThreadPool.new()
, the idle thread executes job one by one.
Number of argument of a function given to argument of web-driver.ThreadPool.new()
is one. (The function given to argument of web-driver.ThreadPool.new()
is crawler
in the above example.)
This argument has all informations for crawl on web pages. (The argument is context
in the above example.)
If you register the same job, LuaWebDriver ignores the same job by default.
A job only recives the string. We suggest give URL to the job.
A failed job retry automatically. A Number of retries are three by default. If a job failed beyond the number of retries, LuaWebDriver deletes it.
You can also specify the number of retries as an argument of web-driver.ThreadPool.new()
as below.
Example:
local pool = web_driver.ThreadPool.new(crawler, {max_n_failures = 5})
Some notes as below for use LuaWebDriver with multiple threads
libpthread.so
to LD_PRELOAD
.A function given to argument of web-driver.ThreadPool.new()
must not reference information of external of one.
luajit
process in the middle, the job executes from the beginning. A check of a duplicate job is reset also.web-driver.ThreadPool:join()
.Now, you knew all major LuaWebDriver features! If you want to understand each feature, see reference manual for each feature.