Automating the Web with Selenium: Complete Tasks Automatically and Write Test Cases!

While teaching Software Engineering I during the Winter 2013 term, I learned of a web testing suite named Selenium.

I was on the lookout for a good unit testing suite for Javascript. I had previously been introduced to the YUI Testing Framework, which provides a console and enables you to easily write and run tests from a browser window, but one limitation of YUI is that, out of the box, it doesn’t support interaction with the site itself. So, while the basic usage is good for verifying libraries and similar, I wanted something that was able to interact with page elements in a different way.

Enter Selenium. Now, here’s a way to not only automate web interactions, but to make them into automated unit tests for my research projects. This post is going to serve as a brief introduction to Selenium and how to start using it to interact with websites.

Selenium

Selenium is a web automation framework that enables a user to essentially script a web site. In this post we’ll talk about using Selenium version 2, which is the main, up-to-date version of the software.

There are two main ways for you to interact with a website.

The Selenium IDE: Recording actions by Demonstration

First, you can use the Selenium IDE, which is a Firefox plugin that allows you to demonstrate interactions with a web site. You essentially record actions and then as you click around the page, type in form elements, and push buttons, the IDE records the actions for you in a pseudo-markup language. (If you happen to follow my other research, it is actually somewhat similar to CoScripter, another program-by-demonstration tool for the web, but with better web interaction and fewer features. For example, Selenium cannot store temporary data into tables.)

Thus, even with minimal web development or programming experience, you could create a script for Selenium that plays back, for example, a series of clicks on a page, completes a survey, or similar.

There are some limitations with the IDE. The main one that led me to using the WebDriver (below) is that it doesn’t interact very well with “contentEditable” div tags and other HTML5 elements.

Note that the Selenium IDE is version 1.x, which does NOT correspond to Selenium itself (which is version 2.x).

The Selenium WebDriver: Programming actions in code

The second way to interact with a website is using the Selenium Webdriver, which is a driver that essentially launches a website and then enables you to look through that website’s DOM to interact with elements on the page. Thus, you can use your own browser, like Firefox or Internet Explorer to load and navigate a web site.

I wanted Selenium to be able to work with a highly interactive web app: Gidget. Gidget is a programming game for kids and teenagers that I am working on in collaboration with Andy Ko and Michael Lee. Gidget runs with a lot of HTML5, JQuery, and Javascript and is extremely visual – something that seems perfect for Selenium.

Unfortunately, since Gidget isn’t available publicly yet, I can’t actually put tests on the blog that’ll directly run Gidget so instead I’ll use Google.

Building a Selenium Script

Selenium has a number of bindings in Java, Ruby, Python, and Javascript. I personally chose to use Python – I like its concise syntax and the fact that it has pretty good library support. I’ll focus exclusively on Python in this particular post. Most of the Java bindings can be derived from the Python commands if you remove the underscores and instead use CamelCase – for example, “element.is_enabled()” in Python would be “element.isEnabled()” in Java.

To install Selenium, you generally need to only type

pip install selenium

assuming that Python exists already on your system.

The first place I’m going to point you at is the Selenium Documentation Example of WebDriver, where they already include an example of making a query on Google. Here’s a reproduction of the code.

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# go to the google home page
driver.get("http://www.google.com")

# find the element that's name attribute is q (the google search box)
inputElement = driver.find_element_by_name("q")

# type in the search
inputElement.send_keys("Cheese!")

# submit the form (although google automatically searches now without submitting)
inputElement.submit()

# the page is ajaxy so the title is originally this:
print driver.title

try:
    # we have to wait for the page to refresh, the last thing that seems to be updated is the title
    WebDriverWait(driver, 10).until(EC.title_contains("cheese!"))

    # You should see "cheese! - Google Search"
    print driver.title

finally:
    driver.quit()

It works exactly as advertised and opens up a web browser, goes to Google, and then searches for “cheese!”. However, there are a number of features that are essential to actually testing web pages.

Waiting for Pages

The unfortunate reality of web pages nowadays is that you have to wait a lot. Whether it’s a form submit where you have to wait for the POST request to complete, or some really slow JQuery fading box, not everything you want to interact with is available. To get around this, you have to use the “wait” commands in Selenium.

The Selenium documents do provide a few examples but I had to do a lot of searching and testing to get things working so I’ll just provide my use cases here directly.

To wait for an element to appear in Selenium, you need to provide an explicit wait along with a condition. It basically waits until either the identified element loads, or until the timeout passes (at which point it will throw an exception). There is an example of that in the Selenium example above, but if you have Google Instant turned on, you’ll realise notice that Google now returns search results to you as soon as you start typing. How can you interact with page elements if you don’t even know what’s going to pop up, when?

In this case, we’re going to wait until the “Search Results” text pops up.

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException

if __name__ == "__main__":
    driver = webdriver.Firefox()
    wait = WebDriverWait(driver, 100)
    driver.get("http://google.com")
    
    inputElement = driver.find_element_by_name("q")
    inputElement.send_keys("Irwin Kwan")
    
    wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@href='https://irwinhkwan.wordpress.com/']")))
    blog = driver.find_element_by_xpath("//a[@href='https://irwinhkwan.wordpress.com/']")
    blog.click()
    
    driver.quit()

Basically, the driver loads, we set up a “wait” that waits up to 100ms, and then we do a search on Google for my name. Note that we don’t actually submit the form – instead, we wait for the link to my blog to appear in the Google search results, find the element, then click on it. And yes, this is a cheap way to rack up those hits for my blog, so everyone run this code. 😉

There are a lot of expected conditions available in Selenium and this is probably the most important thing to be aware of when first starting. You can’t interact with page elements unless they’re available, after all. Here are some of the useful ones:

  • EC.element_to_be_clickable: Wait for the element to be clickable. Good for elements that aren’t always visible or enabled. I use this a lot in the game.
  • EC.visibility_of_element_located: Wait for the element to be visible. A lot of pages load all of their content, but hide it from the user. Here’s how you ensure that what’s being interacted with is actually visible.
  • EC.presence_of_element_located: The element exists somewhere on the page.

There are a lot of these expected conditions. A full list is available on the selenium.webdriver.support.expected_conditions API document page.

The second key aspect here is the interaction. To do this, you have to “find” the element. So far, I’ve used two main ways of finding an element: ID, and XPath.

Finding elements by ID

Finding an element by ID is pretty much what it sounds like: searching for an element using its ID tag. ID tags are unique in the DOM and are therefore ideal for searching and testing. You can use it like this:

driver.find_element_by_id("menubutton_id")

Finding elements by XPath

XPath is a query language designed to navigate XML. I won’t go through all of the details of XPath here, but I’ll present a few basic use cases. You basically can search on a number of conditions that you specify in the language so you can see if that element exists in the DOM.

  • If you want to search if a specific tag exists:
    driver.find_element_by_xpath("//h1")
  • If you want to search for nested tags:
    driver.find_element_by_xpath("/html/body/h1")
  • If you want to search that text within the tags matches:
    driver.find_element_by_xpath("//h1[text()='Heading 1']")
  • If you want to search for text in attributes
    driver.find_element_by_xpath("//img[@title='Irwin Kwan']")

With these two find_element commands, you can find most of what you need in your web pages. Selenium supports a number of other ways to search, including searching by CSS, but I haven’t needed to use it yet.

Dynamic Interactions with a Page

Another issue I encountered with the Gidget game is clicking through introduction text when I didn’t know how many pages were present. Essentially, the problem is that there’s a button that I have to push on the page, and if I push it a certain number of times, it’ll be disabled. However, I don’t know how many times I have to push it because it might be different each time.

I managed to get around this with a little fragment of code below:

wait.until(EC.element_to_be_clickable((By.ID, 'main_buttonMissionTextNext')))
while EC.element_to_be_clickable((By.ID,'main_buttonMissionTextNext')):
    driver.find_element_by_id("main_buttonMissionTextNext").click()
    if not driver.find_element_by_id("main_buttonMissionTextNext").click().is_enabled():
        break
    wait.until(EC.element_to_be_clickable((By.ID, 'main_buttonMissionTextNext')))

There were two gotchas here: First, I wasn’t aware of it at the time, but the button was actually regenerated whenever you clicked it, so I had to search for it again when I clicked it. Second, I didn’t realise that there was an is_enabled() function you could use to test if an element is enabled or not. But now, you know!

I posted a StackOverflow question about this (which I ended up answering myself).

Using the Selenium IDE and the Selenium WebDriver Together To Save Time

While writing code is nice and fun, HTML pages are very large and are covered with tags with various IDs. It becomes tedious very quickly, even with good web development debugging tools, to search through the DOM elements to identify what you need to interact with, then writing the code to search and click on it.

So work smarter, not harder and use the Selenium IDE. If you record your actions with a page and save it in the Selenium IDE, you can use “Export Test Case As… > Python 2 / unittest / WebDriver”. Now you have Python code for that series of actions that you just performed and can integrate it into your other tests.

In my case, I used the Selenium IDE to automatically complete an exit survey in Gidget, then exported it and used the code in my other test suites. It’s a great way to save time writing code.

Making your code into a Test Suite

In Python, the unit testing is built in. You simply have to import unittest, create a class that extends it, and then write your setup and teardown functions, along with a method beginning with “test”. Here’s the previous code for interacting with Google converted into a Python test suite.

import unittest

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException

class WebTester(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.Firefox()
        self.wait = WebDriverWait(self.driver, 100)
        
    def test_load_blog(self):
        self.driver.get("http://google.com")
    
        inputElement = self.driver.find_element_by_name("q")
        inputElement.send_keys("Irwin Kwan")
    
        self.wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@href='https://irwinhkwan.wordpress.com/']")))
        blog = self.driver.find_element_by_xpath("//a[@href='https://irwinhkwan.wordpress.com/']")
        blog.click()
        
        self.wait.until(EC.visibility_of_element_located((By.XPATH, "//h2[text()='Irwin Kwan']")))
        self.assertTrue(self.driver.find_element_by_xpath("//h2[text()='Irwin Kwan']"))
    
    def tearDown(self):
        self.driver.quit()
    
if __name__ == "__main__":
    unittest.main()

If you place multiple test suite classes in the file, unittest will run them all. Selenium will launch a new, fresh browser instance for each one as well.

Conclusion: Web Automation is Pretty Cool

I’m really just a new user. Selenium has a number of features that I haven’t needed and therefore don’t know much about. There’s a “remote control” mode that allows you to use a separate server to run tests for you. There are ways to store session variables, load specific Firefox profiles with add-ons, and there is a “Selenium server” mode as well. If you need these features, chances are you’ll be able to find information about them on the Selenium site documents.

I feel that this information should set most people up with enough information to get started with Selenium and making it work for them in a useful way. I hope that this post is useful to you guys!

Happy automating!

8 thoughts on “Automating the Web with Selenium: Complete Tasks Automatically and Write Test Cases!

  1. Shawn Sellen

    I am just starting to learn Python as my first language. I have been a “script kiddie” for years with wordpress/php. My main interest now is web automation. I was thinking of becoming more skilled in php. I know there is some automation abilities there, but I hear that multithreading is a pain. I read somewhere it’s better to learn by diving in and playing with the code than starting out with “hello world” stuff (I did, and still am *sigh*). Being an older guy in internet terms (40), I figured python is probably the way to go. I hear c++ and some other languages will make you want to stab your own eyes out. lol. Thanks for the code examples and the clear way you explained it. After researching a half a dozen different frameworks over two days, (selenium being one of them) I found your post. Two questions: is it possible to make Python and Selenium scripts like this into a .exe?, and can it have a user interface?. Thanks again for your post. I hope you continue to write on the subject or do a series on it…This “old script kiddie” got bit by the programming bug late in life. 🙂

    Reply
    1. irwinhkwan Post author

      Hi Shawn, you might want to look at one of my other posts on PyInstaller and Python EXEs (https://irwinkwan.com/2013/04/29/python-executables-pyinstaller-and-a-48-hour-game-design-compo/). I haven’t tried using PyInstaller to package a Selenium script so you won’t be able to follow the instructions directly, but maybe you can use that as a place to start and figure it out. If you do, let me know, I’d be interested to know what the procedure is. If I have a moment I’ll look into it a bit more.

      Reply
  2. Alison Benson

    It was nice to read your post.Selenium automates browsers. That’s it. What you do with that power is entirely up to you. Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.I am sure many people will like your blog.

    Reply
  3. Elvis Papas

    Hello my friend! I am using selenium in order to export some data from a website with python.My problem is that i can’t handle the window which it show up when a button is clicked.The window is the windows save as window (this : http://s18.postimg.org/yx0ycaujd/example.png). Can i handle this window with selenium,in order to fill out the “File Name” field and then click to the Save button?

    Reply
  4. Anil Jeeyani

    Hi Irwin,
    One strange thing happens with me is when I try to read gmail mail, It hangs up and freeze all thing. I have to then stop script and close firefox manually. This is one of the step I need to perform in automation. Case is, I need to process forgot password functionality and some how I need to read password from gmail account. I used Google account. I am opening gmail page in same window where website is running for which I am doing testing. I can open password email successfully but after that it hangs. I can get password from email successfully because I can see password is logged in terminal. Do not know what exactly is causing this to freeze.

    Please let me know if you have encountered such scenario and know how to deal with it.

    Thanks

    Reply
  5. sarieta123

    This page is so informative, would you by any chance know how to run these tests on Jenkins and Chrome driver combination from mac? I am total newbie to selenium and python and the only QA in my company and am exploring this entire set up as I go. I have managed to create a few scripts in unittest but not able to run these from Jenkins.

    Reply
    1. Riz Hossain

      Hi Sarieta,
      your question requires a lot more additional questions to be answered first. Of the many ways you can use Jenkins, I usually like to separate it from running test cases. I would rather use the python https://pypi.python.org/pypi/jenkinsapi to figure out when I want to run a set of test cases. For scheduling: https://cloud.google.com/appengine/docs/python/config/cron
      For your question about chrome driver and combination from Mac, its really not much of a relevant how you want to setup and work with them. Chrome driver is just another extension of WebDriver and Mac is just another OS. From webdriver point of view, you really dont need to worry about it. For and executing test cases, scheduling and such … you can try http://www.automationsolutionz.com which comes with all the stuff that you are looking for.

      Reply

Leave a comment