Automated testing with GUI clients

2010-04-10

Tobias Schlitt

sikuli, test, php, qa, ez components, webdav, gui, screenshot, matching, quality, client

Using the Sikuli project, you can script GUIs based on screenshots. I used this approach to automate the client tests for the eZ Webdav component.

As all eZ Components, the eZ Webdav component is highly unit tested. But since WebDAV clients all interpret the standard a little (or more) different, we implemented custom regression tests for this component, based on the PHPUnit framework. Generating these regression tests with each and every supported client was still tedious handwork so far. This needed to be done whenever the servers responses changed due to bug fixes or feature implementations. In this article I present Sikuli, a screenshot based GUI testing framework, which I now used to automate this last part of the testing process.

Sikuli click

To ensure client compatibility in the eZ Webdav component, we use custom regression tests, based on PHPUnit. I won't go into detail here on how these are implemented. If you are curious, you can read about that in the PHP QA book I contributed to. In short: We run a manual test with a client and store its request data together with the response information generated by the server. In the regression test, we then replay the request data to the server and check if the response still matches the record. If this is not the case, we need to manually re-check the client and re-generate the regression test if it still works correct.

Manually generating the client regression tests is quite some work: We currently support Cadaver, a Unix shell client, Nautilus and Konqueror (both X window clients), different InternetExplorer versions and some more. Doing the testing work is not difficult at all, but it is really really boring. And boring test work is the biggest enemy for quality assurance. Basically the following relation is exponential: The more boring the test work, the sloppier it is performed.

I finally fixed another part of this manual testing fiasco: Automating the client test run. To achieve this I used a quite new project, developed at the MIT, called Sikuli. Sikuli allows you to implement scripting of GUI application based on matching against screenshots. Originally created to test GUI applications, Sikuli even supports "unit testing for GUI elements".

Sikuli is written in Java and just hooks into platform specific GUI libraries. It therefore is said to work on Linux, Windows and MacOS. So far I only tested it on Linux, but the screenshots in the manual come form MacOS. So I assume that this also works fine.

To write scripts, the Sikuli IDE is used. The term IDE is a bit over-the-top here, lets call it an editor for now. In this editor you can program Python, as Sikuli includes Jython for interpreting scripts. Beside the typical Python programming functionality, you have some special functions and objects available to access the desired GUI. These operations are all based upon screenshots, which gives Sikuli its simple and ellegant charm. It also ensures cross platform operation of scrips, given the GUI looks the same on all platforms.

I uploaded the Sikuli script I use to generate a client test for Nautilus without authentication. This HTML export is generated by Sikuli directly. The script shows impressively, how Sikuli allows you to work with matching of screenshots. A little extract can also bee seen below:

Sikuli match

In this case, Sikuli is instructed to perform a click action to the area of the screen that matches the screenshot of the file. The match needs to provide a similarity of at least 95%. If multiple areas of the screen match the pattern, only the first one - i.e. the best match - is chosen. The Sikuli editor encapsulates the creation of Pattern objects and calling of similar() and firstN for you, so you don't need to mess around with this. Furthermore, the editor assist you with creating the necessary screenshots.

Using this screenshot based programming approach, you can realize almost any action on your GUI. Sikuli does not only allow you to click on elements, but also to perform typing, including emulation of key combinations, and pastes. You can match inside a sub-region of the screen, wait until a certain GUI element appears and more. Using OOP in Python you can encapsulate certain actions for re-use.

The Sikuli project is currently in beta phase, so don't expect it to work 100% flawlessly. During my work with it on Linux, it occasionally had hiccups matching a certain screenshot or the IDE simply died. However, it already works great and as soon as it matures, you get a powerful tool to automate any kind of GUI interaction on any platform.

Comments

Hi Tobi,
google has a project called "ringo" that enhances selenium with a library of UI elements. This means, UI elements aren't referenced directly in selenium tests but with pointers to the library. If an UI element changes, you only need to make the change once in the library and not in all selenium tests. Do you know, whether Sikuli supports sth. similar?
Thanks for the interesting pointer!

Thomas Koch at 2010-03-23

You can of course encapsulate your GUI elements in a library, aka a Python class, and reference them from there. See my example script, where I also encapsulate complete actions like copy&paste.

Toby at 2010-03-23

Just tested it in Windows and all is well. Every now and then it crashes or behaves unexpectedly (especially when running Unit Tests) but all in all looks very promising so far.

Anton at 2010-03-23