This blog post has first been published in the Qafoo blog and is duplicated here since I wrote it or participated in writing it.
Cover photo for post Practical PHPUnit: Testing XML generation

Practical PHPUnit: Testing XML generation

Testing classes which generate XML can be a cumbersome work. At least, if you don't know the right tricks to make your life easier. In this article, I will throw some light upon different approaches and show you, how XML generation can be tested quite easily using XPath.


The shown test code can even be more optimized by e.g. using a data provider in some places, but this is not in scope of this article. In addition, I do not obey to the one-assertion-per-test principle, since I disagree with it. Please feel free to discuss both in the comments anyway. ;)

For better illustration, I use a real-world example we had at a customers project. Of course I anonymized and simplified the code for this blog entry. In the named project, XML was generated in order to communicate with a small web service. One part of the XML structure consisted of person data, which was encapsulated in a simple class:

class qaPerson { const GENDER_MALE = 1; const GENDER_FEMALE = 2; protected $firstName; protected $lastName; protected $gender; protected $dateOfBirth; public function __construct( $lastName, $firstName ) { $this->setLastName( $lastName ); $this->setFirstName( $firstName ); } public function getLastName() { return $this->lastName; } public function getFirstName() { return $this->firstName; } // ... }

As you can see, there is nothing special in it: Some properties and corresponding getters. Of course there are also some setters, but I kept them out here, because they are not significant.

In order to generate an XML representation of the person data, the common visitor pattern was used. The qaPersonVisitor takes a DOMElement in its constructor, which deals as the root for the XML serialization process. The method visitPerson() receives a qaPerson object and generates the corresponding XML elements:

class qaPersonVisitor { protected $document; protected $root; protected $currentElement; public function __construct( DOMElement $root ) { $this->root = $root; $this->currentElement = $root; $this->document = $root->ownerDocument; } public function visitPerson( qaPerson $person ) { $this->currentElement = $this->currentElement->appendChild( $this->document->createElement( 'Person' ) ); $this->currentElement->appendChild( $this->document->createElement( 'LastName', $person->getLastName() ) ); $this->currentElement->appendChild( $this->document->createElement( 'FirstName', $person->getFirstName() ) ); if ( null !== ( $gender = $person->getGender() ) ) { $this->currentElement->appendChild( $this->document->createElement( 'Gender', $gender ) ); } if ( null !== ( $dateOfBirth = $person->getDateOfBirth() ) ) { $this->currentElement->appendChild( $this->document->createElement( 'DateOfBirth', $dateOfBirth->format( 'Y-m-d' ) ) ); } } }

So, the code does no magic at all and it is pretty straight forward, as a visitor should be. So, how would you attempt to write a unit test for the visitPerson() method?

In following, I first present two basics approaches to this task, which I've seen and wrote myself many times. After that, I show you two more decent approaches, which are shipped with PHPUnit, based on CSS selectors and a custom tag matching data structure. Finally, I will show how we solved the task in the customer project with a very slick and clean XPath based approach.

With each presented approach I will give a short discussion on what the pros and cons are. Please feel invited to tell us about additional approaches you might know and to discuss your impressions on the described approaches.

You can find the complete code examples for testing XML generation in our Github repository for blog code examples.

Test case basics

In order to test the XML generation, the qaPersonVisitor class has the precondition of receiving a DOMElement object in its constructor. In addition to that, the visitPerson() method always needs a valid instance of qaPerson to be serialized. Therefore, the test class qaPersonVisitorTest makes use of PHPUnits setUp() method, to set up the DOM basics, and implements a custom method to return a fixture for the qaPerson class:

class qaPersonVisitorTest extends PHPUnit_Framework_TestCase { protected $domDocument; protected $rootElement; protected function setUp() { $this->domDocument = new DOMDocument( '1.0', 'utf-8' ); $this->rootElement = $this->domDocument->appendChild( $this->domDocument->createElement( 'root' ) ); } protected function getDomDocument() { return $this->domDocument; } protected function getDomRootElement() { return $this->rootElement; } protected function getPersonFixture() { $person = new qaPerson( 'My Last Name', 'Some First Name' ); $person->setGender( qaPerson::GENDER_FEMALE ); $person->setDateOfBirth( new DateTime( '2000-01-01 00:00:00+00:00' ) ); return $person; } // ... }

This class is used as the basis for all of the test methods described in following.

The naive way

The naive way of testing if XML is generated correctly is to manually rebuild the expected DOM structure in the test case and assert that this equals the generated one:

public function testVisitPersonNaive() { $person = $this->getPersonFixture(); $expDom = new DOMDocument( '1.0', 'utf-8' ); $expRoot = $expDom->appendChild( $expDom->createElement( 'root' ) ); $expPersonElem = $expRoot->appendChild( $expDom->createElement( 'Person' ) ); $expPersonElem->appendChild( $expDom->createElement( 'LastName', $person->getLastName() ) ); $expPersonElem->appendChild( $expDom->createElement( 'FirstName', $person->getFirstName() ) ); $expPersonElem->appendChild( $expDom->createElement( 'Gender', $person->getGender() ) ); $expPersonElem->appendChild( $expDom->createElement( 'DateOfBirth', $person->getDateOfBirth()->format( 'Y-m-d' ) ) ); $visitor = new qaPersonVisitor( $this->getDomRootElement() ); $visitor->visitPerson( $person ); $this->assertEquals( $expDom, $this->getDomDocument() ); }

Before I inspect this technique a bit deeper, it is important to note that one thing: In a real life environment this test case should not be the only one for the``visitPerson()`` method. Since the method contains two conditions, you would at least need four test cases to fully test its functionality: One for each combination of the conditions.

Comparing the code of the test method to the tested visitPerson() method, you can see that they have quite some code in common. In fact, their code is equal except for the two conditions. There are two flaws hidden in this observation:

First, there is the risk that developers get lazy and just copy & paste between test and actual code. Why should you re-write code that already exists in either of the methods? Even if no real C&P happens, a developer still reminds how he wrote one of the methods and will do in-mind copying when developing the counterpart.

But why is this actually dangerous? Whenever a human tends to duplicate code instead of writing new code for testing, logic errors are likely to be evident in both versions. This is pretty much like reading your own writings: You read what you were intended to write, not what you actually wrote. So a goal of each test should be to reformulate the expectations regarding the tested method in an independent way.

The second problem with the shown method is, that it involves quite some code. 43 lines is not what one would call a short and simple test method. Even if the code is still simple, there is a danger that people will not have the patients to read the code again when touching the method. Again, if you need to writing a lot of code twice, it is pretty seductive to just copy & paste it.

The lazy way

One of the flaws of the naive testing method, where the XML structure is rebuilt by hand for comparison, is the amount of code to write for testing purposes. In case of testing generated XML structures, it is easy to ship around this by storing the expected XML to disc and compare this to the structure generated by the visitPerson() method:

public function testVisitPersonCompareFile() { $person = $this->getPersonFixture(); $visitor = new qaPersonVisitor( $this->getDomRootElement() ); $visitor->visitPerson( $person ); $this->assertXmlStringEqualsXmlFile( 'data/' . __CLASS__ . '__' . __FUNCTION__ . '.xml', $this->getDomDocument()->saveXml() ); }

The first thing to be noted here is the shortness of the code: 11 lines compared to 43 in the naive test method. This is the shortest test method presented in the article, which is quite appealing. But this is only half the truth, since the expected XML structure has to be written in addition. This way of testing XML also involves a serious danger: Why would you write XML by hand, if you are about to write (or have already implemented) a method for this? So chances are good that you switch to the dark side and just store the generated XML and inspect it manually. A no-go for serious testing.

Of course, you can force yourself to write the comparison XML by hand. If you manage to do so, this way of testing XML is very convenient. You can even automate the assertions some more and therefore shorten your code. However, I suggest that you better take a look at the rest of the proposed test proceedings, just to make sure you don't loose self-control once in a while. Remember when projects become hounded, will you then still write the expected XML by hand?

Selecting with CSS

The third way of testing XML involves some not so well known functionality of PHPUnit: Assertions which allow you to select parts of the XML structure by CSS selectors. Since I expect you did not use the corresponding assertSelectCount() and assertSelectEquals() methods before, I'll explain the code in more detail in following. If you already know these methods, feel free to skip directly to the discussion part, which follows after the next section.

The code

public function testVisitPersonSelectCSS() { $person = $this->getPersonFixture(); $visitor = new qaPersonVisitor( $this->getDomRootElement() ); $visitor->visitPerson( $person ); $this->assertSelectCount( 'Person', 1, $this->getDomDocument(), 'Invalid number of Person elements', false ); $this->assertSelectEquals( 'Person > FirstName', $person->getFirstName(), 1, $this->getDomDocument(), 'Invalid content of FirstName element', false ); $this->assertSelectEquals( 'Person > LastName', $person->getLastName(), 1, $this->getDomDocument(), 'Invalid content of LastName element', false ); $this->assertSelectEquals( 'Person > Gender', $person->getGender(), 1, $this->getDomDocument(), 'Invalid content of Gender element', false ); $this->assertSelectEquals( 'Person > DateOfBirth', $person->getDateOfBirth()->format( 'Y-m-d' ), 1, $this->getDomDocument(), 'Invalid content of DateOfBirth element', false ); }

There are two assert methods called in this code, which I suspect many people did not know about: The assertSelectCount() method selects a set of XML nodes through a CSS selector and ensures that the selected number of nodes equals a defined count. The method assertSelectEquals() also asserts the number of selected nodes, but in addition makes an assertion of their contents.

You can find more information about the CSS select assertions in the PHPUnit documentation.

But lets look at the usage of these methods step by step: The first assertion in the example ensures that exactly one <Person> element is generated. The first parameter to the assertSelectCount() method defines the CSS selector to be matched. Note that there is of course not the full CSS 3 specification supported by the parser implemented in PHPUnit, but the implemented sub set works quite nicely for testing purposes.

The second parameter defines the number of XML nodes to be found by this selector and the third one gives the DOMDocument to be tested. The fourth parameter provides the typical optional failure message, which is printed by PHPUnit if the test fails. The fifth and last parameter indicates if the given DOMDocument contains a HTML structure (default) or some custom XML dialect.

Note that there were some flaws in the PHPUnit test code, regarding the CSS assertions on XML instead of HTML documents, which have been fixed recently. So, in order to use this way of testing right now, you should use a recent PHPUnit checkout from Github.

The other used assertion, assertSelectEquals(), works quite similar: The first parameter is again the CSS selector. In the first call to this method, this selects the <FirstName> elements below the <Person> element. Remember, that the previous assertion already ensured that there is only 1 <Person> element. The second parameter contains the expected string content of the selected XML elements. In the first use of the assertion, this is the persons last name. Parameter three determines how many elements are expected to be matched by the assertion. The last three parameters are the same as for assertSelectCount(): The DOMDocument, a message and the indicator for XML instead of HTML.

There exists a third method in the round dance of CSS selector based assertions in PHPUnit: assertSelectRegExp() allows you to provide a regular expression for the expected content instead of just a plain string. This is quite useful for e.g. generated lists of elements, which have a common text structure, but slightly different actual content.


After the code has been explained, let's return to the actual topic: How well is this method of testing suited?

First of all, the potential flaw of copying and pasting code from or to the visitPerson() method to the test is not evident. By making use of CSS selector matches, you make sure that you write your test code completely independent from the code you're testing. This is a very good thing, because you need to re-think what you actually expect and re-formulate these thoughts it in a completely different way. This fact makes the CSS selector way of testing much more robust than the previously shown test strategies.

However, this way of testing is not flawless, too. Still the number of code lines to write is quite large: 47 lines is even more code than generating the expected XML structure by hand in the test method. In addition to that, the typical developer dealing with XML might not be familiar with the way of selecting nodes through CSS selectors, although this technique became quite common through e.g. jQuery in the JavaScript world. For someone not familiar with this, your test code is harder to read than any of the previously shown methods.

Finally, the code involved for the assertion in the depth of PHPUnit is quite large. It's not a simple equality or type assertion, but involves parsing the CSS selector and performing the corresponding selection of DOM nodes. Involving that much code raises the potential of code flaws in the test code itself, so it is hard to guarantee that your test actually asserts what you are looking for.

Tag match assertions

Build on the same background code base, PHPUnit also offers a custom way of matching XML structures through the assertTag() method. In fact, if you use one of the CSS selecting assertions, your selectors will be parsed into the structure used by this assertion. Again, if you already know the assertTag() method, feel free to skip the following subsection, and go directly to the discussion part.

The code

public function testVisitPersonTag() { $person = $this->getPersonFixture(); $visitor = new qaPersonVisitor( $this->getDomRootElement() ); $visitor->visitPerson( $person ); $this->assertTag( array( 'tag' => 'Person', 'child' => array( 'tag' => 'LastName', 'content' => $person->getLastName(), ), ), $this->getDomDocument(), 'Incorrect LastName tag', false ); $this->assertTag( array( 'tag' => 'Person', 'child' => array( 'tag' => 'FirstName', 'content' => $person->getFirstName(), ), ), $this->getDomDocument(), 'Incorrect FirstName tag', false ); $this->assertTag( array( 'tag' => 'Person', 'child' => array( 'tag' => 'Gender', 'content' => (string) $person->getGender(), ), ), $this->getDomDocument(), 'Incorrect Gender tag', false ); $this->assertTag( array( 'tag' => 'Person', 'child' => array( 'tag' => 'DateOfBirth', 'content' => $person->getDateOfBirth()->format( 'Y-m-d' ), ), ), $this->getDomDocument(), 'Incorrect DateOfBirth tag', false ); }

A tag assertion is defined by an array, potentially reflecting structure and contents of XML elements at the same time. In the shown example, only two of the possible definitions are used: Definition of a tag and one of its children and definition of the contents of this child.

The first call to assertTag() in the above example asserts that there exists a tag <Person> which has a child <LastName> with the expected content. In order to define this, the given array contains the key tag assigned to the name of the tag to look for (Person in this case). The second array key child defines an assertion on a child of this tag, again using an array structure: In this second level array, again a specific tag is required (LastName). However, in this case, not the structure of this tag is defined, but the content, indicated by the corresponding key.

There are various array keys which allow you to define different XML constructs. Assertions are possible on attributes, the parent of a tag, the number of children and some more. You can find an overview on the possibilities in the PHPUnit docs on assertTag(), as well as in the API docs of the method.


As with the assertion through CSS selectors shown in the previous section, the assertTag() method allows you to formulate your expectations in a completely different way than the code to be tested is written. This is, as explained, a good thing.

A fundamental drawback of assertTag() is the syntax of the tag matches. I'm not aware of any construct in PHP (or any other language I know) which uses a similar way of defining how an XML structure should look like. Therefore, it might be really hard for other developers to read your test cases.

Furthermore, while the possibilities with this way of assertions are quite large, it can still not cover all edge cases and you might be forced to choose an alternative way of testing for these. The amount of code for testing is also quite high: With 58 lines, this is the longest test method presented in this article.

Using XPath

XPath is very well suited for working with XML, since it is a W3C recommendation.

A very decent, but not really documented, way for testing the generation of XML content is using XPath. For this purpose, we have implemented a very simple custom assertion method in the named customer project, which is presented below. After that, I show two ways of how this method can practically be used in our visitor example. Finally, the pros and cons of using XPath are discussed.

I won't go into details on XPath itself in this article, so if you are not familiar with this standard language for addressing XML nodes, please refer to one or more of the following resources:

Please note that DOMXPath does only support XPath version 1.0 and does not support XPath 2.0.

Custom assertion

The custom assertion we used is quite simple and implemented in a new base test case class, so it can be re-used for different XML generation tests:

abstract class qaXmlTestCase extends PHPUnit_Framework_TestCase { protected abstract function getDomDocument(); protected function assertXpathMatch( $expected, $xpath, $message = null, DOMNode $context = null ) { $dom = $this->getDomDocument(); $xpathObj = new DOMXPath( $dom ); $context = $context === null ? $dom->documentElement : $context; $res = $xpathObj->evaluate( $xpath, $context ); $this->assertEquals( $expected, $res, $message ); } }

The abstract getDomDocument() method is required in order to instantiate the DOMXPath object properly. The actual assertion method assertXpathMatch() receives two mandatory parameters: The result expected to be produced by the XPath query and the query itself. Furthermore, it optionally receives the typical PHPUnit message to be displayed if the assertion failed (third parameter). The last parameter can optionally be a DOMNode that deals as the context for the XPath query. If this parameter is left out, the query is meant to have the root element as its context.

Extensive XPath

The usage of this custom assertion is quite simple, too. Below you find a quite extensive code example which asserts the correctness of the generated XML structure step by step:

public function testVisitPersonXpathExtensive() { $person = $this->getPersonFixture(); $visitor = new qaPersonVisitor( $this->getDomRootElement() ); $visitor->visitPerson( $person ); $this->assertXpathMatch( 1, 'count(/root/Person)', 'Incorrect number of Person elements.' ); $this->assertXpathMatch( $person->getLastName(), 'string(/root/Person/LastName)', 'Incorrect or missing LastName element.' ); $this->assertXpathMatch( $person->getFirstName(), 'string(/root/Person/FirstName)', 'Incorrect or missing FirstName element.' ); $this->assertXpathMatch( $person->getGender(), 'string(/root/Person/Gender)', 'Incorrect or missing Gender element.' ); $this->assertXpathMatch( $person->getDateOfBirth()->format( 'Y-m-d' ), 'string(/root/Person/DateOfBirth)', 'Incorrect or missing DateOfBirth element.' ); }

The first assertion ensures that exactly one <Person> element was actually generated. The XPath function count() counts the number of nodes in the selected node set. The query inside this function looks from the root of the document for all <root> elements and selects all of their children which are <Person> elements.

The subsequent assertions take care of each child element of the <Person> and assert the correctness of their content. The XPath function string() converts the nodes in a selected nodes set into a single string, by simple concatenating their text content. So inherently, the assertion ensures that only a single such element (with content) exists.

You can make use any of the XPath 1.0 function here, to make the XPath query return something else than a node set, i.e. a DOMNodeList in PHP. If you don't want to use XPath functions or have an assertion which is not possible with them, you can still write an additional assertion method performing your custom operations on the XPath match.

Short XPath

While the last example was quite long, in terms of code lines, a similar assertion could have been achieved with some more complex XPath in fewer lines of code:

public function testVisitPersonXpathShort() { $person = $this->getPersonFixture(); $visitor = new qaPersonVisitor( $this->getDomRootElement() ); $visitor->visitPerson( $person ); $this->assertXpathMatch( 1, sprintf( 'count(/root/Person[' . 'LastName = "%s" and ' . 'FirstName = "%s" and ' . 'Gender = "%s" and ' . 'DateOfBirth = "%s"])', $person->getLastName(), $person->getFirstName(), $person->getGender(), $person->getDateOfBirth()->format( 'Y-m-d' ) ), 'Mismatching XPath.' ); }

The example uses a single XPath expression to count only those <Person> elements with correct content, i.e. a <LastName> element with correct text content and a <FirstName> element with correct content and so on. For this purpose, the XPath matches the <Person> element and puts conditions on it, which are called predicates in XPath terminology.

It has to be noted that this single expression is not as accurate as the multiple ones from the previous example were: For example, it could happen here, that two <Person> elements are generated of which only one matches the given conditions.


So, let's review all the pros and cons I talked about for the previously presented testing methods:

With XPath you need to re-formulate the expectations to your XML structure and contents explicitly and you cannot do copy & paste between test code and actual code. Furthermore, you can hardly auto-generate an XPath from the generated XML, so you are actually forced to think yourself and do write your test code on your own.

The amount of code to write for your test cases is rather flexible, depending on your custom needs for the test case. The two examples above have 33 and 21 lines of code, which is not few but also not much.

As with the CSS selector and tag match examples, XPath queries are quite flexible in what you want to expect from the generated XML. In fact, XPath is even more powerful here. In addition to that, XPath has the clear benefit of being a standardized language which many developers, who deal with XML, do know well.


In this article, I've shown different ways of how you can test XML generation on basis of PHPUnit. In my eyes the most effective, easy and sensible way is the XPath solution shown in the last section. What you actually use in your projects depends much on the amount of XML generation to test and the circumstances, but the XPath way should always be a reasonable choice.

So, how do you test XML generation? Do you use similar techniques than the ones presented here? Did you learn something useful from this blog entry? Please leave a comment!