A week ago Sebastian pointed out an article on LinuxJournal, which talks about documentation coverage. By the question "Isn't that exactly what tobyS' tool does?" I felt remembered, that I wanted to blog the little tool I wrote for eZ Components a while ago. Since this blurb was lurking in my blog for another week, you get my writings a little more belated.
The actual idea was inspired by a blog post by Lukas Smith which threw the term "documentation coverage" into my mind. We (as in "the eZ Components team") are very keen on documentation, which is reflected in extensive API docs, additional tutorials and lots of example code. While the latter 2 are still conveniently checkable manually, API docs are not that easy to validate, resulting from the huge number of classes and class members in eZ Components. Typos, missing doc-tags and violations of our documentation standards are not easily detectable and can occur easily during development. Checking every doc block by hand is a live time work and even if you try to do so, you will miss many small issues.
Therefore I wrote a little tool to assist us with checking the consistency of API documentation. The tool uses PHPs reflection API to retrieve the OO elements of a component and uses a simple (regex based) parser to extract the doc block elements assigned to them. The pure parsing already gives a hint on broken documentation tags, as far as this is possible using a regex based parser. A simple visitor interface can be used to perform checks on the tree of API-elements and their documentation.
While (almost) 100% of eZ Components API elements are documented using a phpDocumentor syntax. So our major concern is not the doc coverage itself, but syntactical correctness of the phpDocumentor annotations and (as far as possible) the semantical correctness. While the first subject can be checked quite easily using a "real" parser (and even with the current one), the latter one is quite tricky, as very semantical check is.
The current implementation checks especially for the availability of certain tags for certain elements (like a @package for each class) and correct values for annotations like @copyright. Beside that it checks if all parameters of a method are documented, if the documented types match eventually available type hints and if the order is correct. Although this sounds not so much work to be performed, we were amazed how many small and bigger issues with the documentation were already found this way.
Since this proof-of-concept implementation works quite good, I started implementing a real parser for the docs, to get a better tree-structure and perform more valuable checks. But this is at a very early stage and not publically available, yet. Anyway, although the current main-script is very eZ Components specific, the whole thing might by valuable to others, too, which is the main reason for this blog post.
You can check the script out of our SVN, where it is called docanalysis.php. Adjusting this stuff to your own project should be easy. Hope this is valuable for someone.
If you liked this blog post or learned something, please consider using flattr to contribute back: .
Fields with bold names are mandatory.
Chuck Burgess
You might get some assistance from one of the new command-line arguments in v1.4.0, "--undocumentedelements on" (http://manual.phpdoc.org/HTMLSmartyConverter/HandS/phpDocumentor/tutorial_phpDocumentor.howto.pkg.html#using.command-line.undocumentedelements).
Link to commentThis outputs warnings (that are compiled along with other warning/error output in the errors.html that is produced) listing what documentable elements are without docblocks.
gggeek
Nice tool, but there are often cases when documenting that will trip up such tools.
Link to commentFirst and foremost, if the coder just puts a std 'this is a param' comment for every param of every function, the checker is defeated. In fact, you could just go as well by reflection only and analyze all the code to provide a useless but correct doc.
Second, there are some cases where actual parameters are slightly different from what is specified in code. This is very true in typeless land such as php.
Crooked example: api change of a function, first version takes string,string,string, improved version takes object. A little php magic and the very same function can be backward compatible and support both syntaxes (we love our users, don't we). The php code still reads 3 parameters, with 2nd and 3rd defaulting to NULL, javadoc says 1 parameter, of class thisobject...