Cover image for post Liking regular expressions, loving regular expressions

Liking regular expressions, loving regular expressions

During my vacation I finally found time for some literature and read the book "Mastering Regular Expressions" by Jeffrey E. F. Friedl. I first saw that book while grubbing around on Amazon and I got a glimpse of it, when I visited Arnaud in Paris. The latter experience finally convienced my to buy it and I can say: It really was worth it.

Regular expressions (or in short "regex") are a tool (or maybe some kind of language) to describe and match literal text, which is available in nearly every programming language today. I guess everyone in the web application business has used regex from time to time and most developers working on *nix systems will have so during their coding.

The common problem about regex is the documentation. Most languages provide just a rudimentary glimpse on their regex facilities and finding good documentation on what is really possible with an implementation is really hard. Beside that documentation on how to use regular expressions is rare.

This lack is filled greatly by "Mastering Regular Expressions" and I can just strongly recommend the book to every developer out there. Beside a deep look at every available feature of regular expressions Jeffrey provides real-live examples on common problems to solve with regex, gives a great overview on which implementation provides which features (so called regex flavors) and how to emulate features that are not available in an implementation. Further on he gives an introduction into how a regex engine internally works (for both, NFA and DFA driven engines, as well as mixed implementations and Posix NFA) which enables the reader to optimize his expressions in respect to performance and optimal matching.

"Mastering Regular Expressions" consists of 9 large chapters which provide the above described information and much, much more. You can find an overview on the content in the extended entry.

All in all i did not even find 1 sentence useless while reading the book and I will definitly have it available as a reference every time I do something with regex (which will be quite more often in the future). A great due to Jeffrey E. F. Friedl for an absolutly great book!

  • Introduction to Regular Expressions

This chapter gives a general overview on regex and introduces the unversed developer into the main topic on the basis of egrep. Also I used regex often in my tools and daily work until now, I decided to read this chapter and it wasn't a mistake to do so.

  • Extended Introductary Examples

The second chapter switches from egrep to Perl as the tool of choice. After a short introduction into Perl and it's regex features, the reader gets familiar with more advanced regex concepts like lookaround.

  • Overview of Regular Expressions Features and Flavors

Here the reader gets a complete overview on available features and their availability in the different flavors. Beside a deeper look at all previously introduced concepts the reader is introduced into even more advanced features like atomic grouping and posessive quantifiers.

  • The Mechanics of Expression Processing

Now it's getting really to the ground. Jeff explains the concepts behind the major engine implementations (NFA and DFA), shows how an expressions is really processed and gives infos on topics like backtracking, greediness/laziness/possessiveness of quantifiers and even more interessting stuff.

  • Practical Regex Techniques

In this chapter the reader gets to know best-practices on creating regular expressions. On the basis of understanding how engines work Jeff shows a huge bunch of practical examples.

  • Crafting an Efficient Expression

Here the main topic is based on efficiency and performance of expressions. Using the knowledge of both preceding chapters Jeff shows how to optimize regex for specific engines, how to gain the last bit of performance from and expression and what common mistakes are to slow an expression down.

  • Perl

  • Java

  • .NET

These 3 chapters are dedicated to the 3 named implementations of regular expressions. I did not dig into those that much, since I do not use any of the listed technologies actively right now. But a first glimpse promises detailed information on how to use and optimize expressions for the specific languages.

Comments

Your post reminded me how enlightening "Mastering Regular Expressions" was I read the first edition 4 or 5 years ago.

Now that I know there's a second edition out, I'm going to have to read it again!

Aaron Wormus at 2004-08-20