Taint mode for PHP?

2008-11-19

Tobias Schlitt

Wietse Venema, the creator of the Postfix MTA, posted a proposal for a "taint mode" to the PHP internals list. Before commenting his proposal, I'd like to give a short intro about what a "taint mode" is:

Consider the 2 main types of data you are using in an application: The most significant division you can make is "incoming", and "outgoing" data (possibly "internal" data, which is justs stuck in your application, but this is not of interest here). "Incoming" data is everything that is received/requested/injected into your application, for example the $_GET/$_POST/$_COOKIE/... arrays in your PHP application contain "incoming" data, but also everything you receive from a database, a file, a shell script or from anywhere else. "Outgoing" data (in contrast) is everything you provide to external resources, like echo'ing a string, sending a query to a database, submitting arguments to a shell command or writing to a file.

As you should know, most (all) of your "incoming" data is potentially dangerous and insecure. This might apply more to the super global arrays and less to files and database results. But if you think a bit deeper and consider that your database might be compromised or somebody manipulated a file maliciously, this kind of "incoming" data contains a potential security risc, too. So, every kind of "incoming" data has to be considered potentially bad (I think this is the most basic mantra of web application development). In contrast, "outgoing" data (most commonly, if it depends on incoming data) is potentially insecure for your users and/or your application directly (XSS, SQL injection, ...).

At this point of the consideration, the "taint mode" comes into place: Every single bit of "incoming" data is insecure, it is "tainted". In taint mode, your interpreter flags all incoming variables as "tainted". If you then perform a potentially insecure operation with the tainted data, you will be notified. For example, if you just take a POST variable and use it in an SQL query, you are using tainted, incoming, data and open up a wide security whole. In "taint mode", the PHP interpreter would stop in inform you about this issue. In order to fix it, you have to use a specific mechanism to "clean" your data before using it. In our example, this would be to escape the data properly before using it in SQL or use variable binding. The same aspect applies the other way around: If you retrieve data from a database and just echo it to the user, it might contain insecure HTML and script code. This data is tainted, too, you need to escape the HTML characters properly (htmlspecialchars()), before sending it to the browser.

So, let us come back to Wietses proposal about a "taint mode" for PHP. While this topic was raised multiple times before on the internals list, I never saw such a well-thought and detailed proposal so far. Remember that I'm neither a C, nor a Zend Engine, nor a security expert. But what I read there, impressed me quite much. I don't want to repeat the whole proposal here, but I can possibly give a short roundup: Wietse wants to have "taint mode" turned off by default, which makes sense to keep backwards compatibility. Turning it on is mainly for development and educational reasons. When switching on "taint mode", every bit of incoming data is marked tainted by PHP itself internally. In a first step every function/primitive (further on refered to as "function") in PHP will be marked as protected by default, which means, that it will not accept tainted data and will return always tainted data. The second step will be to identify 2 further groups of functions: Permeable and sanitizing functions. While permeable functions will only return tainted data if they received tainted data (like substr()), sanitizing functions are used to untaint data (like htmlspecialchars()).

Using this kind of process to introduce "taint mode" smoothly has 2 big advantages:

Because it is off by default, no application will break when upgrading.
Because every function will be protected by default, the need for touching every single PHP function is gone in the first place.

If you want to know more about the proposal in general, I'd suggest to read it directly in the internals archives (and possible the huge thread it spawned, too). What follows now is my personal opinion:

As already stated, I think Wietses proposal is really good and well-thought. He read a lot of literature beforehand and described the overall idea really well-founded. Beside that, he seems to already have a working proof-of-concept, which is great! I really think, having an optional "taint mode" in PHP would be an absolutely large benefit for all of us. There are 2 main reasons, which make me think so:

a) PHP is easy to learn and the perfect tool for rapidly developing web applications. But this exactly is the danger: Every unexperienced guy can just start of with writing a web app and will most probably do the first security error in his first 10 minutes. Surely, this can be blamed to the unexperienced developer, which probably did not read a single bit of literature on web security beforehand. But anyway, with "taint mode", this guy gets a handy tool, which tells him exactly, where he might have done something seriously wrong. For sure, this is not the solution to all of our problems (like XML is, e.g. ;), but it still helps to identify a huge amount of them.

b) Even if you are a highly professional PHP expert, with many years of web development experience. Even if you are a highly experienced hacker, who knows every single bit about web and code security: Everybody makes mistakes. Having a "taint mode", will give you a great possibility to simply check your application for a large number of mistakes you might have missed somewhere.

Surely, the basic implementation of "taint mode" for PHP would still have some drawbacks. For example, Wietse does not plan to devide levels of taintness directly. This means, that you could clean a variable by running htmlspecialchars() on it, but this would not save you from SQL injection anyway, while the PHP interpreter would think so. The main reason here is the overhead that is added to every single zval (the main PHP internal data structure) and the function calls, which need to check for tainted-ness every time (remember, the latter one should not affect your production environment largely, since these checks need to be performed only when "taint mode" is switched on). Adding more information than just "tainted" or "clean" (boolean flag, which could possibly just cause 1 bit overhead) to the zval would cause a much higher memory overhead. But anyway, just knowing, which variable is still tainted when being submitted to potentially dangerous function is a great help! And for the first step, it would last here to give the user some info how he can clean a variable correctly for the specific purpose (like htmspecialchars() for echo and bindParam() for a PDO query). And if designed well (which I think will be the case, if it happens), the "taint mode" should be extendable enough to add levels of tainted-ness later on.

Overall, I think this whole thing would be a great addition to PHP and I hope this could come for 6.0. What do you think?

More information about taint mode in other languages (like Perl and Ruby) can be found here: 1 2

Comments

"levels" of taintedness are silly, and just asking for trouble. Tainting is just raising a flag that shows that a variable has been used without ANY checks, once a check is done on the variable it is not tainted.
Taint mode doesn't protect anyone and shouldn't be considered a security feature of any kind. However it is nice to be able to turn on and get an idea of how much checking is being done on the variables.
Having already lived though one taint mode implementation (in perl5) I can attest that it is a major PITA to begin with, but once you learn to use it it became a great tool and the overall code quality improved.
+1 on a perl-like tainting -1 taint levels

Aaron Wormus at 2006-12-16

What about the Filter extension? Did it drop off the face of the Earth?
Ongoing improvements to that extension as well as a mass education effort would require 1/100 the effort of implementing Taint Mode and get just as good results security-wise.
You're gonna have to teach them about Taint mode anyway, so why not do it with something you already have in place now?

Chris D at 2006-12-17

Like Aaron said: +1 perl-like tainting -999 taint levels

michael at 2006-12-17

The filter extension is not as useful as this proposal, because filtered strings are indistinguishable from unfiltered strings. The tainting proposal would let you know when you leak tainted strings; the filter extension alone has no way of tracking strings from validation to output.

Joshua at 2010-10-16