schlitt.info - php, photography and private stuff
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:Author: Tobias Schlitt
:Date: Wed, 19 Nov 2008 23:29:46 +0100
:Revision: 1
:Copyright: CC by-nc-sa
===================
Taint mode for PHP?
===================
:Description:
Wietse Venema, the creator of the Postfix MTA, posted a proposal for a
"taint mode" to the PHP internals list. Before commenting his proposal,
I'd like to give a short intro about what a "taint mode" is:
Wietse Venema, the creator of the `Postfix MTA`__, posted a `proposal for a
"taint mode"`__ to the `PHP internals list`__. Before commenting his proposal,
I'd like to give a short intro about what a "taint mode" is:
.. __: http://www.postfix.org/
.. __: http://news.php.net/php.internals/26979
.. __: http://news.php.net/php.internals
Consider the 2 main types of data you are using in an application: The most
significant division you can make is "incoming", and "outgoing" data (possibly
"internal" data, which is justs stuck in your application, but this is not of
interest here). "Incoming" data is everything that is
received/requested/injected into your application, for example the
$_GET/$_POST/$_COOKIE/... arrays in your PHP application contain "incoming"
data, but also everything you receive from a database, a file, a shell script
or from anywhere else. "Outgoing" data (in contrast) is everything you provide
to external resources, like echo'ing a string, sending a query to a database,
submitting arguments to a shell command or writing to a file.
As you should know, most (all) of your "incoming" data is potentially
dangerous and insecure. This might apply more to the super global arrays and
less to files and database results. But if you think a bit deeper and consider
that your database might be compromised or somebody manipulated a file
maliciously, this kind of "incoming" data contains a potential security risc,
too. So, every kind of "incoming" data has to be considered potentially bad (I
think this is the most basic mantra of web application development). In
contrast, "outgoing" data (most commonly, if it depends on incoming data) is
potentially insecure for your users and/or your application directly (XSS, SQL
injection, ...).
At this point of the consideration, the "taint mode" comes into place: Every
single bit of "incoming" data is insecure, it is "tainted". In taint mode,
your interpreter flags all incoming variables as "tainted". If you then
perform a potentially insecure operation with the tainted data, you will be
notified. For example, if you just take a POST variable and use it in an SQL
query, you are using tainted, incoming, data and open up a wide security
whole. In "taint mode", the PHP interpreter would stop in inform you about
this issue. In order to fix it, you have to use a specific mechanism to
"clean" your data before using it. In our example, this would be to escape the
data properly before using it in SQL or use variable binding. The same aspect
applies the other way around: If you retrieve data from a database and just
echo it to the user, it might contain insecure HTML and script code. This data
is tainted, too, you need to escape the HTML characters properly
(htmlspecialchars()), before sending it to the browser.
So, let us come back to `Wietses proposal about a "taint mode"`__ for PHP.
While this topic was raised multiple times before on the internals list, I
never saw such a well-thought and detailed proposal so far. Remember that I'm
neither a C, nor a Zend Engine, nor a security expert. But what I read there,
impressed me quite much. I don't want to repeat the whole proposal here, but I
can possibly give a short roundup: Wietse wants to have "taint mode" turned
off by default, which makes sense to keep backwards compatibility. Turning it
on is mainly for development and educational reasons. When switching on "taint
mode", every bit of incoming data is marked tainted by PHP itself internally.
In a first step every function/primitive (further on refered to as "function")
in PHP will be marked as protected by default, which means, that it will not
accept tainted data and will return always tainted data. The second step will
be to identify 2 further groups of functions: Permeable and sanitizing
functions. While permeable functions will only return tainted data if they
received tainted data (like substr()), sanitizing functions are used to
untaint data (like htmlspecialchars()).
.. __: http://news.php.net/php.internals/26979
Using this kind of process to introduce "taint mode" smoothly has 2 big
advantages:
1. Because it is off by default, no application will break when upgrading.
2. Because every function will be protected by default, the need for touching
every single PHP function is gone in the first place.
If you want to know more about the proposal in general, I'd suggest to read it
`directly in the internals archives`__ (and possible the huge thread it
spawned, too). What follows now is my personal opinion:
.. __: http://news.php.net/php.internals/26979
As already stated, I think Wietses proposal is really good and well-thought.
He read a lot of literature beforehand and described the overall idea really
well-founded. Beside that, he seems to already have a working
proof-of-concept, which is great! I really think, having an optional "taint
mode" in PHP would be an absolutely large benefit for all of us. There are 2
main reasons, which make me think so:
a) PHP is easy to learn and the perfect tool for rapidly developing web
applications. But this exactly is the danger: Every unexperienced guy can just
start of with writing a web app and will most probably do the first security
error in his first 10 minutes. Surely, this can be blamed to the unexperienced
developer, which probably did not read a single bit of literature on web
security beforehand. But anyway, with "taint mode", this guy gets a handy
tool, which tells him exactly, where he might have done something seriously
wrong. For sure, this is not the solution to all of our problems (like XML is,
e.g. ;), but it still helps to identify a huge amount of them.
b) Even if you are a highly professional PHP expert, with many years of web
development experience. Even if you are a highly experienced hacker, who knows
every single bit about web and code security: Everybody makes mistakes. Having
a "taint mode", will give you a great possibility to simply check your
application for a large number of mistakes you might have missed somewhere.
Surely, the basic implementation of "taint mode" for PHP would still have some
drawbacks. For example, Wietse does not plan to devide levels of taintness
directly. This means, that you could clean a variable by running
htmlspecialchars() on it, but this would not save you from SQL injection
anyway, while the PHP interpreter would think so. The main reason here is the
overhead that is added to every single zval (the main PHP internal data
structure) and the function calls, which need to check for tainted-ness every
time (remember, the latter one should not affect your production environment
largely, since these checks need to be performed only when "taint mode" is
switched on). Adding more information than just "tainted" or "clean" (boolean
flag, which could possibly just cause 1 bit overhead) to the zval would cause
a much higher memory overhead. But anyway, just knowing, which variable is
still tainted when being submitted to potentially dangerous function is a
great help! And for the first step, it would last here to give the user some
info how he can clean a variable correctly for the specific purpose (like
htmspecialchars() for echo and bindParam() for a PDO query). And if designed
well (which I think will be the case, if it happens), the "taint mode" should
be extendable enough to add levels of tainted-ness later on.
Overall, I think this whole thing would be a great addition to PHP and I hope
this could come for 6.0. What do you think?
More information about taint mode in other languages (like Perl and Ruby) can
be found here: `1`__ `2`__
.. __: http://gunther.web66.com/FAQS/taintmode.html
.. __: http://www.rubycentral.com/book/taint.html
..
Local Variables:
mode: rst
fill-column: 79
End:
vim: et syn=rst tw=79
Trackbacks
==========
- "Tainted mode" pour php on Mon, 18 Dec 2006 08:59:00 +0100 in Kamelot Blog
(Translation in french of this post) Taint Mode pour PHP ? (Mode de
marquages des corruptions de données) Wietse Venema, le créateur du MTA de
suffixe, a posté une proposition pour un « Taint Mode » sur la
mailing-list de PHP-internals...
Comments
========
- Aaron Wormus at Sun, 17 Dec 2006 00:35:28 +0100
"levels" of taintedness are silly, and just asking for trouble. Tainting is
just raising a flag that shows that a variable has been used without ANY
checks, once a check is done on the variable it is not tainted.
Taint mode doesn't protect anyone and shouldn't be considered a security
feature of any kind. However it is nice to be able to turn on and get an
idea of how much checking is being done on the variables.
Having already lived though one taint mode implementation (in perl5) I can
attest that it is a major PITA to begin with, but once you learn to use it
it became a great tool and the overall code quality improved.
+1 on a perl-like tainting -1 taint levels
- Chris D at Sun, 17 Dec 2006 14:40:19 +0100
What about the Filter extension? Did it drop off the face of the Earth?
Ongoing improvements to that extension as well as a mass education effort
would require 1/100 the effort of implementing Taint Mode and get just as
good results security-wise.
You're gonna have to teach them about Taint mode anyway, so why not do it
with something you already have in place now?
- michael at Sun, 17 Dec 2006 23:07:17 +0100
Like Aaron said: +1 perl-like tainting -999 taint levels
- Joshua at Sat, 16 Oct 2010 02:50:08 +0200
The filter extension is not as useful as this proposal, because filtered
strings are indistinguishable from unfiltered strings. The tainting proposal
would let you know when you leak tainted strings; the filter extension alone
has no way of tracking strings from validation to output.
- Lauffmuberery at Sun, 11 Dec 2011 04:06:56 +0100
facebook.com facebook.comс
- Aledcrade at Sun, 20 May 2012 04:42:45 +0200
abacavir is slightly acidic or basic abacavir and fat redistribution
abacavir hepatitis c treatment more
information