This guy's
post about returning to his original focus (which would be Perl,
Python, and PHP) reminded me of a post I had long intended to publish.
It's (briefly) titled this: "How to correctly write PHP code in
conjunction with databases." Long title, but I can't think of anything
shorter.
Foreward: My database of choice would be MySQL (again, it's a PHP
focused post). Note, however, that anything I mention here is easily
applicable to any database. I have done developement on PGSQL and even,
gasp, MSSQL, and likewise I'm quite confident that everything I mention
here as far as databases go should apply to pretty much anything.
PHP is probably the most used language on the web. Why? It is fairly
easy to pick up, and has a huge amount of support on pretty much any
server anywhere, never mind the monstrosities of pre-existing PHP
scripts you can download from... pretty much everywhere. Forums,
download sites, e-commerce sites, you name it, you can probably find a
PHP script doing it. However, as a result, it's one of the most
commonly attacked languages out there, and while it is incredibly easy
to pick up, it's not as easy to write PHP properly and in a secure manner.
Uh-oh, I wrote "PHP" and "secure" in the same sentence. That's a bad
thing. PHP has recently recieved a fair amount of flak for it's
security, or lack thereof. One of the lead programmers for PHP whose
sole purpose was really just to make PHP more secure, stepped down. This article
is a good read on the matter, for anyone curious. From what I gather,
he stepped down for one main reason, with another one on the side. The
main reason would be the response time to security holes. The side
reason would be a view that PHP was built insecure from the ground up,
and try as he might, that wasn't going to change.
That debate really comes down to "it's the programmer's fault for
coding that way" vs. "you shouldn't even allow that to happen, ever."
And it's true: improperly coded PHP is a complete nightmare, riddled
with security holes, both in code and in function, and it turns into a
complete mess quickly.
Now, let's avoid all of that, shall we?
Point #1: Check all of your data for validity.
Now, this is no small order. PHP is typically used to take user
input, process it, and then output it. Right there, you have three
places a malicious user can attack your code: the input, the
processing/storage of the data, and finally, the output.
The input is one of the most commonly checked and parsed things,
and, in my opinion, wrongly so. Assume, for a moment, that everyone
everywhere is a non-malicious user. Why shouldn't they be allowed to
have a username of "admin' or 1=1--"?
Sure, it's a little weird, but then again, so is the internet as a
whole. End result, most people code to strip out less commonly used
characters. Others call htmlspecialchars(), and other flat-out deny
query execution when some special characters are detected. Oh, and then
you have the swarm of mysql_real_escape_string() people, and as a
result, every other line will be mysql_real_escape_string().
Now, admittedly, the last bit there (mysql_real_escape_string()) is
a real and valid solution to the bigger problem of SQL injection. But,
it is not optimal. Optimal would be using SQL statements in combination
with stored procedures. Currently, only MySQL is supported directly
with stored procedures, by ways of the mysqli functions. However, PHP's
PDO
project is coming along very nicely, and with that you can use SQL
statements with pretty much any database out there. SQL statments are,
to put it simply, a way of using SQL queries in a manner that
guarantees you will never be vulnerable to the most common type of SQL
injection attacks (which is generally input manipulation).
Now, before you touch the database (I know, I'm out of order), you
can do other sanity checks on your data. Take for instance, the ever
common URL, http://example/page.php?id=1. It's that ?id=1 part that is
common, and further, non-specific to PHP. My question to you is this:
why is it that you have countless webpages where ?id=1 is a security
hole? You can make your data checks really, really easy here: is_int().
Really, think about it. You can completly skip all forms of data
validation in this example, by checking the results of $_GET["id"] with
is_int(). Is there ANY reason why $_GET["id"] should be anything but a
number? No? So then why are you spending so much time checking for odd
characters that shouldn't be there, when you could merely check the
datatype and know for sure that it's legit?
This can also easily apply to strings, or to be more defined, UUIDs
(On that page, search for "UUID()", it's roughly 2/3 of the way down).
If you know that a UUID will always be in
theaaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee format, you could use a regular
expression to check that easily (not to mention to check that it only
contains a-Z and 0-9 chars).
Point #2: Proper database security and design
If you've ever installed a PHP script that used SQL, it asked you
for a login to a SQL server of your picking. In my opinion, it should
ask you for two seperate logins. "Why?" That's an easy answer.
As it stands now, any SQL username and password you provide to a
script, generally has complete reign over the database. SELECT, UPDATE,
DELETE, the works. This poses the question: Why should someone viewing
a page (not editing, adding, removing, etc.) even have the rights to
turn a query around and run a DELETE? Does that even make sense? I sure
hope not.
The solution to the problem is this: have two users, with differing
rights on the SQL server itself. The first user should really only have
SELECT, the ability to run a stored procedure, check out a view or two,
and... that's it. Even at that, stored procedures that only execute
SELECT. I can't name one reason why an anonymous page view should use a
connection to the SQL server with enough rights to run an UPDATE. It
just makes no logical sense. The second user, however, still shouldn't
have UPDATE rights: rather, it should only have the ability to execute
stored procedures. Stored procedures protect your data, and can
sanitize code everywhere. Again: there is no reason why an anonymous
page view should even have the rights, let alone ability, to run an
UPDATE or a DELETE query. So, um, please stop allowing such things. It
will make the world a nicer place.
Point #3: Taking point #2 a step further, and seperating the code that writes from the code that reads.
PHP 5.0 introduced a lot of very useful object oriented
capabilities. Take for example, a mock forum, with the following
functions: $forum->view->thread(),
$forum->update->thread(). Again, there is no reason at all why
view->thread() should be calling update->thread(). Functions and
classes can be marked as private, protected, and public: please do so.
Before someone goes off and states "but end users can't execute PHP
code that I didn't write!", think again. It can happen. It has happened before. It is preventable. Plus, this makes your code cleaner and enforces good coding habits all around.
Point #4: Check your SQL output
As yes, the ever popular login string of "admin' or 1=1--". Commonly
used is poorly coded web scripts to gain a login as the user "admin",
you can simply google around for this and see countless example of it
pretty much everywhere.
But there's a bigger problem with this. Let's assume for a moment
that someone used this on a poorly coded website and succeded with a
login as user "admin." The real problem? The "or 1=1" bit. You just selected every single user in the database.
Here's yet another sanity check, useful for such things: *_num_rows().
This is an incredibly simple check all around: if you're trying to log
in one user, and you get four rows back, you know something somewhere
is wrong. So.... why do you allow it?
All of the above can really come down to two main points: data
validation, and not allowing the database rights, but rather
abstracting the rights into stored procedures. Data validation is
usually taken too far in the wrong direction (stripping characters, not
allowing query execution), while database rights are completly
overlooked. All that data validation should be, is simple sanity checks
(like is_int() and *_num_rows()). Used in combination with SQL
statments and stored procedures (with limited database access on top),
you'll quickly come to see just how secure PHP can really be.
The biggest problem with PHP out there currently? People try to
secure their scripts, and then they do it incorrectly. "But I stripped
out all of the backslashes!..."