Sanitisation and Validation in PHP

Sanitisation and Validation are important terms to understand when writing PHP applications.



Step 1

What Do Sanitisation and Validation Mean? Sanitisation and Validation are important terms to understand when writing PHP applications. Both in the context of this tutorial are about processes performed on user input. Sanitisation is cleaning user input to make it safe to process, and Validation is checking the data to see if it is: in the correct format; of the correct type etc. It is important to sanitise and validate data coming in from users of your PHP applications, because if it is left unchecked, the input may be used to facilitate an exploit. Some of the most common exploits involving user input are: code injection, sql injection and header injection. And we will have a look at some of these during the tutorial.

Step 2

Validation is a vital topic when handling user input. It helps to improve security, improve usability and reduce the amount of bugs in your program. To validate something, we first work out a criteria which our user input has to conform to. For example, we might want the user input to be a number between 10 and 99, we then test the user input against these rules, and if the input fails the check(s) we will not use the data and inform the user that they have input something incorrect. Ok, but what does that mean in terms of code? Well here's an example of the code you might use to test a number to see if it is between 10 and 99.
<?php
// check the input
if($_POST['number'] >= 10 && $_POST['number'] <= 99)
{
   // the number is fine, continue
   echo $_POST['number'];
}
else
{
   // the number provided is not within range
   die('The number provided is not valid.  Please provide a number between 10 and 99.');
}
?>

Step 3

Validation is especially useful because once we are certain of what format the user input is in we might not have to sanitise it. For example, in the previous code snippet we no longer need to sanitise $_POST['number'] because to have passed the validation it would have to be a number, and is therefore harmless. A practical example of this might be in an email form, where we are taking user input and then placing it in an email header. For example, this script is vulnerable to header injection:
<?php
// the email to send to
$myemail = 'ted@platypus.org.uk';
 
// from header
$from = 'From: ' . $_POST['name'] . ' <' . $_POST['email'] . '>';
 
// send the email
mail($myemail,$_POST['subject'],$_POST['message'],$from);
?>
This script is not validating any of the input it is given, so a user could send an email with a line break within it. This would then allow them to add extra headers to the email, which is not desired. More fundamentally, it just makes no sense not to validate the input. If someone has sent an email like "not_a_valid_email" the email should just not be sent. To combat this we can validate the input provided by the user, to see if it makes sense to allow it. This could be done with string functions, but it is much easier to introduce Regular Expressions (see link to tutorial about regexps). We can use RegExps in the previous example to make the script much more sensible:
<?php
// the email to send to
$myemail = 'ted@platypus.org.uk';
 
if(!preg_match('/^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$/',$_POST['email']))
   die('Invalid email proved, the email must be in valid email format (such as name@domain.tld).');
if(!preg_match('/^[-_ 0-9a-z]$/i',$_POST['name']))
   die('Invalid name proved, the name may only contain a-z, A-Z, 0-9, "-", "_" and spaces.');
 
// from header
$from = 'From: ' . $_POST['name'] . ' <' . $_POST['email'] . '>';
 
// send the email
mail($myemail,$_POST['subject'],$_POST['message'],$from);
?>
The script is now both safer and more suitable.

Step 4

Sanitisation is as we said above, cleaning user input to make it safe to process further, but what does that actually mean? Well, below we have a vulnerable PHP/MySQL login form. First, we'll show it in its vulnerable state, then improve on it and show why it is now safer than it was previously.
<?php
// connection to MySQL server
mysql_connect('localhost','username','password');
mysql_select_db('database');
 
// User input
$username = $_POST['username'];
$password = md5($_POST['password']);
 
// Construct and run query.
$sql = 'SELECT id FROM users WHERE username="'.$username.'" AND password="'.$password.'"';
$result = mysql_query($sql);
 
// If there is a user, log them in.
if(mysql_num_rows($result) > 0)
{
   $_SESSION['login'] = true;
   // Redirect to admincp
   header('Location: http://somesite.com/admincp/');
}
else
   die('Incorrect username or password.');
?>
Now, on the face of things that may look safe, it's checking the username and password in the database, and only logging the user in if a user is found. However, if someone were to enter a username of '" OR password LIKE "%" -- ' then the query becomes:
SELECT id FROM users WHERE username="" OR password LIKE "%" -- " AND password="9cdfb439c7876e703e307864c9167a15"
That query fetches the id of all users in the users table (since LIKE "%" matches all rows and -- comments the rest of the line) meaning it would log them in regardless of the actual values in the database. To prevent things like this, we can use sanitisation functions like mysql_real_escape_string(). Applying this function to the user input means that characters like " which can be used to inject SQL are escaped to with a backslash (e.g. \"). So with the following code:
<?php
// connection to MySQL server
mysql_connect('localhost','username','password');
mysql_select_db('database');
 
// User input
$username = mysql_real_escape_string($_POST['username']); // sanitised input
$password = md5($_POST['password']); // already safe due to md5()
 
// Construct and run query.
$sql = 'SELECT id FROM users WHERE username="'.$username.'" AND password="'.$password.'"';
$result = mysql_query($sql);
 
// etc...
?>
The same input is sanitised, and the query becomes this:
SELECT id FROM users WHERE username="\" OR password LIKE \"%\" -- " AND password="9cdfb439c7876e703e307864c9167a15"
The code is no longer vulnerable to that SQL injection exploit. mysql_real_escape_string() is only applied to $username because $password is hashed, and hashing also sanitises data. Anything passed through a hashing function like md5() or sha1() is returned in hexadecimal. Meaning that only 0-9 and a-f characters can be returned by the function. This means any threatening characters like quotes and slashes are sanitised and we can use the resultant hash in a query without fear of injection.

Step 5

There are also other ways of sanitising input, and a very useful one is typecasting. Taking another SQL example, say we were allowing the users to specify an offset to display data. In a query something like this:
<?php
// code...
$sql = 'SELECT id,title FROM news LIMIT '.$_GET['offset'].',10';
$result = mysql_query($sql);
// more code...
?>
We could use the same function as before to sanitise this $_GET variable, but it is more appropriate to use typecasting to force it to be an integer. We can do this using intval(). Intval takes a variable, and returns its value as an integer. So, a string "14" will become the number 14, and any input that is not numeric will become 0, making the input safe to work with.
<?php
// code...
$sql = 'SELECT id,title FROM news LIMIT '.intval($_GET['offset']).',10'; // sanitised input
$result = mysql_query($sql);
// more code...
?>
An important thing to remember about sanitisation, is that it is not just required only when inputing data into something like a database! Outputting an unsanitised variable can be just as dangerous as taking it as input for another purpose. For example, say we had a simple script that took a $_GET variable called "name", then output "Hello, [name]!". If you do not sanitise the user input then a user can craft a malicious URL to your script that will send cookies associated with your domain to them. How could they do that? By placing HTML code in the URL which executes some Javascript when the page is loaded.
<?php
// Dangerous!  $_GET['name'] has not been sanitised.
echo 'Hello, ',$_GET['name'],'!';
?>
Now we know what the vulnerability, how can we stop it? Luckily PHP provides a very useful function for just this purpose, called htmlspecialchars(). This function replaces possibly dangerous characters like < with their HTML Entities. In the case of < it would become &lt;. Below we can see htmlspecialchars in use, sanitising out user input to make the script safe.
<?php
// This is now safe because the user input has been sanitised.
echo 'Hello, ',htmlspecialchars($_GET['name'], ENT_QUOTES),'!';
?>

Step 6

Conclusion: It is evident that both validation and sanitisation are very important considerations in any PHP application. If possible, you should validate over sanitising, but if you are in doubt as to what you want to recieve, or you want to allow possibly dangerous characters then you should definately sanitise it. Sanitising where you shouldn't is much less trouble than not sanitising where you should! Sanitisation and validation should be a part of your planning stages, you might want to consider jotting down all the input you are taking from the user, and noting down exactly what you expect and make a note of whether the input might require sanitisation. Do this and you will be making much more secure, and more useful PHP applications with fewer bugs and which do not allow as much spam input from undesirable users.

Step 7

Closing Notes:
  • Be careful of validation by type when dealing with $_GET and $_POST variables. If someone inputs a number, is_numeric will not be true, because all information passed as GET and POST vars are strings. So to find out if a valid number has been supplied, you can either cast and then see if it is greater than 0. Or alternatively, check the string to see if it is only made of 0-9 characters.
  • Try to use application specific sanitisation functions where possible. For example mysql_real_escape_string is more likely to return a safe string to use in a mysql query than addslashes, because the latter does not check the character set used in the database. This can lead to inconsistencies that let injection through.
  • All PHP examples are assumed to be run on a system with magic_quotes_gpc OFF.
  • For more information consult the PHP manual

2 comments:

jane holly said...

This professional hacker is absolutely reliable and I strongly recommend him for any type of hack you require. I know this because I have hired him severally for various hacks and he has never disappointed me nor any of my friends who have hired him too, he can help you with any of the following hacks:

-Phone hacks (remotely)
-Credit repair
-Bitcoin recovery (any cryptocurrency)
-Make money from home (USA only)
-Social media hacks
-Website hacks
-Erase criminal records (USA & Canada only)
-Grade change

Email: onlineghosthacker247@ gmail .com

Amalia Eva said...

I Want to use this medium in appreciating hacking setting, after being ripped off my money,he helped me find my cheating lover whom i trusted alot and he helped me hack his WHATSAPP, GMAIL and kik and all other platforms and i got to know that he has being cheating on me, in less than 24 hours he helped me out with everything, hacking setting is trust worthy and affordable contact him on: hackingsetting50 at gmail dot com

Post a Comment

STEALTH HACKER

Sponsers