MOPS Submission 04 – Generating Unpredictable Session IDs and Hashes

May 9th, 2010

Today we want to present you the fourth external MOPS submission. It was submitted by Jordi Boggiano and explains how to generate unpredictable session ids and hashes in PHP.

Generating unpredictable session ids and hashes

Jordi Boggiano

It is not uncommon for web developers to have to generate random ids or hashes, for instance large scale project or frameworks may want to implement their own PHP session handlers either completely abstracted in their API, or overloading PHP’s internal API using session_set_save_handler(). If you do so, unless you want to entrust PHP’s core to do it, one thing you will have to take care of is generating unique session ids to send as a cookie to your users, allowing the session to persist. Other common use cases for such unique hashes is to generate CSRF tokens to insert in forms or URLs, and finally authentication tokens for email validation or such.

Common misconceptions and human errors

The most common error programmers do when implementing this is due to the fact that we humans are very bad at judging randomness. Especially once you add a hashing algorithm on top of it, it becomes very quickly impossible to discern whether the hashes you create are indeed unique and random enough or not. To illustrate, here is twice the output of md5(rand(1,5)):

eccbc87e4b5ce2fe28308fd9f2a7baf3
a87ff679a2f3e71d9181a67b7542122c

If you don’t know better and just look at those values, it all seems pretty random, but using this for your session id would basically mean you would have collisions once you get 5 concurrent users, most likely less than that even since two consecutive calls might return the same value.

Most developers have some clue and wouldn’t do such an obvious mistake, but here comes another common mistake: “let’s use time !”. Indeed, time is linear and if you use microtime(true), you will have pretty unique ids and very low chances of collisions unless you have a high load site with multiple servers. However this is very insecure since anyone knows the approximate time of your server, an attacker could then generate hashes for every micro-second and then potentially hijack a session. I won’t go further in details about the ways those attacks work since this is more about securing your site than hijacking sessions.

Another common approach is to use rand(), mt_rand(), or uniqid(). The first two are somewhat random but more sophisticated attacks can also render them quite dangerous to rely upon, while the latter is just based on the time, so it’s not offering any benefits over using microtime().

Randomness explained

The main thing to understand is how random values are generated. Computers are deterministic machines, they are made to be reliable and consistently produce the same output for a given input, therefore asking them to generate anything out of thin air is extremely difficult. To circumvent this, operating systems have ways to get really random values, that are virtually impossible to guess. Those are using entropy (i.e. noise or chaos) generated by the machine’s environment, using an aggregate of various inputs like hard-drive I/O, fan speed, etc.

Two variants are available on most UNIX machines, built as files you can read randomness from, /dev/random is a high quality, locking random number generator, which means what you read from it is erased, so it might be empty and locking your application until it is filled again so you can read from it. This is fine on high load servers but it’s quite dangerous since if it gets empty, new users waiting for a session id will have to wait seconds or the request will completely time out, not quite the first experience you’d want them to have. The other alternative is /dev/urandom, the “unlocked” variant. This one records data and will serve you instantaneously a chunk of it. It is therefore of lesser quality since it will potentially give you similar output twice, but is the best you can get if you can’t rely on /dev/random being fast enough, and it will beat PHP’s mt_rand() any day.

The Windows counterpart is sadly not built as a file anyone can read from, but is available through the OS API. From PHP, you can call it via the COM extension. See the example at the end of this post for more details.

Also as of PHP 5.3.0, the openssl_random_pseudo_bytes provides you with a good algorithm from libssl that returns pseudo-random bytes. Beware if you use it to check for the value of the $strong parameter though, because if it’s not true, it means the algorithm couldn’t use the optimal method in which case you’d better do it yourself.

Hash algorithms

Getting random data is one thing, but then you most likely want to normalize it, which has two benefits: hiding what you used to create the unique id, and making it safe to be stored and passed as a cookie. Leaking random data to the user might hint an attacker as to what method you used to generate it, so it’s better not to. While trying to set a cookie with data that contain weird characters might leave you with corrupted headers or a PHP warning appearing on some pages, making it hard to debug.

For this purpose, PHP offers the hash extension that is enabled by default and available since PHP5.1.2, so that should run on most hosts by now. By executing hash_algos() or via your phpinfo page, you can see what algorithms are supported by your version. There are a lot of algorithms to choose from, and they vary a lot in quality. Some are not designed for cryptographic purposes and therefore are very prone to collision attacks, while some were once great but have been cracked since then. The other criteria used to pick one is the speed at which they run, slow algorithms means an attacker will have to use more processing power in order to brute-force his way in, so you don’t want an algorithm that performs well. There is a benchmark available in the user comments of the hash() function page, which shows, at the time of this writing, that Whirlpool is one of the slowest, and it would be my recommendation since it offers good cryptographic quality on top of it. However, keep an eye out for security advisories. Cryptographic algorithms are permanently under attack and it is not guaranteed that Whirlpool will still be a good choice 5 years from now.

And while it is not really the topic of this article, please note that even though MD5 is very popular, it is not a good idea for password hashing since, due to it’s popularity, many people have built rainbow tables for it.

The final solution, for now..

To summarize, here is a basic implementation that should work provide optimal results on any machine, while working best on UNIX, but until the Windows API is supported by PHP you can’t really do more there.

function generateUniqueId($maxLength = null) {
    $entropy = '';

    // try ssl first
    if (function_exists('openssl_random_pseudo_bytes')) {
        $entropy = openssl_random_pseudo_bytes(64, $strong);
        // skip ssl since it wasn't using the strong algo
        if($strong !== true) {
            $entropy = '';
        }
    }

    // add some basic mt_rand/uniqid combo
    $entropy .= uniqid(mt_rand(), true);

    // try to read from the windows RNG
    if (class_exists('COM')) {
        try {
            $com = new COM('CAPICOM.Utilities.1');
            $entropy .= base64_decode($com->GetRandom(64, 0));
        } catch (Exception $ex) {
        }
    }

    // try to read from the unix RNG
    if (is_readable('/dev/urandom')) {
        $h = fopen('/dev/urandom', 'rb');
        $entropy .= fread($h, 64);
        fclose($h);
    }

    $hash = hash('whirlpool', $entropy);
    if ($maxLength) {
        return substr($hash, 0, $maxLength);
    }
    return $hash;
}

This code takes cares of having decent basic entropy through a combination of uniqid() and mt_rand(), and then adds some more from /dev/urandom, libssl and windows’ API, depending on what is available. Finally it hashes it all with the Whirlpool algorithm and returns that. It also provides you with a maxLength option, that allows you to get shorter hashes, since Whirlpool returns 128 characters long strings, and that might be a bit much if you plan to use it for an URL token. However cutting it means reducing it’s uniqueness so do it with caution.




  • Volker
    Cool Article!
blog comments powered by Disqus