Jump to content

Using uniqid() causes performance issues on certain VMs


HebRech GmbH & Co. KG

Recommended Posts

I'm not sure if this is the proper channel to address this. If it's not I'm sorry; please tell me where I should take this instead.

While determining performance issues of an IPS install on a Windows VM I profiled it. It turned out that a large chunk of the rendering time is spent running uniqid() over and over again – over 140 times just on the forum overview page. When I grepped through the source code I noticed a number of issues related to PHP's uniqid() function I'd like to speak with you about. I think that a lot of this could be done away with.

Firstly, uniqid() makes a hard assumption that PHP runs in an environment where the system time changes reliably at least every few microseconds. Every call to uniqid() causes PHP to busy-wait until the microsecond part of system time has changed (see uniqid.c:61-68). However, as VMWare points out in this white paper, timekeeping in a VM is a complex topic and it can't be guaranteed that the guest OS gets informed every time the system time has changed. In practice this means that if the system timer does not have sufficient granularity PHP will spend considerable time busy-waiting.

I did a few test runs using a script that compares various means of generating random strings (see attached file test2.php; the results are attached as results.txt). While uniqid() can perform rather well on Linux (more on that later) and does moderately well on native Windows it's always monstrously slow on the VM. Each run of the script also coincides with CPU load spikes, a further sign that uniqid()'s busy waiting is at fault. Switching to a different means of generating unique identifiers would increase performance tremendously. I'd recommend something based on random_bytes(), which shows respectable performance across the board. It's unlikely that two random strings of 16 bytes (equivalent in length to an MD5 hash) collide.

uniqid() is especially a problem since its performance on Linux is heavily dependent on whether or not the second parameter ($more_entropy) is true or not. (In the test script, gen_uniqid2() respresents uniqid() with $more_entropy = true). Most calls to uniqid() in the IPS codebase do not set $more_entropy and are thus slow on Linux.

Secondly, a common pattern found throughout the codebase is to use md5(uniqid()) to generate a random hexadecimal string. This is not a good idea; it doesn't make the ID any more random and combines the busy waiting of uniqid() with the computational overhead of MD5 for no actual gain. There are faster ways of generating random hex strings that avoid uniqid()'s pitfalls, such as bin2hex(random_bytes()), which also has the advantage of being able to scale the length of the string arbitrarily. To use an example from /system/Login/Login.php:

// This is the currently used code. It involves two busy waits.
$return = \substr( md5( uniqid( microtime(), true ) ) . md5( uniqid( microtime(), true ) ), 0, $length );

// This doesn't drown a VM-based system in busy waits.
$return = \substr( bin2hex( random_bytes( ceil( $length / 2 ) ) ), 0, $length);

// If you want to include the system time to make collisions less likely you could prepend the current time's microsecond part.
// Note that the last two digits of that might always be 0 on some systems. Perhaps restricting to 6 instead of 8 characters
// would be preferable to allow for more random bytes.
$return = \substr( substr(microtime(), 2, 8) . bin2hex( random_bytes( ceil( $length / 2 ) ) ), 0, $length);

Thirdly, the biggest culprit in making my forum slow was /system/Lang/Lang.php. It repeatedly uses uniqid() (without $more_entropy = true) to generate tokens for string replacement. While a faster random string source would help here, do these tokens really have to be random strings or would it be feasible to use an incrementing number that is kept track of somewhere on the global scale? That would be a lot faster than generating dozens of unique IDs every time a page is loaded.

Link to comment
Share on other sites

This is excellent feedback, thanks. We'll take a look at this (although, I might also suggest simultaneously that running your site inside a VM may not be the most optimized approach no matter what we do, so you may want to think about that architecture separately).

Link to comment
Share on other sites

Thanks for the quick reply.

I'm well aware that running Windows natively or running Linux on that VM might avoid that problem. I have talked to the admin (the forum belongs to a company that is mostly a Windows shop) and we'll try getting a Linux VM spun up during an IT reorganization later this year. It's still just a medium term solution but I think we can deal with the slowness for a bit if we already know it's going to be addressed one way or another.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...