Jump to content

Optimize MySQL queries with limits


Recommended Posts

Posted

We have bigint(20) type of member_id field in db on old forums (3.x). Start convertion. But when script go deeper to member_id, then it do lesser and lesser members by one mr-step. I am found cause of that.

In /applications/core/setup/upg_40000/upgrade.php on line 240 we have this select:

$select = \IPS\Db::i()->select( 'core_members.*, profile_portal.*', 'core_members', NULL, 'core_members.member_id', array( $limit, $perCycle ) )
			->join( 'profile_portal', 'profile_portal.pp_member_id=core_members.member_id' );

where $limit - is last count of member_id offset (not last member id) and $perCycle is constant 350.

But this select create not optimised query, where MySQL select each of member_id for found next $limit needed. So in my test case we have 2.4kk users and in 1.8kk point of convertion script do 1 or 3 members per 20-30 sec. It's so slow because this select query get ~10 seconds for get data. We can fix this MySQL thing with additional join, where we can select only member_id field from core_members with offset. This subquery ends very fast. And after that MySQL select only needed users with profile_portal.

Additional info about this problem you can read in english here: https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/

With this change i got less then 0.5s for query with same result against 10s+ of standard. I think you can optimize all big table convertions (and may be other things).

Fast examples:

SELECT * from core_members LIMIT 1867311,350;
// 10.3811315 sec duration from mysql profiling

SELECT *
FROM core_members
  JOIN (SELECT member_id
        FROM core_members
        LIMIT 1867311, 350) AS b ON b.member_id = core_members.member_id;
// 2.48505275 sec duration from mysql profiling

same result, ofcouse.

  • 3 weeks later...
Posted

Good suggestion. Instead of the subquery, we'd probably approach this differently (if you return the last ID that was operated on, we can do on the next cycle WHERE member_id > x LIMIT y ORDER BY member_id ASC which is probably the most efficient approach), but either way I've raised this internally for discussion and investigation. Thanks for the suggestion. :)

 

  • 2 weeks later...
Posted

@bfarber, you right! In my case I am change convertion script to where.

This is /applications/core/setup/upg_40000/upgrade.php

line 238: 

$select = \IPS\Db::i()->select( 'core_members.*, profile_portal.*', 'core_members', NULL, 'core_members.member_id', array( $limit, $perCycle ) ) // old

to 

$select = \IPS\Db::i()->select( 'core_members.*, profile_portal.*', 'core_members', 'core_members.member_id > '.$limit, 'core_members.member_id ASC', $perCycle ) // new

line 247:

return ( $limit + $did );

to

return $did;

line 250:

$did++;

to

$did = $row['member_id'];

and finally line 786:

$_SESSION['_step1Count'] = \IPS\Db::i()->select( 'COUNT(*)', 'core_members' )->first();

to

$_SESSION['_step1Count'] = \IPS\Db::i()->select( 'member_id', 'core_members', null, 'member_id DESC', 1 )->first();

With that changes convertion don't lose speed, when it process very big id's

7269a0b41f27ddd4e0e3c6623742a365.png.7987c3bbe83a5e453548402a8e9282da.png

Posted

@bfarber

Is there any update on this? I feel like upgrading members is the longest process compared to the others. So if this was sped up, it would mean less downtime for communities with a big member table.

Posted

@David..: much more time take a post convertion process of convertion legacy post markup.. x10 more time, than members with updated method with where. Fix for post convertion is start queue in several streams. But for that I haven't create solution. In simply view I can get all of them in iteration and set $post->post = \IPS\Text\LegacyParser::parse($post->post); $post->save(); It's work well, but not affect RebuildingPosts background queue (

Posted

I just want to thank you for this snippet, I noticed that my custom conversion script seemed unusually slow and when I looked at the slow query log I realized it was precisely the problem you were talking about here. So I replaced

SELECT * FROM post ORDER BY postid ASC LIMIT 1000000, 1000;

with

SELECT * FROM post JOIN (SELECT postid FROM post ORDER BY postid ASC LIMIT 1000000, 1000) AS b on b.postid = post.postid;

and the select query is now an order of magnitude faster, which is nice.

Posted
On 3/6/2017 at 8:33 AM, David.. said:

@bfarber

Is there any update on this? I feel like upgrading members is the longest process compared to the others. So if this was sped up, it would mean less downtime for communities with a big member table.

 

On 2/20/2017 at 8:55 AM, bfarber said:

Good suggestion. Instead of the subquery, we'd probably approach this differently (if you return the last ID that was operated on, we can do on the next cycle WHERE member_id > x LIMIT y ORDER BY member_id ASC which is probably the most efficient approach), but either way I've raised this internally for discussion and investigation. Thanks for the suggestion. :)

 

There is no further update beyond this yet. I've raised this internally, and we will likely see this changed in an upcoming maintenance release.

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...