Learn SQL, please

cthree · March 3, 2011

This is STUPID:

SELECT * FROM table LIMIT X, Y

Do you know why?

Matt · March 3, 2011

Because X and Y need to be integers?

Because "table" doesn't exist?

I presume you mean the "select *" part. The issue is that sometimes you do need "*". There were parts of IP.Board that used a field specific list but we got complaints from modders that needed the information for their hooks, so if it comes down to needing 95% of the fields, we do select them all.

cthree · March 3, 2011

No, no, because it is a table scan ALWAYS. Indexes are ignored and the table is scanned from the beginning to X+Y rows returning the last Y rows scanned.

Case in point:


EXPLAIN SELECT * FROM ibf_message_text ORDER BY msg_id ASC LIMIT 2768648,5000;

+----+-------------+------------------+-------+---------------+---------+---------+------+---------+-------+

| id | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows    | Extra |

+----+-------------+------------------+-------+---------------+---------+---------+------+---------+-------+

|  1 | SIMPLE      | ibf_message_text | index | NULL          | PRIMARY | 4       | NULL | 2773648 |       |

+----+-------------+------------------+-------+---------------+---------+---------+------+---------+-------+

1 row in set (0.00 sec)

Thanks for playing!

OFFSET is death. LIMIT X, Y is death.

Joshua Hina · March 3, 2011

No, no, because it is a table scan ALWAYS. Indexes are ignored and the table is scanned from the beginning to X+Y rows returning the last Y rows scanned.

Case in point:


EXPLAIN SELECT * FROM ibf_message_text ORDER BY msg_id ASC LIMIT 2768648,5000;

+----+-------------+------------------+-------+---------------+---------+---------+------+---------+-------+

| id | select_type | table            | type  | possible_keys | key     | key_len | ref  | rows    | Extra |

+----+-------------+------------------+-------+---------------+---------+---------+------+---------+-------+

|  1 | SIMPLE      | ibf_message_text | index | NULL          | PRIMARY | 4       | NULL | 2773648 |       |

+----+-------------+------------------+-------+---------------+---------+---------+------+---------+-------+

1 row in set (0.00 sec)

Thanks for playing!

No offense but this reads like an insult. This area is for suggestions and feedback NOT insulting how Invision Power Services conducts it's operations or writes its products. I think this deserves a lock.

Matt · March 3, 2011

No you're wrong. :)

That's not a table scan. It will use indexes to select the rows before returning the data. If sufficient indexes are not available then it will perform a full table scan but that has nothing to do with SELECT * ...

Matt · March 3, 2011

EXPLAIN SELECT * FROM ibf_message_text ORDER BY msg_id ASC LIMIT 2768648,5000;

The lack of WHERE and the large OFFSET indicates to me that this query, if generated from IP.Board, is from a rebuilding function or an upgrader. It's certainly not something you'd come across in normal use.

1 row in set (0.00 sec)

In any case, it seems to be quite efficient.

cthree · March 3, 2011

it's obviously from an upgrade script, that's not my point. I've got 4 million PMs to convert, it matters to me, that also isn't the point. It is a query which is part of IPS code I'm growing old watching run and that makes it feedback. It's something that should not be done, that makes it a suggestion.

Are you sure it's not part of code that is run often or even periodically? Should it be part of any code? If it is and you change it and the software get faster and better is that not a positive thing?

That's my point.

Matt, sorry, you are wrong in this case. There is no WHERE clause and thus no index to choose from.

cthree · March 3, 2011

1 row in set (0.00 sec)
In any case, it seems to be quite efficient.

The EXPLAIN query is, the SELECT query is definitely not.


5000 rows in set (2 min 34.69 sec)

Matt · March 3, 2011

An administrative function won't be as efficient as daily SQL and it can take a while for it to process through. MySQL is fairly poor at large offsets and there's not much we can do about that.

Having said that, I believe we can squeeze a bit more out by removing the offset and instead storing the last processed ID and then simply doing SELECT * FROM table WHERE id > {num} LIMIT 100

Luke · March 4, 2011

This is STUPID:

SELECT * FROM table LIMIT X, Y

Do you know why?

An index is used for sorting, but after that LIMIT is done last. LIMIT is done on the resulting set. The reason why it's slow is because of the large number of rows. Yes you can optimize this by using WHERE on the primary key, but that can't always be done, especially on a dynamic data-set where data can be deleted. And as Matt mentioned, this is on a query that is used only once for upgrading purposes. If it bugs you that much you could always suggest a change and ask nicely. If you be a jerk about it they'll be less likely to listen to you. If they don't change it and it takes hours to convert PM's when you could get it done faster by changing the query yourself,.. that's your fault for not doing it. As a programmer/system admin I would do this myself and THEN suggest the change. I wouldn't come here with a condescending attitude.

I don't know if you've noticed but IPB has a query debug page that shows them EXPLAIN's for every query. They do pay attention to this during development. If I find a slow query I'll point it out; results can be vastly different from development to production.

cthree · March 4, 2011

An administrative function won't be as efficient as daily SQL and it can take a while for it to process through. MySQL is fairly poor at large offsets and there's not much we can do about that.

Having said that, I believe we can squeeze a bit more out by removing the offset and instead storing the last processed ID and then simply doing SELECT * FROM table WHERE id > {num} LIMIT 100

Thank you sir! I apologize for being 'Barneys Girlfriend'y. I've been upgrading my site since last Friday (7 days) and it has been very rough and I've had very little sleep or rest and have dealt with so much abuse and plain nastiness from some choice people that I guess it has made me a little cranky.

select * from table where id > X LIMIT Y

is not going to be "a little" faster, it will be orders of magnitude faster for even very large datasets. No reason anything should take any longer than it needs to.

In my work work I use automated unit testing to ensure that methods produce expected results. Something like this would likely be caught by a unit test around that method which tests it with 100 rows and with 10 million rows. Testing also makes code more modular because it needs to be broken into testable units. It forces the developer to answers the question "what does this code do?" or rather "what must this code do?". In this case you would probably end up refactoring it into a row iteration class or method that you could reuse everywhere you need to do something like this. unit testing will change what you code, how you code and prove that what you coded does what it is supposed to do. Makes your .0 releases a lot less scary.

I highly recommend you guys look into driving your development with a good unit testing platform like phpUnit.

Thanks again for hearing out my complaint/concern. My feedback was meant to be constructive.

cthree · March 4, 2011

An index is used for sorting, but after that LIMIT is done last. LIMIT is done on the resulting set. The reason why it's slow is because of the large number of rows. Yes you can optimize this by using WHERE on the primary key, but that can't always be done, especially on a dynamic data-set where data can be deleted. And as Matt mentioned, this is on a query that is used only once for upgrading purposes. If it bugs you that much you could always suggest a change and ask nicely. If you be a jerk about it they'll be less likely to listen to you. If they don't change it and it takes hours to convert PM's when you could get it done faster by changing the query yourself,.. that's your fault for not doing it. As a programmer/system admin I would do this myself and THEN suggest the change. I wouldn't come here with a condescending attitude.

I don't know if you've noticed but IPB has a query debug page that shows them EXPLAIN's for every query. They do pay attention to this during development. If I find a slow query I'll point it out; results can be vastly different from development to production.

I've already apologized for being cranky and a little sour and I explained why. It is not my fault for not validating the sanity of the code before running it. That is ridiculous. That is the same as saying that every bug is my fault because I didn't test and debug the software first. I've identified dozens of queries which take an exceptionally long time to the support department. Those are bugs.

Luke · March 4, 2011

I'm not saying they aren't valid concerns. I was just a little put off by the topic title "Learn SQL, please" because it implies they don't know what they're doing. They have been in the business a long time :). You can test a query backwards and forwards, but sometimes it really depends on the data in production. And nobody is perfect :).

Also your apology came after my post, so no need to get defensive.

Five Invision Community 5 features your team will love

Five Invision Community 5 features your members will love

Invision Community 4: SEO, prepare for v5 and dormant account notifications

Invision Community 5: Beta testing and latest updates

Learn SQL, please

Recommended Posts

cthree

Matt

cthree

Joshua Hina

Matt

Matt

cthree

cthree

Matt

Luke

cthree

cthree

Luke

Archived

Recently Browsing 0 members

More