Fast Lane! Posted September 9, 2009 Posted September 9, 2009 The main issue I found (after exhaustive log digesting and research) is that the "Find all member posts" mysql statement is SLOW as heck with large post tables (large forums with lots of posts). It also is even slower when run on members that have LOTS of posts. Despite indexing it is SLOW... like five minutes slow sometimes. The change I made below is similar to other methods I suggested (and IPB integrated into the mainline code as large board optimizations) previously. If the place I made the change is not "preferred" then I am sure another location can be found... So what did I do? Basically you can restrict (in this case 90 days) the amount of posts to find back in time. Because the "post_date" column is indexed the search goes from slow to SUPER FAST because it allows a large reduction in the data set. @IPB: Since you support 2.x still any chance to get this into a baseline (or in 3.x)? I am holding out myself for a few more minor rev's on 3.x to shake out most the early bugs/security issues any new release has. forums/sources/sql/mysql_quesries.php was: code execution time is now way less (from 5 minutes sometimes on my board to seconds). function search_get_all_user_count( $a ) { return "SELECT count(*) as count FROM ".SQL_PREFIX."posts p LEFT JOIN ".SQL_PREFIX."topics t ON (t.tid=p.topic_id) WHERE t.approved=1 AND t.forum_id IN({$a['forums']}) AND p.queued=0 AND p.author_id={$a['mid']}"; } is: function search_get_all_user_count( $a ) { $time_restrict = "p.post_date > " . ( time() - 60 * 60 * 24 * 90 ) . ""; return "SELECT count(*) as count FROM ".SQL_PREFIX."posts p LEFT JOIN ".SQL_PREFIX."topics t ON (t.tid=p.topic_id) WHERE t.approved=1 AND t.forum_id IN({$a['forums']}) AND p.queued=0 AND p.author_id={$a['mid']} AND {$time_restrict}"; }
Fast Lane! Posted September 11, 2009 Author Posted September 11, 2009 For the record you need to also edit this function (just realized this needs posted sorry): The speed up is real and is FAST! You can replace the "30" day limit to anything you want. Long is slower but if you have a database spanning several years this will improve the MYSQL query time dramatically if you limit it to any fraction of that. function search_get_all_user_query( $a ) { $time_restrict = "p.post_date > " . ( time() - 60 * 60 * 24 * 30 ) . ""; return "SELECT p.*, t.*, t.posts as topic_posts, t.title as topic_title, m.*, me.*, pp.* FROM ".SQL_PREFIX."posts p LEFT JOIN ".SQL_PREFIX."topics t ON (t.tid=p.topic_id) LEFT JOIN ".SQL_PREFIX."members m ON (m.id=p.author_id) LEFT JOIN ".SQL_PREFIX."member_extra me ON (me.id=p.author_id) LEFT JOIN ".SQL_PREFIX."profile_portal pp ON (m.id=pp.pp_member_id) WHERE t.approved=1 AND t.forum_id IN({$a['forums']}) AND p.queued=0 AND p.author_id={$a['mid']} AND {$time_restrict} ORDER BY post_date DESC"; }
Fast Lane! Posted September 11, 2009 Author Posted September 11, 2009 @IPB Staff, can you please consider adding this as a large board speed option in the ACP. It REALLY reduced the server spikes on my server which is a RAID 10 box with dual quad core xeons. If I was affected and needed this then I know other people are too.
bfarber Posted September 11, 2009 Posted September 11, 2009 I remember implementing something similar in 2.x based on recommendations - only to get my head chewed off by sites that aren't quite as active. Suddenly there were no posts on the portal, or only 1 or 2 (because it was limiting to 30 days). No posts in user profiles because they hadn't posted in over 30 days. etc. We could look into making it a setting at some point in the future perhaps.
Fast Lane! Posted September 12, 2009 Author Posted September 12, 2009 Awesome. A quick flag in the ACP to turn on (with date limit) would work. Even a flag in the source code with a variable time limit (that people can adjust) would work. I understand lesser active boards do not need this but for the larger ones the MYSQL query "as is" is pretty intensive when sorting through years of posts in a (in my case) > 5 GB table.
Fast Lane! Posted September 12, 2009 Author Posted September 12, 2009 I just thought of another improvement that may make this "more smart". I will probably code this up for my board shortly... I will go and first query the "last active" date which is a trivial query (work and time wise). I will then back date the "30" or XX days prior to that. This way members that have not been active in a while (and have the find all posts query run on them) will still get results returned. I will report my results EDIT: Also the "find my last 10 posts" query can have the time limit portion added too. It basically right now finds ALL your posts and then performs a sort and limit. Reducing the query using the post_date index should help as well.
Management Matt Posted September 14, 2009 Management Posted September 14, 2009 I'm actually currently implementing it with a default value of 365 but I'm doing it like this: It'll grab the user's last post (ibf_members.last_post) and then go back X days (last_post - ( 86400 * setting ). If there is no last post stored, it just uses the current timestamp to fire off the most efficient query. As expected, it didn't make any different locally. Will be interesting to see what it does on the company forums. This will be in 3.0.4
bfarber Posted September 14, 2009 Posted September 14, 2009 Just to add since Matt didn't specify, I believe there's going to be a setting to control the cut off.
Fast Lane! Posted September 16, 2009 Author Posted September 16, 2009 One of the things I noticed too (and I changed on my board for speed) was that members with larger post counts took longer to run this series of queries on. In 2.3.x (not sure on 3.x) The order of queries is, 1) Select count(*) as count, 2) Select data (with joins) then sort based on post date (desc) and then limit xx,25 (where xx is the page you want to view (0 page 1, 25 page 2, ...). I noticed that the select count(*) statement for users that have a lot of posts took a LONG time compared to the regular select/join/order/limit statement after it. The "count" is only required so you can calculate the number of "pages" to show. I thought there was more to it but that is it. It was such a resource intensive step for some of my members that to basically double the speed of this query I eliminated this query and hard coded 250 posts (which has the effect of populating 10 page counters even if there are not enough posts). I post a line in the search results saying "Last 250 posts are shown. Some pages may be blank if the user has less than 250 posts in the last xx days." Long story, I am not sure my method is really a good one. I would love a better idea. The important part is the realization that the "count" query was up to 50% of the total query time overall. Sometimes more! This simply to get the page count. There must be a more efficient way? If so we can potentially save 50% of the total query time which on a large board causes massive table locks on the post table (effectively taking the message forum offline as queries on the post table queue up on the locked table). Thoughts? I see some potential here for big gains. My only other thought was simply query on the user post count but for many reasons that is not always accurate over time.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.