Jump to content

Erratic server load spikes


Guest MarcusInMd

Recommended Posts

That is normal. The message I wrote was for the XML condition which I do not fully understand however I needed to have it there. You are getting the "$this->ipsclass->lang['search_off']" message. If you want to edit that and place text in there instead you could do that.



Either way it looks like it is working on your server. You can test it by lowering the threshold for the server load and make sure it prevents searches.




Thanks. Yes, I tried several load parameters and it works perfectly. ;) Thank you.
Link to comment
Share on other sites

  • Replies 154
  • Created
  • Last Reply

We just had a radio program on with over 230 users streaming via shoutcast and over 600 on our forum and we managed to keep the load under 4 the entire time, except for one blip to 16 for about 10 seconds. This was out of about 2 hours. I think the change I made to maxclients made a huge difference (going from 32 to 100). But I will continue to monitor the situation.

Link to comment
Share on other sites

The search function seriously causes massive server spikes and is more likely to do this when the server is loaded more than normal. I wrote something that will disable the search function when the server load exceeds a certain amount and thus help mitigate the chance that you will get a bunch of locked tables due to a slow search (and then make the server load even higher).



http://forums.invisionpower.com/index.php?showtopic=232336


Fast Lane, it looks good. There is a similar feature built into IPB which takes the board offline completely if server load reaches a level that you specify in the ACP. That might be worth taking a look to see how IPB measures the load (I seem to remember it handles Windows and Linux OS), and outputs its error message. If I remember rightly (on v2.0), it's where IPB is doing all the session/cookie checking.


We just had a radio program on with over 230 users streaming via shoutcast and over 600 on our forum and we managed to keep the load under 4 the entire time, except for one blip to 16 for about 10 seconds. This was out of about 2 hours. I think the change I made to maxclients made a huge difference (going from 32 to 100). But I will continue to monitor the situation.



Streaming, thereby your reason for needing considerably more hardware than us poor souls. Major justification for those fast disks too.

Glad my suggestion with MaxClients helped. :)

As for load, "On average, the UNIX load average metrics are certainly not your average average." (from the link I gave a few posts back).
Link to comment
Share on other sites

The radio show is not an everyday thing. With this major snowstorm getting ready to hit the east we will do a radio show each night at 10:30 until about a day before the storm or the day of the storm. This storm promises to be a big one and is really pushing the envelope of what we have done and what we can do.

We just broke our record for most online in 15 minute period. We hit 739. My projections earlier this year showed us hitting 800-1000 if an event like this was going to hit. We will probably push 1000 before Monday.

Link to comment
Share on other sites

The radio show is not an everyday thing. With this major snowstorm getting ready to hit the east we will do a radio show each night at 10:30 until about a day before the storm or the day of the storm. This storm promises to be a big one and is really pushing the envelope of what we have done and what we can do.



We just broke our record for most online in 15 minute period. We hit 739. My projections earlier this year showed us hitting 800-1000 if an event like this was going to hit. We will probably push 1000 before Monday.


If you ran

Dual Opty 170's, raid 10 scsi 320 15k drives, 8gB ram, php 5.2.1, lighttpd 1.4 or 1.5 and tuned mysql I'm sure your load would be pretty stable.
Link to comment
Share on other sites

We are still having major issues.

I dropped the ipf_posts post index last night. Still having problems but now instead of all of the ibf_posts slow queries in the slow querie log there are now showing up:

SELECT COUNT(*) as cnt, MIN(last_post) as min_last_post FROM ibf_topics t WHERE t.forum_id=15 and t.approved=1 AND t.tid NOT IN(0,122833,124309,124710,124830,124985,125021,125125,125162,125156,125134,12516
,125170,125173,125178,125186,125122) AND t.last_post > 0;
# Time: 070211 12:39:05
# User@Host: board_user[board_user] @ localhost []
# Query_time: 3 Lock_time: 0 Rows_sent: 1 Rows_examined: 42255
SELECT COUNT(*) as cnt, MIN(last_post) as min_last_post FROM ibf_topics t WHERE t.forum_id=15 and t.approved=1 AND t.tid NOT IN(0,121116,124710,125012,125021,125071,124309,124985,125173,125178,125170,12518
,125186,124233,125162,125122,125200,125192,125180,125202,125187,125201) AND t.last_post > 0;
# Time: 070211 12:39:07
# User@Host: board_user[board_user] @ localhost []
# Query_time: 3 Lock_time: 1 Rows_sent: 1 Rows_examined: 42255
SELECT COUNT(*) as cnt, MIN(last_post) as min_last_post FROM ibf_topics t WHERE t.forum_id=15 and t.approved=1 AND t.tid NOT IN(0,124982,125125) AND t.last_post > 0;
# User@Host: board_user[board_user] @ localhost []
# Query_time: 2 Lock_time: 0 Rows_sent: 1 Rows_examined: 42255
SELECT COUNT(*) as cnt, MIN(last_post) as min_last_post FROM ibf_topics t WHERE t.forum_id=15 and t.approved=1 AND t.tid NOT IN(0,124233,124710,124830,125071,125156,125021,125134,125162,125173,125178,12518
,125196,125192,125200) AND t.last_post > 1169430621;
# Time: 070211 12:40:18


Should I drop more indexes? Might I have found the root of our problems but have more index problems?

Can I remove ALL of the indexes from the tables with one command??

Thanks!

Link to comment
Share on other sites

  • 1 month later...

Bump with an update.


Well we peaked at almost 1200 members online in a 15 minute period shortly after my last post. I managed to keep the server working well around 900 members online however, when our servers started getting hit hard with requests for model data the server load spiked like nobodies business.

We purchased another server which is now being used as our primary DB server. Quad core Xenon processors (for a total of 8 cores on this machine) It has RAID 10 SCSI 15k RPM drives (seagate 75GB drives) for a total of 150GB in the RAID 10 config. Running CentOS 64bit with 4GB of ram on this new server.

Anyway, once I moved the data over to this new server we had about 400 members online at the older servers load spiked again (AYE!).

This past week linux started saying one of the SATA 10k drives was having problems and it has now failed. I am going to replace it the end of this week. I am hoping that this was the problem. The primary server still hosts images from our forum and we have 1000s of images in one folder because of the way IPB uploaded images. I still think this is contributing to the cause.

Disk I/O does not appear to be the problem here either. Processes start stacking up before the load goes through the roof and the primary server's iowait is anywhere between 0 and 5% which is hardly anything at all. And the DB server is not even blinking when these spikes occured so I know its not the mysql server.

I may have to pull this server out of production and replace it with something faster and more modernized now. Don't want to do it considering I just spent 4K on the new DB server.

Link to comment
Share on other sites

http://www.ipsbeyond.com/forums/index.php?showtopic=21779

1,200 is alot but I'm pretty sure I could handle that on one box if I total up the amount of users on line per x mins for each IPB forum its like 200ish with almost no php/mysql load (its all torrents right now)

I could create a solution with a cluster of mysql and webserver but you ditched the idea of lighttpd with your plesk. If you have more then one server I recomend trying lighttpd and having forums.yourdomain or /forums or whatever be hosted by the lighttpd server

when our servers started getting hit hard with requests for model data



Whats that?

Anyways post a top of when your server is starting to spike
Link to comment
Share on other sites

Weather Model data comes in at set intervals through out the day. When a model set is coming out, members posts links and upload images of the model data.

Sometimes we have 500 to 1000 members in one thread all hitting refresh at the same time.

Our forum is unique in this way from all others.

The amount of traffic we see in a minute probably crushes most other forums.


Top is filled with Apache processes when the server load spikes.


1200 members is a lot but 400 is not especially when we just added a new state of the art DB server.

Link to comment
Share on other sites

Could you PM me a link to your site so I could see for myself?

Also hows your top before/after the spikes with that I could most likely help

Tasks: 189 total, 1 running, 188 sleeping, 0 stopped, 0 zombie Cpu0 : 5.7% us, 1.0% sy, 0.0% ni, 89.0% id, 4.3% wa, 0.0% hi, 0.0% si Cpu1 : 5.0% us, 1.0% sy, 0.0% ni, 93.0% id, 0.3% wa, 0.3% hi, 0.3% si Mem: 1024388k total, 999996k used, 24392k free, 4520k buffers Swap: 1052248k total, 216140k used, 836108k free, 630716k cached Unknown command - try 'h' for help PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26907 lighttpd 16 0 196m 45m 42m S 9 4.5 0:11.02 php 26735 fenix 15 0 161m 44m 2696 S 2 4.5 0:55.42 python 11615 lighttpd 15 0 33992 10m 760 S 1 1.0 6:43.34 lighttpd 3524 mysql 15 0 180m 95m 2952 S 1 9.6 143:36.41 mysqld

top - 20:19:07 up 26 days, 15:10,  5 users,  load average: 0.14, 0.18, 0.17













Thats about 50req/s or so I'm guessing and 10-20req/s to actual forums hit f5 on my site about 2x persecond roughly for a bit and theres usually 2req/s to a ipb index.php page on my box at this time.

Link to comment
Share on other sites

Well we peaked [b]at almost 1200[/b] members online in a 15 minute period shortly after my last post. I managed to [b]keep the server working[/b] well around [b]900 members online[/b] however, when our servers started getting hit hard with requests for model data the server load spiked like nobodies business.



<snip>



Anyway, once I moved the data over to this new server we had about [b]400 members[/b] online at the older [b]servers load spiked[/b] again (AYE!).



<snip>



From the history of this, it looks as though you need to do more detailed analysis of what people are doing when the problems occur. The problem does not seem to be related to the number of users online. (and therefore perhaps, not linked to the flavour of apache .CMANNS.) As you say above, you've coped with 1200, managed OK with 900, but seen it all go Pete Tong with only 400 after you've moved the DB load off.

The primary server still hosts images from our forum and we have 1000s of images in one folder because of the way IPB uploaded images. I still think this is contributing to the cause.



Disk I/O does not appear to be the problem here either. Processes start stacking up before the load goes through the roof and the primary server's iowait is anywhere between 0 and 5% which is hardly anything at all. And the DB server is not even blinking when these spikes occured so I know its not the mysql server.



So you think it's possible that the server is taking too long just to find the different images it wants to serve? Getting them from the disk is fine because the iowait is low? Or is it a case of lots of people all wanting the same image in a directory that contains thousands?

What filesystem are you using?

When you have the spikes, is it lots of people all wanting the same few images?

I may have to pull this server out of production and replace it with something faster and more modernized now. Don't want to do it considering I just spent 4K on the new DB server.



Not sure you need to spend anything at all? Need to do some root cause of the bottleneck, fix that, then move everything onto your shiny new server?
Link to comment
Share on other sites

From the history of this, it looks as though you need to do more detailed analysis of what people are doing when the problems occur. The problem does not seem to be related to the number of users online. (and therefore perhaps, not linked to the flavour of apache .CMANNS.) As you say above, you've coped with 1200, managed OK with 900, but seen it all go Pete Tong with only 400 after you've moved the DB load off.


So you think it's possible that the server is taking too long just to find the different images it wants to serve? Getting them from the disk is fine because the iowait is low? Or is it a case of lots of people all wanting the same image in a directory that contains thousands?



What filesystem are you using?



When you have the spikes, is it lots of people all wanting the same few images?


Not sure you need to spend anything at all? Need to do some root cause of the bottleneck, fix that, then move everything onto your shiny new server?




I have been trying to figure this out for two years now. I don't think its related to the # of people online either but what they are all doing when the problems occur.

This is hard to diagnose now because we won't see traffic like this until a hurricane threatens the US or next fall/winter when a winter storm threatens.

Filesystem is XFS I believe. OS is Fedora Core 4, though if I can offload this server to another I will switch over to CentOS 64 bit on this one too.


I honestly cannot pin point what the users are doing when problems occur. The only thing I can guess is that they start refreshing quickly when new model data arrives to see if anyone has posted analysis/images etc.

Most of those 1200 people DON'T make posts but just read. I would say about 100 or so would be making posts.

Let me give you an example, in a 15 minute period we had close to 1200 people on. the server load went to hell quickly. I shut apache down, people leave and I am able to stablize things OK with around 900 people. But by then the model data is done and I guess those 900 hang around just ocassionally hitting refresh etc.

During the snow/sleet storm, we had about 700 to 900 on continuously without server load issues. As a matter of fact the server load rarely got above 3.5 most of the time sitting around 1 to 2.

I have looked at CRON jobs, other stuff going on on the server at the time etc. Nothing has jumped out. Before I moved the data over to the new DB server our sloq query log was filling up fast as you can see from my previous posts.

Here are some of the more recent entries (during the feb storm)


SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  4:10:03

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 8  Rows_examined: 24

SELECT p.*,

				m.id,m.name,m.mgroup,m.email,m.joined,m.posts, m.last_visit, m.last_activity,m.login_anonymous,m.title,m.hide_email, m.warn_level, m.warn_lastwarn,

				me.msnname,me.aim_name,me.icq_number,me.signature, me.website,me.yahoo,me.location, me.avatar_location, me.avatar_type, me.avatar_size, m.members_display_name

				FROM ibf_posts p

				  LEFT JOIN ibf_members m ON (p.author_id=m.id)

				  LEFT JOIN ibf_member_extra me ON (me.id=m.id)

				WHERE p.pid IN(1754100,1754300,1754534,1754694,1754716,1754728,1754737,1755408) ORDER BY pid asc;

# Time: 070212  6:41:02

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42441

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  7:11:31

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 5  Rows_examined: 5

SELECT pid,topic_id FROM ibf_posts WHERE topic_id=124919 and queued=0 ORDER BY pid asc LIMIT 0,40;

# Time: 070212  7:49:07

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42443

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42443

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  7:49:08

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42443

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  7:56:00

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 1  Rows_examined: 42404

SELECT tid, title, last_poster_id, last_poster_name, last_post FROM ibf_topics WHERE approved=1 and forum_id=15 ORDER BY last_post DESC LIMIT 0,1;

# Time: 070212  8:02:09

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 1  Rows_examined: 42405

SELECT COUNT(*) as cnt, MIN(last_post) as min_last_post FROM ibf_topics t WHERE t.forum_id=15 and t.approved=1 AND t.tid NOT IN(0,125389) AND t.last_post > 0;

# Time: 070212  8:03:05

# User@Host: board_user[board_user] @ localhost []

# Query_time: 4  Lock_time: 0  Rows_sent: 1  Rows_examined: 2887

SELECT COUNT(DISTINCT(p.topic_id)) as max FROM ibf_topics t

				  LEFT JOIN ibf_posts p ON (p.topic_id=t.tid)

				 WHERE t.forum_id=15 AND p.author_id=4709 AND p.new_topic=0

				 and t.approved=1;

# Time: 070212  8:03:08

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 1  Rows_examined: 42405

SELECT COUNT(*) as cnt, MIN(last_post) as min_last_post FROM ibf_topics t WHERE t.forum_id=15 and t.approved=1 AND t.tid NOT IN(0,125021,124705,125389,125341,125399,125421,125431,125426,125437,125439,1254

8,125444) AND t.last_post > 0;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42445

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:03:09

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 150  Rows_examined: 42555

SELECT * FROM ibf_topics WHERE approved=1 and forum_id=15 ORDER BY pinned desc, last_post desc LIMIT 0,150;

# Time: 070212  8:03:41

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42445

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42445

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42445

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42445

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:11:31

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:11:32

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:11:35

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:11:36

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 40  Rows_examined: 42446

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:20:57

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42447

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:26:41

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42447

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:46:29

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:49:05

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:49:28

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:52:02

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:55:31

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:56:40

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 1  Rows_examined: 42409

SELECT tid FROM ibf_topics WHERE forum_id=15 AND approved=1 AND state <> 'link' AND last_post < 1171242227 ORDER BY last_post DESC LIMIT 0,1;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:58:13

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42488

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 40,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  8:58:14

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42448

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved IN (0,1) ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  9:03:04

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 1  Rows_examined: 42409

SELECT COUNT(*) as cnt, MIN(last_post) as min_last_post FROM ibf_topics t WHERE t.forum_id=15 and t.approved=1 AND t.tid NOT IN(0,124233) AND t.last_post > 0;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 40  Rows_examined: 43956

SELECT * FROM ibf_topics t WHERE t.forum_id=6 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  9:03:05

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  9:10:05

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 150  Rows_examined: 46459

SELECT * FROM ibf_topics WHERE approved=1 and forum_id=15 ORDER BY pinned desc, last_post desc LIMIT 3900,150;

# Time: 070212  9:10:13

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  9:15:22

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  9:15:28

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  9:15:29

# User@Host: board_user[board_user] @ localhost []

# Query_time: 2  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# Time: 070212  9:15:30

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 40  Rows_examined: 42449

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 0,40;

# User@Host: board_user[board_user] @ localhost []

# Query_time: 3  Lock_time: 0  Rows_sent: 40  Rows_examined: 57529

SELECT * FROM ibf_topics t WHERE t.forum_id=15 AND t.pinned IN (0,1) and t.approved=1 ORDER BY t.pinned DESC,  t.last_post DESC LIMIT 15080,40;

# Time: 070212  9:21:39

Link to comment
Share on other sites

Marcus It sounds like your wasting your money beyond belief

Thats seriously wayyy too much power for a invision board

test2:/usr/local/bin # ./http_load -rate 10 -seconds 100 url.txt 996 fetches, 7 max parallel, 1.18503e+08 bytes, in 100 seconds 118979 mean bytes/connection 9.95997 fetches/sec, 1.18503e+06 bytes/sec msecs/connect: 75.8775 mean, 9042.77 max, 40.096 min msecs/first-response: 138.342 mean, 548.575 max, 125.362 min HTTP response codes: code 200 -- 996 test2:/usr/local/bin #













Was around 10req/s to a rather large topic on my site, had no adverse affects on the server besides

Tasks: 181 total, 2 running, 179 sleeping, 0 stopped, 0 zombie Cpu0 : 44.3% us, 15.7% sy, 0.0% ni, 39.3% id, 0.7% wa, 0.0% hi, 0.0% si Cpu1 : 29.6% us, 9.3% sy, 0.0% ni, 56.8% id, 1.3% wa, 0.3% hi, 2.7% si Mem: 1024388k total, 1002692k used, 21696k free, 4232k buffers Swap: 1052248k total, 192692k used, 859556k free, 692588k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31877 lighttpd 17 0 198m 13m 8104 R 49 1.3 0:01.47 php 31849 lighttpd 16 0 196m 68m 65m S 11 6.8 0:18.74 php 27129 fenix 15 0 242m 74m 2072 S 5 7.4 9:14.92 python 31848 lighttpd 17 0 195m 47m 45m S 2 4.8 0:04.72 php 23468 fenix 15 0 163m 36m 1500 S 2 3.7 54:54.15 srcds_i686 29803 mysql 15 0 171m 38m 2408 S 2 3.9 1:42.34 mysqld 31797 lighttpd 16 0 27480 3900 880 S 1 0.4 0:02.41 lighttpd

top - 15:48:10 up 27 days, 10:39,  1 user,  load average: 0.55, 2.71, 3.38
















Both servers are running 100mbit with multiple carriers. This was performed while other sites we're being posted on and such. If you'd like to try this just ask

Link to comment
Share on other sites

Marcus It sounds like your wasting your money beyond belief



Thats seriously wayyy too much power for a invision board



test2:/usr/local/bin # ./http_load -rate 10 -seconds 100 url.txt 996 fetches, 7 max parallel, 1.18503e+08 bytes, in 100 seconds 118979 mean bytes/connection 9.95997 fetches/sec, 1.18503e+06 bytes/sec msecs/connect: 75.8775 mean, 9042.77 max, 40.096 min msecs/first-response: 138.342 mean, 548.575 max, 125.362 min HTTP response codes: code 200 -- 996 test2:/usr/local/bin #













Was around 10req/s to a rather large topic on my site, had no adverse affects on the server besides

Tasks: 181 total, 2 running, 179 sleeping, 0 stopped, 0 zombie Cpu0 : 44.3% us, 15.7% sy, 0.0% ni, 39.3% id, 0.7% wa, 0.0% hi, 0.0% si Cpu1 : 29.6% us, 9.3% sy, 0.0% ni, 56.8% id, 1.3% wa, 0.3% hi, 2.7% si Mem: 1024388k total, 1002692k used, 21696k free, 4232k buffers Swap: 1052248k total, 192692k used, 859556k free, 692588k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31877 lighttpd 17 0 198m 13m 8104 R 49 1.3 0:01.47 php 31849 lighttpd 16 0 196m 68m 65m S 11 6.8 0:18.74 php 27129 fenix 15 0 242m 74m 2072 S 5 7.4 9:14.92 python 31848 lighttpd 17 0 195m 47m 45m S 2 4.8 0:04.72 php 23468 fenix 15 0 163m 36m 1500 S 2 3.7 54:54.15 srcds_i686 29803 mysql 15 0 171m 38m 2408 S 2 3.9 1:42.34 mysqld 31797 lighttpd 16 0 27480 3900 880 S 1 0.4 0:02.41 lighttpd

top - 15:48:10 up 27 days, 10:39,  1 user,  load average: 0.55, 2.71, 3.38
















Both servers are running 100mbit with multiple carriers. This was performed while other sites we're being posted on and such. If you'd like to try this just ask




Well, based on what I have read and have been told what we have is only enough horsepower to get us through to about 1400 to 1600 at peak times so I am not sure how you are calculating our system requirements.

If I ever get this problem resolved - then when we need it I will begin to cluster the front end to load balance the web server.
Link to comment
Share on other sites

I have been trying to figure this out for two years now. I don't think its related to the # of people online either but what they are all doing when the problems occur.



This is hard to diagnose now because we won't see traffic like this until a hurricane threatens the US or next fall/winter when a winter storm threatens.



Filesystem is XFS I believe. OS is Fedora Core 4, though if I can offload this server to another I will switch over to CentOS 64 bit on this one too.


I honestly cannot pin point what the users are doing when problems occur. The only thing I can guess is that they start refreshing quickly when new model data arrives to see if anyone has posted analysis/images etc.



Most of those 1200 people DON'T make posts but just read. I would say about 100 or so would be making posts.



Let me give you an example, in a 15 minute period we had close to 1200 people on. the server load went to hell quickly. I shut apache down, people leave and I am able to stablize things OK with around 900 people. But by then the model data is done and I guess those 900 hang around just ocassionally hitting refresh etc.



During the snow/sleet storm, we had about 700 to 900 on continuously without server load issues. As a matter of fact the server load rarely got above 3.5 most of the time sitting around 1 to 2.



I have looked at CRON jobs, other stuff going on on the server at the time etc. Nothing has jumped out. Before I moved the data over to the new DB server our sloq query log was filling up fast as you can see from my previous posts.



Here are some of the more recent entries (during the feb storm)



<snippity snip>



Now you've isolated the db onto the server and know the db server load is very low, I think perhaps that you need to focus elsewhere.

A simple way to think of server load is the number of *NIX processes that are in a wait state over the previous period. In my opinion, from your description, *something* is taking a long time to execute, causing processes to get queued up. I would imagine that these are calls by PHP to the kernel OS, or to MySQL. This *might* be because the IPB code is making inefficient function calls for file access (for example), or because, once the calls are made, the kernel isn't caching frequently accessed files very efficiently.

It seems it's not the SQL (the SQL problems you saw before were possibly an effect, rather than a cause), you know it's not server thrash caused by excessive iowaits.

The next time you see a major spike in server load, it would be worth capturing the server access and error logs to see what exactly users were trying to do, then walk those URL requests through to see where the lag is.
Link to comment
Share on other sites

Please express your opinions in an adult manner. We do not tolerate profanity here.


Just getting alittle sick of hearing hes got a load problem and wants to dump more money when it looks like its obviously apache

Oh well I give up some people have too much money I guess
Link to comment
Share on other sites

In an earlier post, you said you were running 2.1.7. Have you upgraded to 2.2.2 yet?

If not, do so. One of the major advances in 2.2.2 is optimized search code. For example someone on 2.1.7 searches for the term "[their city] weather". How many instances of the word weather would you guess on your site? IPB will search the database for them ALL, and then display only the first 1,000 results. In 2.2.2, the query ends after 1,000 results are found.

This is a HUGE issue with any site that has ~1,000,000+ posts. My site used to take 20+ seconds to search a common term. Since upgrading to 2.2.2, it's fractions of a second. There are also other improvements in 2.2.2 for large sites.

You also might want to consider offloading the search from MYSQL and using Sphinx.

Finally, sorry if I missed it, but do you run a PHP cache program (mmcache, eAccelerator, apc cache)? If not, that will increase your PHP efficiency, and likely drop your loads in half.

P.S. you have WAY too much server, don't invest any more in hardware.

Link to comment
Share on other sites

Exactly Blair

What I think is the problem

Your seeing the webservers load of course, you think mysql is the problem, open two SSH windows one for each server and watch the loads I bet the mysql box is running sub 1 and the webserver box is running in the 1 and higher range

Why is it? because apache cannot handle such high req/s without a super server, you also have php on the server so apache slows down php, then php crashes OR builds up high till the loads in the higher 10's and such.

You can fix this by running a lighter webserver, lighttpd is the MOST EASY webserver to setup so I suggested it, you can run lighttpd on JUST your forums since you said you need apache due to plesk.

If you have so much money/server power why run plesk? or why run this site on the plesk server?

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.

×
×
  • Create New...