Jump to content

Fcgid spawning new processes causes brief timeouts


Recommended Posts

It took me a while, a lot of trial and error, but I think I have almost optimized my VPS. Only annoying issue I have at the moment is with apache and fcgid.

From what I understand the number of fcgid processes started depends on the number of requests currently handled. Normally 3 processes are enough for my website, but from time to time the count jumps to 6. I have allowed maximum 8. I suspect (not sure, though) it happens when a search bot (possibly google) makes a lot of requests at once. When that happens the load on the server jumps a lot and the website is unaccessible for about 1 minute, giving 500 internal server errors to visitors. Shortly after that the number of requests drop, the extra fcgid processes quit and everything is back to normal. This happens for about 1-2 times a day. Not a big deal, but still annoying.

I think limiting the max number of fcgi processes to 3 is not really an option as it could stop legitimate users. On the contrary it seems that spawning more then 5-6 for a brief period of time completely kills my server, so I have no idea how to resolve the issue.

My VPS is 1GB RAM, 2 CPU Cores@3.4 GHZ. I have plenty of ram left, the problem seems to be the CPU and only when the processes are spawning. Any suggestions?

Here is from my server status page:

  Current Time: Friday, 30-May-2014 16:10:22 EEST
   Restart Time: Friday, 30-May-2014 12:23:14 EEST
   Parent Server Generation: 0
   Server uptime: 3 hours 47 minutes 7 seconds
   Total accesses: 31983 - Total Traffic: 340.4 MB
   CPU Usage: u.22 s.01 cu0 cs0 - .00169% CPU load
   2.35 requests/sec - 25.6 kB/second - 10.9 kB/request
   9 requests currently being processed, 0 idle workers

WRW..W...L.R...WR.R.............................................
................................................................
................................................................
................................................................


 Total FastCGI processes: 8
     _______________________________________________________________________________________________________________________________

   Process: php5.fcgi  (/home/.../fcgi-bin/php5.fcgi)

    Pid  Active Idle Accesses  State
   17158 482    47   33       Ready
   17168 477    47   19       Ready
   17182 476    48   30       Ready
   17108 518    21   48       Ready
   17156 482    21   14       Ready
   17159 482    2    28       Working
   17205 473    3    28       Working
   17157 482    13   40       Working


And this is top output, currently because of the many processes the CPU is spiking I think:

top - 16:15:15 up 9 days, 15:59,  1 user,  load average: 1.00, 1.93, 1.70
Tasks:  49 total,   1 running,  48 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.3%us,  0.7%sy,  0.0%ni, 73.4%id, 22.9%wa,  0.0%hi,  0.0%si,  0.7%st
Mem:   1048576k total,   404612k used,   643964k free,        0k buffers
Swap:  1048576k total,    88536k used,   960040k free,   128496k cached

Link to comment
Share on other sites

I recommend dropping fcgid and setting up PHP-FPM to start, if you can. PHP-FPM is superior to standard FastCGI in almost every way, though I've never personally used Apache with PHP-FPM, I know it's possible.

Then I would just set up a static number of pools. I would say around 5-10 should generally be fine, depending on how much traffic you have.

In the worst case scenario, regardless of which you use, your clients shouldn't receive a 503 error unless you exceed your maximum queue/backlog. With fcgid, this can be set with the -listen-queue-depth flag. With PHP-FPM, it's set with the listen.backlog directive. If you have 3 pools spawned and your server received 5 connection requests at the exact same time, for example, the last two requests should normally be put on a queue while the server waits for the other 3 requests to finish.

I wouldn't think simply spawning/despawning pools should kill you that much CPU wise though. Are you sure you're not just running into memory issues and swapping?

Link to comment
Share on other sites

Yeah, I know, but my panel does not support php-fpm, so currently I am pretty much stuck with apache+fcgid until I migrate to centmindmod.

From the look of it I just assumed it is a CPU issue, because I always have more then 500mb ram free. What is the other option, I/O delays? Is there anyway I can properly check what is the problem during a slowdown. I only know "top" and it doesn't give me much information.

Link to comment
Share on other sites

Ah, right.

I'd install htop to start. It's far better and easier to work with than regular old top.

If you want to monitor disk usage on your server, there's also iotop, but I don't think that's the case here.

To be honest, 500MB of free memory isn't that much. A few extra fcgi requests I imagine shouldn't reasonably consume that much memory, but I'm not entirely sure, I don't have a lot of experience with fcgid. But from what I remember, it suffers from a number of efficiency problems, such as spawning a separate APC pool for every process it spawns (which in turn can quickly burn up memory). Have you looked at how much memory you have free when you are suffering these performance issues?

Link to comment
Share on other sites

OK, i'll get htop.

No problem with the memory, Right now the site is ticking nicely with < 300MB ram consumed, about 50 people online. At peaks it goes to 500-600mb, but thats it. You can see the top output I pasted earlier, this was at peak period:

Mem:   1048576k total,   404612k used,   643964k free,        0k buffers
Swap:  1048576k total,    88536k used,   960040k free,   128496k cached

I do have APC with 80mb pool, but I think it is not reserved immidiately, is it?

I should say that when that happens everything is slow, logging in through ssh, executing of basic commands, everything.

Link to comment
Share on other sites

I do have APC with 80mb pool, but I think it is not reserved immidiately, is it?

I believe you're right, I don't think the memory is reserved immediately, but during peak hours if the activity level was high enough, it seem feasible that it may end up causing you issues.

I should say that when that happens everything is slow, logging in through ssh, executing of basic commands, everything.

Yeah. These are the exact symptoms of a server running out of memory and swapping. That's why I keep going back to it. That's the only thing I can immediately think of that would be causing your problems.

Or something else somewhere might be causing your server to run out of memory during peak hours.

Have you been able to check how much memory is available during these slowdowns?

Link to comment
Share on other sites

I believe you're right, I don't think the memory is reserved immediately, but during peak hours if the activity level was high enough, it seem feasible that it may end up causing you issues.

Yeah. These are the exact symptoms of a server running out of memory and swapping. That's why I keep going back to it. That's the only thing I can immediately think of that would be causing your problems.

Or something else somewhere might be causing your server to run out of memory during peak hours.

Have you been able to check how much memory is available during these slowdowns?

This happens only briefly though, the new processes get killed before the entire APC pool is filled I assume. It is just this spawning/killing that causes the slowdown.

Yes, the top I posted was from one of those slowdowns. It wasn't from the terrible ones that completely block the server, but still noticeble enough. But even in the terrible ones I have never seen it running out of memory.

Link to comment
Share on other sites

  • 2 weeks later...

Somewhat old post.... but if you're seeing far too high of a disk utilization, be sure to check what's actually eating up the disk. Tools like iotop, etc. will help you.

I don't think the initial diagnosis of the problem is correct. That is, I don't think it's sudden spike load caused to your site. Something is killing your disk, but it's not your front-end site - your apache/fcgi numbers are too tame to be creating that problem unless there's a bad software related crazy disk usage.

Link to comment
Share on other sites

you try xcache instead of apc with fcgid?

seems to me there were a bunch of people noticing this issue last year with newer php version, iirc php 5.4 and above with apc.

Xcache for some reason didn't work for me before. But if the problem persists I might try. I think I am still on 5.3.

Somewhat old post.... but if you're seeing far too high of a disk utilization, be sure to check what's actually eating up the disk. Tools like iotop, etc. will help you.

I don't think the initial diagnosis of the problem is correct. That is, I don't think it's sudden spike load caused to your site. Something is killing your disk, but it's not your front-end site - your apache/fcgi numbers are too tame to be creating that problem unless there's a bad software related crazy disk usage.

Hm, is it possible that disk being slow can cause spawning of new fcgi process? Like if current process is busy waiting for information from i/o to start a new one? Only thing I know for sure is that when it is slow I have 6-7 fcgi processes, while normally I have 3. It has been pretty stable for the last 10 days by the way. I guess I will have to live with issues like this, before I switch to centminmod.

Link to comment
Share on other sites

Hm, is it possible that disk being slow can cause spawning of new fcgi process? Like if current process is busy waiting for information from i/o to start a new one?

Yes. That is an expected side effect. It'll keep creating fcgi processes every time a request comes in and there are no available slots. So, if the existing slots are busy for a long period of time due to busy disk, it'll keep creating more until it's reached the limit.

It's possible that something like a backup might be running or some IPB cron job that's bigger than it should be. It's also possible that, since it's a VPS, some other use is hogging up the disk temporarily (like them running a backup, someone running benchmark..., etc) and even attempting to use a tiny bit of disk results in extreme utilization. (Because %wa is a measurement of cpu, not actual reading of the disk). Using something like iostat will show whether you're actually using a lot of read/writes or little.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...