Invision Community 4: SEO, prepare for v5 and dormant account notifications By Matt Monday at 02:04 PM
CheersnGears Posted February 23, 2012 Author Posted February 23, 2012 So, I'm still having problems. the MySql tuner reports everything as clean, but if/when the IPBoard software kicks off a table optimize on the Posts table, the whole thing comes crashing down. What happens is the post table is locked while optimizing so every Apache process that requests from the Posts tables gets wait listed. Since it can take 30 - 40 minutes for the server to optimize my posts table, the Apache processes keep building up and building up taking up more and more memory while waiting for MySQL to respond to their request. They don't seem to time out (or if they do, they don't give up the memory)... and eventually causing MySQL to have less available memory therefor slowing it down more and sending the server into a death spiral. Eventually they fill all of the memory on the server and lock it to the point of needing to force power off. I'm running with 3.3 gig right now. I've also tried running at 8 gig (at substantial extra cost) and all it did was marginally increase the time it took for the Apache processes to crash the system. here is a screenshot of the Apache processes doing their thing. Note that the server was totally locked at this point and I had to force power off.
Management Lindy Posted February 23, 2012 Management Posted February 23, 2012 VPS's are not known for strong performance -- especially the provider you're using (assuming I'm correct, we had several clients there until recently.) SANs are everything and theirs are prone to what seem to be constant issues. As a test - when the load climbs, please type 'sar' from the command line and paste the output. I feel you may have an I/O issue.
CheersnGears Posted February 24, 2012 Author Posted February 24, 2012 The problem is that these symptoms only ever occur at times of very low traffic. 98% of our traffic is North American based (and I have China blocked at the firewall) yet the issue occurs in the early morning hours when no one is on. The site runs great the rest of the time and we take spikes in traffic very well. If you wish, I can activate a table optimize and then run that sar command as the resources run out. Would that be a suitible test for you?
Management Lindy Posted February 24, 2012 Management Posted February 24, 2012 Approximately 0100 EDT (perhaps finishing around 0300-0400)? If you're using cPanel, that is the default start time for server backups. That would explain the lock on the post table, etc. That's a relatively intensive process on a VPS. You could try switching to weekly backups and go a day or so to see if the problem still occurs. Or, if you have the disk space, you can turn incremental backups on and disable compressed backups. The gzip process is quite intense as well.
CheersnGears Posted February 24, 2012 Author Posted February 24, 2012 No cpanel. Just an Ubuntu LAMP install.
Clickfinity Posted February 24, 2012 Posted February 24, 2012 Would switching the POST table to InnoDB help - by removing the optimisation process?
CheersnGears Posted February 24, 2012 Author Posted February 24, 2012 Disk throughput issues or not, my question is now this: What configuration change to I need to make to Apache so that it no longer hangs around very long if MySQL can't be buggered to give it the results it wants. I want the Apache child process to time out after say, 15 seconds, and then go kill itself releasing the memory back to the server. I much rather have the server time out apache processes for 30 minutes than have it crash and be down till I wake up in the morning.
p4guru Posted February 24, 2012 Posted February 24, 2012 Disk throughput issues or not, my question is now this: What configuration change to I need to make to Apache so that it no longer hangs around very long if MySQL can't be buggered to give it the results it wants. I want the Apache child process to time out after say, 15 seconds, and then go kill itself releasing the memory back to the server. I much rather have the server time out apache processes for 30 minutes than have it crash and be down till I wake up in the morning.If table optimize runs are a cron scheduled task, a simple work around would be to write a script to put your web site into maintenance mode and redirecting all traffic to a static html page with appropriate notice of maintenance being is underway for XX minutes. The maintenance mode script would kick in just prior to the scheduled table optimization run and script would come out of maintenance mode after a scheduled amount of time XX minutes and return all traffic to use php site. XX minutes would be whatever the usual time it takes table optimization to run. Usual method is via 503 site unavailable custom maintenance redirect work around. Done this with large private forum clients of mine as a temp work around for their 2,400+ users online, 10+ GB database forum when they do scheduled daily backups which just tip them over the available memory installed on their MySQL server which does what you experience drive up Apache process spawning to the point of maxing out MaxClients settings + from my analysis is same time alot of google bot, baidu and yandex and border reader bot hit their site >300 concurrent connections from single ip!. Then to reduce Apache processes, I installed Nginx as a frontend static file reverse proxy to Apache. Nginx in turn sits behind my custom configured haproxy load balancer (resides on same server as Apache/Nginx) for web servers with filtered session and concurrent connection rate limiting for the bots to reduce their concurrent session rates to <10-20 per ip address. When maintenance script kicks in, I redirect maintenance html page to be served via Nginx as it's a static file reverse proxy much more capable of handling high concurrency access during maintenance window. End result, went from Apache crashing and taking 90+ minutes to recover to taking 10-15mins maintenance html page for visitors to stare at :smile: They're going to add more memory to MySQL server and get rid of mysqldump method of backup and use a custom tailored mydumper script I'm writing for them which will speed up backups by 3x times mysqldump at least http://vbtechsupport.com/1716/ But if your VPS is in the cloud, definitely check out your disk performance in terms of random disk I/O and IOPs speed, I've had quite a few private clients with very very slow random disk I/O and iops in the cloud at 1/10th the speed of a single slow SATA disk! Use ioping.sh script I wrote to test your disk I/O http://vbtechsupport.com/1239/single SATA disk should have seek rate around 12-16ms with IOPs around 75-100 IOPs single 15K SAS/SCSI around 4-7ms with 100-250 IOPs single SSD around 0.1 to 3ms with 1000-8000 IOPs Also if you do switch away from MyISAM to InnoDB, you don't want to be regularly running OPTIMIZE command on InnoDB tables they don't support the OPTIMIZE command and remap to ALTER TABLE command to totally rebuild your table driving up InnoDB related resource usage which could be as bad as what you have now if you don't have proper InnoDB related tuned settings and adequate memory and disk resources. Just my AUD$0.02 :smile:
Grumpy Posted February 25, 2012 Posted February 25, 2012 If you want to run a very low memory usage, you need to reduce the number of times apache is used to begin with. Apache is quite inefficient with memory. This is made like this actually for speed (despite apache being the slowest thing... lol) so that instead of freeing the memory, it holds onto it so that the next process has ram without the need to re-request memory allocation. Very very simply, you have too many apache processes. Okay... uh.... I just looked at your top post and going wtf!!!! You got 98.7% wa! 138 load avg! If wa is higher than 1%, you're being bottlenecked by IO. You are also making COMPLETE usage of swap. This is a definite no-no. And your high IO wait is likely due to the fact that you're using swap. Swap drive shouldn't be the same drive as mysql (unfortunately, you probably have no control over this), else you go into a cyclic problem. Mysql is slow b/c you're using swap. You're using swap because stuff aren't finished because mysql is slow. Then you go into wtfbbq mode as you can't do anything anymore. Okay. shocker aside, and getting myself back together. 1. You need more ram. Or you need to decrease ram usage, very significantly. 2. I believe you're on vps.net. In my recommendation, you should stop using them. This is a cloud hosting and naturally uses a network access storage (NAS) which is slower than local storage (it's physically further away) even if their throughput may be higher. Database is a latency sensitive application. You need local storage for highest performance. Get a dedicated, or get a normal vps. You clearly don't have management right now, so get a inexpensive unmanaged dedi server, possibly buy it for the price you're paying right now. 3. Install nginx as reverse proxy to your current server. Let nginx handle all your static files, from .html, .jpg, .css, .js, etc. This will significantly reduce the need to call your precious and expensive apache to work. This will reduce the need for your apache to work at all. 4. I can't tell how many processes of apache you have right now, most probably around 100. So, I can't say exact numbers, but you need to scale down your apache's max clients. Maybe something like 50. This will give everyone else a 503 message as apache refuses to handle more connection when more are requested. Then it won't try to hog all your resources and screw everything up. I have a ipb forum that gets 4~5m page views a day (seriously), and I only have like 20~30 apache process running at any given time (half of which are idling. This is a good enough number for my monster cpu. You likely can't even handle 20 simultaneous and active apache). You have way more because each process is too slow, not because you have more requests than I do. Also, here's another benefit to having nginx as your reverse proxy. Slow connection people. These people will tie up your precious apache connections. You got 100 people asking? You got minimum of 100 apache running just to handle connection for the duration of their transfer. if you have a reverse proxy setup, your reverse proxy will sap the data from apache nearly instantly and closing connections very rapidly. So 100 connection to apache is unnecessary. They will have their connection open with nginx, not apache. This makes a HUGE HUGE HUGE difference. Innodb is not a solve-all-problems database engine. It helps for some, it doesn't for others. It often helps people with tons of memory to spare. This doesn't seem to be your case. You can also change mysql with mariadb for myisam. It does give better performance. However, mysql is not your problem. Not at all. ------------------ Quite regardless of your actual problem, I shall answer some of your questions.What configuration change to I need to make to Apache so that it no longer hangs around very long if MySQL can't be buggered to give it the results it wants. What mysql does is none of apache's business. Apache gave php time to do stuff. That time is "Timeout xx" in seconds. This is found in /path/to/your/apache/settings/httpd.conf or similar. What mysql does and how long is php's concern. This is defined in your php.ini "mysql.connect_timeout = xx" in seconds. Mysql's own timeout is more complicated. Two for input methods. wait_timeout = xx interactive_timeout = xx One for innodb innodb_lock_wait_timeout = xx To kill a query: This is plain dangerous and may screw up your database if you prematurely just outright kill a process. So, it does not exist. Killing the socket to mysql doesn't mean the mysql's process has stopped. But your php will stop and so will your apache move onto the next one.
CheersnGears Posted February 26, 2012 Author Posted February 26, 2012 I don't want to kill MySql queries. I want Apache to kill processes if MySQL doesn't reply in time. I don't normally have anywhere near that number of apache processes running. Right now there are only 3 of them running and the board is running nice and smoothly. That screenshot above is after MySQL starts a table optimize on the posts table, Apache gets locked out, and then Apache processes build up and up and up and the whole server goes into a death spiral. I never have server issues generated by traffic. I simply don't get enough traffic even on my busiest days to crash this server. The only time I have a server issue is when MySQL tries to chew through the Posts table in an Organize query.
Grumpy Posted February 26, 2012 Posted February 26, 2012 Then just set your apache's max clients to a more realistic number like 10. I would still suggest you install nginx though.
Clickfinity Posted February 27, 2012 Posted February 27, 2012 Just as an experiment, try editing the cron job (or writing a separate one) so it stops Apache before the MySQL optimisation runs and restarts Apache afterwards. You should be able to see the stop and start times in your server log which will indicate how long the process takes with Apache turned off. If it completes relatively quickly it might be worth sacrificing 10-20 mins. of Apache downtime (at your quietest time for visitors) to stop it locking up completey. If it works better this way at least you won't have to babysit it like you do at the moment. How often are you running the optimisation process? Daily, weekly, monthly? Cheers, Shaun :D
CheersnGears Posted February 29, 2012 Author Posted February 29, 2012 The optimization process is the default from Invisionpower settings from invision power. I did change max clients from 150 to 60 (based on some other reading I did), but I've also disabled the optimization process for now so no crashes since.
CheersnGears Posted April 11, 2012 Author Posted April 11, 2012 VPS's are not known for strong performance -- especially the provider you're using (assuming I'm correct, we had several clients there until recently.) SANs are everything and theirs are prone to what seem to be constant issues. As a test - when the load climbs, please type 'sar' from the command line and paste the output. I feel you may have an I/O issue. You ended up being partially right on this. I was on one of their clouds that had a slowly dying software raid setup. The disk throughput rates grew slower and slower and slower till eventually I was getting 200kb/s disk transfer when measured with IOping.sh. For comparison, a flash based USB drive can to 2.9mb/s. They moved me to a new cloud and everything has been fine since. IOping.sh reports transfer rates over 200mb/s. So yes, they were having IO issues, but it wasn't company wide.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.