Server crashes often, never happened.

*Salvo* · April 27, 2012

Lately I'm having tonnes of problems with a server that I own and manage.
at this time, it goes slow....until it goes totally down. From that moment no commands work. I only must login on the provider admin cp, and restart the machine. It's a CentOS distro based, cPanel, and some site up.
From what I've understood from the latest ticket with a cPanel tech, there should be a problem on a partition, with a drive.
This is the response:

[color=#000000]there is one thing that concerns me (from the output of dmesg):[/color]

[color=#000000]EXT3-fs: sda2: 79 orphan inodes deleted[/color]

But It becomes far from my competences. Who experienced the same issues? The server hosts some site, but is for 99% of the physical space empty.
Who can help me?

Grumpy · April 28, 2012

Sounds like you're hitting swap, stuff getting too slow, people then start to press F5 madly, and then your server locks up thanks to awesomely stupid apache just grabbing onto everything and never lettering go which makes your entire system freeze up. :D

Incompletely written data on the disk can be dropped when you power down non-safely. It can be heavily damaging depending on the data lost, and that's why rebooting a server via power cycle is bad versus soft reboot. You should also do a check on your mysql as that's what's likely been dropped. If you have a dedicated with raid, you should also check to make sure that the raid is in good condition.

My guesses could be wrong. You should see your memory usage logs as well as swap usage if you have any. If my guess is right, it's time to either
1. Upgrade server / Increase ram (simplest)
2. Set hard limits on apache so it's less stupid (less simple)
3. Optimize (most complicated of the 3)

You can do all 3, but any one of the 3 may solve your problem.

*Salvo* · April 28, 2012

Grumpy,
Thank you for your perfect answer. In the moment I'm writing, this is what on the server happens:

Operating System: CentOS release 5.8 (Final) Processor: Quad-Core AMD Opteron Processor 2344 HE (1000.000MHz) Load averages: 1.40, 0.79, 0.60 Current processes: 211 (208 sleeping, 2 running, 1 zombie) Processes by CPU: php (3) 180.0%, mysqld (1) 0.9%, (other) (71) 0.1% Memory usage: 88.994% (7279912k / 8180212k)

Swap: 0.013% (132k / 1052248k)

This is my server now, that hosts 3 real busy websites.
About the MySQL, yesterday I've uploaded a script that checks the configuration, usage etc... and i don't see particular problems.
What can you tell me about the datas above?

*Salvo* · April 28, 2012

Yes, and about the optimization, can you suggest me some diagnostic tool? After a couple of years, I really feel like a stupid.

Luis Manson · April 28, 2012

did you check the logs?
it happens always at the same time, or same load?
do you know the mem/swap usage before the crash?
its real hang or crashed/panic

*Salvo* · April 28, 2012

Thank you for the questions, Luis Manson.
I did, but there are so many logs... eventually there's an interruption before the crash, and I didn't check the load. Very very high. The tech talks bout a drive, in a partition. I don't know wtf. The only thing I did, tonight, was to restore the innodb lines into the etc/my.cnf file, and today nothing bad is happening. But there's one thing: there is not that much people in Italy today. The swap is always like above, and the server load is always under 1.
Yes, always at the same hour, or btw from 16:00 to 21:00 (now).
The server has 8 gigab. of RAM

Luis Manson · April 28, 2012

just in case, next time check the site's scheduled task and the server ones...i had problem with high SQL usage on some tasks/dumps which caused a lot of apache chidrens waiting for mysql

*Salvo* · April 29, 2012

I found a sh script to defragment all the tables present into the MySQL server. I would like to partition differently the server, and dedicate a well-monitored partition to the busiest databases. MySQL is always the user that sucks more than others, and I've seen it today.
But I will need a help from a professionist, to check all. My guess is that it's time to do it. I have not the money to get a managed server, but now, I would like to sleep without any sms advising me: "Salvo, the server is down".

*Salvo* · April 29, 2012

This is actually the result:

-------- General Statistics --------------------------------------------------

[--] Skipped version check for MySQLTuner script

[OK] Currently running supported MySQL version 5.1.62-cll

[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------

[--] Status: +Archive -BDB -Federated -InnoDB -ISAM -NDBCluster

[--] Data in MyISAM tables: 408M (Tables: 1721)

[--] Data in MEMORY tables: 0B (Tables: 7)

[!!] Total fragmented tables: 23

-------- Security Recommendations -------------------------------------------

[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------

[--] Up for: 22h 28m 17s (2M q [26.203 qps], 71K conn, TX: 43B, RX: 593M)

[--] Reads / Writes: 68% / 32%

[--] Total buffers: 1.8G global + 6.0M per thread (500 max threads)

[OK] Maximum possible memory usage: 4.7G (60% of installed RAM)

[OK] Slow queries: 0% (0/2M)

[OK] Highest usage of available connections: 8% (40/500)

[OK] Key buffer size / total MyISAM indexes: 1.2G/212.2M

[OK] Key buffer hit rate: 99.8% (56M cached / 97K reads)

[OK] Query cache efficiency: 75.2% (1M cached / 1M selects)

[!!] Query cache prunes per day: 107541

[OK] Sorts requiring temporary tables: 0% (15 temp sorts / 67K sorts)

[!!] Joins performed without indexes: 1111

[!!] Temporary tables created on disk: 36% (19K on disk / 55K total)

[OK] Thread cache hit rate: 99% (40 created / 71K connections)

[!!] Table cache hit rate: 1% (1K open / 81K opened)

[OK] Open file limit used: 65% (2K/4K)

[OK] Table locks acquired immediately: 99% (1M immediate / 1M locks)

-------- Recommendations -----------------------------------------------------

General recommendations:

Run OPTIMIZE TABLE to defragment tables for better performance

MySQL started within last 24 hours - recommendations may be inaccurate

Enable the slow query log to troubleshoot bad queries

Adjust your join queries to always utilize indexes

Temporary table size is already large - reduce result set size

Reduce your SELECT DISTINCT queries without LIMIT clauses

Increase table_cache gradually to avoid file descriptor limits

Variables to adjust:

query_cache_size (> 74M)

join_buffer_size (> 2.0M, or always use indexes with joins)

table_cache (> 1350)

Grumpy · April 29, 2012

Grumpy,

Thank you for your perfect answer. In the moment I'm writing, this is what on the server happens:

This is my server now, that hosts 3 real busy websites.

About the MySQL, yesterday I've uploaded a script that checks the configuration, usage etc... and i don't see particular problems.

What can you tell me about the datas above?

What can I tell from those stats?
1. You have a low end processor from 5 years ago that I can buy on ebay for $5...
2. You spawned more processes than you can likely handle
3. Non-peak stats aren't too meaningful.

mysqltuner output

[!!] Joins performed without indexes: 1111 <---------- Problem~~~! You have a bad mod in your IPB or you have a bad program. Software issue. Should add indexes where needed.

-------------------------
Need to know apache and various other stats. More meaningful than above 2.

When in load, post the outputs of these

top -n 1 (if you omit the n flag, it'll just be live stats updated every sec. This method just outputs so u can copy/paste more easily)
iostat -x 10 2 (post the 2nd batch)
apachectl status (if you have apache)

Further questions:
1. Are you using apache?
2. php mod? (mod_php, suphp, cgi, fcgi)?
3. Do you have opcache enabled? (only relavant if you have mod_php/fcgi)
4. Your httpd.conf
5. Your my.cnf (only the [mysqld] section. Not too interested in other sections.)
6. What disk are you using ( sudo hdparm -I /dev/sd[LETTER] )? Any raid setup?

*Salvo* · April 29, 2012


[mysqld]

# bind-address=xxx

set-variable=local-infile=0

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

max_connections = 500

key_buffer_size = 128M

myisam_sort_buffer_size = 64M

join_buffer_size = 16M

read_buffer_size = 1M

sort_buffer_size = 4M

table_open_cache = 1350

thread_cache_size = 32

wait_timeout = 300

interactive_timeout=300

open_files_limit = 4000

connect_timeout = 10

tmp_table_size = 256M

max_heap_table_size = 384M

max_allowed_packet = 64M

net_buffer_length = 16384

max_connect_errors = 100000

thread_concurrency = 8

concurrent_insert = 2

table_lock_wait_timeout = 30

read_rnd_buffer_size = 512k

bulk_insert_buffer_size = 8M

slow_query_log

query_cache_limit = 70M

query_cache_size = 32M

query_cache_type = 1

query_prealloc_size = 262144

query_alloc_block_size = 65536

transaction_alloc_block_size = 8192

transaction_prealloc_size = 4096

innodb_flush_log_at_trx_commit=0

innodb_buffer_pool_size=8G

innodb_buffer_pool_size = 1M

innodb_additional_mem_pool_size = 128M

innodb_log_buffer_size = 256M

innodb_log_file_size = 256M

innodb_flush_log_at_trx_commit = 4

innodb_lock_wait_timeout = 50

innodb_file_io_threads = 4

innodb_thread_concurrency = 4

default-storage-engine = MyISAM

Top:


26178 administ  15   0  160m 9308 6264 R  7.8  0.1   0:00.04 php			   

26177 root	  16   0 12740 1100  744 R  2.0  0.0   0:00.01 top			   

    1 root	  15   0 10352  692  576 S  0.0  0.0   0:02.25 init			  

    2 root	  RT  -5	 0    0    0 S  0.0  0.0   0:09.06 migration/0	   

    3 root	  34  19	 0    0    0 S  0.0  0.0   0:00.06 ksoftirqd/0	   

    4 root	  RT  -5	 0    0    0 S  0.0  0.0   0:00.00 watchdog/0		

    5 root	  RT  -5	 0    0    0 S  0.0  0.0   0:27.10 migration/1	   

    6 root	  34  19	 0    0    0 S  0.0  0.0   0:03.86 ksoftirqd/1	   

    7 root	  RT  -5	 0    0    0 S  0.0  0.0   0:00.00 watchdog/1		

    8 root	  RT  -5	 0    0    0 S  0.0  0.0   0:03.29 migration/2	   

    9 root	  34  19	 0    0    0 S  0.0  0.0   0:00.16 ksoftirqd/2	   

   10 root	  RT  -5	 0    0    0 S  0.0  0.0   0:00.00 watchdog/2		

   11 root	  RT  -5	 0    0    0 S  0.0  0.0   0:00.27 migration/3	   

   12 root	  34  19	 0    0    0 S  0.0  0.0   0:00.04 ksoftirqd/3	   

   13 root	  RT  -5	 0    0    0 S  0.0  0.0   0:00.00 watchdog/3		

   14 root	  RT  -5	 0    0    0 S  0.0  0.0   0:00.76 migration/4	   

   15 root	  34  19	 0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/4


Linux 2.6.18-194.26.1.el5 (my.server.com)  29/04/2012

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

		   5,54    0,02    0,96    1,35    0,00   92,12

Device:		 rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sda			  23,18    57,56  2,10 11,13   258,62   549,65    61,11	 4,54  343,33   8,44  11,16

sda1			  0,19	 0,00  0,00  0,00	 0,38	 0,00   139,36	 0,00    3,13   2,55   0,00

sda2			 22,99    57,56  2,09 11,13   258,23   549,65    61,09	 4,54  343,41   8,44  11,16

sda3			  0,01	 0,00  0,00  0,00	 0,01	 0,00    46,77	 0,00   47,35  32,97   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

		   6,41    0,00    1,10    0,82    0,00   91,66

Device:		 rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sda			   0,00    38,30  0,00  7,10	 0,00   364,00    51,27	 1,23  173,55  10,14   7,20

sda1			  0,00	 0,00  0,00  0,00	 0,00	 0,00	 0,00	 0,00    0,00   0,00   0,00

sda2			  0,00    38,30  0,00  7,10	 0,00   364,00    51,27	 1,23  173,55  10,14   7,20

sda3			  0,00	 0,00  0,00  0,00	 0,00	 0,00	 0,00	 0,00    0,00   0,00   0,00

phpinfo.php

Yes, I'm using apace,.

Please, let me klnow if you need further info. If you have some private contact, pm me.

Grumpy · April 30, 2012

Uh... you omitted the most important part of the top output... stuff that looks like this (this is sample from a random server I have, please pay no mind to values)

Tasks: 183 total, 1 running, 182 sleeping, 0 stopped, 0 zombie Cpu(s): 9.4%us, 1.0%sy, 0.0%ni, 83.7%id, 5.4%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 8184516k total, 8026236k used, 158280k free, 1176208k buffers Swap: 8388592k total, 24392k used, 8364200k free, 6224304k cached

top - 08:56:02 up 218 days, 14:19,  1 user,  load average: 1.43, 1.43, 1.38

--------------------------
But from your iostat, it's clearly visible your disk is being asked more than it can handle. That could be due to swap usage or regular usage thus actually problematic or avoidable (which I can't tell b/c there's no swap info from top... and missing apache settings)

from what I know, serverloft tends to give raid 1 commonly... are you on hardware raid? Or just 1 disk? If just 1 disk (or raid 1), then you're undoubtedly screwed. It just can't handle the current load. You need more disks or switch to SSD. It looks like it primarily can't handle the writes you're giving it. Although, it's not even that much... It depends on write pattern, but if it IS a bad output, it may suggest that the disk health is deteriorating. If you're not on raid or soft raid, you can use SMART to check for it (see tutorial online). If hardware raid, you'll need to check through your raid card which I won't bother getting into.

For start, so you don't hard crash again, you can change these values... Btw, I'm doing extreme approx math. Don't expect perfect results.

in /etc/httpd/conf/httpd.conf
look for MaxClients and lower the value to 200 (this should be a decrease or as is)
look for ServerLimit and lower the value to 256 (this should be a decrease or as is) This should NOT be smaller than maxclients. These 2 values will prevent apache from accepting too many people.
look for MaxRequestsPerChild and lower the value to 1000 (This should be a significant decrease). 1000 is a considerably low value. But, it should keep your memory runaways at a more controllable rate. BUT it will increase your CPU (seems like there's room here) and disk usage (which is already suffering). Even though disk usage may increase, I think your hard crash is due to swap entering. And I think avoiding that at all costs is more important. How much of an increase? I can't really say.

Hopefully, your apache will be less stupid now. If you reach more clients than your server is set to, they will get an error 503, which means temporarily out of service. Don't forget to reboot httpd afterwards.

You can significantly decrease memory usage if you configure nginx/varnish to serve static content. There's plenty of those threads in this forum and on the web (google "nginx reverse proxy"). Remember to setup the IP forwarding mod (more than 1 way), otherwise, your forum will think everyone's coming from localhost. Or even better, ditch apache and use nginx & php-fpm. but that's kinda complicated... not sure if you're upto the challenge. :P

edit:
I noticed number of most likely NOT used php mods in your phpinfo... like you even have apc AND xcache at the same time and have xcache disabled. And bunch others you probably don't use, but I won't get into it... I can't really know if you actually use them or not. But removing useless ones will save resources.

*Salvo* · April 30, 2012

Sorry, this is the missing part of top, and then I'll be back with the other info.


top - 17:45:23 up 2 days, 20:00,  1 user,  load average: 0.84, 0.96, 0.85

Tasks: 226 total,   3 running, 222 sleeping,   0 stopped,   1 zombie

Cpu(s):  5.9%us,  1.0%sy,  0.0%ni, 91.5%id,  1.5%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:   8180212k total,  6409880k used,  1770332k free,   372472k buffers

Swap:  1052248k total,	  132k used,  1052116k free,  4829204k cached

And the server:


Harddisk 2x 500 GB SATA 3,5" 7.200 rpm

CPU 2x AMD Opteron 2344 HE Quadcore

Barebone  Fujitsu PRIMERGY Econel 230

RAM 8x Gigabyte RAM

With the IPMI access.
It should be with RAID1.
I'll be back with other datas.

[color=#282828][font=helvetica, arial, sans-serif]look for MaxClients and lower the value to 200 (this should be a decrease or as is)[/font][/color]

[color=#282828][font=helvetica, arial, sans-serif]look for ServerLimit and lower the value to 256 (this should be a decrease or as is) This should NOT be smaller than maxclients. These 2 values will prevent apache from accepting too many people.[/font][/color]

I don't have MaxClients and ServerLimit, nor Maxrequestperchild into httpd.conf, Grumpy.

*Salvo* · April 30, 2012

[color=#282828][font=helvetica, arial, sans-serif]I don't have MaxClients and ServerLimit, nor Maxrequestperchild into httpd.conf, Grumpy. [/font][/color]

Ok, found it and adjusted. They were into the WHM panel, not in httpd.conf file.

Grumpy · April 30, 2012

Since you have WHM, nginx reverse proxy is going to a bit easier. So, I'm going to suggest that to you.

Go to this link
http://nginxcp.com/f...n-v3-5-released
and follow the step by step instruction. It's very simple and free.
You'll probably need to install python first tough (assuming you do not have python installed)
http://nginxcp.com/f...-python-upgrade (installation is at the bottom of first post) OR just type "yum install python2.5" (I might be wrong on the python name... you can do yum search python first to check. Just don't install python v3)

I've installed this on several ppl's whm before and never had any issues. Solves a ton with static files though.

That's a temporary solution for you. It should alleviate the problem a lot. (Could be a perm. solution if it alleviates enough)
HOWEVER, you seem to have high IO even without hitting swap due to writes primarily. It's unavoidable that you need a better disk / upgrade hardware. Your writes are too heavy for the disks to handle on raid 1 (raid 1 doesn't help with writes at all). We could optimize mysql to load more of the burden onto ram, but right now, I'm afraid of burdening your ram due to potential pitfalls.

Perm solutions:
1. I suggest you shop for some dedis on SSD. SSDs are very affordable now a days. Or a hybrid. SSD and SATA, so you can have your big files on SATA drive while you put your OS & mysql on SSD. But that means at least 4 drives (2 SSD in raid 1, 2 SATA in raid 1). Raid 1 is minimum imo for production. I wouldn't suggest a 2+ dedi split yet. A modern dedi alone is a significant jump from your current setup (in cpu & disk).
2. Not to look at optmization on the system side, but on the script side. I don't know what applications you are running on your server. But, that's a very high IO usage compared to cpu usage, and your cpu isn't even good. A certain application on your server takes too much IO resource and I'm guessing it's actually not IPB. IPB wouldn't give such terrible utilization ratios.

*Salvo* · April 30, 2012

Grumpy, thank you for the BIG help, that I appreciate a lot. If you have some company name that I can contact for a dedi, please, link here if is it possible. cPanel and even me may manage the transfer, even if it will be a pain to change dns on almost 12-14 websites hosted on this server. If it worth....I'm here. In Europe except serverloft, the prices for the dedi are too big for my balance.

Edit: I'm checking one by one your suggestions, and most of them are now configured. Again thank you.

Aussie Cable · April 30, 2012

I use Nginx as a reverse proxy on my server and notice a huge difference in performance.

If you do install it, make sure that you:

1. Generate a new access key before installing nginxadmin (WHM >> Cluster/Remote Access >> Setup Remote Access Key >> Generate New Key) or install will fail
2. After nginxadmin is installed - check that Apache is on the right port (8081) via (WHM >> Server Configuration >> Tweak Settings >> Apache non-SSL IP/port)

NOTE: If you try and restart Apache first (after install it will suggest to do this) without making sure Apache port is 8081, you will get an error.

3. Restart Apache - /etc/init.d/httpd restart using SSH (PUTTY)
4. If you use munin, you need to make some changes to the Apache plugins (change the ports to show graphs correctly):

Using SSH (PUTTY), this next command will open 3 files (then save, close, edit the next (same edit, different file x3)

nano /etc/munin/plugins/apache_*

Find where it is written 80 and change to 8081 (it is mentioned 2 times in each file):

First is on about line 28 (Change from 80 to 8081)

Second is on about line 90 (Change from 80 to 8081)

Save and close that file (CTRL and X, Y to agree to make changes, and enter to go to the next file)

Make the exact same changes to the next 2 files that open.

Restart munin by running this command: /usr/sbin/munin-node restart

If you restart your server and the Apache graphs are not showing, you will need to restart munin to get it to start to collect data again.

If you use an earlier version of WHM 11.32.2 Build 15 or bellow, you may need to add some extra directory information to your munin plugin conf file (cPanel bug), if so, follow these instructions

I hope that helps with your Nginxadmin install.

Gary. · May 1, 2012

Hi, Looking at your my.cnf I would say that's one major cause as there are so many wrong values in place and that is without looking at your website / connections.

What are the current connections to the server at the time of the crash, See you could be getting a small sys attack, Use this cmd and see how many connections you have to apache:

netstat -an | grep 80 | wc -l

*Salvo* · May 1, 2012

Here you are, Gary.
The value is "94".
About my.cnf, I've changed these values after many many consults, and accoarding to the suggestion coming from MySQL Tuner script.
I've noticed, that the crashes stopped after the new re-insert of innodb values. It's wrong, it's a coincidence (I don't know), but this is what's happened. At the moment I've putted offline the forum, until I will find a good alternative for hardware (where the attention seems to be concentrated) and configuration.
My production sites are "alive", but I *must* take care of them, as I get from them (liquidarea.com) the money for the server. unfortunately the forum is a second stuff, it's a fun.
I will contact you, Gary.

Gary. · May 1, 2012

Hi salvo,

I could be wrong but it's not often I am with mysql it's what I mostly do but I can be a little sure that those values in place will not work well with each other. I will mention a few for an example,

Connect timeout, That is way to low, IPB needs atleast 90.
Thread concurrency, This does not exist and is included as default so that can be removed.
Join buffer size is extremely large
Key buffer is 128M yet you have 500 connections so I would increase that to 512 if your getting that many hits.

I could keep going but to be honest with you I would scrap all that my.cnf and start from scratch as there could be values you have added that may not be used and some that are using way more than they should which could cause bad page loads, High I/O wait times or even locking up tables.

Gary.

Ps, Is your website in your signature ?

I just visited that and it took around 12 seconds to fully load up, PHP response was average so I now think it's mysql.

*Salvo* · May 1, 2012

Thank you for your clarifications, Gary. Yes, that's one of my websites, but as I've told you there are other sites may be not optimized, that I host for free, one of those is the site that gives me the help to pay the server (liquidarea.com, aqua-aquapress.com, discusclub.net, world-wodeweb.org).
I'm planning to move the setup to a different server, this one, even if I see in their forum that people blames the slowlyness of the ssd HDD. Strange..
I'lll contact you on MSN.

Again thank you, Gary :smile:

Grumpy · May 2, 2012

...even if I see in their forum that people blames the slowlyness of the ssd HDD. Strange..

There's a reason why I said you shouldn't pick SSDs that are small...
http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-320-series.html
Check out the speeds on 40gig (your choice) vs something like 120gig.

iops is what's important, and even with 40g, it's lot faster than SATA, but when testing the obvious sequentials, a small intel 320 40gb disk is going to do quite poorly.

*Salvo* · May 2, 2012

I'm learning so many things from you guys in this topic than in all my life.
Tried to have a help from a forum of sys admins in Italy, but nobody gave me any help...they only had fun.
Actually I have between Grumpy and Gary clearer the scenario, and after something will be solved, I'll change server, BTW. But I cannot until the software problems will not be solved. It should be useless.

Server crashes often, never happened.

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Recently Browsing 0 members

Upcoming Events

Trending Content