High server load rising slowly...

SecondSight · March 28, 2016

Hello !

I've had all my board's table converted to Innodb after the upgrade to version 4. Once everything looked fine, to me, I rebooted the server.

Now that the server has rebooted, I experience a continuously slowly growing server load along with a high %wa value... :

After server reboot :
top - 07:02:13 up 66 days, 20:12, 1 user, load average: 2.76, 1.63, 1.18
Tasks: 317 total, 3 running, 309 sleeping, 0 stopped, 5 zombie
Cpu(s): 0.6%us, 1.8%sy, 16.8%ni, 61.6%id, 19.1%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 62.887G total, 48.272G used, 14.615G free, 671.758M buffers
Swap: 9998.996M total, 544.266M used, 9454.730M free, 41.999G cached

Now :
top - 10:41:04 up 66 days, 23:51, 1 user, load average: 33.34, 32.84, 31.38
Tasks: 463 total,   4 running, 453 sleeping,   0 stopped,   6 zombie
Cpu(s): 30.1%us, 3.2%sy, 0.1%ni, 6.5%id, 60.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem:    62.887G total,   62.051G used, 855.363M free, 734.770M buffers
Swap: 9998.996M total, 526.285M used, 9472.711M free,   45.173G cached

In spite of the high server load, my web site is just doing fine...

The only modifications I made were :

- converting my database tables to innodb,

- modifying my.cnf.

So, I stopped MySQL, http also, but it changed nothing...

I also noticed that in spite of the fact that the server has rebooted, it says it's been up for the last 66 days.

What can be the reason for this problem, in your opinion ? There must be something I forgot to do regarding innodb configuration...

Thank you for your help !

My my.cnf :

[~]# cat /etc/my.cnf
[mysqld]
slow_query_log=1
slow_query_log_file="/var/lib/mysql/slow_queries.log"
long_query_time =5
#log_queries_not_using_indexes

server-id=1
datadir="/var/lib/mysql"
socket="/var/lib/mysql/mysql.sock"
local-infile=0
low_priority_updates=1
concurrent_insert=2

innodb_large_prefix=ON
innodb_file_format=Barracuda
innodb_log_buffer_size=8M
innodb_file_per_table=1
innodb_log_file_size=64M
innodb_buffer_pool_instances=20
innodb_buffer_pool_size=20G
innodb_additional_mem_pool_size=32M
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=1
sync_binlog=1
innodb_max_dirty_pages_pct = 0

query_cache_type=1
query_cache_size=256M
query_cache_min_res_unit=2K
query_cache_limit=12M

tmp_table_size=256M
max_heap_table_size=256M

max_user_connections=950
max_connections=1000
max_allowed_packet=368435456
max_connect_errors=10

connect_timeout=170
wait_timeout=160
interactive_timeout=160

myisam_sort_buffer_size=2048M
key_buffer=2048M
read_buffer_size=4M
join_buffer=2M
sort_buffer_size=4M
read_rnd_buffer_size=4M

open_files_limit=16000
table_open_cache=8000
thread_concurrency=16

table_definition_cache=8000
thread_cache_size=1024

[isamchk]
key_buffer=256M
sort_buffer=256M
read_buffer=64M
write_buffer=64M

[safe_mysqld]
err-log="/var/log/mysqld.log"
pid-file="/var/lib/mysql/mysql.pid"

[mysqlhotcopy]
interactive-timeout

[myisamchk]
key_buffer=256M
sort_buffer=256M
read_buffer=64M
write_buffer=64M

[mysql]
no-auto-rehash

enigmapatrick · March 28, 2016

When you run top what processes are running?

Are you running something intensive on your disks? (this may help http://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/)

SecondSight · March 28, 2016

I didn't run anything intensive on the disk (I checked the disks too and they are ok) and I didn't see anything abnormal : php, httpd, mysql,...

It was impossible to reboot using /sbin/reboot or using WHM, so I restarted once again the server from my hosting company website interface and so far it looks like it's back to normal :

top - 13:09:53 up 1:13, 1 user, load average: 1.19, 1.37, 1.43
Tasks: 336 total,   5 running, 329 sleeping,   0 stopped,   2 zombie
Cpu(s): 1.1%us, 2.1%sy, 21.3%ni, 74.8%id, 0.5%wa, 0.0%hi, 0.2%si, 0.0%st
Mem:    62.887G total,   12.202G used,   50.684G free, 201.469M buffers
Swap: 9998.996M total,    0.000k used, 9998.996M free, 3151.004M cached

enigmapatrick · March 28, 2016

When it starts hopping up again, screenshot the output of top including processes and PM me the screenshot.

Which provider is this server via?

SecondSight · March 28, 2016

My hosting company is OVH.

I've just experienced a sudden high server load which freezed everything (websites, WHM, SSH) and I managed anyway to restart httpd and it quickly went back to normal, except that I have a high number of tasks :

top - 20:27:14 up 6:15, 2 users, load average: 2.12, 2.46, 4.94
Tasks: 1074 total, 175 running, 894 sleeping,   0 stopped,   5 zombie
Cpu(s): 0.3%us, 4.5%sy, 48.4%ni, 45.9%id, 0.6%wa, 0.0%hi, 0.2%si, 0.0%st
Mem:    62.887G total,   32.305G used,   30.581G free, 833.207M buffers
Swap: 9998.996M total,    0.000k used, 9998.996M free,   13.878G cached

There is a large number of time_wait connections on the server : 2706 TIME_WAIT

There is a large number of hits on my website's user account from these IP's :
823 66.249.89.41
1179 82.233.72.139
7909 157.55.39.128
8937 40.77.167.19
2164 157.55.39.111
11243 207.46.13.63
14632 66.249.64.71

It seems like these IPs are Google and Bing bots (except the second one)...

enigmapatrick · March 29, 2016

What are your access logs like for those IPs? Are they crawling something they shouldn't be?

SecondSight · March 30, 2016

Hello !

I was busy trying to make my board working... I played a little with Apache configuration and reduced TIME_WAIT socket connections, but it didn't fix the problem. I installed a program which restarts services when they use too many resources (prm).

I've had a look in the Google Webmaster tools section, then Crawl stats, and here is what I have :

There are high figures corresponding to the days I had problems. Is it possible that bots can prevent my server from working normally ? Anyway, I used the webmaster tools to slow them down.

I've also had a look in the server domlogs and, regarding my board, I noticed many lines such as these :

Quote

"GET /uploads/javascript_global/root_map.js.0420b05c64c4fac8da5317d262a66a47.js?v=8d7c4f5681 HTTP/1.1" 404 6114 "http://mywebsite.com/topic/280131-cm12-help/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

"GET /uploads/css_built_1/20446cf2d164adcc029377cb04d43d17_flags.css.ebec6fe6f9d9f3d6e74945522734c582.css?v=8d7c4f5681 HTTP/1.1" 200 12475 "http://forums-enseignants-du-primaire.com/topic/321361-chaussons-en-classe/" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

"GET /uploads/av-34648.jpg HTTP/1.1" 304 - "-" "Googlebot-Image/1.0"

Are the bots crawling my uploads directory ?

Thank you !

Flitterkill · April 2, 2016

Is this a dedicated server or a VPS? Given the stats I assume dedicated.

Also, you haven't mentioned this yet, what version php and mysql are you using?

SecondSight · April 4, 2016

Hello !

Yes, it's a dedicated server with Centos, php 5.5.31 and MySQL 5.6.29. I have CloudLinux and CageFS and so I can't use XCache, eAccelerator, etc.

In order to allow the server to stay up, I modifier Apache configuration :

- I reduced ServerLimit from 800 to 600.

- I reduced MaxClient from 800 to 500.

- Max Requests Per Child is 300.

- I reduced KeepAliveTimeout from 5 to 3.

- TimeOut is 30 (has always been 30).

Now, at peak times, I have a number of tasks around 500, with sometimes up to more or less 55 tasks running, but most of the time, I have about 400 tasks and about 4 running.

In case there is a problem (especially at night), I installed PRM which restarts services : http://www.rfxnetworks.com/projects/process-resource-monitor/

Here is what tuning-primer says :

-- MYSQL PERFORMANCE TUNING PRIMER --
- By: Matthew Montgomery -

MySQL Version 5.6.29-log x86_64

Uptime = 5 days 12 hrs 48 min 58 sec
Avg. qps = 150
Total Questions = 71741380
Threads Connected = 12

Server has been running for over 48hrs.
It should be safe to follow these recommendations

To find out more information on how each of these
runtime variables effects performance visit:
http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html
Visit http://www.mysql.com/products/enterprise/advisors.html
for info about MySQL's Enterprise Monitoring and Advisory Service

SLOW QUERIES
The slow query log is NOT enabled.
Current long_query_time = 5.000000 sec.
You have 705 out of 71741432 that take longer than 5.000000 sec. to complete
Your long_query_time seems to be fine

BINARY UPDATE LOG
The binary update log is NOT enabled.
You will not be able to do point in time recovery
See http://dev.mysql.com/doc/refman/5.6/en/point-in-time-recovery.html

WORKER THREADS
Current thread_cache_size = 1024
Current threads_cached = 462
Current threads_per_sec = 0
Historic threads_per_sec = 0
Your thread_cache_size is fine

MAX CONNECTIONS
Current max_connections = 1000
Current threads_connected = 14
Historic max_used_connections = 477
The number of used connections is 47% of the configured maximum.
Your max_connections variable seems to be fine.

INNODB STATUS
Current InnoDB index space = 4.68 G
Current InnoDB data space = 13.16 G
Current InnoDB buffer pool free = 39 %
Current innodb_buffer_pool_size = 25.00 G
Depending on how much space your innodb indexes take up it may be safe
to increase this value to up to 2 / 3 of total system memory

MEMORY USAGE
Max Memory Ever Allocated : 33.92 G
Configured Max Per-thread Buffers : 13.91 G
Configured Max Global Buffers : 27.28 G
Configured Max Memory Limit : 41.20 G
Physical Memory : 62.88 G
Max memory limit seem to be within acceptable norms

KEY BUFFER
Current MyISAM index space = 450 M
Current key_buffer_size = 2.00 G
Key cache miss rate is 1 : 156
Key buffer free ratio = 76 %
Your key_buffer_size seems to be fine

QUERY CACHE
Query cache is enabled
Current query_cache_size = 256 M
Current query_cache_used = 207 M
Current query_cache_limit = 12 M
Current Query cache Memory fill ratio = 81.20 %
Current query_cache_min_res_unit = 2 K
However, 2186984 queries have been removed from the query cache due to lack of memory
Perhaps you should raise query_cache_size
MySQL won't cache query results that are larger than query_cache_limit in size

SORT OPERATIONS
Current sort_buffer_size = 4 M
Current read_rnd_buffer_size = 4 M
Sort buffer seems to be fine

JOINS
Current join_buffer_size = 2.00 M
You have had 61225 queries where a join could not use an index properly
You have had 1071 joins without keys that check for key usage after each row
You should enable "log-queries-not-using-indexes"
Then look for non indexed joins in the slow query log.
If you are unable to optimize your queries you may want to increase your
join_buffer_size to accommodate larger joins in one pass.

Note! This script will still suggest raising the join_buffer_size when
ANY joins not using indexes are found.

OPEN FILES LIMIT
Current open_files_limit = 17010 files
The open_files_limit should typically be set to at least 2x-3x
that of table_cache if you have heavy MyISAM usage.
Your open_files_limit value seems to be fine

TABLE CACHE
Current table_open_cache = 8000 tables
Current table_definition_cache = 8000 tables
You have a total of 1235 tables
You have 2688 open tables.
The table_cache value seems to be fine

TEMP TABLES
Current max_heap_table_size = 256 M
Current tmp_table_size = 256 M
Of 720438 temp tables, 19% were created on disk
Created disk tmp tables ratio seems fine

TABLE SCANS
Current read_buffer_size = 4 M
Current table scan ratio = 210 : 1
read_buffer_size seems to be fine

TABLE LOCKING
Current Lock Wait ratio = 1 : 46101
Your table locking seems to be fine

RevengeFNF · April 4, 2016

Have you thought about changing Apache with nginx + php-fpm?

Flitterkill · April 5, 2016

You left the link to your site up above so I took a look. It was mostly loading fine for me right now - some pages slower than others. Still, some things you can do.

1) Given how much memory you have you can increase query cache size. 500M up to 1G and see what happens.

2) What RevengeFNF said. Pushing over to nginx will help.

3) Your javascript is loading in the header. In the ACP go to Customization - Themes and edit your theme. Hit the custom tab and then choose to load JS at the bottom of the page.

4) The time_wait stuff. This is just a blog post by somebody but explains what's going on and what you might do about it.

http://www.fromdual.com/huge-amount-of-time-wait-connections

5) Are you running this with *no* web cache at all? None of these:

6) Is this really a dedicated server and not a really big VPS? I ask again because you mention CageFS. There aren't any other sites on this machine are there?

7) ZendOpcache on PHP enabled?

ASTRAPI · April 5, 2016

Things changed and now is better if you are using Innodb tables to not use query cache at all due to mutex against innodb buffer

SecondSight · April 6, 2016

Yes, it's a dedicated server for my board mostly but I also have 10 small websites which generate no traffic at all and which work with Wordpress (that's why I have CloudLinux and CageFS).

Thank you for the tips. I'm going yo try them now and see if it's doing better.

Invision Community 4: SEO, prepare for v5 and dormant account notifications

Invision Community 5: Beta testing and latest updates

Invision Community 4: A more professional report center

Invision Community 5: A video walkthrough creating a custom theme and homepage

High server load rising slowly...

Recommended Posts

SecondSight

enigmapatrick

SecondSight

enigmapatrick

SecondSight

enigmapatrick

SecondSight

Flitterkill

SecondSight

RevengeFNF

Flitterkill

ASTRAPI

SecondSight

Archived

Recently Browsing 0 members

More