Jump to content
This topic contains 23 posts. A summary containing the most significant posts is available

Featured Replies

Posted

As the topic title asks, why is it so difficult to troubleshoot root causes of high server load?

I have had server load going from around 15 and spiking all the way up to 50, 60 or even 80

Then it goes back down after a while.

However, it's really difficult to troubleshoot.

There must be a better way to isolate the issue? Any tips?

Thank you

How are you troubleshooting today? What's the way you're looking to improve?

  • Author
 

How are you troubleshooting today? What's the way you're looking to improve?

using TOP

  • Community Expert

Top is a good start. What is the biggest offender?  (I’m guessing it’s either Apache or Mysql)

Is it CPU or memory constrained?

Have you looked at the server connections?  (netstat -plant)

Have you looked at traffic logs and did any sort of correlation of traffic to load?

There is also htop - its easier to read and a better interface and does all top does, and iotop for input/output processes on a per process basis and metope for mysql.

As stated above how are you using top ? As there are specific command line options to filter things

115635577_Screenshot2022-02-16at11_04_17.thumb.png.4d6c42d31956ad6dc95d9e7d2ffc3037.png

 

 (I’m guessing it’s either Apache or Mysql)

Not NGINX ? ðŸ¤£

 

There is also a good read here over at cPanel forums with a bash script 

Edited by Muddy Boots

  • Author

HI

Thank you both of you. I also have htop

I have nixstat subscription and I was going through every single metric to see if anything jumped out at me and I found one really weird thing.

Every 5 days there is a BIG (HUGE) spike in I/O READS that lasts 3 days, even though writes and TPS doesn't change. Not sure if this is correlated to the high load as the patterns don't match exactly but... this seems suspicious

image.thumb.png.ddeac2bf68e00970b68d395252b1c91b.png

Edited by SJ77

The fact it lasts 3 days gives you plenty of time to track it down 

Anything  relating to disk usage like backups etc ?

  • Community Expert

This almost looks like some sort of automated activity. It seems to be spaced out like clock work. 
 

- Is this a dedicated or shared server?  (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?)

- Is there anything else that runs on the server outside of IPB?  

- When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram?

  • Author

I agree, 3 days will give me time to track it down. I don't think it's back up related. I have daily back ups.

 

This almost looks like some sort of automated activity. It seems to be spaced out like clock work. 
 

- Is this a dedicated or shared server?  (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?)

- Is there anything else that runs on the server outside of IPB?  

- When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram?

  • dedicated
  • Nothing but IPB runs on this machine
  • unfortunately I noticed the spikes after they had stopped. So we will have to wait till the next one begins and I will be digging into all the running processes. I will hang tight for about 5 days

Edited by SJ77

You should still be able to look at the logs as you know when the spikes occurred - try the sar command

Whats the memory like - usage when this happens ?

  • 2 weeks later...
  • Author

Ok I am back in the thick of my high server load cycle. Which seems to happen every 5 days as indicated by high disk read patters.

I used IOTOP to isolate these Thread ID's as being the issue. How can I turn this into actual information?

I want to know what these processes are doing exactly so address accordingly.

Having an arbitrary TID doesn't actually tell me much. Is there someway to investigate more from here?

Thank you in advance.

image.thumb.png.34d3938ad294e33e735ff81b5f147a0f.png

  • Community Expert

You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. 

If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing. 

 

I used IOTOP to isolate these Thread ID's as being the issue. How can I turn this into actual information?

 

ps aux

 

Will list the process ids with what the command associated with it

  • Author
 

 

ps aux

 

Will list the process ids with what the command associated with it

But I have the command showing in iostat. Can I find anything more specific beyond "nginx worker process"?

 

You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. 

If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing. 

Because I have high load that correlates to this strange pattern of high disk READ I/O

image.thumb.png.5738d964b1c52afa1ecd1c17f1d27f0c.png

5 days it's good then 3 days it's bad... then repeats.

I am trying to find out what is driving this bizarre patter because then I think I can stop the high load.

 

But I have the command showing in iostat. Can I find anything more specific beyond "nginx worker process"?

Try ps aux

You could also (if not already installed) use in ssh

glances

850686826_Screenshot2022-02-27at23_51_56.thumb.png.fad8a8f186bb57e077a34c589bb0e1fc.png

 

If its not installed then

  Quote

yum install glances

 

Also use this

Put the process id number in this command and it will return a name of the process

ps -p PIDNAME -o comm=

example:
ps -p 1572 -o comm=

 

  • Author
 

Also use this

Put the process id number in this command and it will return a name of the process

ps -p PIDNAME -o comm=

example:
ps -p 1572 -o comm=

 

Shows nginx worker process. That could be many things unfortunately.

 

Shows nginx worker process. That could be many things unfortunately.

Thats strange - Are you logged in as root in ssh ?

Try

ps ax|egrep "^ PIDNAME"

Example:

ps ax|egrep "^ 1572"

 

  • Author
 

Thats strange - Are you logged in as root in ssh ?

Try

ps ax|egrep "^ PIDNAME"

Example:

ps ax|egrep "^ 1572"

 

It returns this

image.png.cefdd35b3d14d47c0e35d2cf0e5da1e3.png

Which of course doesn't mean anything to me.

I am trying to translate this into exactly what is going on, so that I can take action. Surely there is a way to see what exactly this process is actually doing

@SJ77 Whats the full output of

ps -aux |grep nginx

 

  • Author
 

@SJ77 Whats the full output of

ps -aux |grep nginx

 

image.thumb.png.a6ef67f97f80eea7ec8c9e42d2122e00.png 

This will give you details of the worker process memory useage

pmap -x PIDNAME

Your last one would be 8789 for the PID name if thats still high at 17%

What openssl version are you on ?

openssl version

 

Have you got proxy_buffering set to on ? Are you using mod_security ?

 

Recently Browsing 0

  • No registered users viewing this page.