Jump to content

why is it so difficult to troubleshoot root causes of high server load?


Recommended Posts

As the topic title asks, why is it so difficult to troubleshoot root causes of high server load?

I have had server load going from around 15 and spiking all the way up to 50, 60 or even 80

Then it goes back down after a while.

However, it's really difficult to troubleshoot.

There must be a better way to isolate the issue? Any tips?

Thank you

Link to comment
Share on other sites

There is also htop - its easier to read and a better interface and does all top does, and iotop for input/output processes on a per process basis and metope for mysql.

As stated above how are you using top ? As there are specific command line options to filter things

115635577_Screenshot2022-02-16at11_04_17.thumb.png.4d6c42d31956ad6dc95d9e7d2ffc3037.png

11 hours ago, Randy Calvert said:

 (I’m guessing it’s either Apache or Mysql)

Not NGINX ? 🤣

 

There is also a good read here over at cPanel forums with a bash script 

Edited by Muddy Boots
Link to comment
Share on other sites

HI

Thank you both of you. I also have htop

I have nixstat subscription and I was going through every single metric to see if anything jumped out at me and I found one really weird thing.

Every 5 days there is a BIG (HUGE) spike in I/O READS that lasts 3 days, even though writes and TPS doesn't change. Not sure if this is correlated to the high load as the patterns don't match exactly but... this seems suspicious

image.thumb.png.ddeac2bf68e00970b68d395252b1c91b.png

Edited by SJ77
Link to comment
Share on other sites

This almost looks like some sort of automated activity. It seems to be spaced out like clock work. 
 

- Is this a dedicated or shared server?  (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?)

- Is there anything else that runs on the server outside of IPB?  

- When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram?

Link to comment
Share on other sites

I agree, 3 days will give me time to track it down. I don't think it's back up related. I have daily back ups.

2 hours ago, Randy Calvert said:

This almost looks like some sort of automated activity. It seems to be spaced out like clock work. 
 

- Is this a dedicated or shared server?  (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?)

- Is there anything else that runs on the server outside of IPB?  

- When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram?

  • dedicated
  • Nothing but IPB runs on this machine
  • unfortunately I noticed the spikes after they had stopped. So we will have to wait till the next one begins and I will be digging into all the running processes. I will hang tight for about 5 days
Edited by SJ77
Link to comment
Share on other sites

  • 2 weeks later...

Ok I am back in the thick of my high server load cycle. Which seems to happen every 5 days as indicated by high disk read patters.

I used IOTOP to isolate these Thread ID's as being the issue. How can I turn this into actual information?

I want to know what these processes are doing exactly so address accordingly.

Having an arbitrary TID doesn't actually tell me much. Is there someway to investigate more from here?

Thank you in advance.

image.thumb.png.34d3938ad294e33e735ff81b5f147a0f.png

Link to comment
Share on other sites

You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. 

If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing. 

Link to comment
Share on other sites

16 minutes ago, Muddy Boots said:

 

ps aux

 

Will list the process ids with what the command associated with it

But I have the command showing in iostat. Can I find anything more specific beyond "nginx worker process"?

23 minutes ago, Randy Calvert said:

You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. 

If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing. 

Because I have high load that correlates to this strange pattern of high disk READ I/O

image.thumb.png.5738d964b1c52afa1ecd1c17f1d27f0c.png

5 days it's good then 3 days it's bad... then repeats.

I am trying to find out what is driving this bizarre patter because then I think I can stop the high load.

Link to comment
Share on other sites

1 minute ago, Muddy Boots said:

Also use this

Put the process id number in this command and it will return a name of the process

ps -p PIDNAME -o comm=

example:
ps -p 1572 -o comm=

 

Shows nginx worker process. That could be many things unfortunately.

Link to comment
Share on other sites

8 hours ago, Muddy Boots said:

Thats strange - Are you logged in as root in ssh ?

Try

ps ax|egrep "^ PIDNAME"

Example:

ps ax|egrep "^ 1572"

 

It returns this

image.png.cefdd35b3d14d47c0e35d2cf0e5da1e3.png

Which of course doesn't mean anything to me.

I am trying to translate this into exactly what is going on, so that I can take action. Surely there is a way to see what exactly this process is actually doing

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...