Jump to content

Recommended Posts

Posted

As the topic title asks, why is it so difficult to troubleshoot root causes of high server load?

I have had server load going from around 15 and spiking all the way up to 50, 60 or even 80

Then it goes back down after a while.

However, it's really difficult to troubleshoot.

There must be a better way to isolate the issue? Any tips?

Thank you

Posted

Top is a good start. What is the biggest offender?  (I’m guessing it’s either Apache or Mysql)

Is it CPU or memory constrained?

Have you looked at the server connections?  (netstat -plant)

Have you looked at traffic logs and did any sort of correlation of traffic to load?

Posted (edited)

There is also htop - its easier to read and a better interface and does all top does, and iotop for input/output processes on a per process basis and metope for mysql.

As stated above how are you using top ? As there are specific command line options to filter things

115635577_Screenshot2022-02-16at11_04_17.thumb.png.4d6c42d31956ad6dc95d9e7d2ffc3037.png

11 hours ago, Randy Calvert said:

 (I’m guessing it’s either Apache or Mysql)

Not NGINX ? 🤣

 

There is also a good read here over at cPanel forums with a bash script 

Edited by Muddy Boots
Posted (edited)

HI

Thank you both of you. I also have htop

I have nixstat subscription and I was going through every single metric to see if anything jumped out at me and I found one really weird thing.

Every 5 days there is a BIG (HUGE) spike in I/O READS that lasts 3 days, even though writes and TPS doesn't change. Not sure if this is correlated to the high load as the patterns don't match exactly but... this seems suspicious

image.thumb.png.ddeac2bf68e00970b68d395252b1c91b.png

Edited by SJ77
Posted

This almost looks like some sort of automated activity. It seems to be spaced out like clock work. 
 

- Is this a dedicated or shared server?  (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?)

- Is there anything else that runs on the server outside of IPB?  

- When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram?

Posted (edited)

I agree, 3 days will give me time to track it down. I don't think it's back up related. I have daily back ups.

2 hours ago, Randy Calvert said:

This almost looks like some sort of automated activity. It seems to be spaced out like clock work. 
 

- Is this a dedicated or shared server?  (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?)

- Is there anything else that runs on the server outside of IPB?  

- When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram?

  • dedicated
  • Nothing but IPB runs on this machine
  • unfortunately I noticed the spikes after they had stopped. So we will have to wait till the next one begins and I will be digging into all the running processes. I will hang tight for about 5 days
Edited by SJ77
  • 2 weeks later...
Posted

Ok I am back in the thick of my high server load cycle. Which seems to happen every 5 days as indicated by high disk read patters.

I used IOTOP to isolate these Thread ID's as being the issue. How can I turn this into actual information?

I want to know what these processes are doing exactly so address accordingly.

Having an arbitrary TID doesn't actually tell me much. Is there someway to investigate more from here?

Thank you in advance.

image.thumb.png.34d3938ad294e33e735ff81b5f147a0f.png

Posted

You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. 

If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing. 

Posted
11 minutes ago, SJ77 said:

I used IOTOP to isolate these Thread ID's as being the issue. How can I turn this into actual information?

 

ps aux

 

Will list the process ids with what the command associated with it

Posted
16 minutes ago, Muddy Boots said:

 

ps aux

 

Will list the process ids with what the command associated with it

But I have the command showing in iostat. Can I find anything more specific beyond "nginx worker process"?

23 minutes ago, Randy Calvert said:

You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. 

If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing. 

Because I have high load that correlates to this strange pattern of high disk READ I/O

image.thumb.png.5738d964b1c52afa1ecd1c17f1d27f0c.png

5 days it's good then 3 days it's bad... then repeats.

I am trying to find out what is driving this bizarre patter because then I think I can stop the high load.

Posted
10 minutes ago, SJ77 said:

But I have the command showing in iostat. Can I find anything more specific beyond "nginx worker process"?

Try ps aux

You could also (if not already installed) use in ssh

glances

850686826_Screenshot2022-02-27at23_51_56.thumb.png.fad8a8f186bb57e077a34c589bb0e1fc.png

 

If its not installed then

Quote

yum install glances

 

Posted
1 minute ago, Muddy Boots said:

Also use this

Put the process id number in this command and it will return a name of the process

ps -p PIDNAME -o comm=

example:
ps -p 1572 -o comm=

 

Shows nginx worker process. That could be many things unfortunately.

Posted
8 hours ago, Muddy Boots said:

Thats strange - Are you logged in as root in ssh ?

Try

ps ax|egrep "^ PIDNAME"

Example:

ps ax|egrep "^ 1572"

 

It returns this

image.png.cefdd35b3d14d47c0e35d2cf0e5da1e3.png

Which of course doesn't mean anything to me.

I am trying to translate this into exactly what is going on, so that I can take action. Surely there is a way to see what exactly this process is actually doing

Posted

This will give you details of the worker process memory useage

pmap -x PIDNAME

Your last one would be 8789 for the PID name if thats still high at 17%

What openssl version are you on ?

openssl version

 

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...