SJ77 Posted February 15, 2022 Posted February 15, 2022 As the topic title asks, why is it so difficult to troubleshoot root causes of high server load? I have had server load going from around 15 and spiking all the way up to 50, 60 or even 80 Then it goes back down after a while. However, it's really difficult to troubleshoot. There must be a better way to isolate the issue? Any tips? Thank you
CoffeeCake Posted February 15, 2022 Posted February 15, 2022 How are you troubleshooting today? What's the way you're looking to improve?
SJ77 Posted February 15, 2022 Author Posted February 15, 2022 15 minutes ago, CoffeeCake said: How are you troubleshooting today? What's the way you're looking to improve? using TOP
Randy Calvert Posted February 15, 2022 Posted February 15, 2022 Top is a good start. What is the biggest offender? (I’m guessing it’s either Apache or Mysql) Is it CPU or memory constrained? Have you looked at the server connections? (netstat -plant) Have you looked at traffic logs and did any sort of correlation of traffic to load? SJ77 1
IveLeft... Posted February 16, 2022 Posted February 16, 2022 (edited) There is also htop - its easier to read and a better interface and does all top does, and iotop for input/output processes on a per process basis and metope for mysql. As stated above how are you using top ? As there are specific command line options to filter things 11 hours ago, Randy Calvert said: (I’m guessing it’s either Apache or Mysql) Not NGINX ? 🤣 There is also a good read here over at cPanel forums with a bash script Edited February 16, 2022 by Muddy Boots SJ77 1
SJ77 Posted February 16, 2022 Author Posted February 16, 2022 (edited) HI Thank you both of you. I also have htop I have nixstat subscription and I was going through every single metric to see if anything jumped out at me and I found one really weird thing. Every 5 days there is a BIG (HUGE) spike in I/O READS that lasts 3 days, even though writes and TPS doesn't change. Not sure if this is correlated to the high load as the patterns don't match exactly but... this seems suspicious Edited February 16, 2022 by SJ77
IveLeft... Posted February 16, 2022 Posted February 16, 2022 The fact it lasts 3 days gives you plenty of time to track it down Anything relating to disk usage like backups etc ? SJ77 1
Randy Calvert Posted February 16, 2022 Posted February 16, 2022 This almost looks like some sort of automated activity. It seems to be spaced out like clock work. - Is this a dedicated or shared server? (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?) - Is there anything else that runs on the server outside of IPB? - When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram? SJ77 1
SJ77 Posted February 16, 2022 Author Posted February 16, 2022 (edited) I agree, 3 days will give me time to track it down. I don't think it's back up related. I have daily back ups. 2 hours ago, Randy Calvert said: This almost looks like some sort of automated activity. It seems to be spaced out like clock work. - Is this a dedicated or shared server? (If it’s a VPS, are we sure there is not something stealing CPU and causing a lack of resources?) - Is there anything else that runs on the server outside of IPB? - When these spikes happen… we can absolutely see the disk IO increase but this does not describe what’s causing it. What processes are consuming the most disk/cpu/ram? dedicated Nothing but IPB runs on this machine unfortunately I noticed the spikes after they had stopped. So we will have to wait till the next one begins and I will be digging into all the running processes. I will hang tight for about 5 days Edited February 16, 2022 by SJ77
IveLeft... Posted February 17, 2022 Posted February 17, 2022 You should still be able to look at the logs as you know when the spikes occurred - try the sar command Whats the memory like - usage when this happens ?
SJ77 Posted February 27, 2022 Author Posted February 27, 2022 Ok I am back in the thick of my high server load cycle. Which seems to happen every 5 days as indicated by high disk read patters. I used IOTOP to isolate these Thread ID's as being the issue. How can I turn this into actual information? I want to know what these processes are doing exactly so address accordingly. Having an arbitrary TID doesn't actually tell me much. Is there someway to investigate more from here? Thank you in advance.
Randy Calvert Posted February 27, 2022 Posted February 27, 2022 You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing.
IveLeft... Posted February 27, 2022 Posted February 27, 2022 11 minutes ago, SJ77 said: I used IOTOP to isolate these Thread ID's as being the issue. How can I turn this into actual information? ps aux Will list the process ids with what the command associated with it SJ77 1
SJ77 Posted February 27, 2022 Author Posted February 27, 2022 16 minutes ago, Muddy Boots said: ps aux Will list the process ids with what the command associated with it But I have the command showing in iostat. Can I find anything more specific beyond "nginx worker process"? 23 minutes ago, Randy Calvert said: You keep focusing on the activites that are showing disk read/writes. You’re not showing anything that is indicating how many connections are established. What memory is being consumed, what cpu is being consumed, etc. If the system is out of memory and is swapping it would make 100 percent sense that the disk is thrashing. Because I have high load that correlates to this strange pattern of high disk READ I/O 5 days it's good then 3 days it's bad... then repeats. I am trying to find out what is driving this bizarre patter because then I think I can stop the high load.
IveLeft... Posted February 27, 2022 Posted February 27, 2022 10 minutes ago, SJ77 said: But I have the command showing in iostat. Can I find anything more specific beyond "nginx worker process"? Try ps aux You could also (if not already installed) use in ssh glances If its not installed then Quote yum install glances SJ77 1
IveLeft... Posted February 28, 2022 Posted February 28, 2022 Also use this Put the process id number in this command and it will return a name of the process ps -p PIDNAME -o comm= example: ps -p 1572 -o comm= SJ77 1
SJ77 Posted February 28, 2022 Author Posted February 28, 2022 1 minute ago, Muddy Boots said: Also use this Put the process id number in this command and it will return a name of the process ps -p PIDNAME -o comm= example: ps -p 1572 -o comm= Shows nginx worker process. That could be many things unfortunately.
IveLeft... Posted February 28, 2022 Posted February 28, 2022 7 hours ago, SJ77 said: Shows nginx worker process. That could be many things unfortunately. Thats strange - Are you logged in as root in ssh ? Try ps ax|egrep "^ PIDNAME" Example: ps ax|egrep "^ 1572" SJ77 1
SJ77 Posted February 28, 2022 Author Posted February 28, 2022 8 hours ago, Muddy Boots said: Thats strange - Are you logged in as root in ssh ? Try ps ax|egrep "^ PIDNAME" Example: ps ax|egrep "^ 1572" It returns this Which of course doesn't mean anything to me. I am trying to translate this into exactly what is going on, so that I can take action. Surely there is a way to see what exactly this process is actually doing
IveLeft... Posted February 28, 2022 Posted February 28, 2022 @SJ77 Whats the full output of ps -aux |grep nginx
SJ77 Posted February 28, 2022 Author Posted February 28, 2022 5 minutes ago, Muddy Boots said: @SJ77 Whats the full output of ps -aux |grep nginx
IveLeft... Posted February 28, 2022 Posted February 28, 2022 This will give you details of the worker process memory useage pmap -x PIDNAME Your last one would be 8789 for the PID name if thats still high at 17% What openssl version are you on ? openssl version
IveLeft... Posted February 28, 2022 Posted February 28, 2022 Have you got proxy_buffering set to on ? Are you using mod_security ?
Recommended Posts