Jump to content

Too many connections! (Going the load balanced route maybe)


Recommended Posts

Hi All,

So, several times over the past few weeks I have been hit by something, once or twice it was bots at inopportune times, other times like tonight it looked like something out of Russia.

It doesn't seem to be a proper DDOS attack, but there are a lot of connections. Add that to the 700 - 800 users on the boards and I run out of PHP processes really quickly.

On my current AWS EC2 instance (High Mem, XL - 33GB mem, 15 cores) I can maintain 200 connections before it goes to hell. I've played with config and hammered the server using loader.io. Tried Varnish in front of NGINX, tried going back to Apache, but I can't get beyond 200 connections.

Note this is 200 concurrent connections - not 200 users browsing. The equivalent of 200 people clicking a link at the exact same time.

So, I'm considering dropping my DB and Sphinx onto its own instance and also mount an NFS on it. Then having two load balanced instances in front. I can then use the AWS load balancer to spin up new instances as I need them. It can also maintain sessions for users.

So, my questions are;

Can anybody see anything wrong with this?

Am I missing something?

Besides uploads/ what else needs to be persistent across all http servers? (note session stickiness on aws - user a will always go to the same box as long as they are logged in.)

Thanks guys, looking forward to some suggestions here. This has been doing my head in for weeks.

Christian

Link to comment
Share on other sites

Hi All,

So, several times over the past few weeks I have been hit by something, once or twice it was bots at inopportune times, other times like tonight it looked like something out of Russia.

It doesn't seem to be a proper DDOS attack, but there are a lot of connections. Add that to the 700 - 800 users on the boards and I run out of PHP processes really quickly.

On my current AWS EC2 instance (High Mem, XL - 33GB mem, 15 cores) I can maintain 200 connections before it goes to hell. I've played with config and hammered the server using loader.io. Tried Varnish in front of NGINX, tried going back to Apache, but I can't get beyond 200 connections.

Note this is 200 concurrent connections - not 200 users browsing. The equivalent of 200 people clicking a link at the exact same time.

So, I'm considering dropping my DB and Sphinx onto its own instance and also mount an NFS on it. Then having two load balanced instances in front. I can then use the AWS load balancer to spin up new instances as I need them. It can also maintain sessions for users.

So, my questions are;

Can anybody see anything wrong with this?

Am I missing something?

Besides uploads/ what else needs to be persistent across all http servers? (note session stickiness on aws - user a will always go to the same box as long as they are logged in.)

Thanks guys, looking forward to some suggestions here. This has been doing my head in for weeks.

Christian

Sure sounds like something isn't configured correctly. 15 cores and 33GB memory should be more than enough for 200 concurrent connections. To put this into perspective, a Raspberry Pi can dish 200 concurrent connections.

Are you hitting your max_connections in MySQL?

Anything in the nginx or MySQL error logs?

What are your values for worker_processes and max_clients in your nginx.conf file?
What are you using for PHP? FastCGI, CGI, FPM?
Can you copy and paste your conf files here for MySQL, nginx, and whatever you're using for PHP?
Link to comment
Share on other sites

I spun up my test server (same specs as prod) and increased max_connections in my.ini to 512 (from 256) and worker_processes from 4 to 10 and still I capped out at 200 connections.

These are the production configs (so numbers above unchanged.)

my.cnf

nginx.conf

php-fpm.conf

www.conf

Please excuse their appearance. They've been hacked so much lately, I got sick of making them look pretty.

Thanks in advance!

Link to comment
Share on other sites

Have you thought about moving to a dedicated server, what you currently have is essentially a VPS, no matter how you slice the sales pitch and marketing of it, at the end of the day it's still a VPS, or part/slice of a server, if it can't keep up, put some meat and potatoes behind it! :)

Link to comment
Share on other sites

Have you thought about moving to a dedicated server, what you currently have is essentially a VPS, no matter how you slice the sales pitch and marketing of it, at the end of the day it's still a VPS, or part/slice of a server, if it can't keep up, put some meat and potatoes behind it! :smile:

Yeah I moved from a dedicated server to EC2 for various reasons. There are some pretty hefty sites and services using EC2, 'sales pitch' or not, so really it should be able to keep up.

For similar money I can have a single dedicated box or a few load balanced EC2 instance.

Link to comment
Share on other sites

EC2 is roughly 3~10x more expensive than a dedicated on a hardware level. No matter how you slice or dice, you're getting a bad deal. I don't know what your reasons are, but you downgraded as far as I see.

------------------------------

On to the general support side.

Your numbers and points are completely ambiguous. We can't really help with the kind of information given.

When you say 200 connections... What exactly are you saying?

  • 200 mysql connections?
  • 200 apache connections?
  • 200 apache clients?
  • 200 tcp connections?
  • 200 what?

And 200 people clicking at once isn't really a helpful identification as it can result in so many different outcomes.

What have you tried with varnish? It's not something you just install and forget it.

Ditto on nginx.

Have you gotten any errors like someone else said previously?

---------------------------

Onto the configs you showed. This is only the obviously wrong things. The numbers I suggest are lacking actual numbers or stats to provide any final answer. It's merely a starting point.

my.cnf

These are all problems:

wait_timeout = 3000 - WAY WAY WAY too high. You are literally making things very bad with this. Bad queries need to be killed with haste. Not wait 3000 seconds for them. Imagine this. You have maximum of 256 connections. Lets say you got a bad query that's just not going to end. You wait 3Ks for it. Another bad query... Now you have 2 connections tied up. Give it more time, and you got only bad queries tying up your 256 connections. Your site is then dead.

interactive_timeout = 50 - way too high for a high performance production server. Though you'll not actually use this...

max_connect_errors = 999999999 - ... Well, this itself isn't a problem... Just that if you found yourself having to raise this, it's a huge signal of other problems.

This is questionable as to why you set it:

max_allowed_packet=128M - quite high.

nginx.conf

These are problems:

keepalive_timeout 65; - way too high. Try 3

php-fpm.conf

Nothing here... than just pointing at www.conf

www.conf

pm.max_children = 500 - Unless you have one of those half million dollar servers, this number isn't going to work. Need better detail of specs (I don't have the same box to get stats for it). Try 30 first.

pm.min_spare_servers = 25 - Try 5

pm.max_spare_servers = 50 - Try 10

--------------------------

Post some stats. Things like output from top... etc. See sticky.

In general philosophy of management. This is what you have wrong. You see a lacking resource, so you increase maximums and get a bigger box. Sure, that works if you have infinite money. But better philosophy would be if there's something that's going to fail, fail fast. So you can serve stuff the next thing right away. Waiting for a bad process isn't helping anyone, just taking a lot of resource.

Link to comment
Share on other sites

I have to ask - Read a lot of your posts and they are very good - why are you called Grumpy ?

It's a handle I've been using for well over a decade. Frankly don't remember why I chose it back then.
& thanks for compliments.

Anyone that works on servers will be grumpy from time to time! :smile: lol

Now you are talking going from Grumpy to down right pissed off and ready to kill someone! windows needs to stay on the (desktop) porch.

It's so true, it brings tears to my eyes. ㅠ_ㅠ
Link to comment
Share on other sites

  • 2 weeks later...

Ok, so, sorry to semi-abandon this thread.

I did a lot of reading, some soul searching and a little bit of meditation. Now I'm on a dedicated dual-quad core Dell with 32GB memory. Running much better.

Thanks guys.

Link to comment
Share on other sites

I agree with Grumpy , AWS is not profitable at the moment , their prices are extremely high and they are taking advantage of users at the moment with those pricings.

However i expect a huge drop in AWS pricing within next 12 month

@Alex still working on LBT ?

Link to comment
Share on other sites

Even with their pricing, I could not get my board to accept 200 incoming connections or more (using loader.io to answer your earlier question, Grumpy) it just couldn't handle it. On the new box (similar 'specs' 16 cores, 32GB) I wound loader.io up to 500 connections and it was still fine (I didnt find its 'limit' but I was happy with the result.)

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...