Jump to content

Matt

Management
  • Posts

    69,987
  • Joined

  • Last visited

  • Days Won

    626

 Content Type 

Downloads

Release Notes

IPS4 Guides

IPS4 Developer Documentation

Invision Community Blog

Development Blog

Deprecation Tracker

Providers Directory

Projects

Release Notes v5

Invision Community 5 Bug Tracker

Forums

Events

Store

Gallery

Everything posted by Matt

  1. It's a cloud site, so he won't be able to do that. 😄 I'll take a look. It shouldn't be using datastore, it should be using Redis.
  2. I'll make a note to see if we can be more graceful about that, but it just looked like the user session timed out that stores the flags that a theme editing session is in progress.
  3. .. is already fixed, but posted this bug as I know others will notice. 😅
  4. Matt

    5.0.0 Alpha 16

    Changed Long ID to 5000025
  5. Matt

    5.0.0 Alpha 15

    Changed Current Release to No
  6. This is the latest version of Invision Community 5.
  7. Matt

    5.0.0 Alpha 14

    Changed Current Release to No
  8. This is the latest version of Invision Community 5.
  9. We had to deploy some new nodes, it can take a few minutes to settle. Actual downtime was measured in seconds, I've been watching the response times graph obsessively. 😅
  10. Thanks Mike. It's never nice when these issues arise but we are responsive to them and work hard to resolve them. 99.9% of the time it's smooth sailing, but these 0.01% events sure are memorable. 😄
  11. Thank you all for your patience, I have approved every single post that was made both good and bad, and will hopefully address some of the points raised and the issues from the past 2-3 days. It's been a very busy few days but here's a brief run down of events. We were alerted to short (5-8 minute) bursts of a sharp increase in response times starting on the 21st. These bursts didn't last long which made it hard to track down. Our Cloud platform is made from several components, each of which could cause latency issues. We have a WAF to filter traffic, a CDN to create short term caches for guest traffic, ElastiCache (in a read/write cluster), MySQL database clusters (multiple read/write) and then the processing layer where PHP lives. The bursts between 9/21 and 9/23 only affected about 15% of our customers due to how the database clusters are segregated but coincided with an increase update of 4.7.18. One of the main changes in 4.7.18 was to how often the write MySQL servers are used. The write servers are really good at writing (insert/delete/update) but less useful for complex select queries. One downside of user a read/write separation is the replication delay. You can insert a record to the write server, and this has to then copy it across to the read servers. So, when we recount the last post in a topic, and forum, and recount the number of comments, etc we run that select query on the write server so we know it has the latest data. This is fine, but it puts a heavy load on the write servers. So, in .18 we removed the select queries from the write server and added a task to recount again every five minutes or so just incase there are any odd issues from race conditions on busy sites (and we have some super busy sites - one currently has 36,000 active sessions). After a lot of debugging, we tracked the issue down to the use of ElastiCache to manage the locking flags when recounting. This meant that busy sites couldn't lock fast enough with ElastiCache as there is a very tiny window of replication lag. So instead of the expensive recount query running just once, it would run 3, 5, or 10 times before the lock was created. Multiply this for all sites and it increased load at the database level due to InnoDB locking and unlocking rows. We tried several interventions which seemed to work, but randomly did not a day later. This is very frustrating for us, and very frustrating for you. Yesterday, we found a solution via a hotfix deployed to all 4.7.18 sites on our platform to use database locking. It drives up database I/O a little but not enough to cause concern, and we have rewritten this recounting feature already for .19 to use a task which has more robust locking and is proven to avoid race conditions. Yesterday we did see random latency issues that affected most sites between 10am and 12pm EST on and off, with peaks occurring around 10am and 11:30am. This is in the ElastiCache layer which we're woking in, although we have made changes to the configuration to make that stable. This has taken a few days to get to the bottom of and has involved multiple members of our engineering team and some long days, so I thank you for your patience. These burst happened so quickly (in relative terms, I know it feels like forever when you experience it on your site) that our external status monitoring doesn't pick it up, but rest assured, our internal monitoring does. It's very loud and impossible to ignore. 😄 I'll address some of the comments: As mentioned above, these burst are over before our external status monitoring picks it up and often doesn't affect all sites due to the way the MySQL clusters are set up. Thanks, we had resolved most of the issues last night. We were running some very short term ElastiCache configuration tests. We were monitoring the response times but needed a few minutes to gather some data. This lasted about 8 minutes total. Again, thank you for your patience and I know that it can seem like nothing is happening, but we have strong internal monitoring and have been focused on resolving these latency issues. A large complex platform like ours can be quite organic and tough to diagnose as GitLab found out when experiencing similar randomly latency issues. I can only apologise. The past few days are not indicative of our normal service. I'll reply to your ticket in more detail.
  12. Matt

    5.0.0 Alpha 13

    Changed Current Release to No
  13. This is the latest release of Invision Community 5.
  14. This is the latest release of Invision Community 5.
  15. Matt

    5.0.0 Alpha 12

    Changed Current Release to No
  16. A little update on development. Our public preview of v5 alpha has been very busy, with lots of suggestions and bugs being posted. We released alpha 12 last week, and you can check out all the bugs fixed in our release notes. We've also tested upgrading a clone of this community, a private site we use with custom apps built on v4, and created a new custom site with multiple needs and those have been informative. The next step is public beta, and we're very close to that, so I wouldn't expect that to be very far out at all. Thanks for all your patience. Although v5 is currently alpha, it's proven to be incredibly stable already and we're looking forward to the next step!
  17. Between v4 and v5? Or between alpha and beta? Unfortunately not, themes and apps made for v4 will need refactoring to work with v5.
  18. I'll add it to today's call list and get back to you later.
×
×
  • Create New...