Squarespace experiences a major outage early today

UPDATE: Details of the severe outage that happened last Wednesday has been posted today on Squarespace Status page. Here's the highlight of the root cause:

At 3:42 AM ET, we concluded that traffic to a particular website or traffic from a particular group of visitors was causing the issue. We successfully made a change in our web application firewall to block traffic to all URL patterns except to the root (home) page for all sites. We ruled out a problem from any visitors and knew it must be a particular URL pattern triggering a bug in our code. By 4:31 AM ET, the home pages of all Squarespace websites were loading properly.

For the next two hours, we methodically searched for the problematic URL pattern by replaying the requests found in our access logs against a subset of the servers in our web cluster. Our engineers continuously updated our web application firewall to allow more traffic to enter our systems. By 6:23 AM ET, we were successfully serving traffic at 95% of our normal level. At 7:48 AM ET, we identified and isolated the problematic URLs and all systems were back to normal.

Upon further investigation of the problematic URL, a certain data payload existing on a single high-traffic site was triggering an unexplored code path within our system. Whenever one of the servers in our web cluster received a request for that site, memory consumption would skyrocket immediately and the operating system would kill the underlying process. This event happened so quickly that no logging of the offending URL occurred, and the crash reports from the affected servers became corrupted, severely impeding our ability to determine the trigger for this issue.
— John Colton, Squarespace, VP Engineering

A code fix has been deployed already to address the problem. This was a serious outage and hopefully it doesn't happen again anytime soon. 

Source: Squarespace Status


As I was writing a post at noon today, local time here in Manila, I noticed something weird in Squarespace. I was uploading an image in my post but it was not loading. Then, I cannot save the post I am writing (a common problem I encounter in the platform). I tried accessing the site's home page and then that's when I noticed that there's a problem. I was not able to save a screenshot earlier of the message it returned but I did verify from status.squarespace.com that there is definitely a problem. Their @SquarespaceHelp account in Twitter also started tweeting about the problem. This was my first time to encounter a major outage. I did not encounter this when I'm still writing before in Blogger. Working in a software company, I knew that this is a major production environment problem and I knew as well that their Operations team are working to fix the issue as quickly as possible. This is a P1 issue! The outage started at 1:22 am today Eastern time and lasted for more than 6 hours. I was monitoring their status page and it automatically refreshes in case there are updates from Squarespace. I ended work at around 6:22 am and the outage is still not resolved. I just subscribed to their SMS functionality to be updated of the outage and when it will get resolved. I checked online and there are a lot of people disappointed of the outage. It's an obvious reaction especially if your site deals with a lot of transactions. I'll check in the coming days for the details of the outage. I hope its not something serious on the part of Squarespace. 

Here are screenshots of the updates I captured from their status page:

squarespace continued connectivity issues
squarespace connectivity issues