How A Downtime Alert Saved My Online Infrastructure

It's Easter break, and in Sweden, we get both Friday and Monday off, so I decided to take a short trip and spend this time at a friend's place in the middle of Sweden. As I write this post, I am surrounded by nature, calmness, silence and relaxed vibes, something that can be hard to attain in the busy streets of Stockholm.

I avoided electronics as much as I could this weekend, but a couple of text messages sent me into panic mode and forced me into sitting down and opening my personal laptop, here is the story of what happened.

I use Fathom Analytics for website analytics but also for uptime monitoring, I didn't think much about the monitoring feature when I set it up, but its notifications saved me from a lot of misery, hassle and trouble.

My friend and I wanted to do some snowmobiling today, we woke a bit late but decided to through with our plans anyhow. We loaded the snowmobile, layered our clothes and started driving to a nearby location where the snow was supposed to be perfect for snowmobiles. Unfortunately, we didn't find the amount and quality of snow we were looking for, and given it's the Easter break, there were plenty of people in many parts of the location, so we decided to drive around and enjoy the views while chatting about random stuff.

We drove around, came upon some breathtaking views, and even stopped at a lovely shop where I bought a remarkably beautiful scented candle. Eventually, we made back to this place, and as we were disembarking, I noticed three text messages from Fathom.

Despite being annoying at the fact my blog is offline, I wasn't distressed (at least not at first) for a couple of reasons. First, I run this blog on a rather cheap server, so I expect some downtime now and then, and wouldn't lose any sleep over it. Second, I have daily backups of the entire site, so even if the server burns down (fingers crossed that never happens), I can restore everything within a couple of hours max. With that in mind, I opened my laptop and started looking into what's going on, and it was way worse than what I expected.

Did I Run Out Of Money?

My first thought was that a payment failed and my hosting provider shutdown the server due to the missing payment, so I logged into my provider and checked the status of the server, and to my delight, it was on and working as expected.

Was My Server Assigned A New Public IP?

With that out of the way, my second guess was that my server rebooted and got a new public IP, which no longer matches the one I configured in my DNS records. I checked the public DNS A records for my domain and voilà, they were showing a different IP from the one assigned to my server. At this point, I didn't think much of it, I just logged into my domain registrar and looked for that particular domain (I have a handful of domains, we all do) and that's when shit hit the fan

It Wasn't Technical, It Was Me

The status column for the domain on which this blog is hosted said Grace, and that was enough to cause me to go from chill to panic within a second or two. Apparently, I forgot to renew my domain registration on time, it expired, and went into the Grace state and part of that is suspending active DNS records so the IP that was showing in the public DNS A records isn't the old server IP, but rather one assigned by my domain registrar, splendid.

3.1. With the exception of sponsored gTLD registries, all gTLD registries must offer a Redemption Grace Period ("RGP") of 30 days immediately following the deletion of a registration, during which time the deleted registration may be restored at the request of the RAE by the registrar that deleted it. Registrations deleted during a registry's add-grace period, if applicable, should not be subject to the RGP. (Source: ICANN)

The fact my domain registration expired and the DNS records I had were no longer active meant this site went offline but more importantly, other more important services also stopped working. This is simply a nightmare and if it wasn't for the downtime alert I got from Fathom Analytics, I might have not remembered to renew my domain, and that would have been really, terrible.

Crisis Averted

Thankfully, the domain was still in the grace period, so I promptly renewed it and within a few minutes, this site went back online, and other services followed shortly after. With that out of the way, I dug into why I missed the renewal notices and turn out, I used an email address that I don't check as frequently nowadays, so I updated my account to an email address that I monitor closely, and added a reminder for renewal to my To-Do app.

Lessons Learned

I learned a couple of valuable lessons from this experience, here are they in no particular order:

  • Set up uptime monitoring for your services, it might save you in ways you don't even expect or know.
  • Create a reminder to renew your important domains (even better, set them to autorenew).
  • Have working backups of your critical services and sites, and;
  • Enjoy your weekends away from computers and electronics.

Now I must go back to enjoying the few remaining hours I have in this calm and peaceful place. I hope you learn something from this story, and not repeat my mistake. Enjoy your break, and renew your domains.