Posted on November 22, 2013 by

It’s All Going to Fail

So just get used to it. The Internet, I mean. If you use it frequently, you probably already realize it breaks kind of a lot. But if you don’t use the internet a lot, like all the people who use my software, you think that the Internet always works. So when something bad happens, it is because the website is broken. It could not possibly be poor connectivity or user error.

When I have user problems, I get reports like

“The website doesn’t work”

“I tried uploading when I signed up, but it didn’t work”

“My account doesn’t work”

All of these are totally useless. But those are the reports I get. So, really, it’s up to me to make a better application that users don’t have problems with. Then I won’t get these error reports.

Your Javascript Isn’t Going to Load

I have a rock solid web app design. I minify my javascript and CSS. I deploy my assets to S3 to reduce requests to my server. I use CloudFront as a CDN to make sure my users have great download speeds no matter where they are. What could possibly go wrong? Well, for starters, sometimes the CDN just doesn’t load my Javascript. Not very often at all mind you. But if I have even a moderate amount of requests coming through the app, then every couple of days a page load will fail.

If my site isn’t built right (and when is it ever), then parts of the webpage just don’t work. Sometimes things fail silently. Sometimes the page doesn’t load correctly. While most people would refresh the page, my users say “The site stopped working, so I canceled”.

RequireJS, YepNopeJS, and HeadJS are all tools to help deal with this. I’ve been using HeadJS recently because I don’t have to rewrite much code, like you do with RequireJS.

<head>
<script type="text/javascript">
/* minified headjs code */
</script>
<script type="text/javascript">
    var scriptLocations = [
        '',
        'cdn3.domain.com/js/app.min.js',
        'cdn2.domain.com/js/app.min.js',
        'cdn1.domain.com/js/app.min.js'
    ];
    var loadtries = scriptLocations.length;
 
    function loadapp(callback) {
        head.load(scriptLocations[loadtries--], callback);
    }
 
    function testReady() {
        if (loadtries == 0 ) {
            document.location.reload();  // Fail… what to do if we continually fail
        } else {
            if (loadFailed()) {
                loadapp(testReady);
            } else {
                docready();   // This should call whatever would normally be run first
            }
        }
    }
    head.ready('mainScriptLabel', function() {
        testReady();
   });
</script> </head>

That’s the code I use to attempt multiple loads. There is a commit pending to allow promises. That would be a lot better than this. But for now, this has solved the odd CDN load error.

You Will Have Phantom Errors

I noticed that once I get over a few hundred users on a web application, they start having errors that I can’t ever reproduce. But the errors do actually exist. I’ve recently started using BugSnag to track these errors. I use Bugsnag because they had the easiest setup that worked with my existing codebase. I was able to start receiving live javascript failures in minutes.

Now that I am tracking these odd problems, I have to deal with my minified JS code, which is CDN loaded. So whenever a problem happens in a script loaded from the CDN, CORS breaks, and I get a useless Script Error Line 0 message. I’m actually still getting this working in all browsers. Setting up CORS through AWS gets some users, but certainly not everyone.

You Will Use Ugly Hacks

Is it just me, or does Heroku’s CloudAMQP Tough Tiger fail all the time? It’s to expensive for all this failure. See, I have problems connecting and uploading to S3 fairly regularly. So I put upload jobs in a queue and return a 202 to my users. However, queueing the job in CloudAMQP routinely fails with a pika.exceptions.AMQPConnectionError: 1. I could just tell the user to try again, but the entire point of this post is that users don’t try again. They say “The website is broken” and give up. So I have to create silly loops to increase the likelyhood of the job being queued in CloudAMQP.

    i = 0
    queued = False
    while queued is False:
        try:
            connection = pika.BlockingConnection(params)
            channel = connection.channel()
            channel.queue_declare(queue=CLOUDAMQP_QUEUE, durable=True)
            channel.basic_publish(
                exchange='',
                routing_key=CLOUDAMQP_QUEUE,
                body=message_body
            )
            queued = True
        except Exception as e:
            logger.error(unicode(e))
            i += 1
            if i == 3:
                queued = True

I find this rather ugly. But it seriously reduced upload failures. So it remains.

I’m out of fuel for this rant. I wanted to remind myself that everything that can go wrong, will go wrong. The people reporting that something has gone wrong won’t be helpful. You have to plan for all this upfront and move back your deadlines if need be.