website performance

Archive for the ‘website performance’ Category

Website performance: Concurrency and its evil brother latency

Tuesday, July 6th, 2010

There was a lot of buzz about Ruby On Rails a few years ago, many would say there still is. I find Rails focus on productivity very attractive, so a few years ago, for web work, I shifted over from Java to Rails. One significant difference between the two environments is how concurrent requests are handled; in particular the potential impact of high latency requests. This post isn’t about Java vs Ruby. However, the differences and similarities between their deployments did get me thinking about the impact of latency and concurrency on web performance and wondering about other approaches such as the event based approach of Node.js.

I use the term “high latency” to mean requests where the response doesn’t complete in under a couple of seconds, though arguably a couple of seconds is not particularly quick.

In our Java web systems we had lots of threads available. We didn’t tend to worry all that much about the odd high latency request as there were always plenty of other threads to serve other requests. In reality though, that was perhaps a little naive. In many applications there’s plenty of opportunity for a thread to consume and hold resources needed by threads e.g. db connections.

I shifted to using Ruby on Rails a few years ago now. Moving to Ruby on Rails was an eye opener for lots of positive reasons but (at the time) I was surprised to discover that Ruby servers are effectively single-threaded (there is threading in the de facto ruby 1.8 implementation (Matz MRI) but it’s cooperative rather than pre-emptive switching). The threading model effectively means that processes are used for concurrency – one request per process. On many systems (but by no means all) spawning processes can be slow, also processes consume memory – even if just the web stack when idle. As such there’s a relatively low limit to how many Ruby processes can co-exist vs the number of threads available to a similar Java web system running on the same hardware.

The one request per process model tends to make me much more paranoid about high latency requests than in my Java days. However, looking back I wonder if our old Java systems would have benefited from greater attention to eliminating high latency requests. Server crashes tend to creep up on you and often it’s not obvious what the one thing that’s causing the crashes is because it’s not one thing but a gradual degradation in responsiveness, that eventually gets to a point at which it tips into a server crash/stall. I rather grandly think of this as the “tipping point of crashes”.

You see a similar effect on busy motorways (highways, autobahns..). Think of the lanes as being like the processes available to service requests from browsers concurrently. If there is a slow car in one lane, the lane starts backing up and vehicles start switching to the other lanes. Capacity is reduced and the likelihood of a slow car entering one of the other lanes increases. The problem effectively cascades until suddenly everything stops. Once the congestion has passed and you’ve travelled a few miles more you may wonder what caused the problem in the first place, there’ll be no obvious cause.

So how to solve this.. Well continuing the motorway analogy, more lanes can obviously help. With more lanes you’re reducing the chance of a slow car affecting others. With additional lanes you may have moved the tipping point further out, a single slow car is less disruptive to others. However, the problem is still slow cars. Take the slow car off the road and the motorway copes with no problems. Here lies the danger, whilst lanes help, don’t let more lanes fool you into ignoring slow cars.

The Node.js web framework is interesting because it doesn’t follow the common model of process/thread per connection. Node.js provides an asynchronous framework, which put in very plain language means the thread doesn’t sit around waiting for databases, filesystems, messaging systems etc to reply to requests, instead it registers a callback to be called when the database/filesystem etc responds, the thread proceeds onto processing the next request.

Programming a system like this is a bit of a paradigm shift for most web developers, it is more like programming a GUI. However, decoupling concurrent processing of requests from the number of processes/threads should result in lower memory footprints and fewer problems with occasional high latency requests blocking out new requests and causing congestion problems.

I’m planning to have a go with Node.js pretty soon so it’ll probably form the basis of a future post here. In the meantime, here’s a good example explaining node.js.

So my tip: no matter what the system, always pay attention to high latency requests particularly if you are having occasional “mysterious” crashes.