Concurrency & Long running requests: Sinatra, EventMachine, fibers and Node

The problem – long running requests with blocking IO

Recently, I’ve been working on an alternative view onto Google Analytics data. This presents some challenges:

  1. Calls to the Google Analytics API vary but some can take over 10s.
  2. At this point I only have a single VPS instance which runs various ruby and wordpress instances and has limited memory and db connections

Problem 1, long running API requests (yes 10s is long) normally means that a process is tied up (blocking) waiting for a reply for 10s.

Problem 2, the single VPS instance has limited resources. Memory wise, I cannot have many processes running each using up a chunk of mememory.

1 + 2 = crappy performance, when there are multiple concurrent users the limited number of processes available to service the users requests may be tied up waiting for replies from Google.

The solution – get more concurrency from a process

Solution: get more concurrency from a limited number of processes. I can think of two solutions in the Ruby world:

A) Use JRuby which has native rather than cooperative threads
B) Use EventMachine and non-blocking async http requests.

Like node.js, eventmachine implements the reactor pattern – essentially requests cooperate by giving back control to the reactor when they themselves are blocking on something e.g. an http API call.

Solution A, use JRuby is reasonable but has some drawbacks. In theory a single JRuby process can concurrently handle 100s of concurrent requests, processing each in it’s own native thread. However, in practice we still need to be careful about non-reenterant code, particularly in gems and extensions we depend upon.

Solution B, initially appears more complex but is perhaps more predicatable. We’re explicitly recognising the blocking API calls issue and making it a 1st class issue for our system. Whilst I don’t like to make things more complex than they need be, often it’s better to recognise a fundemental issue and address it explictly.

How evented/async approach works

So how does B work:

  1. server receives a request from browser
  2. server “queues” the request in EventMachine’s reactor
  3. EventMachine’s reactor gets back control from whatever was running and pulls the next event from its reactor queue and executes it
  4. our app gets control from the reactor and starts processing the request
  5. our app makes a http request to Google Analytics API using an async library (em-http) and provides a callback to be executed when a reply is received; control returns to EventMachine’s reactor
  6. the EM reactor pulls the next event from its queue and executes it
  7. our earlier http API call (from 5) returns and the callback get’s queued on the EM reactor
  8. the currently executing event finishes, returning control to EM
  9. EM picks up the callback (from 7) which processes the results of the API query, builds a response and sends it to the browser. The HTTP request is finished
  10. So where’s the pain? Well async code with callbacks can quickly get messy, especially if there are callbacks nested within callbacks, nested within callbacks… and then an exceptional condition occurs. It is possible to structure code as sets of actions, where the callback code just links actions together, handling the flow of control between them. However, it’d be nicer if we could just write code that looks like normal sync server code. Ruby (1.9) Fibers and Sinatra-synchrony to the rescue…

    Fibers – explicit cooperation

    Ruby 1.9 has Fibers. Threads enable implicit concurrency, with some layer deciding when to switch control from one thread to another. Fibers enable explicit cooperative concurrency. A piece of code is wrapped in a fiber and executed. The fiber can yield, freezing its current state and returning control to its caller, at some later point the caller can resume the fiber, with the fiber executing from where it yielded control, with it’s pre-yield state intact.

    Here’s an example:

    Calling main( fiber ) gives the following output:

    Hello from main: 321
    Hello from the fiber: 123
    and we're back in main: 321
    And we're back in the fiber: 123
    back in main and finishing: 321

    Fibers, making async code look like sync

    I’ll just give a very brief overview. For more detail here’s a great article explaining how Fibers can be used to make async code look like sync code.

    Now we can just write async code that looks like sync code – no callbacks:

    So this is great, async code without the nested callback headache! I’m a fan of the lightweight ruby web framework Sinatra. This helpful and clever person has put Sinatra and EventMachine together (with a few other useful pieces) in he form of Sinatra::Synchrony.

    Node Fibers

    Ruby not your bag? Prefer javascript/coffeescript and node.js? I’m finding myself writing more and more javascript/coffeescript code in the browser – coffeescript + backbone.js + jasmine (for testing) is pretty good. I can certainly see the attraction of node.js, using the same language on the server as the client, particularly if it comes with the high-concurrency goodness of async events. Well there may be some good news in the form of Node-fibers – in particular take a look at the Fiber/future API.

Website performance: Concurrency and its evil brother latency

There was a lot of buzz about Ruby On Rails a few years ago, many would say there still is. I find Rails focus on productivity very attractive, so a few years ago, for web work, I shifted over from Java to Rails. One significant difference between the two environments is how concurrent requests are handled; in particular the potential impact of high latency requests. This post isn’t about Java vs Ruby. However, the differences and similarities between their deployments did get me thinking about the impact of latency and concurrency on web performance and wondering about other approaches such as the event based approach of Node.js.

I use the term “high latency” to mean requests where the response doesn’t complete in under a couple of seconds, though arguably a couple of seconds is not particularly quick.

In our Java web systems we had lots of threads available. We didn’t tend to worry all that much about the odd high latency request as there were always plenty of other threads to serve other requests. In reality though, that was perhaps a little naive. In many applications there’s plenty of opportunity for a thread to consume and hold resources needed by threads e.g. db connections.

I shifted to using Ruby on Rails a few years ago now. Moving to Ruby on Rails was an eye opener for lots of positive reasons but (at the time) I was surprised to discover that Ruby servers are effectively single-threaded (there is threading in the de facto ruby 1.8 implementation (Matz MRI) but it’s cooperative rather than pre-emptive switching). The threading model effectively means that processes are used for concurrency – one request per process. On many systems (but by no means all) spawning processes can be slow, also processes consume memory – even if just the web stack when idle. As such there’s a relatively low limit to how many Ruby processes can co-exist vs the number of threads available to a similar Java web system running on the same hardware.

The one request per process model tends to make me much more paranoid about high latency requests than in my Java days. However, looking back I wonder if our old Java systems would have benefited from greater attention to eliminating high latency requests. Server crashes tend to creep up on you and often it’s not obvious what the one thing that’s causing the crashes is because it’s not one thing but a gradual degradation in responsiveness, that eventually gets to a point at which it tips into a server crash/stall. I rather grandly think of this as the “tipping point of crashes”.

You see a similar effect on busy motorways (highways, autobahns..). Think of the lanes as being like the processes available to service requests from browsers concurrently. If there is a slow car in one lane, the lane starts backing up and vehicles start switching to the other lanes. Capacity is reduced and the likelihood of a slow car entering one of the other lanes increases. The problem effectively cascades until suddenly everything stops. Once the congestion has passed and you’ve travelled a few miles more you may wonder what caused the problem in the first place, there’ll be no obvious cause.

So how to solve this.. Well continuing the motorway analogy, more lanes can obviously help. With more lanes you’re reducing the chance of a slow car affecting others. With additional lanes you may have moved the tipping point further out, a single slow car is less disruptive to others. However, the problem is still slow cars. Take the slow car off the road and the motorway copes with no problems. Here lies the danger, whilst lanes help, don’t let more lanes fool you into ignoring slow cars.

The Node.js web framework is interesting because it doesn’t follow the common model of process/thread per connection. Node.js provides an asynchronous framework, which put in very plain language means the thread doesn’t sit around waiting for databases, filesystems, messaging systems etc to reply to requests, instead it registers a callback to be called when the database/filesystem etc responds, the thread proceeds onto processing the next request.

Programming a system like this is a bit of a paradigm shift for most web developers, it is more like programming a GUI. However, decoupling concurrent processing of requests from the number of processes/threads should result in lower memory footprints and fewer problems with occasional high latency requests blocking out new requests and causing congestion problems.

I’m planning to have a go with Node.js pretty soon so it’ll probably form the basis of a future post here. In the meantime, here’s a good example explaining node.js.

So my tip: no matter what the system, always pay attention to high latency requests particularly if you are having occasional “mysterious” crashes.