Concurrency & Long running requests: Sinatra, EventMachine, fibers and Node

The problem – long running requests with blocking IO

Recently, I’ve been working on an alternative view onto Google Analytics data. This presents some challenges:

  1. Calls to the Google Analytics API vary but some can take over 10s.
  2. At this point I only have a single VPS instance which runs various ruby and wordpress instances and has limited memory and db connections

Problem 1, long running API requests (yes 10s is long) normally means that a process is tied up (blocking) waiting for a reply for 10s.

Problem 2, the single VPS instance has limited resources. Memory wise, I cannot have many processes running each using up a chunk of mememory.

1 + 2 = crappy performance, when there are multiple concurrent users the limited number of processes available to service the users requests may be tied up waiting for replies from Google.

The solution – get more concurrency from a process

Solution: get more concurrency from a limited number of processes. I can think of two solutions in the Ruby world:

A) Use JRuby which has native rather than cooperative threads
B) Use EventMachine and non-blocking async http requests.

Like node.js, eventmachine implements the reactor pattern – essentially requests cooperate by giving back control to the reactor when they themselves are blocking on something e.g. an http API call.

Solution A, use JRuby is reasonable but has some drawbacks. In theory a single JRuby process can concurrently handle 100s of concurrent requests, processing each in it’s own native thread. However, in practice we still need to be careful about non-reenterant code, particularly in gems and extensions we depend upon.

Solution B, initially appears more complex but is perhaps more predicatable. We’re explicitly recognising the blocking API calls issue and making it a 1st class issue for our system. Whilst I don’t like to make things more complex than they need be, often it’s better to recognise a fundemental issue and address it explictly.

How evented/async approach works

So how does B work:

  1. server receives a request from browser
  2. server “queues” the request in EventMachine’s reactor
  3. EventMachine’s reactor gets back control from whatever was running and pulls the next event from its reactor queue and executes it
  4. our app gets control from the reactor and starts processing the request
  5. our app makes a http request to Google Analytics API using an async library (em-http) and provides a callback to be executed when a reply is received; control returns to EventMachine’s reactor
  6. the EM reactor pulls the next event from its queue and executes it
  7. our earlier http API call (from 5) returns and the callback get’s queued on the EM reactor
  8. the currently executing event finishes, returning control to EM
  9. EM picks up the callback (from 7) which processes the results of the API query, builds a response and sends it to the browser. The HTTP request is finished
  10. So where’s the pain? Well async code with callbacks can quickly get messy, especially if there are callbacks nested within callbacks, nested within callbacks… and then an exceptional condition occurs. It is possible to structure code as sets of actions, where the callback code just links actions together, handling the flow of control between them. However, it’d be nicer if we could just write code that looks like normal sync server code. Ruby (1.9) Fibers and Sinatra-synchrony to the rescue…

    Fibers – explicit cooperation

    Ruby 1.9 has Fibers. Threads enable implicit concurrency, with some layer deciding when to switch control from one thread to another. Fibers enable explicit cooperative concurrency. A piece of code is wrapped in a fiber and executed. The fiber can yield, freezing its current state and returning control to its caller, at some later point the caller can resume the fiber, with the fiber executing from where it yielded control, with it’s pre-yield state intact.

    Here’s an example:

    Calling main( fiber ) gives the following output:


    Hello from main: 321
    Hello from the fiber: 123
    and we're back in main: 321
    And we're back in the fiber: 123
    back in main and finishing: 321
    

    Fibers, making async code look like sync

    I’ll just give a very brief overview. For more detail here’s a great article explaining how Fibers can be used to make async code look like sync code.

    Now we can just write async code that looks like sync code – no callbacks:

    So this is great, async code without the nested callback headache! I’m a fan of the lightweight ruby web framework Sinatra. This helpful and clever person has put Sinatra and EventMachine together (with a few other useful pieces) in he form of Sinatra::Synchrony.

    Node Fibers

    Ruby not your bag? Prefer javascript/coffeescript and node.js? I’m finding myself writing more and more javascript/coffeescript code in the browser – coffeescript + backbone.js + jasmine (for testing) is pretty good. I can certainly see the attraction of node.js, using the same language on the server as the client, particularly if it comes with the high-concurrency goodness of async events. Well there may be some good news in the form of Node-fibers – in particular take a look at the Fiber/future API.

Coffeescript and Jasmine for testing

Recently I’ve been writing a fair bit of browser-side Coffeescript + Backbone.js. Nice, but as the code base grows and gets more complex, the rubyist in me feels increasingly uncomfortable and less productive with the lack of TDD/BDD. What to do… Jasmine BDD for javascript seems to be a good answer.

Jasmine follows the same spec/BDD style as RSpec for Ruby and Kiwi for Objective-C (Kiwi is very good BTW). That is, set up your context using ‘describe’ + ‘before’, then make assertions on the post context state using ‘it/should’ blocks. Jasmine runs your test suite in the browser and seems to be blindingly fast.

For me, Jasmine tests also highlight one of the reasons I like Coffeescript more than plain old JS – less lines of code. BDD specs tend to have a lot of nesting as you build up contexts, so Jasmine written in JS tend to have a lot of nested functions. As these test functions tend to be pretty straightforward assertions, it’s seems a reasonably fair to compare coffeescript vs the generated js. Here’s one of my tests, notice that the coffeescript version is just 49 lines and pretty easy to read; the JS version is 78 lines and to my eye it’s not terrible but it is harder to read.

Coffeescript

Javascript

Javascript dropdown menu of recently viewed pages

Lately I’ve been sharpening my Javascript skills and have read Douglas Crockford’s excellent book Javascript: The Good Parts. Last week I had a need to create a recently viewed cottages drop down menu for our holiday cottages website so I thought it would be interesting to use Javascript together with the robust Prototype library.

One advantage of using javascript is that the personalisation is client-side. This technique works with static pages and indeed, any pages rendered by the server can be the same for all users – potentially simplifying caching.

For this blog post, I’ve modified and generalised the script to show recently viewed pages instead. You can see it working by clicking on ‘Recently Viewed’ in the navigation bar at the top of this page. The script should work for many websites and blogs without much modification. Here’s the details..

The script works by creating a cookie whose value is JSON representing an array of ‘recently viewed’ objects. The ‘recently viewed’ object contains information about a recently viewed page. The array is ordered so that the most recently viewed page’s entry is the first element. No pages are duplicated, if you view a page again it simply moves to the beginning of the array. In order to keep the cookie value reasonably small, by default I’ve limited the array of recently viewed pages to the last 5 unique pages.

I was surprised to discover that Javascript doesn’t have the common block level scope found in C, Java, Ruby etc. Javascript has function level scope. We can use this idiosyncrasy together with Javascript’s support for closures to achieve something close to a module, with private variables and functions as well as public functions. Here’s the template for a ‘module’:


We need a few functions for reading and writing our cookie, and setting the expiry time for the cookie to some future date. For berivity, I won’t list those here but they’re included in the complete source at the end of this post.

We need a function for remembering a page view in our cookie. This function needs to maintain the ordered array containing the objects representing page views, the most recently viewed page comes first. It also needs to ensure there are no duplicate page views in the array. I’ve written this as two functions, a private remember function and a public rememberPage function. The public rememberPage function is only responsible for obtaining the page url and title. The private remember function is responsible for creating and maintaining the cookie of ordered unique recent objects. Following the format of our module template from above, notice how the private remember function is a stored in a variable of the constructor function, whereas the public rememberPage function is stored in a variable of the that object.

Here’s the corresponding recent function for obtaining the recently viewed page objects from the cookie:

Now on every page we want to remember we include a call to the rememberPage function. For example, my blog’s header html file has the following body tag which invokes our onload function:

Here’s our onload function (we’ll see the function renderPopup later):

At this point, we’ve got enough code to populate and maintain our recently viewed cookie. Next up is the drop down menu displaying links to the most recently viewed.

To create a drop down menu we use the CSS position. First, lets create our menu heading – a link that will cause the popup menu to display. To position the popup menu relative to the menu header link, we wrap the menu header link in a span#recently-viewed-popup-container which has position: relative. As well as the menu header link this span also nests a div which contains our popup menu.

The popup menu div has position: absolute together with left: Xpx; top: Ypx; where X and Y give the top left position of the popup menu relative to the top left of the menu header div. The popup also has a z-index: 100 to push it on top of everything else. Notice the style=”display: none;”, this hides our popup until we’re ready to show it.

Notice the onclick=”RECENTLY_VIEWED.toggleDisplayPopup();return false;” links, the return false prevents the browser following the href and changing the current url.

This brings us to the javascript function which shows and hides the popup menu – toggleDisplayPopup. Here we make use of Prototype’s $(id) to set the display of the recently-viewed-popup div, inline to show the popup and none to hide it. The variable popupId specifies the css id of our popup menu div. It is a private member variable declared and initialized in our constructor function. The popupId is available to our public function via closure.

Next we need a function to populate our popup menu from the contents of the recently viewed cookie.

Like the rememberPage function, this renderPopup function also needs to be called on every page where the recently viewed popup menu is available. Remember our body tag contains:

<body onload=”RECENTLY_VIEWED.onload();”>

And our onload function (from earlier):

That’s it. No doubt there are numerous improvements that could be made and adaptations to other JS libraries instead of Prototype. Please do comment if you’ve a question, suggestion, have done something similar or just want to say hi!

Bye, Paul

P.S. Here’s the complete code:

HTML snippet:

CSS snippet: