Some screenshots of gobook.

A Weekend With Go

This weekend I decided to learn the basics of Go and see if there was a way that I could experiment with the language with a quick project, preferably a web app.

I started by reading up the Getting Started guide on the golang.org website and then started using A Tour of Go. About half-way into it, I felt like I had absorbed about all that I could without reading some more complex examples and experimenting.

Installing Go on my laptop was pretty simple. I did have a few issues with getting environmental variables correct to get goinstall to work.

A few google searches lead me to the projects section of the Go Dashboard. From there I started looking at the web frameworks and toolkits. I’ve got a good amount of experience with some of the more do-it-yourself web frameworks like Tornado, so I was looking for the closest thing to that as I could find. The Twister project looked pretty promising.

At this point, I decided to port the logbook project Go as a first project. I picked this one because it is a fairly simple CRUD web application that I wanted to port over to use MySQL from SQLite and integrate bootstrap into it.

This meant finding solutions to two problems: How do I engage MySQL? How should template handling be done? There were a few MySQL client libraries listed on the projects page and a few additional ones mentioned on the mailing list, but the majority were either abandoned, incomplete or unstable. I ended up using GoMySQL, although it has been updated since 5/2011. I’m really hoping that I stumble on or someone recommends a better alternative.

For templating, I saw that there was a mustache library, mustache.go, and went with that. It supports the basics that I’m looking for and I’ve had an itch to use it for a little while now. Although it hasn’t been updated since 4/2011, it seems pretty stable.

Porting the code over was pretty fun. It feels like the standard library set in Go is a bit bare and lacks some of polish that Python has, but all of the basic functionality ported over without a fuss. I had to write a string scanner to parse space/quote separated tags out of a list and deal with MySQL in an ugly way, to name a few.

With all of that said, the project is “done” and all of the functionality ported. Take a look at the gobook project on GitHub and tell me what you think. I’d love to get a quick code review from some people more skilled in Go to learn about how I could be doing some things better.

First impressions: I love it. At my current job I do about 70% Java and 30% c++, mostly building concurrent and distributed backend/platform systems. I usually work on multi-threaded apps and use several different libraries to work (or fake) async functionality: nio, akka, boost thread, libevent, zmq, etc. My last job was 100% Erlang, which has an awesome reputation for massively concurrent development.

What I love about Go is that it feels like it takes the best of what I’ve worked with and distilled it down into a pretty basic set of primitives that cover most of them. I’m not sure if I’ll get to work on Go projects much, but I’d really love to.

Boost serialize and vectors

This is more of a note to future me: When using boost serialize and archive with an object that contains a vector of objects, make sure you include this:

#include <boost/serialization/vector.hpp>

Making a better torrent tracker

A few weeks ago I decided to write a bit torrent tracker service from scratch in C. There are a lot of languages that could be better suited for this sort of work, but I chose C because of speed and stability. I’ve written a few services and daemons in C before and I think that there is a lot that can be done with the language and this domain of problems.

The first reason is to smaller services that can easily be dropped onto a host and start running. Of the torrent trackers that I’ve encountered, all of them have rather large dependency chains. Needing to install a Apache, MySQL, PHP, a set of PHP modules and a 5000+ line PHP application just to start sharing some videos of my daughter at the park is not what I think of when I think of portable software. This brings us to goal #1: Running the service should be as simple as updating a configuration file and starting a single daemon.

The next reason is related to the first and has to do with the service footprint. Normally, defining an application’s footprint is limited to how much memory it consumes and how much disk space it eats up. I’d like to expand that definition to include how much time is spent serving requests and how many different resources are impacted. If you include the relative footprint of Apache, MySQL, the JVM, etc, the cost of the server required to do all of this starts to rise. You move away from being able to server many thousand users on a single 256 meg Slicehost slice to several thousand or fewer. Now we have goal #2: Running the service should consume as few resources as possible and should not, under regular circumstances, interfere with other applications and services on the host.

I think the way trackers manage the swarm and peers connected to them can be dramatically improved as well. Of the code that I’ve read, and granted I haven’t been able to peer into the trackers used by the very large sites out there, use very very simple algorithms to determine how a peer list is given to a peer. I use the term algorithm pretty loosely here, most of them do a random select on a database or pull the N most recent peers. I think there are plenty of ways that torrent trackers can play a more proactive role in shaping the swarm and enforcing different behaviors. This reveals goal #3: The service should be extensible to support different types of processes and algorithms to craft and shape the swarm it serves.

That last goal is pretty flexible, and ultimately, I’d love to see trackers be able to play different roles. They could range from private, well guarded trackers for personal use that a family may use to share photos or documents to public and wide-spread services used by companies looking to take advantage of the swarm to share information and content.

To see how these goals are being applied, check out JohnLocke, an open source bittorrent tracker written in c using libevent 2.0. It is in a functional but immature state and I’ve been using it for a few days in a test environment with a few different torrent clients (utorrent, ctorrent and bittorrent) connecting to it and using it.

I had this epiphany on my build system when I did a lookup of the project and saw that GitHub had classified it as a shellcode project instead of a C one. I’m really digging cmake these days; it is so much easier to use.

Creating priority-influenced jobs with Barbershop and Redis

There are lots of projects and systems that use Redis, a superb key/value store, as a job management and information system. Out of the box, Redis supports things like lists, hash sets and all sorts of other fancy things that make it what it is.

Recently I had the need to inject the concept of priority into a job distribution and management system and I wasn’t pleased with what is out there. I’ve used Gearman before and like the idea that works subscribe to job distribution systems and get notified of work. I also like that with Redis, thanks to Resque, you get lots of flexibility in what task/job meta-data is stored and how. With it if you start to grow dramatically the number of jobs being supported, you can scale and support more tasks/jobs by adding more Redis servers.

So, with that said using a Resque like system is ideal for my needs. But then came the requirement of being able to adjust the priority of tasks. The current Resque implement doesn’t really support the concept of some work needing to be done before others which got me thinking about how Barbershop can be used to fill in the missing pieces.

Barbershop is a simple priority queue daemon written in C using libevent and some well-crafted indexes and reverse indexes. With Barbershop, your created jobs’ ids are injected into Barbershop and the clients then query Barbershop for the next task/job to perform. You get the power that is Redis to scale tasks/jobs horizontally and the ability to increment and peak into a priority queue to adjust your application as needed.

Because Barbershop uses the same line-wire protocol as Redis, integration is pretty straight forward.

Introducing Barbershop, A Priority Queue Daemon

I created a new open source project a few weeks ago called Barbershop. Barbershop is a priority queue daemon in C using libevent. The goal is to create a fast service that allows a queue consumer to get the next highest priority item out of a queue.

In it’s current form things are dead simple. There is only one queue per daemon and the items and priorities supported are 32bit integers. The queues are only kept in memory but there is a component that writes the queue to a snapshot and allows you to bootstrap a queue from a snapshot when starting the daemon.

What is this useful for? Well, a lot of things really. The primary use case is when you’ve got a worker process that maintains and uses a priority queue, either in memory or on a database like MySQL, that needs to be scale to 2 or more workers. You could go the route of using MySQL but can add costly and unnecessary load on your database. RabbitMQ does support changing priorities for items or queue peaking which crosses it from the list. This is where Barbershop comes into the picture.

Your queue loaders and adjusters feed information into Barbershop just like your queue consumers. The producers of queue items don’t really care about which workers do actually get the queue item to process. When a worker needs a new item to process, it makes a next request.

It’s open source under the MIT license and on GitHub. I don’t claim to write beautiful C, so if you find something that needs to be improved either let me know or fork it and send me a pull request. There are lots of documentation and a small php extension showing how to use it.

Globally Shared Queues

In the Erlang apps that I’ve written, there seems to be a recurring issue. I’ll build an application made to run across several nodes but there could be some sort of shared process or functionality that is only meant to run on a single node. This could be something like a queue, application specific cache or connection to an external service.

The most recent case is for an application called I Play Warhammer. When processing Warhammer Online characters, the application encounters many hundreds to thousands of them a minute and I want to be able to keep a queue of the incoming data processing actions to not overload the application during peak game usage. If the queue gets out of control, I want to be able to quickly destroy it. If the node that it’s running on goes down, it needs to be started on another node. The data in the queue doesn’t have to be durable, but the service that is the queue has to survive within the grid.

With that, I wrote a small proof of concept behavior called ets_queue. The first use of this module is allow developers to create workers based on the behavior defined by the module. The idea is that the worker module exports the functions init/1 and process/1 that define which queue they work against and how to process data.

The second use of this module is to create a durable process that manages the actual queue based on an ordered_set ets table. Each time the queue/2 and dequeue/1 functions are called, the bootstrap_queue/1 function is called that either creates a queue server process or returns the globally registered pid for that process.

This is the best way that I’ve found one can tackle this sort of problem. I’m actively looking for better solutions and alternatives so please let me know if you have one.

Learning Python

I wrote my first Python/Django app a few days ago, “Top Rupture Games”. It was a fun little project for a few reasons. First of all, I’ve been wanting to play with Rupture data for a little while know and had some time over the weekend to see what I could do with it. Second, I’ve been itching to play with Python, and subsequently Django, ever since being introduced to it over at Mythic a few months ago. Lastly, I wanted to create a Rupture client library, even if it’s just a proof of concept, in Python.

Out of this venture came two new projects on GitHub. The first is pyRupture, a native Python library/module for interfacing with Rupture. It uses the Python Protocol Buffers library/modules to make API calls against Rupture. It only supports the API calls that I needed for the rupture-stats project, but it’s really easy to extend.

The second is rupture-stats, a Django project that uses pyRupture to compute and display a list of the top games on Rupture and render a graph of recent gaming activity. You can see a demo of it in action at http://67.207.133.142/.

Without going into too many details, the flow of creating and sorting games by popularity isn’t that complex. Every n minutes the app pulls the recent game activity feed and creates an internal list of recently played games. Then, it goes through the process of fetching the feeds for each of those games and aggregates the number of game sessions by day. Once it has that list, it removes any games that have either no recent activity and stores the computed information. The games are ranked via a “score” which is the total number of game sessions over a 21 day period and they are displayed on render.

To generate the graphs, I used the Google Charts API with a list of the number of game sessions over the past n days, filling any any days that didn’t have any with 0s.

Both projects are open source under the MIT.

Mochevent

This past weekend I started a small work-related project called mochevent. It’s purpose is to offload the work HTTP request build up, tear down and socket handling onto a c application. As requests come in, the request method, uri, headers and body are dispatched off to an Erlang process on a remote node to do whatever heavy lifting is required to fulfill the request. The Erlang node then sends the results back to the cnode to then be massaged into to an HTTP response which gets sent to the client.

The application is based on the the proof of concept application developed by Richard Jones and posted to http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3/. The goal is to get it as small and tight as possible so we are really looking at performance and memory footprint when looking to improve the application.

The project can be found on GitHub as mochevent and contributions are welcome and encouraged. This is looking like we will be using it in an official capacity at work so keep watching for feature additions, improvements and bug fixes.