Are you mission driven?

What is your startup about? What are *you* about? In “Mission Driven Companies Do It Better” a case is made for setting down a mission statement for your startup to make sure that you and your team know what you are ultimately trying to achieve.

As one of the comments made at the end of article says though, writing a mission statement is one thing, getting buy in for that mission is something else.

In “The Seven Habits of Highly Effective People“, Stephen Covey talks in depth about writing a mission statement for the business. To get deep buy in from your team, your team has to be completely involved in crafting the mission statement. It isn’t something that can just be imposed from the outside.

In many ways the same is true for us as individuals. Unless we take time out to focus on what is really important to ourselves, how do we know how to prioritise? How do we make plans for the future? Perhaps writing a personal mission statement would be a useful exercise for all of us.

(A version of this article originally appeared in Liverpool Startup Digest on 18/11/2014)

Posted in Uncategorized | Leave a comment

Customer Development for Enterprise Solutions

I’m a big fan of Eric Ries’ The Lean Startup and, by extension, Steve Blank and Customer Development, as described in Four Steps to the Epiphany. What I find most compelling is the focus on minimising waste by making sure that we’re solving a real customer problem. That means spending time with prototype customers before anything is even built.

In the Business-to-Consumer (B2C) world that are most often described in “lean” books, such as Ash Maurya’s excellent Running Lean, meeting proto-customers is a relatively easy task. That doesn’t mean it’s easy to go and talk to them, but almost by definition there are many of them to be found. And if there are many customers, making a few mistakes when talking to a few of them will still leave you plenty to sell to.

Depending on what you’re product is, the process of Business-to-Business (B2B) customer discovery may be quite similar. When developing enterprise solutions though, the problem of meeting your potential customers (and not accidentally wrecking a future sale) suddenly becomes rather more significant.

In this post, I’m going to describe how we’re applying the lessons of Lean Startups to enterprise software development at Sea Level Research. In particular, I’m going to try and provide some insight into how focus on learning what our customers problems really are and how we work with our customers to develop a scaleable, repeatable business model, in the sense of a tech startup.

Getting up close to customers

Sea Level Research enables ports and shipping companies to improve efficiency by optimising the flow of vessels into, and out of, ports. Many people aren’t that familiar with ports and shipping so I’m going to just backup a little and explain some the issues our clients face.

1. Sprint and loiter

Ship fuel is incredibly expensive – $600 per ton in a large container ship that may well burn 150 tons per day (yep, that’s $90 000 a day just in fuel!). Reducing costs is a massive priority. As fuel usage is roughly proportional to the cube of ship speed, “slow steaming” is the new normal.

The problem is that being late on scheduled services also costs a lot of money. Missing a booked berth costs around $15 000 meaning that ships try to arrive early – meaning that they burn more fuel than is necessary.

For tidally limited ports where large vessels can only enter during certain time periods, missing the tide means a wait of up to 12 hours and the costs mount further.

2. Rate of unloading

The enormous container ships now being built (the latest carry 18 000 “twenty foot equivalent” [TEU] containers) take several days to load and unload. The critical factor here is the number of crane moves per hour. For in river berths where the water height varies, this means that the rate of unloading changes with the tide – and surge level. In a just-in-time environment for delivery and removal of containers, understanding this timing is critical for avoiding huge traffic jams.

3. Optimal loading level for export

Part empty ships are expensive to move around so the maximising the amount of cargo loaded onto a ship for departure during that limited tidal window is essential. Knowing exactly how deep the ship can sit in the water and still depart on time is therefore a priority for many exporters.

Great! But how did we find all this out? We certainly didn’t start here. We learned by going and talking to our potential customers.

The Customer Discovery model

There are four essential steps in the Customer Discovery stage of a startup (summarised from Four Steps):

1. Stating the Hypotheses

  • Product hypotheses
  • Customer hypotheses
  • Channel and pricing hypotheses
  • Demand creation hypotheses
  • Market type hypotheses
  • Competitive hypotheses

2. Treat and Qualify Hypotheses

  • First customer contacts
  • Problem presentation
  • In-depth customer understanding
  • Market knowledge

3. Test and Qualify Product Concept

  • First reality check
  • Product presentation
  • More customer visits
  • Second reality check
  • First advisory board members

4. Verify

  • Problem verification
  • Product verification
  • Business model verification
  • Iterate or exit

This is WAY too much to cover in one post so I’m going to concentrate on the second stage here, which essentially where we’re at.

How we’re doing Customer Discovery at Sea Level Research

Suffice it to say that our (or rather my) original hypotheses were (almost) totally wrong.

  • We thought that we would be selling innovative, machine learned quality control software for processing real-time sea level data.
  • We thought our customers would be national organisations who run large scale sea level monitoring infrastructure
  • Our channel was supposed to primarily through academic links and the sea level community
  • Pricing was a real problem issue but we imagined it would be in the region of a $100 000 per customer per year
  • We had no idea how to generate demand – we intended to tender for contracts and exhibit at conferences
  • We had no real idea of the competitive landscape or the market

This last point was the real killer – we couldn’t really find any customers and the ones we could talk to already had home-brewed solutions – and no money to invest in anything new.

What changed?

What changed all this was a chance conversation with a tide gauge supplier (OTT Hydrometry) who had a client (the Port of Liverpool) who had some “noisy” sea level data. Amazingly we’d never even considered ports as a potential customer!

In tagging along with the supplier to a meeting with the Assistant Harbour Master we learned that this real-time data was being used to bring in ships safely over the shallowest part of the Mersey River – with a clearance of just 60cm! The problem was that, under certain wind conditions, the tide gauge reported sea level varied by 1m or more from minute-to-minute (due to the positioning of the gauge). This made the data next to useless for the Port Pilots to navigate with.


That one conversation completely changed our view of potential customers. We took two significant steps: 1) We joined the industry group where our potential customers met, and started going to meetings 2) We went to talk to the actual end users, the Port Pilots.

What we learned from informally meeting the decision makers at industry events and meeting the pilots, was that we had to work simultaneously at different levels of organisations:

  • the end-users
  • the budget holders
  • the decision makers

The end-users (in this case, the pilots) hold all the key knowledge about how the job is done and what the key problems in solving them are. This is absolutely essential in understanding what the product has to do.

The budget holder has the purse strings and is crucial for learning about the value of the proposition and also for how to sell the idea to senior management.

The senior management are, for significant enterprise purchases, the decision makers. Getting buy in from the people setting strategy is also extremely important.

Creating value

Our meetings with the pilots and the Assistant Harbour Master have allowed us to develop a much broader view of the value we can deliver with the technology we’ve developed. Instead of relatively “simple” data-processing, we’ve learned what jobs our users are trying to get done. That in turn has allowed us to use the data, not just as information, but to provide insight and solutions. In that way we create and deliver value.


I’m not going to pretend that we’re experts in Customer Discovery in enterprise solutions. We’re far from it. I have absolutely no doubt that Steve Blank (and you for that matter!) would have done things differently. Even the process of doing Customer Discovery is a learning journey for all of us. But learning we are and we’re making great progress.

Posted in Uncategorized | Tagged , , , | Leave a comment

Running a Clojure REPL on OpenShift

Any Clojure programmer will tell you that you need a REPL (Read-Eval-Print-Loop) to develop and debug applications. Interactive development is one of the great benefits of the LISP world. It gets used rather more than shells in other languages such as Ruby and Python.

OpenShift app development is generally done locally then pushed to the cloud. But what if the deployed app doesn’t run for some reason? What you want to do is fire up a REPL and debug the app. While that should be possible in OpenShift (since you can ssh directly into your application) it’s not quite as straight-forward as you might hope.

Simply running lein repl in your project directory gives a stack trace:

Exception in thread "Thread-1" Permission denied
    at Method)
    at clojure.lang.RestFn.invoke(
    at leiningen.repl$fn__4138.invoke(repl.clj:90)
    at clojure.lang.Delay.deref(
    at clojure.core$deref.invoke(core.clj:2080)
    at leiningen.repl$repl$fn__4172.invoke(repl.clj:175)
    at clojure.lang.AFn.applyToHelper(
    at clojure.lang.AFn.applyTo(
    at clojure.core$apply.invoke(core.clj:601)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1771)
    at clojure.lang.RestFn.invoke(
    at clojure.lang.AFn.applyToHelper(
    at clojure.lang.RestFn.applyTo(
    at clojure.core$apply.invoke(core.clj:605)
    at clojure.core$bound_fn_STAR_$fn__3984.doInvoke(core.clj:1793)
    at clojure.lang.RestFn.invoke(

There are two issues: The first is that, by default, leiningen tries to bind to locahost and find a random available port. The second is that OpenShift appears not to allow multiple Java threads in the same shell.

My workaround is to put a couple of bash scripts into the $OPENSHIFT_REPO_DIR/bin/ directory along with lein. The first (which I’ve named repl-server) sets up a headless server using the $OPENSHIFT_INTERNAL_IP for the host and a high open port number (35000 in this case):

# Script to start nrepl server with leiningen
export HTTP_CLIENT="wget --no-check-certificate -O"
export LEIN_REPL_PORT=35000
export LEIN_JVM_OPTS=-Duser.home=$HOME

$OPENSHIFT_REPO_DIR/bin/lein repl :headless >${OPENSHIFT_DIY_LOG_DIR}/repl.log 2>&1 &

Then in lein-connect I have the following:

# Script to connect to headless nrepl server with leiningen
export HTTP_CLIENT="wget --no-check-certificate -O"
export LEIN_REPL_PORT=35000
export LEIN_JVM_OPTS=-Duser.home=$HOME


And now, after making the two scripts executable:
chmod +x repl-server lein-connect
we have the expected result:

REPL-y 0.1.9
Clojure 1.4.0
    Exit: Control+D or (exit) or (quit)
Commands: (user/help)
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
          (user/sourcery function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
Examples from [clojuredocs or cdoc]
          (user/clojuredocs name-here)
          (user/clojuredocs "ns-here" "name-here")

with any server errors being recorded in ${OPENSHIFT_DIY_LOG_DIR}/repl.log.

Happy debugging 🙂

Posted in Clojure, openshift | Tagged , , , , | 1 Comment

A first Clojure app on OpenShift

Steve Citron-Pousty came to introduce RedHat’s OpenShift to the Clojure dojo at Liverpool GeekUp recently. I decided to put a couple of demonstration apps together so people at the dojo could get started.

A really quick start is to create a Noir web app.  This is based on a great post by Siscia Tech but updated because OpenShift has a little a bit since that was written. The full code for my take on Siscia’s app is available on my GitHub account here.

First things first

If you don’t already have a free OpenShift account then you need to sign up. Once you’ve created an account you should then install the command line tools for your system. You will also need the git command. That is installed with some of the command line tools downloads from OpenShift or you may have to install it yourself.

Finally, we’re going to use leiningen to help build our Clojure applications. Download the script from the website as it describes and make sure it is executable. Once downloaded, type:
in a terminal window and leiningen will self-install, downloading all the necessary jar files.

Creating an app

Once you’ve installed the tools you’ll have a command named rhc. You get help by typing: rhc help in a terminal window.

Clojure isn’t a supported language in OpenShift so we’re going to use a DIY cartridge for our application. To create an app called ‘examplenoir’ we use the command:
rhc app create -a examplenoir -t diy
it will ask you for your password then you should see something like:

Using diy-0.1 (Do-It-Yourself) instead of 'diy'
Creating application 'examplenoir'
Namespace: holgate
 Gear Size: default
 Cartridge: diy-0.1
 Scaling: no
Your application's domain name is being propagated worldwide (this might take a
Cloning into 'examplenoir'...
examplenoir @
 Application Info
 Gear Size = small
 UUID = 2773e93681c64b30807d869e4d1e2925
 Created = 12:11 PM
 Git URL =
Application examplenoir was created.
Disclaimer: This is an experimental cartridge that provides a way to try
unsupported languages, frameworks, and middleware on Openshift.

At this point you can clone the code into your local workspace (which we will refer to as ‘MY_WORKSPACE’) with git using the URL given in the output from your ‘create’ command. So for this case we have:
git clone ssh://

Cloning into 'examplenoir'...
remote: Counting objects: 25, done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 25 (delta 1), reused 25 (delta 1)
Receiving objects: 100% (25/25), 7.48 KiB, done.
Resolving deltas: 100% (1/1), done.

which will create a directory named ‘examplenoir’. Change directory into your app:
cd examplenoir
and list the contents:

README diy misc

Because we created a DIY app, our code is going to be created in the ‘diy’ directory. So change into that folder then we’ll use leiningen to create a new web app based on noir.

First though, you’ll see two files in the ‘diy’ directory that OpenShift provides by default. We don’t need them so you can delete them.
cd diy

index.html testrubyserver.rb

rm -f testrubyserver.rb index.html

Now create our noir web app, which we will again call ‘examplenoir’
lein new noir examplenoir
This creates a working project template for us. If we change directory again we can see what has been created:
cd examplenoir/
ls project.clj resources src test

For now, we don’t need to worry what these files and directories are for, except that everything we need for a our basic application is right here. We can run this with:
lein run
and you should see:

Starting server...
2013-01-31 17:38:23.403:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2013-01-31 17:38:23.404:INFO::jetty-6.1.25
Server started on port [8080].
You can view the site at http://localhost:8080
#<Server Server@75e13ce3>
2013-01-31 17:38:23.472:INFO::Started SocketConnector@

As it says, you can just open up your web browser to http://localhost:8080 to see the page.

Screen Shot of web browser showing page at localhost:8080

Screen Shot of web browser showing page at localhost:8080

Modifying the app

Before we can deploy the application to OpenShift, we need to make a couple of minor changes. Firstly we need to tell OpenShift how to start and stop a Clojure application and set some environment variables. Then we need to add leiningen to our application so that it can run Clojure. Finally, we have to modify the code to pick up the environment variables that tell the built-in webserver (jetty) where to run.

The start and stop scripts are stored in ‘.openshift/action_hooks/’ which is under the top-level of the application (where we initially cloned it):
cd $MY_WORKSPACE/examplenoir/
ls -a

. .git .openshift misc
.. .gitignore README diy

Then edit ‘.openshift/action_hooks/start’ so that it reads as follows:
# The logic to start up your application should be put in this
# script. The application will work only if it binds to
# save as .openshift/action_hooks/start
export HTTP_CLIENT="wget --no-check-certificate -O"
export LEIN_JVM_OPTS=-Duser.home=$HOME
export APPDIR="examplenoir"
$OPENSHIFT_REPO_DIR/bin/lein deps
$OPENSHIFT_REPO_DIR/bin/lein run >${OPENSHIFT_DIY_LOG_DIR}/lein.log 2>&1 &

Now edit ‘.openshift/action_hooks/stop’ so that it reads:
# The logic to stop your application should be put in this script.
# save as .openshift/action_hooks/stop
kill `ps -ef | grep 'clojure' | grep -v 'grep clojure' | awk '{ print $2 }'` >${OPENSHIFT_DIY_LOG_DIR}/stop.log 2>&1
exit 0

Next we need to add leiningen. Again in the top-level directory of the application, make a sub-directory named ‘bin’ and add the leiningen script and make it executable:
cd $MY_WORKSPACE/examplenoir/
mkdir bin
cd bin
curl > lein
chmod +x lein

Finally we need to edit the Clojure code. Normally jetty runs on the localhost ( on port 8080. On OpenShift, we have to use the $OPENSHIFT_INTERNAL_IP and $OPENSHIFT_INTERNAL_PORT instead. These are set as environment variables in the startup script and aliased to $HOST and $PORT. We just need to alter the server code:
cd $MY_WORKSPACE/examplenoir/diy/examplenoir/src/examplenoir

models server.clj views

Now edit ‘server.clj’ with your favourite editor and add the ‘host’ line and ‘jetty-options’ so that it reads:
(ns examplenoir.server
(:require [noir.server :as server]))
(server/load-views-ns 'examplenoir.views)
(defn -main [& m]
(let [mode (keyword (or (first m) :dev))
port (Integer. (get (System/getenv) "PORT" "8080"))
host (get (System/getenv) "HOST" "")]
(server/start port {:mode mode
:ns 'examplenoir
:jetty-options {:host host}})))


Having made all these changes we now need to tell git to update the repository. Running:
git commit -a
should open up a text editor where you can add something like:

Initial commit

We are now ready to deploy the application to OpenShift:

git push

Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 548 bytes, done.
Total 4 (delta 1), reused 0 (delta 0)
remote: restart_on_add=false
remote: Done
remote: restart_on_add=false
remote: Running .openshift/action_hooks/pre_build
remote: Running .openshift/action_hooks/build
remote: Running .openshift/action_hooks/deploy
remote: hot_deploy_added=false
remote: Downloading Leiningen to /var/lib/openshift/2773e93681c64b30807d869e4d1e2925/app-root/data//home/.lein/self-installs/leiningen-2.0.0-standalone.jar now...
remote: --2013-01-31 13:09:34--
remote: Resolving
remote: Connecting to||:443... connected
remote: HTTP request sent, awaiting response... 200 OK
remote: Length: 13227743 (13M) [application/java-archive]
remote: Saving to: `/var/lib/openshift/2773e93681c64b30807d869e4d1e2925/app-root/data//home/.lein/self-installs/leiningen-2.0.0-standalone.jar.pending'
remote: 0K .......... .......... .......... .......... .......... 0% 7.19M 2s


remote: 12900K .......... ....... 100% 10.5M=3.0s
remote: 2013-01-31 13:09:37 (4.26 MB/s) - `/var/lib/openshift/2773e93681c64b30807d869e4d1e2925/app-root/data//home/.lein/self-installs/leiningen-2.0.0-standalone.jar.pending' saved [13227743/13227743]
remote: Retrieving org/clojure/clojure/1.4.0/clojure-1.4.0.pom from central


remote: Done
remote: Running .openshift/action_hooks/post_deploy
To ssh://
bd64a1d..2e1d6aa master -> master

All being well, after we’ve given jetty a minute or two to start running we should be able to open up our browser on our apps page (accessible through the OpenShift web console) and see the same Noir page as before:

Screen Shot of the examplenoir app deployed on OpenShift

Screen Shot of the examplenoir app deployed on OpenShift

And there is our first cloud deployed app!

Let me know if I’ve missed anything and I’ll do my best to help.

Posted in Clojure, openshift | Tagged , , , , | 2 Comments

The end of Votizen – or the beginning?

Many things have happened since my last post. However, the biggest news is the announcement that Votizen has been bought by Causes, a social fund-raising startup. Not only has Votizen been bought, however, its website has been shutdown with immediate effect. The announcement also stated that all personal information from the site has been destroyed. 

While the acquisition has been given a positive spin by TechCrunch, it has convinced me that there are significant problems with using social media to promote political agendas. The logic of the purchase for Causes is that Causes in itself is a little lightweight – ‘clicktivism’ rather than activism. Votizen users on the other hand are much more engaged but their business is dominated by the election cycle and keeping people engaged outside of those periods is tough. However, the fact that Votizen has gone and Causes gets the benefit of Votizen’s team makes me feel that this was more of a talent acquisition than a real merging of ideas. I could be wrong. We shall see.

With hindsight a few things now make a little more sense. I had a feeling that things might not have been going so well at Votizen at the end of last year. Right after the Presidential election I thought I noticed a low feeling in a couple of David Binetti’s tweets which surprised me as I thought that he would be high after the rush of election night. I didn’t dwell on it though. After all, many other things are going in people’s lives and it might have been nothing to do with Votizen at all.

Then in December I met Votizen engineer, Jeremy Dunck for coffee in San Francisco, right next to the Votizen office. It was great that he offered to take time out to talk to me and we covered a lot of ground. One thing he said in particular stuck with me: It’s easier to get people to talk to someone they don’t know about politics than to get them to talk to their friends. 

David Binetti came by while I was talking with Jeremy. Perhaps he was in a rush but he didn’t seem the slightest bit interested in why I’d travelled to San Francisco to see them. I was just a little disappointed. Given that Votizen was in the process of being sold though he probably had more important things to think about.

Despite my enthusiasm for Votizen’s goal of promoting democracy, it doesn’t take many conversations to realise that many people are lukewarm at best on the idea of discussing their politics with friends. People who are already politically engaged are much keener. But for many people politics is boring at best and corrupt at worst. Politics isn’t cool and it isn’t sexy. And yet it is this disenchanted group of voters who are really disenfranchised and most need to be reached. Clearly it was hard for Votizen to reach them too.

These are things that I started to worry about when talking to people about SociaVote which was going be my attempt to build a Votizen-like startup in the UK. For now I’ve decided to put that aside and try and look at the problem from the other way around: something that is cool enough and useful enough that people want to use often.

Taking the ‘inverting the problem’ thought a bit further rather than being told what candidates and manifestos to vote for, maybe we can find a way to create and share ideas about what we want our world to be like? Then perhaps the important ideas can emerge from what people are really interested in. 

The stuff of dreams? Maybe, but that’s the approach I’m working on now. There’s no point building something that nobody wants to use.


Posted in Democracy, Startups | Tagged , , , | Leave a comment

Why I love Votizen

ImageStartup ethos is all about finding big problems that technology can solve. There can be few
bigger problems than reforming democracy itself. And yet that is exactly what Votizen is aiming to do. It’s goal is nothing less than disrupting US politics and giving people a real voice. That’s a true ‘big data’ problem.

As I wrote in my last post, Votizen (@votizen) is using social media to bring people together and allow them to campaign for the issues that they care about most. They make money too which, of course, any sustainable business must.

If you doubt the power of social media to change politics, consider the so called “Arab Spring” uprisings, especially in Tunisia, Egypt and Libya. It’s not just in countries where dictators have held power either. Think of the self-organising and leaderless formation of “Occupy Wall St” or even the British summer riots of 2011. In each of these social media had a major part to play (though perhaps less in the case of the British riots than was suggested at the time).

The US political system is certainly in desperate need of reform. In Gary Younge’s recent article (The Guardian – “Americans deserve a better choice in this election than the one they’ve got”) the extent that money dictates the outcome of elections is made clear. How many of us are even aware that there are three other candidates in the Presidential race? The only time the Greens ever got a mention was when (perhaps) Ralph Nader contributed to Al Gore’s loss in 2000.

As David Binetti (Votizen’s CEO – @dbinetti) also pointed out in his presentation at TEDx San Francisco, political information is completely asymmetric. In the UK we are fortunate to be spared the negative political advertising and the robo-dialling callers. Social media has the potential to make the communication into a conversation. Relationships can be built between candidates and electors. Candidates can build dedicated support – and the electorate can hold them to account.

British politics needs disrupting too. Beginning with the Magna Carta, the entire history of British politics has been about the slow and hard fought decentralisation of power. David Binetti demonstrated the way in which population growth has reduced the representation of the people in Congress. However, the story in the UK is complicated by a massive increase in the franchise. Between 1912 and 1918 the electorate tripled from 7 million to 21 million with the advent of universal suffrage. Since 1918 the electorate has doubled in size again, to 42 million in 2010. The point remains though that a similar number of MPs has to represent a much larger number of people, with the reduction of access that implies.

We are now at a true inflexion point in history. There have been unparalleled changes in technology. As Facebook celebrated its 1 billionth user this month, it is worth remembering that in less than a decade, social media has developed from nothing to be used by something like a sixth of the planet. For the first time we have the opportunity for genuine dialogue with our political representatives. Even in ancient Athens, democracy was restricted to the free men of the city while women and slaves were excluded. Votizen allows all Americans to have a voice. Isn’t it time British voters found their voice too?

Posted in Startups | Tagged , , , | 1 Comment

Social voting?

How much would your voting choice be swayed by your friends views? Votizen, a startup in California, thinks that your friends can have a big impact on your vote. Their aim is to disrupt the money-driven, negative ad campaigning, US electoral system and reinvent democracy. By no means a small goal. 

Votizen’s idea is fairly straightforward: use your existing social networks (Facebook, Twitter, LinkedIn) to ask your friends/connections to commit to voting for a particular candidate. Getting the vote out on election day, especially in tight contests, is the difference between winning and losing. The smart part is that Votizen links to publically available US electoral information to show which of your contacts are registered voters in influential voting districts and even which of them has voted in previous elections. That allows an individual to target their efforts towards their contacts who are most likely to make a difference.

Can something like this work in the UK? With voter turnouts in the past three General Elections well below the post-1945 average (Local Elections [pdf] and European Elections are even worse) voter alienation from politics is a big issue.

UK politics is very different from the US. We don’t have all the negative attack ads or the robo-dialling telephone calls. Even though we don’t have the same levels of funding that the US has, concerns certainly still exist (pdf). 

Yet, even though people are willing to disclose a great deal about themselves on-line, are we really willing to declare our political views? I get the feeling that despite the huge impact that political decisions have on our lives, we’re not really willing to talk about them.

For me, it’s time that we change this and fix democracy from the bottom up. Votizen for the UK anyone?

Posted in Democracy | Tagged , , | 1 Comment

New look

It’s time to freshen things up so I’ve changed the theme. Something a bit lighter. The picture is of St Kilda beach in Melbourne. In case you were wondering….

Posted in Uncategorized | Leave a comment

Importing data to HBase

I’ve begun to experiment with Hadoop (with the aim of eventually running jobs on EC2) for a project with

Henry Garner (’s CTO) provided exported HBase tables containing tweets as well as content and urls extracted from the tweets. So the first job was to import that data back into HBase.

I’m working on Ubuntu and initially installed the Cloudera Hadoop distribution. However, I found that the way the configuration files and jars were distributed made it harder for me to understand what was going on. Coupled with the fact that I’m running Natty Narwhal and the distribution is based on Lucid Lynx, I decided to uninstall it and  use a fresh (and up-to-date) version (1.0.3) from the Hadoop website.

The instructions from Michael Noll on running a single cluster Hadoop installation on Ubuntu were clear and easy to follow. Hadoop was therefore installed in /usr/local/hadoop and I run jobs as hduser. I also installed the latest HBase (0.92.1 at the time of writing)  in /usr/local/hbase.

After exporting the class paths and related variables:

export HBASE_HOME=/usr/local/hbase/
export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.92.1.jar:$HBASE_HOME:\

Hadoop and HBase are then started (as hduser):

hduser:~$ /usr/local/hadoop/bin/
hduser:~$ /usr/local/hbase/bin/

I then created the schemas for the tables in the HBase shell (as not doing so leads to an exception):

hduser:~$ hbase shell
hbase(main):001:0> create 'twitter_accounts', 'raw', 'base', 'extra'
hbase(main):002:0> create 'content', 'raw', 'base', 'extra'
hbase(main):003:0> create 'tweets', 'raw', 'base', 'extra'
hbase(main):004:0> create 'short_urls', 'rel'

Here the table name is followed by the names of the columns to be created. For example the ‘tweets’ table is created with three columns: ‘raw’, ‘base’ and ‘extra’ which matches the schema of’s data.

The data were provided as gzipped tar files so they were uncompressed into a local directory (hbase-likely). To import the files into HBase the next step is to copy from the local file system into the Hadoop file system (HFS).

hduser:~$ mkdir localtable
hduser:~$ hadoop fs -copyFromLocal hbase-likely/short_urls\
hduser:~$ hadoop fs -copyFromLocal hbase-likely/content\
hduser:~$ hadoop fs -copyFromLocal hbase-likely/tweets\
hduser:~$ hadoop fs -copyFromLocal hbase-likely/twitter_accounts\

Now I could finally import the data (which takes quite a while for big tables):

hduser:~$ hadoop jar $HBASE_HOME/hbase-0.92.1.jar import twitter_accounts\
hduser:~$ hadoop jar $HBASE_HOMEhbase-0.92.1.jar import tweets\
hduser:~$ hadoop jar $HBASE_HOME/hbase-0.92.1.jar import content\
hduser:~$ hadoop jar $HBASE_HOME/hbase-0.92.1.jar import short_urls\

Ta da! Now we can look at the data. Back in the HBase shell:

scan 'tweets', {COLUMNS => 'base', LIMIT=>1}

returns just the first row from the ‘base’ column of the ‘tweets’ table. From this we can see what the data looks like and, importantly, what qualifiers are applied to the column data.

We can count the number of rows (which may take a long time):
hbase(main):008:0> count 'short_urls'

In another post I’ll look at how to do more complex queries using Clojure and Cascalog.

Posted in Clojure | Tagged , , | 6 Comments

Reflections on Euroclojure

I spent Thursday and Friday last week attending the first Clojure conference outside of the US – Euroclojure – in London. I’ve got to say that I’ve come back feeling really excited by what I saw there.

I’ve been to many conferences before and this one was quite small in comparison with those. However, with 200 enthusiastic coders from the UK, Europe, the US and Canada, it was a really great and friendly place to be. More importantly there were plenty of great ideas talked about as well.

Clojure is a really exciting young language. It’s embracement of the Java and the JVM has allowed many at the conference to bring in elegant solutions to existing legacy (Java) software without wholesale rewrites. In every case this has been achieved with less code and faster results. That has to be great news!

Better still was hearing about the things that are really fresh and new such as experimenting with dynamic programming of music in Overtone (like modifying waveforms in real time to see how the sound is affected) or demonstrating JS Bach’s Canons in Overtone or dynamic algorithmic art displays using Clojure. Not that programming is just about art! There were also great talks on logic programming, solving concurrency issues in databases with Datomic, automating deployments in the Cloud with Pallet, and data processing in Hadoop with Cascalog to name a few.

So many talks in such a small amount of time was pretty mind blowing, especially given the heat, but meeting so many great people was fantastic. Roll on Euroclojure 2013!

Posted in Clojure | Tagged , , , , , | Leave a comment