In the past few years a number of governments have launched open data portals. These sites, like www.data.gov or data.vancouver.ca share data - in machine readable formats (e.g. that you can play with on your computer) that government agencies collect.
Increasingly, people approach me and ask: what makes for a good open
data portal? Great question. And now that we have a number of sites out
there we are starting to learn what makes a site more or less
effective. A good starting point for any of this is 8 Open Government principles, and for those newer to this discussion, there are the 3 laws of open data (also available in German Japanese, Chinese, Spanish, Dutch and Russian).
But beyond that, I think there are some pretty tactical things, data
portal owners should be thinking about. So here are some issues I've
noticed and thought might be helpful.
1. It's all about automating the back end
Probably the single greatest mistake I've seen governments make is,
in the rush to get some PR or meet an artificial deadline, they create
a data portal in which the data must be updated manually. This means
that a public servant must run around copying the data out of one
system, converting (and possibly scrubbing it of personal and security
information) and then posting it to the data portal.
There are a few interrelated problems with this approach. Yes, it
allows you to get a site up quickly but... it isn't sustainable. Most
government IT departments don't have a spare body that can do this work
part time, even less so if the data site were to grow to include 100s
or 1000s of data sets.
Consequently, this approach is likely to generate ill-will towards
the government, especially from the very community of people who could
and should be your largest supporters: local tech advocates and
Consider New York, here is a site where - from I can tell - the data is not regularly updated and grumblings
are getting louder. I've heard similar grumblings out of some
developers and citizens in Canadians cities where open data portals get
trumpeted despite infrequent updates and having few data sets available.
If you are going to launch an open data portal, make sure you've figured out how to automate the data updates first.
It is harder to do, but essential. In the early days open data sites
often live and die based on the engagement of a relatively small
community or early adopters - the people who will initially make the
data come alive and build broader awareness. Frustrate the community
and the initiative will have a harder time gaining traction.
2. Keep the barriers low
Both the 8 principles and 3 laws talk a lot about licensing.
Obviously there are those who would like the licenses on many existing
portals to be more open, but in most cases the licenses are pretty good.
What you shouldn't do is require users to register. If the data is
open, you don't care who is using it and indeed, as a government, you
don't want the hassle of tracking them. Also, don't call your data open
if members must belong to a educational institution or a non-profit.
That is by definition not data that is open (I'm looking at you StatsCan,
its not liberated data if only a handful of people can look at it,
sadly, you're not the only site to do this). Worst is one website that,
in order to access the online catalogue you have to fax in a form outlining who you are.
This is the antithesis of how an open data portal should work.
3. Think like (or get help from) good librarians and designers
The real problem is when sites demand too much of users to even gain
access to the data. Readers of this blog know about my feelings
regarding Statistics Canada's website, the data always seems to be one click away. Of course, that's if you even think you are able to locate the data you are interested in, which usually seems impossible to find.
And yes, I know that Statistics Canada's phone operators are
very helpful and can help you locate datasets quickly - but I submit to
you that this is a symptom of a problem. If every time I went to
Amazon.com I had to call a help desk to find the book I was interested
in I don't think we'd be talking about how great Amazon's help desk
was. We'd be talking about how crappy their website is.
The point here is that an open data site is likely to grow. Indeed, looking at data.gov and data.gov.uk these sites now have thousands
of data sets on them. In order to be navigable they need to have
excellent design. More importantly, you need to have a new breed of
librarian - one capable of thinking in the online space - to help
create a system where data sets can be easily and quickly located.
This is rarely a problem early on (Vancouver has 140 data sets up,
Washington DC, around 250, these can still be trolled through without a
sophisticated system). But you may want to sit down with a designer and
a librarian during these early stages to think about how the site might
evolve so that you don't create problems in the future.
Finally, I think good open data portals want, and even encourage
feedback. I like that data.vancouver.ca has a survey on the site which
asks people what data sets they would be interested in seeing made open.
But more importantly, this is an area where governments can benefit.
No data set is perfect. Most have a typo here or there. Once people
start using your data they are going to find mistakes.
The best approach is not to pretend like the information is perfect
(it isn't, and the public will have less confidence in you if you
pretend this is true). Instead, ask to be notified about errors.
Remember, you are using this data internally, so any errors are
negatively impacting your own planning and analysis. By harnessing the
eyes of the public you will be able to identify and fix problems more
And, while I'm sure we all agree this is probably not the case,
maybe the face that the data us public, there will be a small added
incentive to fixing it quickly. Maybe.