Recently, I had an opportunity to work with the amazing team at School of Data, meet interesting people and do cool things around data- thrills me. The World Bank and the Open Knowledge are working with the Tanzania government on her open data initiative. We spent a week in Dar-es-Salaam, the largest and richest city in Tanzania.
Last week I worked with the Ministry of Water and National Bureau of Statistics of Tanzania. The goal? Teaching data skills and kicking off an Open Data Initiative for the government of the United Republic of Tanzania. We made a three-man team: Michael from School of Data and David from Code for Africa. Cool stuff going on in Tanzania! The most interesting thing about our work there was the idea of not knowing what to expect, in terms of data to work with, who does what in the ministry, how all the departments are interlinked and what skill level to expect- finding answers to these questions became part of the one week training.
And so we made a deep dive into the unknown waters of Tanzania water data. First we tried to figure out who was in the room, which department they were from, what their role was in the data work-flow process and what data they usually work with. We also had a discussion on the the data work-flow process in each of the departments represented in the room. One exercise turned out to be so telling: working in small groups, the participants were tasked to randomly list the type of data they usually work with. Next, the list was pasted on a wall giving off a mixed-up list of stickies with different categories of data. The last bit was to categorize the list so that we had unique groups/clusters of types of data. Super cool way to get people talking, become ‘data conscious’ and learning about each other’s work!
Tools, tools, tools
The next two days were about exploring tools that could be used to work with data. There’s so much you could do with data, but there’s usually the first things you do with most datasets. Find out if it’s good for use: say analysis, visualization, building models or merging with other datasets. It it does not fit the bill, then it probably needs some data cleaning love. This was the case with some of the Tanzania datasets- missing values, duplicates, unavailable meta-data to give context and most likely a need for data quality consciousness- what passes as good quality data, what are the not so good experiences with data quality and some best practices for data quality management.
We worked with OpenRefine which is an open source tool for cleaning, analyzing and plotting data. There’s so many ghosts that open refine will unearth about your data. For example names that may refer to the same thing but are spelled slightly differently, say Dar es Salaam and Dar es Salam. This is the same place, but you want quality data? Then think consistence. We had spreadsheets as a compliment.
We also took a look at the water geodata, creating maps and running queries to answer questions like where are the functional, non-functional water points? What is the ratio of functional water points to the population served, which of the water points in the Dar es Salaam region are handpumps among other GIS exercises like styling, coordinate reference systems and data conversions, why use .txt files instead of .csv files when making a map. After giving all the data cleaning/mapping love, we thought it a good idea to introduce data visualisation with google fusion tables.David from Code for Africa did an amazing job with this.
And then we hosted a water data expedition on the last day.
Smelling the flowers?
Dar es Salaam is a beautiful city. This was the view at breakfast- a harbor! The abode of peace indeed! Oh, did you spot the national flag of this United Republic….nicely hoisted on the pylon, in the center of this picture.