A week in Tanzania

Recently, I had an opportunity to work with the amazing team at School of Data, meet interesting people and do cool things around data- thrills me. The World Bank and the Open Knowledge are working with the Tanzania government on her open data initiative. We spent a week in Dar-es-Salaam, the largest and richest city in Tanzania.

Last week I worked with the Ministry of Water and National Bureau of Statistics of Tanzania. The goal? Teaching data skills and kicking off an Open Data Initiative for the government of the United Republic of Tanzania. We made a three-man team: Michael from School of Data and David from Code for Africa. Cool stuff going on in Tanzania! The most interesting thing about our work there was the idea of not knowing what to expect, in terms of data to work with, who does what in the ministry, how all the departments are interlinked and what skill level to expect- finding answers to these questions became part of the one week training.

Deep dive

And so we made a deep dive into the unknown waters of Tanzania water data. First we tried to figure out who was in the room, which department they were from, what their role was in the data work-flow process and what data they usually work with. We also had a discussion on the the data work-flow process in each of the departments represented in the room. One exercise turned out to be so telling: working in small groups, the participants were tasked to randomly list the type of data they usually work with. Next, the list was pasted on a wall giving off a mixed-up list of stickies with different categories of data. The last bit was to categorize the list so that we had unique groups/clusters of types of data. Super cool way to get people talking, become ‘data conscious’ and learning about each other’s work!

Image

Tools, tools, tools

The next two days were about exploring tools that could be used to work with data. There’s so much you could do with data, but there’s usually the first things you do with most datasets. Find out if it’s good for use: say analysis, visualization, building models or merging with other datasets. It it does not fit the bill, then it probably needs some data cleaning love. This was the case with some of the Tanzania datasets- missing values, duplicates, unavailable meta-data to give context and most likely a need for data quality consciousness- what passes as good quality data, what are the not so good experiences with data quality and some best practices for data quality management.

We worked with OpenRefine which is an open source tool for cleaning, analyzing and plotting data. There’s so many ghosts that open refine will unearth about your data. For example names that may refer to the same thing but are spelled slightly differently, say Dar es Salaam and Dar es Salam. This is the same place, but you want quality data? Then think consistence. We had spreadsheets as a compliment.

Image

We also took a look at the water geodata, creating maps and running queries to answer questions like where are the functional, non-functional water  points? What is the ratio of functional water points to the population served, which of the water points in the Dar es Salaam region are handpumps among other GIS exercises like styling, coordinate reference systems and data conversions, why use .txt files instead of .csv files when making a map. After giving all the data cleaning/mapping love, we thought it a good idea to introduce data visualisation with google fusion tables.David from Code for Africa did an amazing job with this.

And then we hosted a water data expedition on the last day.

Smelling the flowers?

Dar es Salaam is a beautiful city. This was the view at breakfast- a harbor! The abode of peace indeed! Oh, did you spot the national flag of this United Republic….nicely hoisted on the pylon, in the center of this picture.

Image

Advertisements
Aside

Data beautification, some ‘prototypes’

Recently, as we’ve been doing lately, had the Outbox Hub host us,  to talk about data visualization but also get our hands dirty creating data visualizations with the most unlikely tools- paper, straws, pens, bottle-tops, the floor and wall! The design inclined refer to it as designing data or making it beautiful. So we took a deep dive in.

We organized this mainly to discuss best practices, highlight worst practices from a design standpoint when creating data visualisations. The main goal here is to have your data communicate, clearly to your audience-it could be to children, teenagers or adults.

Take a look at some of the varying interpretations of the data:

SAM_2297

Raw dataviz from the meet-up!

From the data design experts:

We had a great mix of design, communications and programmer experts to share their insight, with 25 people in the room: Joe Ssekkono a front end developer and visual artist gave a presentation explaining ‘what data visualisation is’ , ‘how to plan your data visualisation project’ and ‘evaluating and iterating your data visualisation project’. He went on to show us a step-by-step process of how to start data visualization, some tips & tricks, websites and handy software tools. You can find the complete presentation here.

Emmy Van Kleef a Creative Communications and Concepts expert talked about the ‘latest trends and available tools’, the ‘best practices and worst practices of data visualisation’.

ImageAdapted from Emmy Van Kleef’s Presentation

Eve Ndagire a graphics designer talked about what ‘categories of data you could visualize’ (is it linear, discrete, continuous or categorized/numerical)? There are several ways of visualizing data, for example using bar charts, histograms, graphs, maps, pie charts, scatter plots and sometimes very abstract methods. We decided to go traditional and practical, using non digital tools like paper, pen, straws, crayons, bottle-tops, the wall, table and floor.

The workspace was very busy as everybody worked away, trying to visualise crime and number of convicts in the major cities in Uganda. A break did not seem welcome to the participants. Take a look at the dataset we worked with.

ImageNext-up

This turned out to be an interesting topic, and there was demand for a similar meet-up with more emphasis on using digital tools/code/software to make data beautiful. We hope to explore some of the trending data visualization libraries in the future. This meet-up allowed us to create ‘prototypes’ which we’ll build on, the next time!

A talk about data journalism in Uganda.

Since last year when I started working on an Open Data Project at Fruits of Thought and hanging out a lot with the School of Data folk, I have felt strongly about building a data community in Uganda. Data community? Yes, this would be a group of people that meet regularly to work with open data: clean, analyze, explore different data tools, visualize the data, draw insight from or get nerdy with programming around data or building applications or just talk- sometimes talking brings out the most amazing ideas. Some of these ideas have been inspired by my friends at School of Data- Thank you so much for showing me that this is do-able!

So, about two weeks ago, I saw the first of what we’ve termed “local data days” happen. We want these to be monthly events that happen every last Friday of the month. We’ll be meeting to discuss a range of topics like data journalism, basic design principles, data visualization, knowledge creation and sharing, mapping, programming, basic statistics, data analysis, explore some data tools like Scraper Wiki to get volunteers to scrape PDFs of all the data that’s ‘locked’ in there. Community(Ugandans) should be at the core of all this, so we’ve invited the community to tell us what they’d want to share on- I am keeping my hopes up that it sparks some interest in the possible (open)data lovers out there.

We kicked off the local data days with a focus on “Data Journalism”. We had a mix of journalists and programmers plus a few curious people in the house. We learned basic R for data analysis and visualization but it turned out that the R learning curve was a bit steep, we should have chosen a simpler tool perhaps- positive criticism right there! We had our focus on the education sector- unfortunately the visualized data painted a grim picture. It was amazing what questions journalists and programmers can come up with when in the same room, for example “how do you make the map interactive?”, “why use a map instead of a graph, pie-chart or word cloud?”

At a glance- what stood out from the discussion. Via jasondavies.com

At a glance- what stood out from the discussion.

We also had sometime to discuss what the status of data journalism is in Uganda, the challenges and what the work around was for this. Data journalism has not taken off in Uganda- not that much. Some of the proof is in one of the local daily news paper that publishes tabular data that sometimes spans several pages(resource appropriation gone bad?). This data is numerical, focusing on administrative levels and themed to sectors like education, law & order, culture, health, employment, consumer goods and prices. It is totally possible to constructively compress this data and give the huge tables some context. This is just one example of a data journalism time ‘bomb’- where is the good switch? One journalist impressed me when she said, in the news room, she had to learn statistics and maths in order to work on her stories, there was no one to do it and it all depends on how passionate and self driven one is as a journalist- she was up for the programming meet-up in this series of local data days. When I asked about the challenges of writing data backed stories, it all boiled down to a data skills deficiency.

My take-away from this meet-up was that our journalists know what needs to be done in the data journalism space in Uganda but need some support in the area of training. I hope that these regular meet-ups will work toward closing some of the skills gap for the local journalists. Looking forward to ‘hotter’ stories from Uganda’s news rooms, going forward. :-). I am glad that some of them are already working with the datasets from data.ug for their projects.

A week later, Europe was a buzz with the biggest data journalism event in the region. It started with a few interesting tweets on my timeline- I ended up at #ijf14.

Information access, the Ugandan way

During my interaction with different people, I gathered that there are two ways of accessing data in Uganda- note that there could be/will be more ways.

1. The ATI act:

The Access To Information (ATI) act of 2005 declares that “Every citizen has a right of access to information and records in the possession of the State or any public body, except where the release of the information is likely to prejudice the security or sovereignty of the State or interfere with the right to the privacy of any other person”. Recently, on Open Data Day, I had a conversation with a Ugandan journalist who is already using the ATI act as a tool to get his government to account to him, in terms of good governance and transparency- he is exercising his right as a citizen, to access of information. So what happens in the government bureau corridors when the ATI act gets into action?

  • For starters, you need information request forms (a template for information access) from the government ministry that houses the information you need. These are hard copy forms and are 3 leaflets long.

  • Next step is to fill the paper form specifying your address, specifics of the information you need and when you need it.

How does one know government has or does not have this information?(this will be a discussion for another day I guess)

  • Drop the request form at the information unit of the respective government department then wait for 21 days within which your request is either granted or rejected.
Image Credit: HIM

Image Credit: HIM

If your information request is rejected, it may just end up in litigation, so your dance with government just started! However, the state has every right to reject request for information that might jeopardize her security- in that case, you got no option but to rest your case.

My journalist friend just sued government because his rights as a citizen of this country was denied- his request for information was not granted within 21 days despite part of the act reading thus “ATI gives any citizen the right of access to information and records in the possession of the state or any public body, except information that may prejudice the security of the state.” …. Probably in the eyes of government his request was not in favor of national security.

“ATI act came into place in 2005…”

On the flip side, my journalist friend has had several of his access to information requests granted- thumbs up to my government.

I was curious to find out what his motivation for information request was, so I ventured with this question: What will you do with the information now that you have it? His response, “the power of the media cannot be ignored; we provide news for the public and we’ll use this information to follow how governance issues are being handled in this country”. Fair point!

I am not a journalist, but as someone who identifies more with the technical data issues, I could request for information for the sole purpose of doing data quality checks for the many datasets hosted on data.ug.

It’s also nice to note that access to information may not be limited to only journalists or technical data people like I mention above, there are just so many ways this information could be used.

Assuming Uganda had a robust digital information infrastructure, where for example information requests can be made online and the response is received online (all this on my wish-list), this means government would probably have a clearer picture of the country’s economic status and so would the citizens. In addition to the wish-list would be extra infrastructure that generates aggregates of information from all the different government ministries for decision support for national policy experts and decision makers, besides according to NITA, Uganda’s vision for ICT development is to see “A Uganda where national development, especially human development and good governance, are sustainably enhanced, promoted and accelerated by efficient application and use of ICT, including timely access to information.”

Well, the latter would definitely use deeper systems analysis requirements techniques- it’s not as easy as it seems.

It’s nice to meet enthusiastic people! My journalist friend is planning to run a project in the rural communities of Uganda to share on how powerful, a legal tool like the ATI act could turn out. I presume only 1/10 people in the rural communities of Uganda know about how much power they could wield with this act!

2.) Data with a license?

In Uganda, it’s rare that you find data carrying a license. What this means is that a data user has to bear with the ambiguity surrounding the particular dataset. Sadly, this seems to be the universal notion surrounding the terms of use for data in Uganda, it’s not clear how far you can go with a dataset once you get hold of it. Most times, folk will use data however they want, share it, remix it- It almost feels like the data carries a creative commons share-alike license, until someone starts asking questions about licensing, then that’s when we start to see stars on the wall. Bittersweet! I daresay, 98% of Ugandan data does not carry a license.

Creative Commons, as one way of licensing

Uganda has a creative commons(CC) chapter right here in Kampala, salons are held occasionally, ideally after you leave a CC salon, you may strut you creative commons knowledge with more confidence. One really impressive thing going on here is the little booklets printed both in English and the local language, Luganda (possibly to reach a wider audience) that explains how this license works. In Uganda, since there are many chances that you run into case 2. above, the nice thing to do is suggest the use of a license or Creative Commons license to make things easier and streamlined. Though it’s not a guarantee that the suggestion will be received with open arms, just brace yourself for that but suggest anyway. So far it’s not stopped me from getting my hands on data that does not carry a license- ironical? Yeah, there lots of licensed data that gets in the wrong hands, but I would still rally behind licensed work!

Open data day 2014: Kampala.

Open Data Day is on Feb. 22, 2014 and people around the world are organizing events to mark the day. For an overview of events happening elsewhere internationally, take a look at this map.

In a few days, Open Data enthusiasts will gather in over 100 cities across the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption of Open Data policies by the worlds local, regional and national governments. Mountbatten Ltd, an IT & Websites company, Africa Center for Media Excellence and Fruits of Thought, a local NGO are partnering to organize Open Data day in Kampala. Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.

Getting together

From Kampala, we are very excited because this is the first time a diverse group of people, in the Open Data space come together to share how Open Data can be used to make ordinary lives better. It’s not as easy as it sounds but there has to be a starting point. The latter can be achieved using different approaches: we are bringing policy and government people, computer application developers, legal people, potential Open Data users (that could be you and me, the ordinary citizen), potential Open Data providers, legal people and journalists together. We are optimistic that the thoughts generated from the discussions will be pivotal in shaping the Open Data landscape in Uganda, precisely because the unique categories of people who contribute to the Open Data cycle will be present. Some of the uniques groups of people we’ll have are the data providers like the Uganda Bureau of Statistics (UBOS) who are the principal data collecting, processing, analyzing and disseminating agency, the legal people who will talk about the Creative Commons license, the Africa Center for Media Excellence who’ll field a journalist to talk about storytelling and we’ll definitely have technology people to hack the available Open Datasets which is the lifeblood of the day!

The day will focus on the Open Data cycle, as each role is equally important. If one fails the entire cycle stops. Starting with those who have data, these parties have to be willing to open their data under an open license. Then someone should publish the data in a way that others can use it. Those who have used it should then give constructive feedback on the data that they have used.

Open data should experience a cycle. Image Credit: Evelien Christianse.

I would also like to highlight the fact that it’s not about a one-day event but also the process that goes on behind the scenes and the need to understand why data is locked away in the first place. In Uganda’s case, it is the issue of how the Uganda Bureau of Statistics makes datasets that do not carry a license publicly available. To address this, on Open Data Day, we will push for the Bureau to propose a solid licensing process.

The issue of licensing is just one of a few live discussions that the open data community in Uganda will continue to engage in. Suffice to say, some discussions with government and various players in the open data space are technical and take numerous iterations to get refined, working solutions at the end of the day.

Moving on

When relationships are created, the ideal plan is for them to be organic, in a positive sense. We are happy to bring like minded people together to foster future relationships. We hope the people of Uganda will have a better understanding of why it’s important for them to have access to data about their country, but also know where to find, how to access and use this data in more valuable and practical ways. We’ll be hosting all consequently opened datasets on data.ug– the one stop portal for Open Data in Uganda.

Join in

If you want to be apart of this dynamic group of people, Come join us! We hope to see you at the Open data day in Kampala- 22nd February 2013! If you cannot attend in person, follow the event remotely on twitter with the hashtag #oddkampala. Be sure to check our website for links to the presentations and other resources. Questions about the event can be directed to info@opendatadaykampala.org.

This post also appeared on the Sunlight Foundation’s blog.

Image

Data, Maps and Tech-tools – ICCM 2013

#ICCM, trended last week on twitter!The International Conference of Crisis Mappers, 2013 took place in Nairobi over a span of five days. “Technology and Innovation for crisis mapping, in and outside Africa” was the theme. It was a rich convention of industry practitioners in the crisis mapping space.

I could write a ‘book’ I think,though for purposes of enjoying my reminisce, I will stick to what I like talking about, DATA, MAPS & TECH TOOLS! Data means different things to different people- a panelist in the “What’s so Big about Big Data?” session opined.

Data discussions

Big data

For the ICCM context, data may zero down to the kind that allows for coordination during a crisis, early warning preparations and post crisis management.According to ICCM discussions,data from social media has already gone along way in shaping how humanitarian activations are carried out. The Hurricane Haiyan crisis got several humanitarians working to provide relief to affected people,  for example DHN and MicroMappers. One way of looking at it is appreciating how diverse one’s intentions might be when they post a tweet. So it turns out that even though it’s all technology aiding this kind of communication, one cannot under-estimate the power of integrating human psychology in synthesizing all this information and designing algorithms that incorporate the very subtle bits of what might not be expressed in a tweet.  Hermant Purohit a Phd candidate with Knoesis team delved deeper into this when he gave an Ignite Talk.

Some of the challenges highlighted in synthesizing social media data came up as :

  • authenticity/ validity: where do we draw the line?
  • temporal relevance: a tweet that’s as late as 1 day may not be relevant.
  • scaling machine learning tools that synthesize the data is not easy: the data is not static because humans generate new data quickly.
1473048_10152090614055477_1568045303_n

Aditya Vashistha of IVR Junction, Presents on “Generating, Analyzing &
Using Information from Hard-to-Access Areas”

So you might want to take all the flurry on social media with a pinch of salt or better still make supported assumptions for the machine learning algorithms that would allow for graceful use of social media information, in crisis management, for that matter.

This big data talk tickled my mind…I ended up thinking that the millions of volunteers making maps to ease coordination for devastation caused by hurricane Yolanda were actually doing a big data gig: 2million plus digital hands editing many tiles of satellite imagery plus other crowd sourced geospatial information to make life better for the Philippine people!

Open data commons– another dimension of data.

The School of Data and UNOCHA had a self organised session on how crisis mappers can leverage open data to quicken humanitarian response, the discussion delved into data issues like curation, informatics/open data working groups for crisis mappers, cleaning and quality, granularization of big datasets for easier use by smaller groups of people and the big time need of training and mentoring in a space where data needs seem to change by the hour. OKFN’s Ruffus Polock shares his thoughts in an article titled Forget big data, small data is the real revolution.

Then we veered off a little, discussing, “How much of personal information should be out there for the world to see?” in the open data context. This turned out to be an uncomfortable topic so we quickly got back to discussing what role open data plays in crisis management and how the open data commons tool can be harnessed to make this possible.  More of what came out of this discussion here .

Teaching and learning

Data Track

The tech and training session run for a day and had four tracks, Mobile/Security, Maps, Data & Knowledge. These sessions were meant to allow people learn and teach in small groups. School of Data run the Data & Knowledge session. As a volunteer mentor at School of Data, I was invited to give an Introductory training to Geocoding and Mapping using Google fusion tables. I thought it very basic in the beginning but it turned out helpful for the folk who attended the Data and Knowledge session, with data cleaning(Michael Bauer from OKFN), spreadsheet basics(Steve Kemei, Development Initiatives) and basic dataviz(Agnes Rube, Internews) mini sessions to go with it. Google fusion tables is an experimental version, this means it could change anytime. Try out open source options like cartodb, QGIS and more services for your geocoding and mapping needs.

Photo Credit: iHub

Photo Credit: iHub

Maps

What is a crisis mappers convention without a  party? I mean a mapping one. Heather Leson  and Severin Menard of the Humanitarian OpenStreetMap Team  put together a map-up where once again I joined as a co-trainer on mapping an area that needs urgent humanitarian and volunteer contribution, the Philippines. If you can contribute your OSM mapping skills, please do. Heather gives a detailed account of the map-up here.

One last thing about using the HOT tasking manager,  JOSM’s performance can improve, considerably if you install the list of plugins captured in the screen dump below. This works for tasking manager jobs- Thanks to HOT’s Severin Menard for this priceless tip!

Finally but not least, I gave a talk on How the team at Fruits of Thought is “Building a robust mapping community in Uganda” and making it even easier by developing very simple mapping tools to complement the community effort.

We’d appreciate feedback on these tools so that we can improve them, use them to map and let us know how it all goes.

Putting ‘a little Open data’ onto OSM

OpenStreetMap(OSM) has been used to map a host of features in Uganda. Web mapping tools such as Google map maker have been used too- the Google people have been here for quite a while now! However, the folk at Map Uganda are giving Google maps a run for their money. The exception with OSM is that you get open data at your disposal! . During the last couple of years, the team at Map Uganda have worked towards translating physical features on the ground in our communities to digital maps. We have added points of interest, hospitals, schools, roads (highways, tracks, ways in OSM talk), forested areas, hydrological features to mention a few. The plan is to Map Uganda- which is a work in progress, though I cannot help wondering how long it’ll take to map a whooping 236,040 km 2 expanse of land. This is where we draw the inspiration to get Uganda beautifully mapped one day! So we’ll continue mapping in groups (which is much better) but also individually- for those who have a snack for mapping.

Under 4 Hours

Last month had us map using GPS Units, field papers and satellite imagery. We used JOSM as digitizing software. This was an exceptional mapping day and tagged successful in my OSM mapping world. Key success points that stood out for me were:

-collecting ≈ 200 data points

-configuring brand new GPS Units (Garmin Etrex 30), that worked without a flaw

-installing JOSM software for Windows machines

*you need to run the java runtime environment file for windows first and then the JOSM set-up file above

-working with a sunny and focused team

-fitting a quality* mapping day into under four hours

katie

A graphed-up mappingday

During the planning of the event above, I worked remotely with a colleague, who was putting together a similar event at Gulu University, almost simultaneously. I thought it a good idea to have a ‘graphed-up mapping day’, that doubles as a four hour mapping program. It might just become easier to plan one with a reference like the one below.

prog

Carbon data wrangling

Carbon data wrangling.

*what is special about the data?

*what stories can be drawn from the data?

*correlation between the attributes?

On the table

One month down and 5 people with unique backgrounds have worked a raw spreadsheet with carbon emission data. I am truly proud to have been part of a talented team! We cleaned, analyzed, and visualized spreadsheet data. I am passionate about the environment and the opportunity to work closely with carbon emission data could not have come at a better time. Working with my team through this exercise allowed me to learn one or two things about data wrangling, it was a nice stock taking tool (it’s good to check yourself once in a while) and I was a little curious about how working together remotely would turn out. Thanks to School of Data.

Tools

E-mail, Google’s products (G+ hangout for a real feel of who your team-mates are, docs for collaboration) and Ether pad, aided communication. Meeting time had to be scheduled using a doodle poll but was not a popular choice for the team. The second option which was email turned out to be a tall order because we had to manually sync time zones between Uganda, South Africa, Russia, Romania and the United Arab Emirates. I actually realized that you have to work almost twice as much with remote team work especially if it’s a size able team (>3 is big)-smaller is better and allows you to concentrate and keep a tab on each team member’s contribution. The latter also makes for easier peer reviews.

Team

Anna and Sipos made great facilitators (Journalist background at work?) while Jakes carried the day when it came to research on coal’s contribution to carbon emissions especially in South-Africa(He is a professor of research methods so we understood why he had a strong affinity for more and more data ). Irina made great strides with documenting all that we did(even in pictures) . I liked moving the data around, splitting it and seeing how visually appealing this carbon data would turn out. By this time we had our data cleaned and ready to merge with the plethora of datasets from the team.

Data served

We came out very clean after the mission with carbon emission data that was easier to reuse (for example merge with other datasets) and comprehend. A detailed write up on what the dataset contains was mentioned earlier in the Guardian.

We actually went ahead to combine the original carbon emission dataset with others that were differently themed: for example, GDP and forest cover data.

This was not only about data, I got another version of teamwork/playing(purely remotely this time), communicating technical terms in simple ways, relevant communication through email for example(I did not have the luxury of going over to someone’s desk and pointing out a few things, I had to be spot-on), facilitating a meeting online(G+ hangout) and having the entire team on the same page and lastly teamwork might not be about getting things done but also the process of getting that work done. It’s therefore not a bad idea to drop a line or two to let your team-mates know you’re checked-in : this will keep the team spirit up.

During this exercise, we got to know which folks make our air dirty, at a glance that is- we should be doing something about it in due course! And my experience with Mission control gets to grace Dataspa as she debut’s working with data.

changeincarbon

Dataspa will share how she harnesses technology for data – this is multi faceted I must say.

Web links that gave perspective

Constituted the tools to achieve this collaborative effort.

Fruity overtones

http://datadrivenjournalism.ru/ #Irina and Anna

http://info.p2pu.org/2013/06/18/data-mooc-results-findings-and-recommendations/?utm_source=rss&utm_medium=rss&utm_campaign=data-mooc-results-findings-and-recommendations  #Our team (10) rocked!

http://info.p2pu.org/2013/06/17/data-explorer-mission-from-the-inside-an-agents-story/ #Anna’s story

http://thematicmapping.org/downloads/world_borders.php #Source of world boundaries shapefile