A talk about data journalism in Uganda.

Since last year when I started working on an Open Data Project at Fruits of Thought and hanging out a lot with the School of Data folk, I have felt strongly about building a data community in Uganda. Data community? Yes, this would be a group of people that meet regularly to work with open data: clean, analyze, explore different data tools, visualize the data, draw insight from or get nerdy with programming around data or building applications or just talk- sometimes talking brings out the most amazing ideas. Some of these ideas have been inspired by my friends at School of Data- Thank you so much for showing me that this is do-able!

So, about two weeks ago, I saw the first of what we’ve termed “local data days” happen. We want these to be monthly events that happen every last Friday of the month. We’ll be meeting to discuss a range of topics like data journalism, basic design principles, data visualization, knowledge creation and sharing, mapping, programming, basic statistics, data analysis, explore some data tools like Scraper Wiki to get volunteers to scrape PDFs of all the data that’s ‘locked’ in there. Community(Ugandans) should be at the core of all this, so we’ve invited the community to tell us what they’d want to share on- I am keeping my hopes up that it sparks some interest in the possible (open)data lovers out there.

We kicked off the local data days with a focus on “Data Journalism”. We had a mix of journalists and programmers plus a few curious people in the house. We learned basic R for data analysis and visualization but it turned out that the R learning curve was a bit steep, we should have chosen a simpler tool perhaps- positive criticism right there! We had our focus on the education sector- unfortunately the visualized data painted a grim picture. It was amazing what questions journalists and programmers can come up with when in the same room, for example “how do you make the map interactive?”, “why use a map instead of a graph, pie-chart or word cloud?”

At a glance- what stood out from the discussion. Via jasondavies.com

At a glance- what stood out from the discussion.

We also had sometime to discuss what the status of data journalism is in Uganda, the challenges and what the work around was for this. Data journalism has not taken off in Uganda- not that much. Some of the proof is in one of the local daily news paper that publishes tabular data that sometimes spans several pages(resource appropriation gone bad?). This data is numerical, focusing on administrative levels and themed to sectors like education, law & order, culture, health, employment, consumer goods and prices. It is totally possible to constructively compress this data and give the huge tables some context. This is just one example of a data journalism time ‘bomb’- where is the good switch? One journalist impressed me when she said, in the news room, she had to learn statistics and maths in order to work on her stories, there was no one to do it and it all depends on how passionate and self driven one is as a journalist- she was up for the programming meet-up in this series of local data days. When I asked about the challenges of writing data backed stories, it all boiled down to a data skills deficiency.

My take-away from this meet-up was that our journalists know what needs to be done in the data journalism space in Uganda but need some support in the area of training. I hope that these regular meet-ups will work toward closing some of the skills gap for the local journalists. Looking forward to ‘hotter’ stories from Uganda’s news rooms, going forward. :-). I am glad that some of them are already working with the datasets from data.ug for their projects.

A week later, Europe was a buzz with the biggest data journalism event in the region. It started with a few interesting tweets on my timeline- I ended up at #ijf14.

Image

Data, Maps and Tech-tools – ICCM 2013

#ICCM, trended last week on twitter!The International Conference of Crisis Mappers, 2013 took place in Nairobi over a span of five days. “Technology and Innovation for crisis mapping, in and outside Africa” was the theme. It was a rich convention of industry practitioners in the crisis mapping space.

I could write a ‘book’ I think,though for purposes of enjoying my reminisce, I will stick to what I like talking about, DATA, MAPS & TECH TOOLS! Data means different things to different people- a panelist in the “What’s so Big about Big Data?” session opined.

Data discussions

Big data

For the ICCM context, data may zero down to the kind that allows for coordination during a crisis, early warning preparations and post crisis management.According to ICCM discussions,data from social media has already gone along way in shaping how humanitarian activations are carried out. The Hurricane Haiyan crisis got several humanitarians working to provide relief to affected people,  for example DHN and MicroMappers. One way of looking at it is appreciating how diverse one’s intentions might be when they post a tweet. So it turns out that even though it’s all technology aiding this kind of communication, one cannot under-estimate the power of integrating human psychology in synthesizing all this information and designing algorithms that incorporate the very subtle bits of what might not be expressed in a tweet.  Hermant Purohit a Phd candidate with Knoesis team delved deeper into this when he gave an Ignite Talk.

Some of the challenges highlighted in synthesizing social media data came up as :

  • authenticity/ validity: where do we draw the line?
  • temporal relevance: a tweet that’s as late as 1 day may not be relevant.
  • scaling machine learning tools that synthesize the data is not easy: the data is not static because humans generate new data quickly.
1473048_10152090614055477_1568045303_n

Aditya Vashistha of IVR Junction, Presents on “Generating, Analyzing &
Using Information from Hard-to-Access Areas”

So you might want to take all the flurry on social media with a pinch of salt or better still make supported assumptions for the machine learning algorithms that would allow for graceful use of social media information, in crisis management, for that matter.

This big data talk tickled my mind…I ended up thinking that the millions of volunteers making maps to ease coordination for devastation caused by hurricane Yolanda were actually doing a big data gig: 2million plus digital hands editing many tiles of satellite imagery plus other crowd sourced geospatial information to make life better for the Philippine people!

Open data commons– another dimension of data.

The School of Data and UNOCHA had a self organised session on how crisis mappers can leverage open data to quicken humanitarian response, the discussion delved into data issues like curation, informatics/open data working groups for crisis mappers, cleaning and quality, granularization of big datasets for easier use by smaller groups of people and the big time need of training and mentoring in a space where data needs seem to change by the hour. OKFN’s Ruffus Polock shares his thoughts in an article titled Forget big data, small data is the real revolution.

Then we veered off a little, discussing, “How much of personal information should be out there for the world to see?” in the open data context. This turned out to be an uncomfortable topic so we quickly got back to discussing what role open data plays in crisis management and how the open data commons tool can be harnessed to make this possible.  More of what came out of this discussion here .

Teaching and learning

Data Track

The tech and training session run for a day and had four tracks, Mobile/Security, Maps, Data & Knowledge. These sessions were meant to allow people learn and teach in small groups. School of Data run the Data & Knowledge session. As a volunteer mentor at School of Data, I was invited to give an Introductory training to Geocoding and Mapping using Google fusion tables. I thought it very basic in the beginning but it turned out helpful for the folk who attended the Data and Knowledge session, with data cleaning(Michael Bauer from OKFN), spreadsheet basics(Steve Kemei, Development Initiatives) and basic dataviz(Agnes Rube, Internews) mini sessions to go with it. Google fusion tables is an experimental version, this means it could change anytime. Try out open source options like cartodb, QGIS and more services for your geocoding and mapping needs.

Photo Credit: iHub

Photo Credit: iHub

Maps

What is a crisis mappers convention without a  party? I mean a mapping one. Heather Leson  and Severin Menard of the Humanitarian OpenStreetMap Team  put together a map-up where once again I joined as a co-trainer on mapping an area that needs urgent humanitarian and volunteer contribution, the Philippines. If you can contribute your OSM mapping skills, please do. Heather gives a detailed account of the map-up here.

One last thing about using the HOT tasking manager,  JOSM’s performance can improve, considerably if you install the list of plugins captured in the screen dump below. This works for tasking manager jobs- Thanks to HOT’s Severin Menard for this priceless tip!

Finally but not least, I gave a talk on How the team at Fruits of Thought is “Building a robust mapping community in Uganda” and making it even easier by developing very simple mapping tools to complement the community effort.

We’d appreciate feedback on these tools so that we can improve them, use them to map and let us know how it all goes.

Carbon data wrangling

Carbon data wrangling.

*what is special about the data?

*what stories can be drawn from the data?

*correlation between the attributes?

On the table

One month down and 5 people with unique backgrounds have worked a raw spreadsheet with carbon emission data. I am truly proud to have been part of a talented team! We cleaned, analyzed, and visualized spreadsheet data. I am passionate about the environment and the opportunity to work closely with carbon emission data could not have come at a better time. Working with my team through this exercise allowed me to learn one or two things about data wrangling, it was a nice stock taking tool (it’s good to check yourself once in a while) and I was a little curious about how working together remotely would turn out. Thanks to School of Data.

Tools

E-mail, Google’s products (G+ hangout for a real feel of who your team-mates are, docs for collaboration) and Ether pad, aided communication. Meeting time had to be scheduled using a doodle poll but was not a popular choice for the team. The second option which was email turned out to be a tall order because we had to manually sync time zones between Uganda, South Africa, Russia, Romania and the United Arab Emirates. I actually realized that you have to work almost twice as much with remote team work especially if it’s a size able team (>3 is big)-smaller is better and allows you to concentrate and keep a tab on each team member’s contribution. The latter also makes for easier peer reviews.

Team

Anna and Sipos made great facilitators (Journalist background at work?) while Jakes carried the day when it came to research on coal’s contribution to carbon emissions especially in South-Africa(He is a professor of research methods so we understood why he had a strong affinity for more and more data ). Irina made great strides with documenting all that we did(even in pictures) . I liked moving the data around, splitting it and seeing how visually appealing this carbon data would turn out. By this time we had our data cleaned and ready to merge with the plethora of datasets from the team.

Data served

We came out very clean after the mission with carbon emission data that was easier to reuse (for example merge with other datasets) and comprehend. A detailed write up on what the dataset contains was mentioned earlier in the Guardian.

We actually went ahead to combine the original carbon emission dataset with others that were differently themed: for example, GDP and forest cover data.

This was not only about data, I got another version of teamwork/playing(purely remotely this time), communicating technical terms in simple ways, relevant communication through email for example(I did not have the luxury of going over to someone’s desk and pointing out a few things, I had to be spot-on), facilitating a meeting online(G+ hangout) and having the entire team on the same page and lastly teamwork might not be about getting things done but also the process of getting that work done. It’s therefore not a bad idea to drop a line or two to let your team-mates know you’re checked-in : this will keep the team spirit up.

During this exercise, we got to know which folks make our air dirty, at a glance that is- we should be doing something about it in due course! And my experience with Mission control gets to grace Dataspa as she debut’s working with data.

changeincarbon

Dataspa will share how she harnesses technology for data – this is multi faceted I must say.

Web links that gave perspective

Constituted the tools to achieve this collaborative effort.

Fruity overtones

http://datadrivenjournalism.ru/ #Irina and Anna

http://info.p2pu.org/2013/06/18/data-mooc-results-findings-and-recommendations/?utm_source=rss&utm_medium=rss&utm_campaign=data-mooc-results-findings-and-recommendations  #Our team (10) rocked!

http://info.p2pu.org/2013/06/17/data-explorer-mission-from-the-inside-an-agents-story/ #Anna’s story

http://thematicmapping.org/downloads/world_borders.php #Source of world boundaries shapefile