Leaving for Malawi

Aug. 21, 2013
by
Benjamin Balder Bach

During the coming month, we will switch to some more personal blogging, and this is a post hopefully to be followed up by many more about the next month of practical work to implement IT Centres around the Mzuzu area of Northern Malawi.

Lately, we have been in close contact, preparing the project with Mzuzu University's ICT head of department, Seyani Nayeja, who has also blogged about the project [1].

The overall headline is: We are leaving for Malawi, and we have suitcases full of data, and everything is updated since the last project. We are now running Ubuntu 12.04 (Edubuntu) and a snapshot of Wikipedia that's just a month old.

First item on the packing list: Wikipedia

The last couple of weeks have been hectic and are now culminating in sleepless nights and lots of coffee: Downloading and remixing Wikipedia did not go smoothly this time. Dead hard drives, overheating CPUs, wrong advice from peers. The list of excuses is long, but the point is that we have almost arrived at a full copy that will bring vast amounts of information where the internet does not quite reach yet.

The growth of Wikipedia does not only cause problems to us, but the Wikimedia Foundation has also experienced a drop in the rapidness of providing up to date snapshots. This is understandable. The number of articles has grown by 50% since 2010, reaching almost 4.5 M articles on the English Wikipedia. Meanwhile, article edits have stabilized at around 10 M edits every 55 days. [2]

...other items

New additions to the information collection will be Project Gutenberg, KA Lite [3] (60 GB offline version of Khan Academy with thousands of videos and exercises), Why Poverty (8 hours of documentary movies and 30 shorts, thanks to Steps International), and Gap Minder [4] (an exploration tool for open access data).

What more is the 80 GB collection of open source software from the Ubuntu Repositories, Why Democracy documentaries, and a collection of various free ebooks.

It's hard to keep up with the development, but once we get back, we should be taking thorough steps to streamline the collection and maintenance of what we've nicknamed the FAIR Intranet.

Thoughts on data media

Everything is made to fit 2 TB media, a hard drive size that's affordable to us (one of the few hardware items we buy from new). This time, however, we have switched to internal hard drives in response to the rapid decay of external hard drive quality. This will make media access times much faster.

It should be easy: Make a prototype that works and replicate it. However, the recurrent nature of blackouts in Malawi has made it quite tedious to move large data collections, and 2 TB data take roughly 8 hours to copy on an internal disk drive.

So little time

Good news is that we have finished downloading and scaling Wikipedia images: 3.2 million files have been processed and gone from 2.3 TB to 900 GB, just the result we were hoping for.

However, time is sparse... we need to scan through all the files and recreate a MySQL database of all the images... the plane is leaving Saturday evening.

[1]: http://nayeja.blogspot.com/2013/08/fair-danmark-mzuni-project.html
[2]: http://en.wikipedia.org/wiki/Wikipedia:Statistics
[3]: http://kalite.learningequality.org/
[4]: http://www.gapminder.org/

Updated logo

Beretning fra en virksomhedspartner

Back to blog index