Projects (Fall 2011)

NOTE: These are the Fall 2011 projects; for the current project list please

Encyclopedia of Life

The Encyclopedia of Life (EOL) is a free, online collaborative encyclopedia intended to document all of the 2 million living species known to science. It is compiled from existing databases and from contributions by experts and non-experts throughout the world.

Smart phones revolutionized how we access information, and support for smart mobile devices is becoming crucial for internet projects such as the EOL. This summer, the EOL for mobile devices started as a Google Summer of Code project and is in active development right now with an aim of the code being part of a new major release of the EOL website. The initial mobile interface only enables searching the EOL website and accessing the basic information about species. Desired enhancements include the ability for EOL users to sign-in and see the data they collected via mobile interface.

Understanding how to program for mobile devices is becoming a skill important for finding interesting and rewarding jobs. By participating in this project you will learn mobile features of HTML5 and how to work with Ruby on Rails in mobile environment. You will get experience in Test Driven Development, writing accessible code, using JQuery mobile, and the Git version control system. Upon completion the project is aimed to be released into the production version of the Encyclopedia of Life.


Ever want to make a good quality video of a demo or presentation? Looking to record class lectures or labs? Need to record video from a webcam or camcorder? Looking for something that will record vga output? Freeseer is the project that does this and more – with a click to start, and a click to stop recording.

The Freeseer project is a powerful software suite for capturing video. It enables you to capture great presentations, demos, training material, and other videos. It handles simple desktop screen-casting with ease. It is particularly good at handling large conferences with hundreds of talks.

Freeseer is implemented using the Python programming language, Qt framework, and GStreamer multimedia framework. It is licensed under the GPL license (v3) and so free to download, use, modify, or build upon. It has been used to record dozens of large conferences to make great talks available on-line.

More information:


MarkUs is a grading and code review tool that gives the flexibility of pen-on-paper marking through the web. It is built with Ruby on Rails. MarkUs is currently used by 3 universities serving several thousand students.

Current efforts include bringing the automated testing infrastructure to a point where it is ready for production, reporting and summarization features, and as always, some interesting bug fixes.

More information: and their blog,


POSIT aims to create a portable, opensource tool for the Android platform to aid search and rescue efforts by allowing the transmission of data between users and to central servers.

Imagine you are a rescue worker searching for victims and survivors in the aftermath of a hurricane or other natural disaster. Or, imagine you are botanist mapping a geographical area for an invasive species. Or, an environmental scientist searching for hazardous waste deposits.

What’s needed is a portable tool that is able record information about Finds and transmit it to a central server or control center. As mobile phone technology becomes ubiquitous and more powerful, such a tool is now feasible. Building such a device on the FOSS Android platform would make it widely and freely available to rescue workers, environmental scientists, and other field workers.

More information:


Review Board is a powerful web-based code review tool that offers developers an easy way to handle code reviews. It scales well from small projects to large companies and offers a variety of tools to take much of the stress and time out of the code review process. ReviewBoard is written in Python using the Django web framework.

More information: and


The Sentinel Project for Genocide Prevention (SPGP) is a young, innovative non-profit based in Toronto, Canada. Its core mandate is to develop a framework to assess the risk of genocide affecting a community that will allow analysts to systematically monitor conditions and provide targeted early warnings to strategic partners and threatened groups. SPGP has developed a methodology synthesized from existing academic research that leverages modern information and communication technology (ICT) tools for data collection, reporting, and analysis.

This approach is embodied by Threatwiki, an ICT platform being designed by the Sentinel Project. Threatwiki has two functions: threat assessment decision support and the online dissemination of threat information. It aggregates event information sourced from news media, blogs, SMS messages, and other communication media, producing a coherent narrative. It also features a visual interface to illustrate the emergence of key operation processes and their systemic causes.

More information:


Umple is an open source toolkit whose objective is to merge UML modeling and programming into a single activity. Umple can be used in several ways: It can be used as a textual language for UML. It can also be used as a programming-language pre-processor, allowing UML concepts like associations and state machines to be added directly to Java and PHP. In addition, Umple allows drawing UML diagrams online and generating code directly from those diagrams. It is the goal of the Umple team to have large numbers of programmers and modelers incrementally adopt Umple. The barriers to entry are low, since using Umple can be done in a minimal way, without disrupting the existing model or code. Umple is an open-source project hosted on Google Code.

More information:
Suitable student projects:

Data Generator

The Data Generator (DG) is a cross-platform, cross-database command-line tool for generating data sets for Relational Database Management Systems (RDBMSs) based on an XML description of these data sets.

The user creating the description of a data set defines the structure and relationship between the tables and specifies the types of random and pseudo-random data that each of the table’s field has to contain (i.e. dates, text, strings based on regular expressions, etc). A typical configuration file is usually a few hundred lines long and allows DG to generate millions of rows of data for each of the tables in just a few minutes.

DG is an open-source project distributed under Apache 2.0 license and written almost exclusively in Java. DG has been developed as part of the Technology Explorer project and is maintained by the open-source community, which includes a few contributing developers from IBM. The most common application of DG is building large data sets for different applications that need a large pseudo-random data sets for testing or performance comparison purposes.

Being just under 5k lines of code, the core functionality of the project is complete, but there are still a lot of things that need to be done: the project is ready to offer everything from minor bug fixes and adding unit tests to new feature development and writing (better) documentation to the participants. Whatever your interests and strengths are, it is likely you will find something interesting to work on if you choose to join this project.

More information: Data Generator Wiki