Projects (January 2016)

Overview

This term UCOSP is happy to be partnering with six projects.  Some recent projects are no longer on the list because of travel costs or other commitments, but we are pleased to welcome two projects from Mozilla.  The project descriptions will be updated shortly.

  • Review Board: Code review tool.
  • Markus: Web-based grading platform.
  • UMPLE: UML modeling and programming tool.
  • Formulize: Database, reporting and workflow management system.
  • Jupyter Notebook: Documents with embedded live code
  • Code Coverage: Tools to analyze test coverage

Review Board

Review Board is a powerful web-based code review tool that helps developers do peer review as they write code. Review Board is used by thousands of software companies including Twitter, Yahoo, and VMware, as well as many open-source projects like Apache and KDE.
Students working on Review Board will have the opportunity to learn about back-end web development using Python and Django, as well as front-end development using HTML, CSS, Javascript, jQuery, and Backbone.js. Source control is managed via Git on GitHub. All patches are reviewed using Review Board, and students are expected to contribute to reviews for each other, as well as to other members of the development community.
Some possible projects include:
  • Making improvements to Review Board’s extensions infrastructure, which allows third party developers to build features that aren’t generic enough to be part of the product.
  • New kinds of integration with other services, such as deeper bug tracker integration, an adapter for GitHub pull requests, or allowing users to log in using Mozilla Persona.
  • Reworking parts of the UI to work better on touch devices like tablets and smartphones.
For a full list of project suggestions, check out our wiki: Student Project Ideas
 
Some experience using Python and/or Django, Javascript and jQuery would definitely be a plus. In our experience, Git is usually the largest stumbling block, so students comfortable with Git (or able to quickly get comfortable with Git) will likely have an easier time developing.
For more information, see the project web page at http://reviewboard.org, or our students blog at http://reviewboardstudents.wordpress.com/

Markus

MarkUs is a web-based grading tool built with Ruby on Rails. The primary goal of MarkUs is to make it easy for graders to read and annotate students’ code. Graders also fill in a marking scheme or rubric created by the instructor.  Annotations may be saved for later reuse.  Students submit their code using either the web interface or using standard Subversion tools, and can form their own groups when allowed by the instructor. As MarkUs grows, we continue to add more useful features including a REST API that allows some operations on MarkUs to be scripted, a remark request system, more reporting, and improving support for PDF annotations.  We are also working towards integrating a testing infrastructure that would allow students to run instructor created tests on their submission and get realtime feedback. Students working on MarkUs will learn basic web application development technologies using Ruby and Rails.  MarkUs is hosted on Github so students will become familiar with Git and the process we use when working on the code. Because MarkUs is used by several thousand students in more than 4 universities (on 3 continents!), we take code quality seriously.  All code submissions go through a code review, so the first task that students are asked to complete is fixing a trivial bug so that they become familiar with the code review process.  Students working on MarkUs need to be able to work in Linux either natively or in a virtual machine. As the fall term comes to a close, we are putting together a list of the next projects. More information: http://markusproject.org/ and their blog, http://blog.markusproject.org/

Umple

Umple is an open source toolkit whose objective is to merge UML modeling and programming into a single activity. Umple can be used in several ways: It can be used as a textual language for UML. It can also be used as a programming-language pre-processor, allowing UML concepts like associations and state machines to be added directly to Java, C++, and PHP. In addition, Umple allows drawing UML diagrams online and generating code directly from those diagrams. It is the goal of the Umple team to have large numbers of programmers and modelers incrementally adopt Umple. The barriers to entry are low, since using Umple can be done in a minimal way, without disrupting the existing model or code. Umple is an open-source project hosted on Github. You will have the opportunity to learn some or all of the following:

  • Model-driven design using UML
  • Test-driven development using JUnit
  • Programming in Java, Umple, PHP, C++, and/or Javascript
  • Compiler design including parsing and code generation
  • Web site design (of the UmpleOnline tool)
  • Eclipse plugin development (of the Umple plugins)
  • A variety of other libraries and tools
  • Agile open source development with continuous integration

The exact set of skills you will employ depends on the task(s) you choose to work on. More information: http://www.umple.org Suitable student projects: http://projects.umple.org

 Formulize

Formulize is a tool for making data management systems on the web. It has extensive support for modelling workflows, so that organizations can customize how users interact with the data that Formulize is managing. It is aimed at “power users” in not-for-profits and other organizations without large IT departments and resources, empowering them to create systems that would otherwise require custom programming to deploy. The most basic operation in Formulize, is the creation of forms. Administrators can specify what elements should appear on the form, and also how different groups of end users should be able to interact with the form. From there, administrators can make custom screens that control how lists of entries in each form are shown to end users. Administrators can also control how different forms relate to each other, similar to describing table relationships in an ERD. These relationships then govern how data is queried from the database, enabling screens to display complex sets of information to users, rather than just entries from a single form. Formulize can work as a standalone application, installed on a web server. Formulize can also be embedded within any PHP-driven web application on the same server where it is installed. A Drupal module has been created that supports extensive integration with the Drupal content management platform, including single sign-on for users. Integration plugins for WordPress and Joomla have also been created (by previous UCOSP students!).

Who uses Formulize? Formulize is used by organizations around the world, for a variety of purposes, from tracking the status of housing renovations, to recording the activities of wilderness rescue teams. The lead developer of Formulize is Freeform Solutions, a Canadian not-for-profit organization that helps other not-for-profits with IT. Freeform has used Formulize with several past and current clients, including: Oxfam Canada, the Boys and Girls Clubs of Canada, the Ontario Association of Children’s Aid Societies, the Australasian College of Sports Physicians, and various social science research projects at the University of Toronto, McMaster University and the University of Western Ontario.

How is Formulize built? Formulize is built primarily with PHP, but makes use of HTML and Javascript as well of course. jQuery is also used extensively in the more recent parts of the code base (the project started in 2004). Formulize relies on MySQL for database operations. Because it is a tool that you use to create other systems, rather than a tool that does something for end users by itself, there is a high degree of abstraction throughout the codebase, especially the parts that interact with the database. The code has to read configuration information specified by the administrators, and use that to dynamically generate all operations, including database queries. The newer parts of the codebase employ some object orientation. Older parts remain largely procedural. The codebase is maintained on GitHub.

What will students learn? Students will have extensive exposure to PHP of course, and related web technologies. Students will be tasked with fixing problems and adding features to Formulize. We follow a specific process in our GitHub repository, to record code changes, documentation and Selenium tests all together. The tests are run automatically by our continuous integration system, based on Travis-CI and Sauce Labs. Students will have to develop documentation for their features, as well as verification tests in Selenium, before their code will be merged with the master branch. This should give students a deeper understanding of the role of software engineering in the larger process of maintenance and deployment of software. Learn more about Formulize:

Jupyter Notebook

(Mozilla)

The Jupyter Notebook[1] is a web application that allows you to create, share, and run documents that contain live code, equations, visualizations and explanatory text. It is used by Mozilla for Self-Serve Data Analysis[2] with Apache Spark[3]. This infrastructure allows people to conveniently process terabytes of data using powerful analysis tools, all inside the browser.
The code for the Self-Serve Data Analysis system is managed on Github[4].
Students will become familiar with
  • Python and Jupyter
  • Shell scripting
  • Spark
  • Git and Github
  • Code reviews
  • Linux
  • Amazon Web Services (Elastic MapReduce, S3, etc)
Some potential projects include
  • Automatically save Jupyter notebook to Amazon S3
  • Integration with Github to easily publish a notebook as a gist or load from a gist
  • Add a progress bar to give feedback on the currently executing cell

Code Coverage collector and explorer

At Mozilla we have a lot of code and thousands of tests in various languages.  Our goal is to use code coverage to be smarter about how we measure quality, determine what we test, and look for overall trends as time passes.  Today, code coverage is straighforward for C++, but when it comes to Javascript, we run into a variety of issues.  While we have success in many cases, there is a lot of work to ensure we have reliable data for all cases.

The real question comes up when we have data what can we do with it?  This is where we are looking for a team of creative hackers to help us out.  We have some basic questions we want to answer such as:

  • Given a test, which methods and files do we touch?
  • Given a source file and related method, give me a list of all tests which access this and tell me which lines have no coverage.
  • Given a patch (list of source files and methods adjusted) give me the tests I should run?

In order to answer those questions we have some pre-requisites to solve:

  1.  Updating test harnesses to collect code coverage while running tests.  This needs to be done without crashing, timing out, or breaking our infrastructure.
  2. Store code coverage data in ActiveData.  ActiveData is a ES Cluster which has information about all our tests and history.
  3. Write an interface library to ActiveData which will run queries to answer common questions about code coverage.
  4. Have a basic webUI to run a set of pre-canned queries along with related input data and display results in an easy to read and understand format.  Ideally this will be a webservice which other tools can query.
  5. Having the facility to run these in our automation infrastructure “on demand”.  This allows for us to run jobs in automation to collect a new batch of code coverage.  This should be run without exceeding our 2 hour time limit on jobs (we would have to run in parallel by small chunks per machine).  To do this we will use task cluster which is a new easily hackable CI system.
  6. As code coverage is new at Mozilla- there will be a need to either accept the data as inaccurate or to help make the data more accurate by ensuring we find the proper files and offsets into files.  This is difficult when there are packaged files during the build as it doesn’t map directly to the source tree.  In addition we have many types of files (js, xbl, svg, xul, c++, etc.) which will eventually need proper ways to collect coverage and coalesce the coverage data with the related data from other source code types.
  7. Mozilla has dozens of build types, platforms, and test configurations, for the purpose of this project we will limit this to “linux64 debug tests in e10s mode”.  Ideally upon completion we can scale to other types of tests/builds without much work.

The above steps don’t necessarily need to be in order, but there is some sanity in knowing that that we have parameters and milestones to achieve throughout the project.  To get started on this project, it will be good to have a Firefox development environment setup: http://areweeveryoneyet.org/onramp/desktop.html.  Also learning about how other code coverage tools work and store data will be useful background information.