This term UCOSP is happy to be partnering with six projects. Some recent projects are no longer on the list because of travel costs or other commitments, but we are pleased to welcome two projects from Mozilla. The project descriptions will be updated shortly.
- Review Board: Code review tool.
- Markus: Web-based grading platform.
- UMPLE: UML modeling and programming tool.
- Formulize: Database, reporting and workflow management system.
- Jupyter Notebook: Documents with embedded live code
- Code Coverage: Tools to analyze test coverage
MarkUs is a web-based grading tool built with Ruby on Rails. The primary goal of MarkUs is to make it easy for graders to read and annotate students’ code. Graders also fill in a marking scheme or rubric created by the instructor. Annotations may be saved for later reuse. Students submit their code using either the web interface or using standard Subversion tools, and can form their own groups when allowed by the instructor. As MarkUs grows, we continue to add more useful features including a REST API that allows some operations on MarkUs to be scripted, a remark request system, more reporting, and improving support for PDF annotations. We are also working towards integrating a testing infrastructure that would allow students to run instructor created tests on their submission and get realtime feedback. Students working on MarkUs will learn basic web application development technologies using Ruby and Rails. MarkUs is hosted on Github so students will become familiar with Git and the process we use when working on the code. Because MarkUs is used by several thousand students in more than 4 universities (on 3 continents!), we take code quality seriously. All code submissions go through a code review, so the first task that students are asked to complete is fixing a trivial bug so that they become familiar with the code review process. Students working on MarkUs need to be able to work in Linux either natively or in a virtual machine. As the fall term comes to a close, we are putting together a list of the next projects. More information: http://markusproject.org/ and their blog, http://blog.markusproject.org/
Umple is an open source toolkit whose objective is to merge UML modeling and programming into a single activity. Umple can be used in several ways: It can be used as a textual language for UML. It can also be used as a programming-language pre-processor, allowing UML concepts like associations and state machines to be added directly to Java, C++, and PHP. In addition, Umple allows drawing UML diagrams online and generating code directly from those diagrams. It is the goal of the Umple team to have large numbers of programmers and modelers incrementally adopt Umple. The barriers to entry are low, since using Umple can be done in a minimal way, without disrupting the existing model or code. Umple is an open-source project hosted on Github. You will have the opportunity to learn some or all of the following:
- Model-driven design using UML
- Test-driven development using JUnit
- Compiler design including parsing and code generation
- Web site design (of the UmpleOnline tool)
- Eclipse plugin development (of the Umple plugins)
- A variety of other libraries and tools
- Agile open source development with continuous integration
Formulize is a tool for making data management systems on the web. It has extensive support for modelling workflows, so that organizations can customize how users interact with the data that Formulize is managing. It is aimed at “power users” in not-for-profits and other organizations without large IT departments and resources, empowering them to create systems that would otherwise require custom programming to deploy. The most basic operation in Formulize, is the creation of forms. Administrators can specify what elements should appear on the form, and also how different groups of end users should be able to interact with the form. From there, administrators can make custom screens that control how lists of entries in each form are shown to end users. Administrators can also control how different forms relate to each other, similar to describing table relationships in an ERD. These relationships then govern how data is queried from the database, enabling screens to display complex sets of information to users, rather than just entries from a single form. Formulize can work as a standalone application, installed on a web server. Formulize can also be embedded within any PHP-driven web application on the same server where it is installed. A Drupal module has been created that supports extensive integration with the Drupal content management platform, including single sign-on for users. Integration plugins for WordPress and Joomla have also been created (by previous UCOSP students!).
Who uses Formulize? Formulize is used by organizations around the world, for a variety of purposes, from tracking the status of housing renovations, to recording the activities of wilderness rescue teams. The lead developer of Formulize is Freeform Solutions, a Canadian not-for-profit organization that helps other not-for-profits with IT. Freeform has used Formulize with several past and current clients, including: Oxfam Canada, the Boys and Girls Clubs of Canada, the Ontario Association of Children’s Aid Societies, the Australasian College of Sports Physicians, and various social science research projects at the University of Toronto, McMaster University and the University of Western Ontario.
What will students learn? Students will have extensive exposure to PHP of course, and related web technologies. Students will be tasked with fixing problems and adding features to Formulize. We follow a specific process in our GitHub repository, to record code changes, documentation and Selenium tests all together. The tests are run automatically by our continuous integration system, based on Travis-CI and Sauce Labs. Students will have to develop documentation for their features, as well as verification tests in Selenium, before their code will be merged with the master branch. This should give students a deeper understanding of the role of software engineering in the larger process of maintenance and deployment of software. Learn more about Formulize:
- Download Formulize and docs.
- Read the history and roadmap for Formulize.
- Browse the GitHub repository.
- Video tutorials for using Formulize (the full series is about three hours, but you can skip around between various videos at your leisure):
- Learn about our version control and continuous integration process.
- Visit the Formulize support forums.
- Python and Jupyter
- Shell scripting
- Git and Github
- Code reviews
- Amazon Web Services (Elastic MapReduce, S3, etc)
- Automatically save Jupyter notebook to Amazon S3
- Integration with Github to easily publish a notebook as a gist or load from a gist
- Add a progress bar to give feedback on the currently executing cell
Code Coverage collector and explorer
The real question comes up when we have data what can we do with it? This is where we are looking for a team of creative hackers to help us out. We have some basic questions we want to answer such as:
- Given a test, which methods and files do we touch?
- Given a source file and related method, give me a list of all tests which access this and tell me which lines have no coverage.
- Given a patch (list of source files and methods adjusted) give me the tests I should run?
In order to answer those questions we have some pre-requisites to solve:
- Updating test harnesses to collect code coverage while running tests. This needs to be done without crashing, timing out, or breaking our infrastructure.
- Store code coverage data in ActiveData. ActiveData is a ES Cluster which has information about all our tests and history.
- Write an interface library to ActiveData which will run queries to answer common questions about code coverage.
- Have a basic webUI to run a set of pre-canned queries along with related input data and display results in an easy to read and understand format. Ideally this will be a webservice which other tools can query.
- Having the facility to run these in our automation infrastructure “on demand”. This allows for us to run jobs in automation to collect a new batch of code coverage. This should be run without exceeding our 2 hour time limit on jobs (we would have to run in parallel by small chunks per machine). To do this we will use task cluster which is a new easily hackable CI system.
- As code coverage is new at Mozilla- there will be a need to either accept the data as inaccurate or to help make the data more accurate by ensuring we find the proper files and offsets into files. This is difficult when there are packaged files during the build as it doesn’t map directly to the source tree. In addition we have many types of files (js, xbl, svg, xul, c++, etc.) which will eventually need proper ways to collect coverage and coalesce the coverage data with the related data from other source code types.
- Mozilla has dozens of build types, platforms, and test configurations, for the purpose of this project we will limit this to “linux64 debug tests in e10s mode”. Ideally upon completion we can scale to other types of tests/builds without much work.
The above steps don’t necessarily need to be in order, but there is some sanity in knowing that that we have parameters and milestones to achieve throughout the project. To get started on this project, it will be good to have a Firefox development environment setup: http://areweeveryoneyet.org/onramp/desktop.html. Also learning about how other code coverage tools work and store data will be useful background information.