On Friday, the Elmcity team had its weekly meeting, the minutes of which can be found here.
A summary of the things we covered:
We went over and finalized our grading scheme, including exactly what functionality we hope to have achieved by the 29th of November.
In that vein, we’ve decided that our functionality will primarily be dictated by the ability to parse the pages brought up by our users here, and secondarily by this list. There’s a lot of overlap between the two, but our focus is on the sites the current curators are interested in.
We’ve all been looking over the patterns that should be recognized by our general parser, and will report on them this week.
Finally, we discussed which approach should be taken for parsing based on two libraries available for python: parsedatetime and dateutil. Dateutil seems to err on the side of caution, choking on any non-datetime text; whereas parsedatetime can handle a really wide range, but will also return false positives. We’ll likely have to take a middle ground approach to our parsing, since human generated text is so unpredictable.