A Touch of Class

2009-11/12

Creating the Future of CSS Testing (2009 Edition)

This article has been updated for 2011

In the previous article I explained the role CSS2.1 and how powerful and useful a CSS2.1 test suite could be for the Web.

In this article I'm going to explain the current situation in CSS testing at W3C, what my goals are for each aspect of the testing process, and what's missing to get us there.

Submitting Tests

The goal here is to allow multiple people to submit and update tests in ways that are straightforward and easy for them, and to store the tests in a way that both works for W3C and is compatible with what our contributors want to do with the tests elsewhere.

We currently have:

A standard format for tests that includes the metadata we need to index the tests and generate useful reports.
A Subversion server to collect tests and track their development
Submissions from HP, Microsoft, some Mozilla contributors, and individual volunteers.
A volunteer (Gérard Talbot) who is dedicated to collecting tests from the web authoring community. (Please support Gérard's efforts!)

What we're missing:

A friendlier way to submit tests. Subversion has a bit of a learning curve; a WebDAV+Subversion setup might be easier for contributors if it is capable of transparently handling file moves and renames.
Documentation and best practices on submitting in reftest format, which allows script-driven testing for many (though not all) CSS features. This format would make test result reporting much faster.
Participation from Opera and WebKit's quality assurance teams: they are already writing tests for internal use, but their efforts are totally disconnected from W3C, which means they don't benefit the Web community as a whole. Establishing collaboration at the CSS2.1 level would also carry over into CSS3 modules, where the availability of their tests would allow faster adoption of new CSS3 technologies.
Awareness of W3C test formats at Mozilla and sync-up between W3C and Mozilla's test repositories. (Mozilla contributors already write tests in reftest format, many of which could be imported into the CSS test suite if they included the necessary metadata.)
Buy-in for collaboration from Microsoft's management levels. Currently they tend to withold tests for undetermined reasons, and it seems management is getting in the way of their developers updating tests in response to review comments. They have contributed a lot of tests to the test suite, but collaboration has been missing.

Publishing Tests

The goal here is to present the tests for each module on an official W3C server in the formats needed to run on all CSS implementations; to have the tests indexed so that it is easy for a developer to find the tests he is interested in running; and to have a coverage report that shows which parts of the spec are tested and which tests are missing.

What we have:

A place to host tests on W3C's main server.
An official test format that requires several bits of information:
- Links to the specific sections of the spec(s) being tested.
- The title of the test, intended for a table of contents.
- Optionally, a statement explaining what, specifically, the test is asserting if passed.
The original CSS2.1 build scripts can convert from the source format to the various destination formats. They can also build a table of contents mirroring the spec, and listing all tests for a given section. However, they require all the tests to be in a single directory, which with the number of tests we have now, won't work on Windows. Also, a lot of the CSS2.1 tests apply to various CSS3 modules, and maintaining separate copies of a test for each spec that needs it is a recipe for letting things get out of sync. (The same problem applies to the build scripts; they would need to be copied and modified for each test suite.)
A new set of build scripts from an HP contractor that is intended to replace the originals, and were designed with the idea of using a single set of scripts to build all the test suites. Unfortunately, while the scripts are complete in that they run, they're unfinished in that they're missing key elements and have a number of (mostly-unidentified) errors.

What we're missing:

Build scripts that work with our current directory structure.
Build scripts that can build the test suites incrementally, rather than rebuilding everything each time, so that we can run them every time someone adds or modifies a test and thereby keep a live copy of the test suite available in its output format.
A way to publish reftests that makes them understandable to someone who has never encountered them before.

Reviewing Tests

The goal here is to make it straightforward and easy for reviewers to review and comment on the tests they're interested in reviewing and to keep track of the required next steps for each test.

What we have:

A review process and checklist.
Not much else, other than some experience showing that mailing lists and wikis are not efficient ways to track test reviews.

What we're missing:

A way to track and search by the status of submitted tests: whether they have been reviewed, need to be reviewed, have errors, need further work, are waiting for spec clarifications, etc. Right now I get questions like "I want to help review some tests, which tests should I review?" and I can't come up with an answer. It would also allow people to see which tests are unreliable, or wrong, and either improve them or skip them when evaluating an implementation.
An efficient way to submit and track review comments associated with a test.
- Mailing lists allow easy submission, but can't track. If the test author is not subscribed and engaged when the comment comes in, it just gets lost in the archives and may or may not get found the next time someone works with that test.
- Bug trackers allow easy tracking, but are difficult for commenters. For one thing, it's not clear how to map problems with a test into the bug report model. Some of that can maybe be fixed by heavily customizing the bug tracker's templates, but they're just not designed for a test-reviewing workflow. For example, we've noticed that the same comment often applies to an entire series of tests (or even to all tests submitted by a particular contributor), and a system with one comment form per test is very inefficient for dealing with that. On the flip side, filing a bug per issue rather than a bug per test makes it hard for a contributor sit down with a particular test, integrate all the feedback, and fix the test.
An automated way to catch machine-checkable errors. Many format violations are machine-checkable, and having those checked by machine would free up reviewers' time to focus on the aspects of the test that must be human-verified.

Ideally we want it to be:

Easy for anyone to submit comments on errors they find in the test suite.
Efficient for a dedicated reviewer to work straight through a list of related tests.
Straightforward for a test contributor to find all the comments that pertain to his tests, improve them as necessary, and report back the changes to his reviewers for confirmation.

What we're thinking:

Melinda Grant (HP), Eira Monstad, Peter Linss (HP / CSSWG Chair) and I spent some time last year working out the requirements and design for a system. One of the key ideas was to integrate with the Subversion repository so that changes there automatically show up in the test's record in the system. Another idea was to detect and cache the metadata in the tests to make it possible to search for tests on a particular topic.

I started prototyping a system in Python+Django to address these concerns during the W3C test hackathon last September, but did not manage to complete the Subversion integration in that timeframe. If working on the design and implementation (front-end or back-end) of a system like this interests you, drop me a line. :)

Collecting and Reporting Test Results

The goal for collecting is to make it easy and efficient to collect accurate test results for as many implementations as we can, both for the human-verified tests and the reftests.

The goal for reporting is to generate up-to-date reports from the test results that are useful and insightful for implementors, web designers, and W3C.

What we have:

Templates for some of our smaller test suites, where the tester can hand-code their implementation's pass/fail result into an HTML table.
A prototype test harness that allows submission of test results for the human-verified tests by simply clicking the Pass/Fail/Skip buttons associated with a test shown on the screen. (It records the UA string and the result.) A subset of tests can be run and you can stop at any time, so it's not required to run thousands of tests before submitting any results.
Some rudimentary output capabilities in the prototype test harness for reporting those results.

What we're missing:

A production-ready system set up to collect test results for the latest version of the CSS2.1 test suite, that can be easily updated to handle future revisions of the test suite and any upcoming CSS3 modules.
Designs for useful reports that we could create from the test results, that would scale well to thousands of tests and myriad implementations.
Code that automatically creates those reports from the collected test results.

Get Involved

The CSS Working Group's wiki has more information on CSS testing and how you can get involved.