A Touch of Class

2011-01

Creating the Future of CSS Testing (2011 Edition)

In an earlier article I explained the role CSS2.1 and how powerful and useful a CSS2.1 test suite could be for the Web.

In this article I'm going to explain the current situation in CSS testing at W3C, what my goals are for each aspect of the testing process, and what's missing to get us there. It's an update of my 2009 edition.

Submitting Tests

The goal here is to allow multiple people to submit and update tests in ways that are straightforward and easy for them, and to store the tests in a way that both works for W3C and is compatible with what our contributors want to do with the tests elsewhere.

We currently have:

A standard format for tests that includes the metadata we need to index the tests and generate useful reports.
A Subversion server to collect tests and track their development
Submissions from HP, Microsoft, Mozilla, and individual volunteers.
A volunteer (Gérard Talbot) who is dedicated to collecting tests from the web authoring community.

What we're missing:

A friendlier way to submit tests. Subversion has a bit of a learning curve; a WebDAV+Subversion setup might be easier for contributors if it is capable of transparently handling file moves and renames. A customized Web interface to SVN might be another option.
Increased participation from Opera and WebKit's quality assurance teams: they are already writing tests for internal use, but their efforts are totally disconnected from W3C, which means they don't benefit the Web community as a whole. Establishing collaboration at the CSS2.1 level would also carry over into CSS3 modules, where the availability of their tests would allow faster adoption of new CSS3 technologies.
Awareness of W3C test formats at Mozilla and sync-up between W3C and Mozilla's test repositories. (Mozilla contributors already write tests in reftest format.)
Best practices and examples for converting tests to self-describing reftests format. The reftest format allows script-driven testing for many (though not all) CSS features, which makes test result reporting much faster. Self-describing tests are easy to analyze by a human and can catch reftest false positives when both the test and its reference fail in the same way.
Reftestification: For historic reasons, most of the tests in the CSS2.1 test suite are self-describing tests that are not reftests. Converting the tests to reftest format would make it easy for browser teams to run the tests as part of their usual QA process, and to keep their results in the W3C database up-to-date. (Self-describing reftests are the official format for CSS3 tests.)
Automatic builds of the test suite, so that contributors can see their work in between snapshot publications.

Publishing Tests

The goal here is to present the tests for each module on an official W3C server in the formats needed to run on all CSS implementations; to have the tests indexed so that it is easy for a developer to find the tests he is interested in running; and to have a coverage report that shows which parts of the spec are tested and which tests are missing.

What we have:

A place to host tests.
An official test format that requires several bits of information:
- Links to the specific sections of the spec(s) being tested.
- The title of the test, intended for a table of contents.
- Optionally, a statement explaining what, specifically, the test is asserting if passed.
A set of build scripts that can pull both reftests and self-describing tests from multiple directories, index their metadata, convert them to XHTML and HTML output, and merge them into a single test suite publication.

What we're missing:

A more understandable human index, since the design of the current one is a little obscure.
Database-driven test searches.
An easy way to find the source file of a given generated test.
A publication system that can gracefully handle intersecting test suites. (For example, many, but not all, CSS2.1 tests are also CSS3 Backgrounds tests, and many, but not all CSS3 Backgrounds tests are also CSS2.1 tests.)

Reviewing Tests

The goal here is to make it straightforward and easy for reviewers to review and comment on the tests they're interested in reviewing and to keep track of the required next steps for each test.

What we have:

A review process and checklist.
Not much else, other than some experience showing that mailing lists and wikis are not efficient ways to track test reviews.

What we're missing:

A way to track and search by the status of submitted tests: whether they have been reviewed, need to be reviewed, have errors, need further work, are waiting for spec clarifications, etc. Right now I get questions like "I want to help review some tests, which tests should I review?" and I can't come up with an answer. It would also allow people to see which tests are unreliable, or wrong, and either improve them or skip them when evaluating an implementation.
An efficient way to submit and track review comments associated with a test.
- Mailing lists allow easy submission, but can't track. If the test author is not subscribed and engaged when the comment comes in, it just gets lost in the archives and may or may not get found the next time someone works with that test.
- Bug trackers allow easy tracking, but are difficult for commenters. For one thing, it's not clear how to map problems with a test into the bug report model. Some of that can maybe be fixed by heavily customizing the bug tracker's templates, but they're just not designed for a test-reviewing workflow. For example, we've noticed that the same comment often applies to an entire series of tests (or even to all tests submitted by a particular contributor), and a system with one comment form per test is very inefficient for dealing with that. On the flip side, filing a bug per issue rather than a bug per test makes it hard for a contributor sit down with a particular test, integrate all the feedback, and fix the test.
An automated way to catch machine-checkable errors. Many format violations are machine-checkable, and having those checked by machine would free up reviewers' time to focus on the aspects of the test that must be human-verified.

Ideally we want it to be:

Easy for anyone to submit comments on errors they find in the test suite.
Efficient for a dedicated reviewer to work straight through a list of related tests.
Straightforward for a test contributor to find all the comments that pertain to his tests, improve them as necessary, and report back the changes to his reviewers for confirmation.

What we're thinking:

Melinda Grant (HP), Eira Monstad, Peter Linss (HP / CSSWG Chair) and I spent some time working out the requirements and design for a system. One of the key ideas was to integrate with the Subversion repository so that changes there automatically show up in the test's record in the system. Another idea was to detect and cache the metadata in the tests to make it possible to search for tests on a particular topic.

We have a half-finished prototype of such a system, but it hasn't been getting much priority.

Collecting and Reporting Test Results

The goal for collecting is to make it easy and efficient to collect accurate test results for as many implementations as we can, both for the human-verified tests and the reftests.

The goal for reporting is to generate up-to-date reports from the test results that are useful and insightful for implementors, web designers, and W3C.

What we have:

A test harness that allows submission of test results for the human-verified tests by simply clicking the Pass/Fail/Skip buttons associated with a test shown on the screen. (It records the UA string and the result.) A subset of tests can be run and you can stop at any time, so it's not required to run thousands of tests before submitting any results.
Some rudimentary output capabilities in the test harness for reporting those results.

What we're missing:

Designs for useful reports that we could create from the test results, that would scale well to thousands of tests and myriad implementations. Peter Linss has created the bare minimum the CSSWG needs to prepare a spec for REC, but there is a lot of useful data in the databases that is currently not being useful to either browser developers or web designers.
Code that automatically creates those reports from the collected test results.
Easy and straightforward ways to generate and incorporate scripted test results into the harness result data.

Get Involved

The CSS Working Group's wiki has more information on CSS testing and how you can get involved.