A Touch of Class

f. rainne
2003-01-26

Why validate your pages?

"Valid" has a very specific meaning in SGML/XML, but I'm using it here to mean "syntactically correct" so I can group XML well-formedness with it.

In his latest blog entry, mpt links to an article where Mark Pilgrim argues that users will choose the most lenient RSS aggregator because to them that's the program that "works best". So, mpt argues, validation is a pointless excercise for the web and a waste of time for web page authors.

However, for a responsible web author, validating pages saves time. It makes sure you don't have to test your page in every significant browser. You don't have to worry about how different browsers interpret invalid markup because you won't have any. When your markup is valid, it also means any further complications you add to the page--DOM scripting, CSS--will work correctly and more predictably. There are still browser bugs to deal with of course, but you'll run into fewer discrepencies if your markup tree is correct to begin with. Then, too, the validator reports errors explicitly, making it easier to figure out what went wrong in a given page.

I think any software project manager would say it's a bad idea to rely on proprietary extensions unless you're absolutely sure you're sticking with that vendor for the life of the project. Otherwise, their slight convenience is not worth it; it just costs too much to retrofit the code if (when) you decide to switch.

Sloppy, invalid coding is another form of proprietary extension--one vendor might interpret your mistakes as you intended, but not all will. Some won't even accept your input. If all the files are internal, you can control what software is used to process them. Fine. Now open up those files to the world. You just lost all control over what software reads them.

Some people argue that the most lenient parser would set an industry standard, so coding to that software is good enough. But not all readers would behave the same even if all tried to imitate the quirks of a single leader (which won't necessarily be the case). Their code bases are different, so while the simplest cases may work the same, more complicated ones might not. You still have to test the page in multiple readers to make sure it works for your audience.

Even if you take a really simple page, with no CSS, no JS, nothing except pure HTML, there's no guarantee that it will turn out all right:

H1 header + paragraph text
Browser 1 Browser 2
Large-font heading, normal-font paragraph Large-font heading, large-font paragraph

There's nothing maliciously wrong with the code. I just forgot an end tag. (By the way, the second image is IE5. The first one is Mozilla.)

It's also not guaranteed that the market will sqeeze out software with stricter parsers. If enough pages are well-formed, an RSS reader with a clever user interface, a clever marketing team, and a picky XML parser could very well gain significant market share.

Moreover, the current market leader isn't necessarily going to hang on to the lion's share of the market. Apple's new Safari browser uses KHTML (Konqueror's layout engine), which doesn't have the level of quirks handling Mozilla has to make it compatible with the market leader. Judging by the Mac's browser stats, though, it probably won't have a difficult time capturing a significant share of that market.

Then there's the handheld market. The people working on Mozilla's layout engine have devoted lots of time and code to handling "quirks"--imitating how other browsers misinterpret correct code or handle incorrect code. Granted, some of them won't be applicable to handhelds anyway, but must such needless error handling be stuffed into every cell phone browser?

Validation really doesn't cost that much. I estimate less than five minutes per page, including code-fixing time, if one is doing it by hand. The LazyWeb principle mpt talks about at the beginning of his entry predicts that someone will write or has already written a "find all non-validating pages on my site" tool. If you already know the language you're coding, validating your site with that will probably take as much time per page as thinking up good titles for them. It's even possible to schedule a computer to periodically check for and report any problems.

As an author, one's main purpose is to write ideas so that others can read and understand. If ten years from now I'm sitting on a park bench with a pair of viewer glasses and a handheld mouse, I should be able to read an article you wrote today. If two days from now I find your pages with my favorite browser, a missing </table> tag shouldn't force me to choose between switching my browser or leaving your article unread. (It has happened before.) You just spent hours writing it. Will you spend two minutes making sure I can read it?

A society grows great when old men plant trees whose shade they know they shall never sit in. Greek proverb