HTML 5 Comments

Posted by Jack Hsu Tue, 07 Jul 2009 15:51:00 GMT

One change that I think most people are overlooking is the change to HTML comments in HTML 5. If you read the working draft of the HTML 5 spec, you’ll notice that previously valid comment markup may no longer be valid in HTML 5.

Here is definition from HTML 5:

Comments must start with the four character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS ().

Compared to HTML 4:

White space is not permitted between the markup declaration open delimiter(""). A common error is to include a string of hyphens ("—") within a comment. Authors should avoid putting two or more adjacent hyphens inside comments.

Information that appears between comments has no special meaning (e.g., character references are not interpreted as such).

Note that comments are markup.

The main difference is that having dash-dash (–) within a comment is longer acceptable. In HTML 4, you can nest any number of opening and closing delimeters (–), which can cause unwanted behavious for web authors who do not know the comment definition that well.

For example, take a look at this chunk of HTML:

<!-- bad comment -- -->
<p>Hello, World</p>
<!--p>Hide me!</p-->

 It seems to show the paragraph "Hello, World" and hide the second "Hide me!" paragraph. However, because the first comment contains two consecutive dashes within it’s text, the –> no longer closes the entire comment. Rather, it treats the second paragraph as part of the comment text, and once it parses , those two dashes finally closes the initial comment block.

In HTML 5, the first comment would be invalid markup. I don’t think any browsers implement this yet, and I’m not sure how invalid comment markup would be handled.

Note: I only saw this behaviour in Firefox, and not in IE nor Safari. It’s actually a bug in those browsers to not parse the comment tag properly.