Summary

XHTML should be delivered as application/xhtml+xml. Most modern browsers, with the exception of Internet Explorer 6, support the MIME type application/xhtml+xml. This article demonstrates how to use content negotiation to deliver application/xhtml+xml to user agents that support that MIME type, and text/html to the rest.

Author: Gez Lemon

Contents

The Internet Engineering Task Force (IETF)

The Internet Engineering Task Force (IETF) is a large open international community, responsible for the evolution of the Internet architecture, such as TCP/IP. The standards they produce are expressed as Request For Comments (RFC), which are the result of committees, and reviews by interested parties. RFC 1521 and RFC 1522 specify the types and subtypes for Multipurpose Internet Mail Extensions (MIME).

MIME Types

The original Internet e-mail protocol only supported ASCII text. MIME is an extension of the e-mail protocol, to allow other types of data to be exchanged over the Internet, such as video, images, and applications. When you access a document over the Internet, the HTML document, images, style sheets, and any objects all have an associated MIME type. Web servers insert the MIME information into the HTTP headers on each transmission. Web clients, such as a browser, use this information to determine how to handle the data. For example, a MIME type of image/gif, informs the user agent to handle the data as an image.

The Internet Assigned Numbers Authority (IANA) maintains a list of registered MIME Media types. MIME types are specified in two parts. The top-level media type declares the general type of media, and the subtype defines the specific format for that media. The two are separated by a forward slash, top-level/subtype. For example, image/gif has a top-level media type of image, and the specific format is GIF. There are five discrete top-level media types; text, image, audio, video, and application. There are also two composite media types; multipart, and message. Experimental, or unofficial MIME types are denoted with a subtype that starts with, "x-". For example, a MIME type of application/x-shockwave-flash is an unofficial MIME type that instructs user agents that recognise the MIME type to use a Flash Player to handle the data.

Visual browsers have the ability to handle a range of standard MIME types, such as HTML (text/html), JPEG (image/jpeg), and GIF (image/gif). XHTML is the reformulation of HTML, as an XML application. As such, XHTML documents must conform to the rules of XML, and be well-formed. Well-formed means there can be no overlapping elements, all elements must contain a closing tag, attributes must be quoted and given a value, and the case of the characters is important.

User agents that handle text/html, do so in a forgiving manner. For example, every major user agent would display the following, despite the fact that it's malformed if it was intended to be XHTML.

<input type="checkbox" name="option1" checked>

XHTML MIME Types

Two RFCs were published for handling XML documents; RFC 3023 (XML Media Types) and RFC 3236 (application/xhtml+xml). This resulted in four possibilities for specifying a MIME type for XHTML documents; application/xhtml+xml, application/xml, text/xml, and text/html. Care should be taken when serving as text/xml as the character set rules for text/* are more complex than application/*, and you may get unexpected results. The MIME type application/xml is a generic media type for any XML document. As such, it is plausible to serve an XHTML document with this MIME type. Generic XML processors may not necessarily recognise the document as an XHTML document, and may not render the content how you intended. The text/html MIME type (RFC 2854) is intended for HTML, and is not suitable for XHTML. When an XHTML document is served as text/html, the user agent will not process it as XML.

The preferred MIME type to use with XHTML documents is application/xhtml+xml. When served with this MIME type, XHTML compliant user agents must ensure the document is well-formed, complies with the rules of XML. For example, if you serve the above code to Mozilla using application/xhtml+xml, the page will not display as it isn't well-formed.

MIME Types and User Agents

So that's that. All XHTML files should be served with a MIME type of application/xhtml+xml, and everyone's happy. Well, that's not quite the whole story. Unfortunately, some browsers do not understand the application/xhtml+xml MIME type. Internet Explorer 6, the most widely used browser at the time of writing, falls into this category. If you serve an XHTML document with a MIME type of application/xhtml+xml, Internet Explorer will prompt you to download the file, because it doesn't know how to handle the file. That's quite a serious issue, and one that stops many developers using the correct MIME type.

However, other browsers such as Netscape, Mozilla, and Opera do understand the MIME type, and are able to handle the document correctly. Compatibility issues usually improve over time, but with the announcement that Microsoft no longer intends to provide free stand-alone versions of Internet Explorer, this particular compatibility issue may be with us for a long time yet.

Content Negotiation

A solution to the compatibility issue is to use content negotiation to serve application/xhtml+xml to user agents that understand that MIME type, and text/html to other user agents. When a user agent requests a document from the server, it sends an Accept HTTP header, containing the various MIME types it supports, and how well it understands the MIME type using a quality parameter. The server may be configured to reply with a version of the resource that is most suitable for the particular user agent. Whilst XHTML 1.0 may be served as text/html, it should be served as application/xhtml+xml to user agents that understand it.

Apache have a document, explaining how to configure content negotiation on an Apache HTTP server. Some user agents send incomplete Accept headers, making it difficult to determine which version to serve. To cater for this, it's sensible to lower the quality of source parameter (qs) a little for application/xhtml+xml, to make sure that text/html is the preferred MIME type when using the AddType directive with Apache.

AddType application/xhtml+xml;qs=0.8

Setting the MIME Type with Code

It is not sufficient to try and set the content type through the meta element in the head of the document. User agents receive the MIME type from HTTP headers set on the server. If for whatever reason you're unable to configure the server for content negotiation, you will have to resort to scripting to determine the MIME type to serve the document. The principle is the same as above. You read the HTTP Accept header, and set the MIME type depending on the capabilities of the user agent.

The following is typical of what may be specified in the HTTP Accept header.

text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, video/x-mng, image/png, image/jpeg, image/gif;q=0.2, text/css, */*;q=0.1

The quality parameter (q) indicates how well the user agent handles the MIME type. A value of 1 indicates the MIME type is understood perfectly, and a value of 0 indicates the MIME type isn't understood at all. The reason the image/gif MIME type contains a quality parameter of 0.2, is to indicate that PNG is preferred over GIF if the server is using content negotiation to deliver either a PNG or a GIF to user agents. Similarly, the text/html quality parameter has been lowered a little, to ensure that the XML MIME types are given in preference if content negotiation is being used to serve an XHTML document.

Setting the MIME Type with PHP

In PHP, the MIME type is set through the header function. The $_SERVER array contains the server variables, allowing us to interrogate the Accept HTTP header.

header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml")) 
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");

Setting the MIME Type with ASP

In ASP, the content type and the charset are specified separately through the Response object. The ServerVariables collection allows us to interrogate the Accept HTTP header.

If InStr(Request.ServerVariables("HTTP_ACCEPT"), "application/xhtml+xml") > 0 Then
    Response.ContentType = "application/xhtml+xml"
Else
    Response.ContentType = "text/html"
End If

Response.Charset = "utf-8"

Setting the MIME Type with PERL

In PERL, the MIME type is set by writing it out at the start of the page, before any other content. The $ENV hash array contains the environmental variables, allowing us to interrogate the Accept HTTP header.

if ($ENV{'HTTP_ACCEPT'} =~ /application\/xhtml\+xml/)
{
    print "content-type:application/xhtml+xml; charset=utf-8\n\n";
}
else
{
    print "content-type:text/html; charset=utf-8\n\n";
}

Conclusion

According to W3C Guidelines, XHTML 1.1 should not be served with a MIME type of text/html. "Should not" is not as serious as "must not", so for the time being, many content developers are overlooking this particular recommendation. It is still clear that XHTML 1.1 should be served with a MIME type of application/xhtml+xml. The techniques outlined above can easily be extended to serve text/html and a DOCTYPE of HTML 4.01 Strict to user agents that don't understand application/xhtml+xml, and application/xhtml+xml and a DOCTYPE of XHTML 1.0 Strict to those that do. This page is served as HTML to user agents that don't understand application/xhtml+xml, and XHTML to those that do. See Tommy Olsson's article on content negotiation for a detailed explanation on how to do this with PHP.

Category: Web Standards.

Comments

  1. [content-negotiation.php#comment1]

    Googeling around I came over your site.
    Impressive, as far as MIME history etc are concerned. But I miss discussion of the extension problem as far as helper applications with hardcoded extensions are concerned. It is easy with Opera, lynx and Netscape up to Netscape 7, but I dont know any solution with Mozilla and the like, including Netscape 8.

    Im Opera etc, I get the extension, that is in mime.types (or similar menus), also in the browser cache, in Mozilla etc I get this extension on the desktop, where I dont need it (save as..), but not in the browser cache, where applications see it. For example bla.php, that applications, that come with hardcoded extensions, would not understand.

    Not a XML problem, but MIME,

    anyway, best,

    H.

    Posted by Heiko Recktenwald on

Comments are closed for this entry.