This is intended as a pretty high level overview of how you can
get MOSS to validate against the W3C XHTML 1.0 recommendation. The
aim of this article is not to explain every intricate
detail of getting a MOSS site to be XHTML compliant. However it
should demonstrate some techniques to get people started and
eventually develop better methods of achieving compliance.
Firstly it should be noted that this is really just for public
facing publishing sites. In other words we're talking about using
just the WCM features of MOSS. You are not going to get your fully
featured Intranet to be conformant in the current releases of
SharePoint. The reason for this is that SharePoint generates out a
lot of non W3C validating code. This is largely due to the richness
of many SharePoint features like web parts.
Some of the techniques mentioned in this article are equally
applicable to standard ASP.NET sites. So if you're totally
unfamiliar with XHTML validation it may be worth having a look at
the Microsoft technical article on Building
ASP.NET 2.0 Web Sites Using Web Standards. It is also likely
that some of these techniques will be unsupported by Microsoft; so
use at your own risk.
Web.Config
The first step is to add the conformance configuration setting
to your web.config file. This will force certain asp.net controls
to only output attributes that comply with XHTML standards.
<xhtmlConformance mode="Strict" />
This is only really important if you are trying to conform to
the strict standard. If the setting is omitted the default output
for ASP.NET controls is XHTML transitional. I prefer to add the tag
in any way to make it explicit. Unfortunately many SharePoint
controls are still going to output non-compliant tags.
Master Page Basics
The master page is where you are going to do the majority of the
validation work. It really pays to start fresh with the Microsoft
minimal master page. You are going to cause yourself a serious
amount of pain if you try and tweak one of the out-of-the-box
master pages. It is also best to start off using a page that is
based on a blank page layout. This will help identify where any
validation errors are actually coming from. The first thing to do
with the master page is set the doctype to reflect the
xhtmlconformance setting. For example if I'm using the strict
standard my doctype will be:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >
You can also add a few XHTML attributes to the html tag:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
At this stage it is worth performing a W3C validation check to get a
feel for the types of errors that need to be fixed.
My initial check yielded a horrifying 232 validation errors, with
no content or navigation. Did I say this was going to be
easy?
If you scroll through some of the errors you should notice that
most of them are to do with invalid attributes. The majority of the
errors come from only a few controls; namely the site actions menu
and publishing console (referred to as Authoring Controls from here
on).
Authoring Controls
These controls aren't as big a problem as they first appear to
be. We only really care about W3C compliance for anonymous (public)
accessing users. As the authoring console is security trimmed when
a normal public user accesses the site the authoring html won't be
rendered at all. You can prove this by
enabling anonymous access, logging out and then trying to view
the page anonymously. Make sure you have checked in and published
the master page and layout page or you won't be able to see the
changes. The page will only display the welcome control displaying
the 'Sign In' message. A quick validation check should show only a
dozen or so validation errors. About half of these errors will have
something to do with the site actions control. Although it has been
removed from the page visually it is still rendering some HTML.
There is an easy to solution to this problem; the
SPSecurityTrimmedControl. This control allows blocks of content to
only render when the user has specified a specified permission set.
By wrapping the site action control inside the security trimming
control and setting a required permission, the site action control
is prevented from rendering any HTML at all.
<SharePoint:SPSecurityTrimmedControl PermissionsString="AddAndCustomizePages" runat="server">
<PublishingSiteAction:SiteActionMenu runat="server"/> </SharePoint:SPSecurityTrimmedControl>
Remove/clean non-compliant HTML
Upon making a closer examination of the remaining validation
errors it becomes obvious that most of them are to do with poorly
declared script tags. Unfortunately these script tags are usually
generated by the HtmlForm control so it's not easy to override the
output. The one technique that can be applied is to override the
Page.Render method and do a bit of tag cleaning. This effectively
lets us hijack the HTML rendering process and have a chance to add,
modify or remove parts. It does involve writing a bit of inline
code in the master page. Assuming that you have set your master
page to allow inline code we can add some code similar to the
following in the master page.
<script type="text/c#" runat="server">
protected override void Render(HtmlTextWriter writer)
{
// extract all html
System.IO.StringWriter str = new System.IO.StringWriter();
HtmlTextWriter wrt = new HtmlTextWriter(str);
// render html
base.Render(wrt);
wrt.Close();
string html = str.ToString();
// find all script tags
Regex scriptRegex = new Regex("<script[^>]*");
MatchCollection scriptMatches = scriptRegex.Matches(html);
// go through matches in reverse
for (int i = scriptMatches.Count - 1; i >= 0; i--)
{
// identify script tags with no type attribute
if (scriptMatches[i].ToString().IndexOf("type") < 0)
{
// add type attribute after script opening tag
html = html.Insert(scriptMatches[i].Index + 7, " type=\"text/javascript\"");
}
}
// write the 'clean' html to the page
writer.Write(html);
}
</script>
This code block uses some regex matching to find all script tags
and add a type attribute to any tags that don't already have one.
It is only a partial solution to the script tag problem, more code
will need to be written to completely clean the tags.
Note that this code is crude and untested, but it should give
you an idea of what can be done. It would be prudent to keep this
kind of code to a bare minimum, so that performance is not
affected. In other words you should only be using this method when
you have no other option.
Meta Tags
You could extend the render code sample to remove all
of your non-compliant code if you want. I would suggest that this
is not the best way of doing things. Another technique is to
replace standard SharePoint controls with your own custom built
controls. A simple example of where this can be done is with the
RobotsMetaTag. Using Lutz Roeder's
Reflector I was able to extract the following (simplified) code
for the RobotsMetaTag control:
public class RobotsMetaTag : SPControl
{
protected override void Render(HtmlTextWriter output)
{
if (!SPControl.GetContextWeb(this.Context).ASPXPageIndexed)
{
output.Write("<META NAME=\"ROBOTS\" CONTENT=\"NOHTMLINDEX\"/>");
}
}
}
We can see that the META, NAME and CONTENT elements all violate
the XHTML rule that stipulates all tags should be lower case. The
offending line can be rewritten in a custom control as:
output.Write("<meta name=\"ROBOTS\" content=\"NOHTMLINDEX\"/>");
Now deploy the custom control to your SharePoint environment and
add it to the master page in place of the RobotsMetaTag. Two more
validation errors out of the way. I recommend using this approach
wherever possible.
Content
Between the security trimming control, render code and custom
control methods it is now possible to fix all validation errors and
eventually get a conforming web page. Great, we have just made a
virtually blank web page comply. The next step is to add all the
content and navigation back in. To achieve total compliance it will
be necessary to create a custom control for many of the components
on each page. The trick here is to make these controls as re-usable
as possible.
The majority of the content in your site will come from page
layouts and thankfully most of the field controls output XHTML
compliant code out-of-the-box. When you come across a control that
doesn't comply, again, you will need to make
your own.
When it comes to navigation you have a couple of options. Either
create custom navigation controls from scratch, or extend the
SharePoint AspMenu control, as the source code has now been
released for this control.
Web parts
At first I thought web parts were going to be no trouble when it
came to W3C validation. Most of the web parts I use have an XSL
source editor so all that is involved is to make sure that the XML
is transformed into valid HTML right? Wrong, unfortunately web
parts such as the Data View Web Part use some surrounding tables
for layout. This is intrinsically bad practice for web development
and accessibility but worse is that the tables use all kinds of
non-standard attributes. The only solution that I have found to
this is to override the render method as explained earlier. I
recommend keeping web part use to a minimum. Web part zones should
definitely not be used; they introduce a large amount of layout
tables and non compliant HTML.
Summary
Hopefully this gives you a bit of a starting point in getting a
SharePoint WCM site XHTML compliant. It is definitely not a simple
task, however if you are doing a lot of SharePoint development you
are bound to come across a project that requires it (e.g.
government sector work). I believe that Microsoft is quite aware of
these compliance issues and they will hopefully be addressed in the
next release if not a service pack. I plan to maintain this
document over time as I discover better techniques for dealing with
the various validation issues that SharePoint presents. Good luck
in getting your site to validate, you can look forward to seeing
the following message to reward your hard work.
