Please upgrade your web browser now. Internet Explorer 6 is no longer supported.
Thinking Web Solutions?
We create smart, fun, functional websites that make your web a better place.

Guide to making SharePoint XHTML Compliant

This is intended as a pretty high level overview of how you can get MOSS to validate against the W3C XHTML 1.0 recommendation. The aim of this article is not to explain every intricate detail of getting a MOSS site to be XHTML compliant. However it should demonstrate some techniques to get people started and eventually develop better methods of achieving compliance.

Firstly it should be noted that this is really just for public facing publishing sites. In other words we're talking about using just the WCM features of MOSS. You are not going to get your fully featured Intranet to be conformant in the current releases of SharePoint. The reason for this is that SharePoint generates out a lot of non W3C validating code. This is largely due to the richness of many SharePoint features like web parts.
Some of the techniques mentioned in this article are equally applicable to standard ASP.NET sites. So if you're totally unfamiliar with XHTML validation it may be worth having a look at the Microsoft technical article on Building ASP.NET 2.0 Web Sites Using Web Standards. It is also likely that some of these techniques will be unsupported by Microsoft; so use at your own risk.

Web.Config

The first step is to add the conformance configuration setting to your web.config file. This will force certain asp.net controls to only output attributes that comply with XHTML standards.

<xhtmlConformance mode="Strict" />

This is only really important if you are trying to conform to the strict standard. If the setting is omitted the default output for ASP.NET controls is XHTML transitional. I prefer to add the tag in any way to make it explicit. Unfortunately many SharePoint controls are still going to output non-compliant tags.

Master Page Basics

The master page is where you are going to do the majority of the validation work. It really pays to start fresh with the Microsoft minimal master page. You are going to cause yourself a serious amount of pain if you try and tweak one of the out-of-the-box master pages. It is also best to start off using a page that is based on a blank page layout. This will help identify where any validation errors are actually coming from. The first thing to do with the master page is set the doctype to reflect the xhtmlconformance setting. For example if I'm using the strict standard my doctype will be:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >

You can also add a few XHTML attributes to the html tag:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

At this stage it is worth performing a W3C validation check to get a feel for the types of errors that need to be fixed.
My initial check yielded a horrifying 232 validation errors, with no content or navigation. Did I say this was going to be easy?
If you scroll through some of the errors you should notice that most of them are to do with invalid attributes. The majority of the errors come from only a few controls; namely the site actions menu and publishing console (referred to as Authoring Controls from here on).

Authoring Controls

These controls aren't as big a problem as they first appear to be. We only really care about W3C compliance for anonymous (public) accessing users. As the authoring console is security trimmed when a normal public user accesses the site the authoring html won't be rendered at all. You can prove this by enabling anonymous access, logging out and then trying to view the page anonymously. Make sure you have checked in and published the master page and layout page or you won't be able to see the changes. The page will only display the welcome control displaying the 'Sign In' message. A quick validation check should show only a dozen or so validation errors. About half of these errors will have something to do with the site actions control. Although it has been removed from the page visually it is still rendering some HTML. There is an easy to solution to this problem; the SPSecurityTrimmedControl. This control allows blocks of content to only render when the user has specified a specified permission set. By wrapping the site action control inside the security trimming control and setting a required permission, the site action control is prevented from rendering any HTML at all.

<SharePoint:SPSecurityTrimmedControl PermissionsString="AddAndCustomizePages" runat="server"> 
     <PublishingSiteAction:SiteActionMenu runat="server"/> </SharePoint:SPSecurityTrimmedControl>

Remove/clean non-compliant HTML

Upon making a closer examination of the remaining validation errors it becomes obvious that most of them are to do with poorly declared script tags. Unfortunately these script tags are usually generated by the HtmlForm control so it's not easy to override the output. The one technique that can be applied is to override the Page.Render method and do a bit of tag cleaning. This effectively lets us hijack the HTML rendering process and have a chance to add, modify or remove parts. It does involve writing a bit of inline code in the master page. Assuming that you have set your master page to allow inline code we can add some code similar to the following in the master page.

<script type="text/c#" runat="server">
protected override void Render(HtmlTextWriter writer)
{
  // extract all html
  System.IO.StringWriter str = new System.IO.StringWriter();
  HtmlTextWriter wrt = new HtmlTextWriter(str);

  // render html 
  base.Render(wrt); 
  wrt.Close(); 
  string html = str.ToString();

  // find all script tags
  Regex scriptRegex = new Regex("<script[^>]*");
  MatchCollection scriptMatches = scriptRegex.Matches(html);

  // go through matches in reverse
  for (int i = scriptMatches.Count - 1; i >= 0; i--)
  {

    // identify script tags with no type attribute 
    if (scriptMatches[i].ToString().IndexOf("type") < 0)
    { 
      // add type attribute after script opening tag
      html = html.Insert(scriptMatches[i].Index + 7, " type=\"text/javascript\"");
    }
  }

  // write the 'clean' html to the page
  writer.Write(html);
}

</script>

This code block uses some regex matching to find all script tags and add a type attribute to any tags that don't already have one. It is only a partial solution to the script tag problem, more code will need to be written to completely clean the tags.

Note that this code is crude and untested, but it should give you an idea of what can be done. It would be prudent to keep this kind of code to a bare minimum, so that performance is not affected. In other words you should only be using this method when you have no other option.

Meta Tags

You could extend the render code sample to remove all of your non-compliant code if you want. I would suggest that this is not the best way of doing things. Another technique is to replace standard SharePoint controls with your own custom built controls. A simple example of where this can be done is with the RobotsMetaTag. Using Lutz Roeder's Reflector I was able to extract the following (simplified) code for the RobotsMetaTag control:

public class RobotsMetaTag : SPControl
{
  protected override void Render(HtmlTextWriter output)
  {
    if (!SPControl.GetContextWeb(this.Context).ASPXPageIndexed)
    {
      output.Write("<META NAME=\"ROBOTS\" CONTENT=\"NOHTMLINDEX\"/>");
    }
  }
}

We can see that the META, NAME and CONTENT elements all violate the XHTML rule that stipulates all tags should be lower case. The offending line can be rewritten in a custom control as:

output.Write("<meta name=\"ROBOTS\" content=\"NOHTMLINDEX\"/>");

Now deploy the custom control to your SharePoint environment and add it to the master page in place of the RobotsMetaTag. Two more validation errors out of the way. I recommend using this approach wherever possible.

Content

Between the security trimming control, render code and custom control methods it is now possible to fix all validation errors and eventually get a conforming web page. Great, we have just made a virtually blank web page comply. The next step is to add all the content and navigation back in. To achieve total compliance it will be necessary to create a custom control for many of the components on each page. The trick here is to make these controls as re-usable as possible.
The majority of the content in your site will come from page layouts and thankfully most of the field controls output XHTML compliant code out-of-the-box. When you come across a control that doesn't comply, again, you will need to make your own.
When it comes to navigation you have a couple of options. Either create custom navigation controls from scratch, or extend the SharePoint AspMenu control, as the source code has now been released for this control.

Web parts

At first I thought web parts were going to be no trouble when it came to W3C validation. Most of the web parts I use have an XSL source editor so all that is involved is to make sure that the XML is transformed into valid HTML right? Wrong, unfortunately web parts such as the Data View Web Part use some surrounding tables for layout. This is intrinsically bad practice for web development and accessibility but worse is that the tables use all kinds of non-standard attributes. The only solution that I have found to this is to override the render method as explained earlier. I recommend keeping web part use to a minimum. Web part zones should definitely not be used; they introduce a large amount of layout tables and non compliant HTML.

Summary

Hopefully this gives you a bit of a starting point in getting a SharePoint WCM site XHTML compliant. It is definitely not a simple task, however if you are doing a lot of SharePoint development you are bound to come across a project that requires it (e.g. government sector work). I believe that Microsoft is quite aware of these compliance issues and they will hopefully be addressed in the next release if not a service pack. I plan to maintain this document over time as I discover better techniques for dealing with the various validation issues that SharePoint presents. Good luck in getting your site to validate, you can look forward to seeing the following message to reward your hard work.

xhtml-valid

24 comments for “Guide to making SharePoint XHTML Compliant”

  1. RE: Use a product developed by people that care about standards  5/31/2007

    I *hate* SharePoint's output. But I have no choice but to deal with it because it's management's decision. What other .Net-based CMS's would you recommend?

  2. Why SharePoint  5/31/2007

    Perhaps I can offer another perspective. To give you my quick background, I am a long time (10+ year) Web UI Designer and huge advocate of web standards, accessibility, css design etc. That being said, I have spent the last several years almost 100% dedicated to SharePoint UI design. While I am now a SharePoint MVP, I am still a web designer and still very much understand the frustrations of the general web community. I mean how can you build a web platform and not adhere to web design best practices right? So to surmise, I agree, SharePoint has a LOT of work to do under the hood to fix the codebase. I understand that the UI could be much more usable, that the code base could adhere to web standards and best practices making internet facing sites more friendly to search engines and with less bloat. I understand that the theme framework could be greatly improved and that the web part framework should output better code. With this I agree 100%. That being said, to offer some insight as to why people choose SharePoint, you have to think from enterprise business user perspective. The majority of the SharePoint user base is the Information (Business) Worker. What they can do out of the box and without code is absolutely astonishing. Enterprise customers with 100's, or 1000's of employee's get immediate productivity boosts and hugely reduced overhead in the administration department. The truth is the majority of the customer base is being served what they need to do their jobs and they really don't care what's under the hood. From Microsoft's perspective, the majority of their customer base is thrilled with the 2007 Office Line. All of that being said, we as designers and user experience experts have to keep advocating better usability, better design practices, cleaner code and all the benefits that come along with it. Microsoft does listen, but as designers we have to be patient and understand that as the huge minority in this race, we're probably not going to see results as fast as we would like. I am confident that over time however we will see Microsoft making strides in the web standards direction.

  3. Sunil Upadhyay  5/31/2007

    HI Zac, I am totaly aggry with you regarding the webpart since it rendes out table layout. Also if some one try to ovveride the render method the dom structure will get distrubed as detailed by the Andrew conell in one of his blog. I had understood till the masterpage creating and blank page layout which you have suggested. I want to know more about how to use the user control with the code behind in the sharepoint enviornment. I can create my user control out of the box and test it out, but how to integrate it with the sharepoint enviornment.i.e. where to copy the code (assembly) and where to copy the .ascx file . also i am suppose to do the safe control entry. It would be grate if you could provide me a detail idea of using user control directly in the pagelayout or master page. without any help of webpart. Thanks in Advance!

  4. share point  5/31/2007

    Hi Great Article, I'm new to Sharepoint and this is a great reference to have for our implementation and my planning

  5. Qamar  5/31/2007

    When we override the render method of master pages, then this interferes with post cache substitution. SharePoint simply loses the ability to substitude intelligently and simply dumps all substituted text at the top of the page (under the body tag). So unless you have a solution - stay away from overriding the render method. This is a headache as we want xhtml comliant script tags. Anyone know of a clean solution?

  6. Sharepoint and SEO  5/31/2007

    Thank you Zak for a very helpful article. One quick note on search engine optimization. Our firm performs both CMS integration and SEO services so we face this on a daily basis. Our whitepaper on SEO-CMS Best Practices isn't sharepoint specific, but it is applicable (http://www.nonlinear.ca/seo-cms). Finally - other .net CMS solutions we've worked with successfully include Ektron (www.ektron.com) and SiteCore (www.sitecore.com). SiteCore, in particular, plays nice with MOSS 2007. We use it as a front end to overcome MOSS limitations, including W3C accessibility and XHTML challenges.

  7. Sharepoint PublishingWebControls:RichHtmlField  5/31/2007

    The PublishingWebControls:RichHtmlField control does not generate xhtml compliant markup. So even if the template is xhtml compliant, the content created through the authoring UI creates non-compliant markup. The funny thing is that if you content makers submit correct xhtml compliant content, the sharepoint will make the markup non-compliant (removes quotes around class attributes. Does anybody know about a third party control that fixes this issue?

  8. Major obstacle with MOSS 2007  5/31/2007

    Just replace the PublishingWebControls:RichHTMLField control with one from teleric that generates correct xhtml. Problem is that when saving the markup, sharepoint keep messing with the markup (making the markup non-compliant). MOSS 2007 is just NOT ready for as a Web content publishing.

  9. Great Post!  5/31/2007

    Great Post, Zack! I'm currently working on a WCM project for a Fortune 25 company and Sharepoint definitely isn't Xhtml friendly! We are an interactive firm and we live by Xhtml standards and design, and Sharepoint definitely breaks the mold!

  10. RE:Sharepoint PublishingWebControls:RichHtmlField  5/31/2007

    The Teleric controls are very good, you could also set about making your own field controls. There is an msdn article that describes this process in quite a lot of detail: http://msdn2.microsoft.com/en-us/library/aa981226.aspx Zac

  11. XHTML compliance is a wothy goal, but lofty  5/31/2007

    I'm glad you attempted to tackle this Zac. I too have spent long hours fighting with SharePoint's poor markup. Nevertheless, I keep at it. One thing you forgot to mention was the CAML code. This too needs to be torn apart in order for completely compliant code to be rendered. This can also be quite dangerous if you don't know what you are doing considering CAML has inherit logic built in to how it gets rendered on the page. Also, working with Publishing Sites is one thing, but there is code generated in Wiki and Blog sites that will be nearly impossible to overcome the "table" oriented methods of building HTML. You touched upon this when you mentioned the non-compliant tables generated in the XML but this is further multiplied when working with Wikis and Blogs as well. I just thought I'd offer my $0.02. Chris Arella

  12. Does this work with WSS 3.0?  5/31/2007

    Hi Zac Is it possible to get a valid output for WSS 3.0 sites and not full MOS? Can you provide me with a consultancy service?

  13. DocType is causing problems on my site for menus!  5/31/2007

    I've followed some of the advice above and have started rebuilding my publishing site. In the master page I have added a doctype definition and that works great. However in non-IE browsers (FF, Mozilla etc) some of the Javascript menus stop working as expected. When a pop up menu appears (eg: the Site Actions menu) it is positioned at the top of the browser. Only the X position of the menu is correct, the Y position is no longer at the position of the Site Actions button. This affects other pop up menus, but not the AspMenu control. Take the doctype out and the menus work correctly! IE is not affected by this problem at all, just other browsers. Is this a known issue?

  14. RE:Does this work with WSS 3.0?  5/31/2007

    Many of the same principles should apply equally to WSS, in the end this is the foundation for MOSS. If you are after consultation please contact my company: http://www.provoke.co.nz/

  15. RE:DocType is causing problems on my site for menus!  5/31/2007

    I have seen problems with the SiteActions and Welcome control. if you are using overriding the render method, try removing that code to see if it makes a diff. It could just be some kind of CSS compatibility - something I can't help you with sorry - im a developer, not really a designer.

  16. Sharepoint SEO friendly?  5/31/2007

    Hi Zac, Do you know of any resources for learning how to optimize Sharepoint public website pages for search engines. Can Sharepoint even do it? Regards, Paul http://www.wsibiz.co.nz

  17. RE: The doctype issue  5/31/2007

    There must be some javascript or css links that you have not put on your master page. Look at the default.master to work out what css/js links you might need for all the SharePoint controls to work correctly. I can confirm that it is possible to get these controls working in non-IE browsers. They are not going to be xhtml compliant however, id expect you'd either replace or remove the controls for public facing versions of the site.

  18. RE:Sharepoint SEO friendly?  5/31/2007

    Im not really sure what you consider SEO to be, but I see no reason why most standard "SEO" techniques cannot be applied to a SharePoint (WCM) site. Making your site XHTML valid is a good first step. I have shown in my post how you can override the functionality of the robotsmetatag control and this can be extended to add your own meta tags in. It would be easy enough to create some kind of index page for crawling and you can always include the robots.txt on your server.

  19. RE: Sharepoint SEO friendly?  5/31/2007

    On making SharePoint "SEO friendly". I personally don't think that making any web site SEO friendly has got anything to do with what platform it's running under. I have built many web site based on various CMS systems. The idea is that when an author creates a page the correct and applicable meta data is captured and then written back to the page source at render time. Ideally your HTML schemantic markup should be clean and have the following from the start. Then the rest of the HTML stuff. What typically happens from my experience is totally irrelevant code being written at the top. SharePoint can and does output correct markup that is why it is one of the best WCM systems in the market. The issue is lack of understanding by development shops who go in for the kill and not do any proper investigation on how to do it properly. Chandima [ http://www.chandima.net/blog ]

  20. RE:Sharepoint SEO friendly?  5/31/2007

    Just on the metadata note, it is easy enough to make a web control that can pull through the fields of your content types/page layouts as metadata tags. The Page content type has things like title, description, author by default. Such a control would be put in your master page.

  21. Use a product developed by people that care about standards  5/31/2007

    Does it strike anyone as odd that you have to jump through all of these hoops and engage in some herculean struggle with SharePoint to make it output something sensible? Why on earth would I want to choose a product that has such a flagrant disregard for standards? Anyone? Anyone? Bueller?

  22. RE: Use a product developed by people that care about standards  5/31/2007

    I have to agree with the above comment. This sounds like a lot of work, extra time and in some cases an extra cost to make a master page compliant. Yet this does not even solve the problems and rework you will face when you start adding content and controls to your once compliant and valid master page. I am very interested in hearing your reasons why you would choose SharePoint as your main CMS when building a standards compliant website. There are many .net based CMS’s out there that are compliant straight out the box.

  23. Alan Coulter  11/11/2009

    It appears that even Telerik can't fix the XHTML problem. After injecting valid xhtml code into a placeholder (or referencing valid Reusable HTML), at render time the double quotes are stripped from the class statements. The link http://social.msdn.microsoft.com/Forums/en/sharepointecm/thread/b9ddac19-f4e2-422e-a8ca-7c1baab37f2f suggests that Telerik renders it correctly, but MOSS has the final say in the output and decides to strip out the double quotes. I even had the situation where MOSS had updated my Reusable content to use upper case tags (thus making it non-compliant again).

  24. SharePoint Development  7/29/2010

    Great article! I believe that Microsoft is quite aware of these compliance issues and they will hopefully be addressed in the next release if not a service pack...

Post a comment