SharePoint Search–Excluding Headers, Footers, and Other Content from Search Results

I had a site about to go public built on SharePoint 2010 and using the built in SharePoint search engine.  It wasn’t designed by me but I was responsible for setting up the infrastructure and service applications.  During the feedback one of the comments was that there was a lot of “noise” in the search results.  In reviewing the site I could see what they meant.  The designers had custom built a mega-menu dropdown with verbiage, and the verbiage in the menus was being indexed by SharePoint search.  In addition the footer for each page had the usual common verbiage for things like “contact us” and “locations” which made them pop on the result set for these terms. So what is one to do when we don’t want the chrome or branding of a site to interfere with the content indexing?

One way we can alleviate this problem is through the use of the “noindex” class in the rendered HTML of a page.  Because we want SharePoint to ignore the branding and navigation used on almost every page and only focus on the content, by adding this class value to the tags in the HTML, the crawler understands not to index the content of those tags and focus only on the terms that appear within the content sections of the pages.

SharePoint 2010’s iFilter excludes content inside of a <div class=”noindex”> tag.  By adding this class to the divs that we don’t want to have indexed, the search engine will exclude them. To include the noindex class, simply find the class that shouldn’t have it’s content indexed.  This is typically a header, footer, RSS or news webpart, or various navigation elements.  There are two ways of doing this.

1. You can “bookend” the content that you don’t want to have indexed with a div.  This is probably the easiest method as we just throw a <div class=”noindex”> at the top and a </div> at the bottom.  Especially useful when dealing with classic ASP sites where headers and footers are #included in the page templates… just open the header.asp and put them in.  Note however, that in certain cases the nested divs cause problems…

<div class="noindex">
  <table>
    <tr><td>
      <a href="http://iedaddy.com">Home Page</a>
    </td></tr>
  </table>
</div>

2. As referenced above, nested divs can cause problems, so in method 2 one would just add the noindex class to existing div classes as follows:

<div class="Footer noindex"> 
     <div class="copyright noindex"> Copyright 2010 © Company </div> | 
     <div class="ContactUs noindex"><a href="/sitepages/ContactUs.aspx">Contact Us</a></div>
</div>

In this way we can tell the SharePoint search iFilter that the content contained in the divs can be safely ignored for the purposes of indexing the content and this will remove much of the noise caused by indexing the branding and navigation elements.

EDIT: it was brought to my attention that we have a third/forth way of hiding content from the Search in webparts that we don’t want to have rendered during a crawl, which is either:

A: Create a special SharePoint control to wrap around what we don’t want rendered through a code class:

[ParseChildren(false), PersistChildren(true)]
public class SearchCrawlExclusionControl : WebControl
{
private string userAgentToExclude;
public string UserAgentToExclude
{
get
{
return (string.IsNullOrEmpty(userAgentToExclude)) ? "ms search" : userAgentToExclude;
}
set
{
userAgentToExclude = value;
}
}
protected override void CreateChildControls()
{
string userAgent = this.Context.Request.UserAgent;
this.Visible = (!string.IsNullOrEmpty(userAgent)) ? !userAgent.ToLower().Contains(UserAgentToExclude) : true;
base.CreateChildControls();
}
}

 

After adding the register tag to the page layout, we can wrap all the content we want to exclude with our control:

<SearchUtil:SearchCrawlExclusionControl ID="SearchCrawlExclusionControl1" runat="server">

<div>Some Content To Exclude</div>

</SearchUtil:SearchCrawlExclusionControl>

B: Write code directly in a webpart that you don’t want to have indexed during a crawl:

protected override void CreateChildControls() 
{ 
    string userAgent = this.Context.Request.UserAgent; 
 
    if (userAgent.ToLower().Contains("ms search")) 
    { 
        this.Controls.Add(new LiteralControl("This WebPart is not allowed to be crawled");
        return; 
    } 
 
    ... <normal web part code here> 
} 

Of course, using A and B, since the code is not rendered to the page, any hyperlinks you have in the webpart or content will not be crawled as Search will not know about them.

How To Hide Left Display Panel in SharePoint 2010 and 2007

The left navigation bar makes the page looks very “SharePoint”. And, some valuable space is wasted under it. If you don’t want to customize the Master page (or want to keep the left nav panel in the rest of the site), you can just hide it on the home page.  There is no way out of the box to hide the Quick Launch and the Tree View navigation in SharePoint 2010 or 2007

The easiest way is to use the old school CEWP trick that worked even back in 2003, just add a the Web Part that now has the name Content Editor and add the following text in the HTML Source mode:

<style type="text/css">

/*--Hide Quick Launch for 2007 -–*/

.ms-navframe, .leftNavSpacer { display:none; }

/*--Hide Quick Launch for 2010 -–*/

#s4-leftpanel{display:none}

.s4-ca{margin-left:0px}

</style>

The Master page that includes this navigation control does not longer have a table structure as it had in SharePoint 2007, which gives a cleaner and a well-formed markup HTML. So with the code above, I am including both CSS styles so that no matter which masterpage you use, the page will still hide the left nav.

In SharePoint 2010, the DIV elements in the master are floated to each other and have their positions defined by margins or a width, this is why I had to set the margin to zero for the s4-ca class, the wrapper for the content in the page. The value for the left panel ID can be set to not be displayed.

clip_image001