SharePoint 2010 | Crawl Not Starting Due To Missing Search Temp Directories

I was working on a search server recently and started getting errors that the crawl component was failing to “CreateTempFolderForCacheFiles”

As it turns out, the environment I was working on was an extremely secure farm where permissions were locked down and often shares were not permitted and accounts were allowed the minimum permissions they needed in order to run.  In  this case, the local temporary folder where the index files are created had been deleted, and the search service did not have permissions to recreate the folder.  This was blocking the crawls from proceeding and they were just sitting there.

In order to fix the issue, the folders need to be recreated.  What is nice is that using PowerShell you can quickly recreate the folders in the correct location using the following script:

$app = Get-SPEnterpriseSearchServiceApplication "<my SSA>"
$crawlComponents = Get-SPEnterpriseSearchCrawlComponent -CrawlTopology $app.CrawlTopologies.ActiveTopology | where {$_.ServerName.ToLower().Equals($Env:COMPUTERNAME.ToLower()) }
foreach ($component in $crawlComponents)
{
    $path = $component.IndexLocation + "\" + $component.Name
    if ( Test-Path $path -pathType container)
    {
        Write-Host "Directory " $path " already exists"
    }
    Else
    {
        Write-Host "Creating Directory: " $path
        New-Item $path -ItemType directory | write-output
    }
}

FAST Search | Index Server Runs Out Of Disk Space During Crawl

I’m not an expert with FAST, I just have to deal with it.  This is a fun little thing to have happened recently.  SharePoint adoption has been going really well.  More people are using it, more people are adding content, more content is being indexed, more space is being used.

The drive that we installed FASTsearch on is fairly small for drives these days, roughly 136GB of free disk space.  This particular company also has a policy of 80% utilization of drives before an alert goes off to go look at the server disk utilization and reduce it.  As I know from getting these alerts, when FAST is doing an index, there are times where the %FASTSEARCH%\tmp and the %FASTSEARCH%\data\data_index directories get pretty full.  Like an extra 60GB worth of full.  This is enough that along with the other items on the drive it tips past 80% utilization and I get the email alert.  This is because FAST Search Server keeps a read-only binary index file set to serve queries while building the next index file set. The worst-case disk space usage for index data is approximately 2.5 times the size of a single index file set.

This generally happens at night and by morning all the indexing is done and the drives have plenty of space in them.  It’s not really worth ordering another drive at this point as it is a temporary condition and doesn’t significantly affect performance, but would be nice to minimize the number of alerts I get by reconfiguring FAST to use a different drive for temporary files and even for storing index data.

As it turns out, FAST is not as easily configurable as say, Windows when you want to move the temp directory, or even SharePoint when you want to move the ULS and usage logs.  A quick Bing search did not turn up any useful articles about how to reconfigure how FAST utilizes directories, except for one example which suggested editing some of the XML – bad idea because you are in an unsupported configuration and possible possible upgrade issues in the future.  Turns out the resolution for this was actually relatively simple though thanks to the use of junction points.

In my case, we had just purchased a rather large HDD of 300GB in which to house the ULS and usage logs because Microsoft Best Practices in regards to SharePoint says keep the log files and the SharePoint binaries on separate drives if possible, and 300GB was the smallest standard the company supported when we ordered them.  This means I had a lot of free space out there on the new drive, if only I could get FAST to utilize it.

In my case, I was able to use KB2506015 to reconfigure the directories with junction points, stay in a supported mode, and utilize the extra space.

If additional storage can be added to the server, the entire %FASTSEARCH%\data directory can be moved to a new location with the same permissions ("Full Control", granted to the FASTSearchAdministrators local group), and connected back to the installation via a junction point. To do so, follow the steps below on each FAST index server:

  1. Stop the FAST Search for SharePoint service.
  2. Stop the FAST Search for SharePoint Monitoring service.
  3. Move %FASTSEARCH%\data to the larger storage you have added.
  4. Run the following in a command prompt: mklink /j %FASTSEARCH%\data %NEW_LOCATION%\data
  5. Start the FAST Search for SharePoint Service.

Please note that while other methods of relocating the index outside of the %FASTSEARCH% parent directory are not supported, the entire parent directory can also be moved to a new physical location without using a junction point (skipping step #4 above) if the drive letter, path, and permissions remain identical.

One I started back up the FAST Search for SharePoint Service everything came up perfect and I could perform a crawl without issues.

Social Intelligence and Stubenville

So, just a quick time out from my usual techie posts to ramble for a bit. The Internet has gotten huge these days and once the information is out there it’s almost impossible to put the horses back in the barn. Case in point and what people are talking about right now is the verdict that just came down in the Steubenville, Ohio rape case.

Judge Thomas Lipps issued a cautionary note to children and parents, urging them to reconsider “how you record things on the social media so prevalent today.”

In today’s world where twitter feeds are archived by third parties and people who know people post their own messages of support, no matter how well intentioned, things will never be kept secret and this girl now has to live with the repercussions of what happened to her, no chance to put it behind her, unlikely that she can keep any sort of privacy.

In my own life, I really try to keep my Internet persona and my real name as separate as possible. People who know me or that I want to reach out to know my alter ego, but hopefully a random google search of my name or a couple of key facts about me don’t lead to a huge Internet trail of photos, tweets, and posting made two or three years in the past (personally, I’ve found a couple of my old posts from 15 years ago still out there).

So back to Steubenville. I read an article in the news, and quite frankly was a bit skeptical. So as I sat there on my tablet, I did a quick google search. That popped up a little more info, which then led to a little more. Long story short, within the space of about 10 minutes I knew real names, friends, random posts showed up two or three years old, even pictures of the victim from happier times. I was stunned at how much was out there. Even with all her online accounts closed, the wealth of information that is indelibly marked out there is heart breaking. Bad things happen to good people and sometimes being paranoid about protecting your name isn’t just being paranoid. Taking these lessons to heart, rules for my kids using social media:

  1. Never use your real name as your twitter handle
  2. Never use your real name on Facebook
  3. If you post or blog about your friends or family, use first names or code names only (one friend posts about her daughter as PT…. Pink Tornado)
  4. No “Internet only” friends, you should have already met people that you friend online
  5. If people tag you in photos with your real name, ask them to retag you with your more anonymous Internet handle. People who know you will still enjoy the pic, but random strangers won’t
  6. Be paranoid, always.

In today’s connected world where posts, pics and even thanks to apps like foursquare, your location can easily be tracked, I think the it takes the whole “stranger-danger” lessons from when I was a kid to a whole new level for the next generation.

SharePoint 2010–Best Practices for Alternate Access Mappings (AAM)

Have I done a soap-box post recently?  Yup, this one. 

There is an awful lot of misunderstanding about the concept of Alternate Access Mappings (AAM) in the SharePoint world.  It seems like every SharePoint consultant I talk to has a different opinion on why and how the AAM setting should be used, with a lot of it boiling down to the old stand-by of “it depends”.  So, this is how I’ve set up my standards so that I always have a certain level of consistency across the farm.

At its most basic level, AAMs are configuration settings set in SharePoint Central Admin that tells SharePoint how to map a web request to the correct web application so that SharePoint can respond with the correct content.  It then tells the SharePoint content engine what zone and URL to use in relative links as users continue to surf that SharePoint site.

The major reason we need this is that there are very common Internet and load balancing scenarios where the URL of a web request received by IIS is not the same URL that the end user entered in their web browser.  There are also various health monitoring tools that will need to try to access the individual WFE to verify that each server is up and responding correctly.  Additionally, we see situations where search driven applications accessed internally and externally will need to work within the same zone in order to return result URLs that make sense in the context of where the site is being accessed from.

Because we can also host multiple web applications within our SharePoint farm, we need to make sure the nomenclature and zones are correctly set up so that when performing a global enterprise search, if a user has accessed the site from the “Intranet” zone, all search result URLs will utilize the other web application “Intranet” zones as well.

When it comes to Public URLs and Internal URLs, what’s the difference?

Public URLs: what the user types into the browser address bar to access the site.

Internal URLs: what load balancers, reverse proxy, NAT and port forwarding devices use to forward a user request to the farm.  This is where you may see individual WFE IP addresses and non-standard ports.

If I change an AAM, why don’t the IIS bindings change to match them?

That would be a cool feature, I’m not really sure why something like that isn’t implemented yet.  For the time being, if you add additional AAMs you’ll need to remote into the WFEs and add the additional bindings to the web applications of your choice.  Don’t forget that if you’re using FQDNs (and you should always have a FQDN for your farm) you’ll need the appropriate DNS entries as well.

Great, I totally understand AAMs.  Now what is best practice standards for how they are set up?

While I can’t speak to every situation, this is generally how I define my zones and set up my Alternate Access Mappings (AAMs).  Default and Intranet should always be used, the rest are of course optional, but whatever naming convention you use, be consistent!

Default: always the FQDN in the form of “http://WEBAPP.Company.com” (or “http://WEBAPP.domain.local” – whatever your FQDN format happens to be )

Intranet: Just the name of the webapp itself in the form of “http://WEBAPP”  It is important to use a single name, without any “.”s in the name, as some windows applications will see “.” in the address and will assume “Internet” zone for the address, which can lead to some interesting authentication issues with Microsoft Office applications.  Resources accessed internally work best with a single name address.

Internet: Almost always prefixed with "WWW” in the form of “http://www.WEBAPP.Company.com” – this identifies the access URL as a public facing or externally accessible site.  Remember there is no magic to using www, it is just a naming convention and you still need to do all the leg work to set up your firewall/NAT/port forwarding.  Seems obvious to some, but you’d be surprised what some non-technical people will assume.

Custom: In all honesty I have never used this field for anything more than specialized one-off URL testing and sometimes for SSL cert verification (which generally should go on the load balancer anyways) or when I’m looking to rename a site for business reasons and want to test out the site using the new URL.  I have never actually seen this field in use for normal production applications.

Extranet: Every company will be different, but I like to use the “PARTNERS” prefix, in the form of "http://partners.WEBAPP.Company.com”, you can use CLIENTS or CUSTOMERS as well, just be consistent in your naming convention, if you do have variations for business reasons make sure the rules are clearly documented.

Great, so I’m all set?

NO!  Whatever you decide to use as standards you absolutely must document them somewhere accessible (like say, SharePoint?) and then get agreement on these standards going forward.  You may have it clear in your head and totally understand how things should be set up, but when you win the lottery or move on to your next gig, if the company doesn’t have a documented standard to refer to going forward all your hard work will go down the tubes.  Entropy sets in and the standards will be diluted by those who come after you unless you leave behind some sort of guiding principles.