When setting up search content sites, sometimes you’ll see in the default content source (Local SharePoint) that there is a sps3://servername already in the list.
The SPS3 protocol handler uses a web services call (using HTTP/SOAP) to enumerate the content and then uses regular HTTP GET requests to get to the actual items. Crawling using the SPS3 protocol handler requires no RPC calls or direct database access to the target farm. That’s the main reason why this type of crawling is supported over WAN links and has a good tolerance to latency.
One the first things the crawler will do is use the "Portal Crawl" web service at http://servername/_vti_bin/spscrawl.asmx. The methods in this web service are EnumerateBucket, EnumerateFolder, GetBucket, GetItem and GetSite. It is interesting to see how both "Enumerate" methods will basically return just an "ID" and a "LastModified" datetime, hinting at how SharePoint can do incremental content crawls via this protocol handler… If you just point your browser to that URL yourself, you can find the additional information about the web service, including sample SOAP calls and the WSDL (as you get with any .NET web service). At this point, I could not find much detail on this web service beyond the actual class definition for Microsoft.Office.Server.Search.Internal.Protocols.SPSCrawl.
Some good articles for further reading:
- There is an overview of this way content sources and protocol handlers work at http://technet2.microsoft.com/Office/en-us/library/f32cb02e-e396-46c5-a65a-e1b045152b6b1033.mspx
- You can find some more detailed information and a nice diagram on what a protocol handler does at http://msdn2.microsoft.com/en-us/library/ms974315.aspx
- There is also the description of the web services call used at http://msdn2.microsoft.com/en-us/library/ms583576.aspx
And here is a list of all the protocols that you may find useful:
BDC protocol | BDC:// | Used for Business Data Catalog URLs { available only in the Enterprise edition of MOSS 2007 } |
BDC2 protocol | BDC2:// | Used for Business Data Catalog URLs (an internal protocol) { available only in the Enterprise edition of MOSS 2007 } |
File protocol | File:// | Used to index file shares |
RB protocol | RB:// | Used to index Exchange Server public folders |
RBS protocol | RBS:// | Used to index Exchange Server public folders over SSL |
SPS protocol | SPS:// | Used to index people profiles from WSS 2.0 server farms |
SPSS protocol | SPSS:// | Used to index people profiles from WSS 2.0 server farms over SSL |
STS2 protocol | STS2:// | Used to index SharePoint content from WSS 2.0 sites |
STS2S protocol | STS2S:// | Used to index SharePoint content from WSS 2.0 sites over SSL |
Notes protocol | NOTES:// | Used to index Lotus Notes databases, and include this content in the MOSS Enterprise Search indexes |