When setting up search content sites, sometimes you’ll see in the default content source (Local SharePoint) that there is a sps3://servername already in the list.

The SPS3 protocol handler uses a web services call (using HTTP/SOAP) to enumerate the content and then uses regular HTTP GET requests to get to the actual items. Crawling using the SPS3 protocol handler requires no RPC calls or direct database access to the target farm. That’s the main reason why this type of crawling is supported over WAN links and has a good tolerance to latency.

One the first things the crawler will do is use the "Portal Crawl" web service at http://servername/_vti_bin/spscrawl.asmx. The methods in this web service are EnumerateBucket, EnumerateFolder, GetBucket, GetItem and GetSite. It is interesting to see how both "Enumerate" methods will basically return just an "ID" and a "LastModified" datetime, hinting at how SharePoint can do incremental content crawls via this protocol handler… If you just point your browser to that URL yourself, you can find the additional information about the web service, including sample SOAP calls and the WSDL (as you get with any .NET web service). At this point, I could not find much detail on this web service beyond the actual class definition for Microsoft.Office.Server.Search.Internal.Protocols.SPSCrawl.

Some good articles for further reading:

And here is a list of all the protocols that you may find useful:

BDC protocol

BDC://

Used for Business Data Catalog URLs { available only in the Enterprise edition of MOSS 2007 }

BDC2 protocol

BDC2://

Used for Business Data Catalog URLs (an internal protocol) { available only in the Enterprise edition of MOSS 2007 }

File protocol

File://

Used to index file shares

RB protocol

RB://

Used to index Exchange Server public folders

RBS protocol

RBS://

Used to index Exchange Server public folders over SSL

SPS protocol

SPS://

Used to index people profiles from WSS 2.0 server farms

SPSS protocol

SPSS://

Used to index people profiles from WSS 2.0 server farms over SSL

STS2 protocol

STS2://

Used to index SharePoint content from WSS 2.0 sites

STS2S protocol

STS2S://

Used to index SharePoint content from WSS 2.0 sites over SSL

Notes protocol

NOTES://

Used to index Lotus Notes databases, and include this content in the MOSS Enterprise Search indexes