- You can create content sources from the following locations:
- SharePoint sites
- Websites (non-SharePoint)
- File shares
- Exchange public folders
- Lotus Notes
- Applications via the Business Data Catalog
- Index is like the table of contents, crawl is the action to establish the index and read the contents for search later.
- Sharepoint stores content from searching Sharepoint sites in a database. Content from non-Sharepoint sites are stored in the search indexes folder.
- For SharePoint to search any content, the content must first be indexed.
- A content index combines details on all the information in the content sources. When users perform searches, the index is queried for content that matches the user-entered terms.
Crawl
- A crawl is the process by which the SharePoint index is rebuilt or updated to include new information.
- A crawler starts off with the URL for an initial page. It retrieves the initial page, extracts any URLs in it, and adds them to a queue of URLs to be scanned. Then the crawler gets URLs from the queue and repeats the process.
- A full crawl is a complete recrawl of all SharePoint content to update the index.
- An incremental update only reviews items that have changed or been added since the last update.
Crawl Rules
- If a content source points to www.mycompany.com, and you can use crawl rule to exclude www.mycompany.com/backup, so the backup subdirectory will not be crawled.
- If you want to include a folder inside the backup folder which is excluded, you can include the url as well, such as www.mycompany.com/backup/folder
- The above only applies to HTTP content.
- You can use the asterisk (*) as a wildcard character in crawl rules, for example: http://*.mycompany.com/*.html
Scopes
- Scopes tell SharePoint what sections of an index to search.
- A search scope is a subsection of the index based on some predetermined rules related to a specific content source, location, or property.
- Once you defined the scopes, you can search the content by scopes which narrows down your search to the specific rules you set up.
Scope Rules
- You define a scope by adding scope rules to the scope.
- Scope rules define what content to associate with the scope and what content to not associate with it.
- Each scope rule is based upon a particular scope rule type:
- Web address (http://server/site)
- Property Query (Author = John Doe)
- Content Source
- All Content
Managed Properties
- You can create site columns as global properties to define documents and list items. Then you can configure these columns as managed properties so that users can search for documents based on specific content in the Advanced Search interface.
Basic Search Interface
- If you type two or more keywords in the search box, the result will return a list of items that show where the words appear together and separately, but in no specific order.
- If you type two or more keywords with quotation marks around the words, the result will return occurrences when the these words appear together and in the order specified.
Keyword
- A keyword is a word that summarizes the topic for which you are searching.
- You can define words that are strongly related to your business and identify these as keywords so that when a user types a keyword to perform a search, she receives best bets for the returned results.
Best Bet
- A best bet is a web address associated with a custom keyword that appears prominently when a user makes a search request using that keyword.
- Best bet results are returned in the right-hand column of the search results in an order that the administrator specifies.
Reference:
Building Team Solutions with MOSS 2007
Efficient Crawling Through URL Ordering