Crawler Settings

The crawler may be adjusted as you prefer for each project/website. All changes are being saved only for the current project.

General Settings

General Crawler Settings

User-Agent тАУ Choose the User-Agent the crawler will use during the crawling process. Also, you may select your own User-Agent.

User-Agent

Crawling Depth - Enter the depth level for the crawler, if you want to limit the crawling depth.

Examples of Values: 0 - no limitations, 1 - only root-web page, 2 - all webpages that refer from root web pages, etc.

Priorities by default тАУ Here you can adjust priorities by default which crawler will apply to found pages.

Principal of using: 0 - homepage of the website, 1 - All pages which refer from homepage, 2 тАУ All pages that refer from pages, which in their turn refer from homepage and so on, it is possible to add 3, 4, etc.

File Extension

List of file extensions spider has to crawl. For instance if your websiteтАЩs pages have specific extensions, like .file, then in the list of extensions you need to select file for the spider to crawl the site. You need to add the exact extension without dots and asterix. It is possible to add your own extensions or remove unnecessary ones.

Extension of crawler files

Exceptions

Spider will skip all those websites, in which your mentioned words or symbols will be found. You can see the examples no screenshot.

Crawler exclusion

Spider exceptions can also be adjusted on basis of robots.txt site. For that you need to press the Import from robots.txt button and select the address from robots.txt file.

Inclusions

Spider will index only those websites, which addresses contain texts from that list. See the using example in the screenshot.

Inclusions

Remove Parameters

If certain parameters will be found in URL, they will be removed from it, before the URL will be placed in search. This function can be used for discarding Session-ID or similar one-time parameters.

Example:

If spider indicated such link: http://community.invisionpower.com/forum/297-ips-company-feedback/?session=02e0a436b7555ee760af1a1a70c266cb and in the list you selected session, then the program will delete the following from the link?session=02e0a436b7555ee760af1a1a70c266cb and will transfer to Sitemap file the clear link: http://community.invisionpower.com/forum/297-ips-company-feedback/.

Remove parameters

Content Types

Enter the content type of files that spider has to index. Example: text/html, text/plain.

Content Types

Ready-to-use Settings

We have prepared complete spider settings for popular CMS and forum engines. These settings will keep you away from indexing spam that those engines usually contain. If you employ one of these settings, the program will automatically add all necessary spider settings into Remove Parameters and Exceptions sections. If you want other popular engines included in the list, contact us and we will consider your offer.

Ready-to-Use Settings

Processing Attributes

Choose the attributes which spider needs to process and where it should look for references to other pages of a site.

Processing Attributes