- Project settings
- Create Sitemap - Quick start
- HTML Sitemap
- Add URL
- Generate URLs
- Submit Sitemap
The crawler may be adjusted as you prefer for each project/website. All changes are being saved only for the current project.
User-Agent Ñ‚ÐÐ£ Choose the User-Agent the crawler will use during the crawling process. Also, you may select your own User-Agent.
Crawling Depth - Enter the depth level for the crawler, if you want to limit the crawling depth.
Examples of Values: 0 - no limitations, 1 - only root-web page, 2 - all webpages that refer from root web pages, etc.
Priorities by default Ñ‚ÐÐ£ Here you can adjust priorities by default which crawler will apply to found pages.
Principal of using: 0 - homepage of the website, 1 - All pages which refer from homepage, 2 Ñ‚ÐÐ£ All pages that refer from pages, which in their turn refer from homepage and so on, it is possible to add 3, 4, etc.
List of file extensions spider has to crawl. For instance if your websiteÑ‚ÐÐ©s pages have specific extensions, like .file, then in the list of extensions you need to select file for the spider to crawl the site. You need to add the exact extension without dots and asterix. It is possible to add your own extensions or remove unnecessary ones.
Spider will skip all those websites, in which your mentioned words or symbols will be found. You can see the examples no screenshot.
Spider exceptions can also be adjusted on basis of robots.txt site. For that you need to press the Import from robots.txt button and select the address from robots.txt file.
Spider will index only those websites, which addresses contain texts from that list. See the using example in the screenshot.
If certain parameters will be found in URL, they will be removed from it, before the URL will be placed in search. This function can be used for discarding Session-ID or similar one-time parameters.
If spider indicated such link: http://community.invisionpower.com/forum/297-ips-company-feedback/?session=02e0a436b7555ee760af1a1a70c266cb and in the list you selected session, then the program will delete the following from the link?session=02e0a436b7555ee760af1a1a70c266cb and will transfer to Sitemap file the clear link: http://community.invisionpower.com/forum/297-ips-company-feedback/.
Enter the content type of files that spider has to index. Example: text/html, text/plain.
We have prepared complete spider settings for popular CMS and forum engines. These settings will keep you away from indexing spam that those engines usually contain. If you employ one of these settings, the program will automatically add all necessary spider settings into Remove Parameters and Exceptions sections. If you want other popular engines included in the list, contact us and we will consider your offer.
Choose the attributes which spider needs to process and where it should look for references to other pages of a site.