library_books Documentation

info

PRO Sitemaps Crawler Bot

PRO Sitemaps service uses crawler bots (also known as "robots" or "spiders") in order to find out the structure of the website and create the list of website's pages in different formats, known as "sitemaps".

Our crawler bot sends individual requests to load website pages, analyzes response, finds internal links on those pages and continues the process until no more new pages can be discovered, unless restricted in some way by the website owner.

Crawl Rates and Rules

Crawling process is initiated on request of the website entry owner, according with the settings in their PRO Sitemaps Configuration, allowing to restrict the schedule for the crawler and limit the crawling rate (the number of request per time interval).

Crawler Bot respects directives found in the website's robots.txt file, ignoring all disallowed URLs. Additionally our crawler bot respects "robots" meta tags and rel="nofollow" tag attributes found in the webpage's source code.

Crawling rate will be automatically reduced upon receiving 429 Too Many Requests HTTP response code.

Technical Data

In order to allow efficient crawling process, our crawler bot uses multiple servers to send requests. Our main crawler servers are located in the UK and current list of server's IP addresses can be found in this file and is listed here:

85.92.66.149
81.19.188.235
81.19.188.236
85.92.66.150

Our bot uses the following "User-Agent" identification header:

Mozilla/5.0 (compatible; Pro Sitemaps Generator; pro-sitemaps.com) Gecko Pro-Sitemaps/1.0

Please contact us if you need any more details or have a request regarding the functioning of our crawler bot.