Scrape Web Site

alt text

BotDojo provides an easy way to scrape and index data from the web. When creating an index, select the Scrape a Website option.

important

BotDojo will only scrape 50 pages by default. Contact Us if you need this limit increased.

Name: Name of the Loader
Document Folder: The name of the folder to store the data
Url: URL of the website to scrape. If you only need to scrape a section of a website, then include the full path to the section to scrape.

For example, https://docs.botdojo.com will index the entire website while https://docs.botdojo.com/docs/learn/concepts/vector will only include pages under the docs/learn/concepts/vector path.
Starting Url: When scraping a website, BotDojo will navigate to any link found on a page. Optionally provide a starting page to start the crawl.
Parser: BotDojo supports two parsers when scraping a website.
- Cheerio: A fast parser but can have problems with complicated JavaScript pages.
- Puppeteer: Opens the website in Chrome so it can handle complicated websites but is slower than Cheerio.