Skip to main content

Scrape Web Site

alt text

BotDojo provides an easy way to scrape and index data from the web. When creating an index, select the Scrape a Website option.

important

BotDojo will only scrape 50 pages by default. Contact Us if you need this limit increased.

  • Name: Name of the Loader

  • Document Folder: The name of the folder to store the data

  • Url: URL of the website to scrape. If you only need to scrape a section of a website, then include the full path to the section to scrape.

    For example, https://docs.botdojo.com will index the entire website while https://docs.botdojo.com/docs/learn/concepts/vector will only include pages under the docs/learn/concepts/vector path.

  • Starting Url: When scraping a website, BotDojo will navigate to any link found on a page. Optionally provide a starting page to start the crawl.

  • Parser: BotDojo supports two parsers when scraping a website.

    • Cheerio: A fast parser but can have problems with complicated JavaScript pages.
    • Puppeteer: Opens the website in Chrome so it can handle complicated websites but is slower than Cheerio.