Enhance the knowledge crawl mechanism to intelligently filter out irrelevant chunks (e.g., headers, footers, side content, discussion links) from crawled URLs. The current crawl grabs everything, leading to many useless chunks and requiring manual workarounds.
Even requesting a knowledge crawl at level one on a URL results in many useless chunks, including things like headers, footers, side content, forum/discussion links, etc. The crawl isn't smart and just grabs everything, resulting in sometimes hundreds of useless chunks. My current solution has been to "print" relevant pages into PDFs, combine all the PDFs into one, convert the combined PDF using Docling into a .md to use with Archon. Below is a chunk, with a broken end link due to chunking, just showing links to Tauri releases. It provides no relevant information to use in a coding project. `0-beta.2 ](https://tauri.app/release/tauri/v2.0.0-beta.2/) * [ 2.0.0-beta.1 ](https://tauri.app/release/tauri/v2.0.0-beta.1/) * [ 2.0.0-beta.0 ](https://tauri.app/release/tauri/v2.0.0-beta.0/) * [ 2.0.0-alpha.21 ](https://tauri.app/release/tauri/v2.0.0-alpha.21/) * [ 2.0.0-alpha.20 ](https://tauri.app/release/tauri/v2.0.0-alpha.20/) * [ 2.0.0-alpha.19 ](https://