chore: increase of numbers to scrape; disabled PDF check in scholar model a6fbfb6 eljanmahammadli commited on Sep 26, 2024
added pagintion to google search, now retrieving more sites 5650543 eljanmahammadli commited on Sep 26, 2024
#perf: quality improvements to website scrape + PDF detect logic d904dd4 eljanmahammadli commited on Sep 23, 2024
#bugfix response 301 is solved, as we should explicitly set follow_redirects for httpx 6f4a113 eljanmahammadli commited on Sep 20, 2024
remove content_string (not used) + clean unicode non-printable chars + add pymupdf reading for pdf urls a62cc34 minko186 commited on Aug 23, 2024
changed split logic to resolve short generated text, more search website and some logging 59fbf6a eljanmahammadli commited on Aug 13, 2024