-
Notifications
You must be signed in to change notification settings - Fork 5k
(feat/pdf) Add Reducto as a fallback (for testing purposes) #2175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@cubic-dev-ai review |
@nickscamara I've started the AI code review. It'll take a few minutes to complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 1 file
Prompt for AI agents (all 3 issues)
Understand the root cause of the following 3 issues and fix them.
<file name="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts">
<violation number="1" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:303">
XSS risk: untrusted Markdown rendered to HTML with marked.parse without sanitization.</violation>
<violation number="2" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:343">
Timer not cleared; Reducto may start after completion, causing background work and potential unhandled rejection.</violation>
<violation number="3" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:376">
Racing logic rejects on first failure; should use first successful result (e.g., Promise.any).</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 1 file
Prompt for AI agents (all 3 issues)
Understand the root cause of the following 3 issues and fix them.
<file name="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts">
<violation number="1" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:301">
XSS risk: Rendering untrusted Markdown to HTML with marked without sanitization</violation>
<violation number="2" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:374">
Promise.race will reject on the first failure, breaking the intended "first successful result" behavior; use Promise.any instead.</violation>
<violation number="3" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:519">
Use decoded byte length instead of base64 string length when enforcing the 19MB limit.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
Don't merge, not working as expected |
@cubic-dev-ai re-run |
Fixed! |
@nickscamara I've started the AI code review. It'll take a few minutes to complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 1 file
Prompt for AI agents (all 2 issues)
Understand the root cause of the following 2 issues and fix them.
<file name="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts">
<violation number="1" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:394">
Reducto race task isn’t cancelled if RunPod later succeeds, causing unnecessary background processing and cost; consider aborting Reducto when RunPod resolves.</violation>
<violation number="2" location="apps/api/src/scraper/scrapeURL/engines/pdf/index.ts:584">
Size check uses Base64 string length instead of decoded byte length, misclassifying PDFs near the 19MB limit.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
@nickscamara I'd highly recommend giving a try to |
Summary by cubic
Adds Reducto as a PDF parsing fallback and introduces a delayed racing strategy with RunPod MU to improve reliability and speed for PDFs under 19MB. If RunPod is slow (>120s) or fails, Reducto runs; pdf-parse remains the final fallback.
New Features
Migration