Will provide a link to a shared folder with a large number (60-70k) of files of various formats (pdf, png, msg, etc.). Three example files are provided. Primary deliverable will be a folder with all original files converted to PDF format with searchable text. Currently some files (~10k) are text-searchable PDFs; these do not have to be altered. The remaining files should be converted to PDF (if necessary) and then OCRed. High accuracy in the text recognition is essential. File names should be identical to those originally provided except for the file extension. If possible, we would also like a second deliverable: a spreadsheet with three columns: the original file name (with the original extension), the current file name (with the .pdf file extension), and the full extracted text from the document. Thanks for considering us; we look forward to your application!
Keyword: Data Analysis
Price: $500.0
Data Extraction PDF Conversion OCR Software File Conversion Python
Role Overview: We’re looking for a Machine Learning Engineer with hands-on experience fine-tuning large language models (LLMs) for domain-specific applications. You'll help us adapt modern generative models to the unique challenges of the automotive space, leveraging st...
View JobWe are seeking a Mixpanel expert to assess our current implementation and provide recommendations for improvement. The ideal candidate will assist in creating insightful dashboards tailored to our business needs and conduct training sessions for our internal team to ens...
View Job