Will provide a link to a shared folder with a large number (60-70k) of files of various formats (pdf, png, msg, etc.). Three example files are provided. Primary deliverable will be a folder with all original files converted to PDF format with searchable text. Currently some files (~10k) are text-searchable PDFs; these do not have to be altered. The remaining files should be converted to PDF (if necessary) and then OCRed. High accuracy in the text recognition is essential. File names should be identical to those originally provided except for the file extension. If possible, we would also like a second deliverable: a spreadsheet with three columns: the original file name (with the original extension), the current file name (with the .pdf file extension), and the full extracted text from the document. Thanks for considering us; we look forward to your application!
Keyword: Python
Price: $500.0
Python PDF Conversion Data Extraction OCR Software File Conversion
We are seeking a highly skilled developer or team to deliver an MVP of an internal AI-powered lead generation platform. The system must automate lead collection, scoring, and multi-channel outreach via integrated messaging systems. Key Requirements: Data Collection: Col...
View JobEstamos buscando un desarrollador o analista de datos para crear un dashboard de indicadores de negocio. Este dashboard debe extraer datos de múltiples fuentes, principalmente desde nuestro sistema ERP (Bind ERP), y calcular diferentes indicadores con fórmulas que nosot...
View Job