Job Description Overview: We’re looking for a skilled back-end developer with experience in Google Document AI, OCR, and data classification to build a robust pipeline that automatically converts 12-month batches of PDF bank statements into structured Excel reports. A key part of this project involves an initial testing phase to compare Google Document AI's specialized bank statement processing against alternative OCR/AI methods, clearly determining which delivers the best speed and accuracy. Project Scope: 1. OCR/AI Comparative Testing Phase: Side-by-side testing of: Google Document AI (bank statement processor) Alternative custom OCR solutions (e.g., Amazon Textract, Tesseract, EasyOCR, Mistral-based approaches) Evaluate accuracy (target: 95–99% extraction accuracy) and speed (under 3 minutes per 12-month batch). Provide clear, data-backed recommendations on the optimal solution after testing. 2. Transaction Classification and Validation: Classify deposits automatically as Income or Non-Income. Identify and flag: Missing or incomplete monthly statements Large or suspicious deposits Potential discrepancies or anomalies Allow manual overrides on uncertain classifications. 3. Final Excel Output: Generate structured, polished Excel workbooks (template attached) with: Separate tabs for personal and business deposits Highlighted flags and summaries User-editable fields to manually adjust calculations if needed 4. RESTful API & Documentation: Build a clean RESTful API (Python/FastAPI or Node.js) to seamlessly handle PDF uploads and deliver completed Excel reports. Provide clear, detailed documentation for easy integration. Resources Provided: Extensive set of bank statements for thorough comparative testing Attached Excel template demonstrating required output format clearly Skills & Requirements: Proven OCR expertise (Google Document AI, Amazon Textract, custom OCR approaches) Experience with structured financial data extraction and classification Excel file automation skills (OpenPyXL, Pandas, SheetJS) Ability to build and document robust RESTful APIs Cloud deployment knowledge (Google Cloud/AWS) Timeline & Milestones: Week 1: Conduct comparative testing between Google Document AI and custom OCR solutions using provided statements Deliver clear evaluation results and recommended approach Week 2: Implement classification logic, transaction flagging, and Excel report generation based on chosen OCR solution Week 3: Complete full integration, manual override functionality, and API delivery Final QA and documented solution delivered How to Apply: Please provide: Description of your direct experience with Google Document AI and/or custom OCR/AI solutions Past relevant OCR-to-Excel project examples Estimated timeline and budget to perform the outlined OCR comparative testing and subsequent development Why Join Falcon Mortgage? Participate in cutting-edge automation in the mortgage industry Access large, detailed datasets and clear goals Opportunity for continued involvement and future innovation We look forward to finding a developer who can carefully test, validate, and recommend the most accurate and efficient OCR solution before building our production-ready system. Best regards, Nick Sharp Falcon Mortgage
Keyword: Data Cleaning
Price: $3.0
Data Entry AI Builder Google Sheets Automation Microsoft Excel Automation Python OCR Software Natural Language Processing Full-Stack Development Google APIs Back-End Development
We are seeking a Marketing Operations expert with deep experience in both Salesforce and HubSpot to audit, optimize, and scale our marketing and sales workflows. Our organization is fully functional within these platforms, but we need a specialist to enhance automation,...
View JobScope – Make the overall Excel book cleaner and more uniform without changing the overall layout. Separate out the contract labor data on the Mon – Sunday tabs on the Phase code tab per company.
View JobWe are looking for a dashboard design professional who can create a dashboard that’s easy to navigate and has drill-down capabilities, based on substantial data collected from insurance claims. The dashboard needs to be user-friendly so that our team members can view, f...
View Job