TIME SENSITIVE PROJECT Description: I have approximately 200 reports in Excel and PDF format. These reports contain tables or structured/semi-structured data, but the formatting, field names, and file naming conventions vary significantly across files. I'm looking for a skilled data analyst or Python developer who can help me compare these reports and identify which ones are at least 60% similar in content. This will require fuzzy matching techniques and possibly data normalization. Responsibilities: Extract data from PDF and Excel reports (some may require OCR or table parsing). Clean and normalize the data across all files. Compare the reports and determine which are ≥60% similar based on data content. Deliver a summary of matched report pairs or groups with similarity scores....
Keyword: Data Processing
Delivery Time: 2 days left days
Price: $481.0
Data Mining Data Processing Excel Python Software Architecture
Requiero confeccionar un software dedicado a los presupuestos de obra por disciplina, el operario va introduciendo la data y se va actualizando los costes para generar valores por unidad con precios, datos del proveedor, imagen del producto.
View Job