Help Needed: Design a Cost-Efficient Semantic Deduplication Pipeline Using OpenAI Embeddings


$3.00
Hourly: $3.00 - $15.00

Description: Millions of medically relevant sentences need to be saved in a database and later sent automatically to a language model for accuracy checking, one by one. Before they are sent to LLMs for processing, each sentence on the Database must go through semantic deduplication using OpenAI embeddings. We're not sure how best to implement this — especially in a way that keeps infrastructure and costs as low as possible. We are looking for someone who can help us: - Design or suggest a scalable, cost-friendly deduplication method. - Recommend whether we should use in-memory comparison, local tools, or something external. - Advise on how to structure storage so we don’t overpay for performance. - We prefer simple and practical over complex and over-engineered. IMPORTANT: -This will be a fixed budget project. -on your job proposal, SHARE YOUR ESTIMATED BUDGET.

Keyword: Data Processing

Price: $3.0

Data Science MongoDB Python Machine Learning OpenAI Embeddings Vector Embedding OpenAPI Machine Learning Algorithm Deep Learning

 

Convert Book Pages to Excel 2

I need help converting text from book page images into an editable Excel format. Tell your quote to write around 1,000 such questions from different books Requirements: - Use AI tools (ChatGPT, Deep Seek, etc.) or any tool of your choice. - Convert clear JPEG images of ...

View Job
Virtual Assistant

N/D

View Job
Excel Expert for Detailed Data Entry

Thank you for visiting my profile #Professional Data Entry | Data Cleaning | Excel Expert | VLOOKUP Master Hi there! I'm a detail-oriented and reliable freelancer with expertise in Data Entry, Data Cleaning, and advanced Microsoft Excel tasks like VLOOKUP, data fo...

View Job