Validating profile... 0% complete

What is your current level of experience in freelance work?

 

 

Help Needed: Design a Cost-Efficient Semantic Deduplication Pipeline Using OpenAI Embeddings

Description: Millions of medically relevant sentences need to be saved in a database and later sent automatically to a language model for accuracy checking, one by one. Before they are sent to LLMs for processing, each sentence on the Database must go through semantic deduplication using OpenAI embeddings. We're not sure how best to implement this — especially in a way that keeps infrastructure and costs as low as possible. We are looking for someone who can help us: - Design or suggest a scalable, cost-friendly deduplication method. - Recommend whether we should use in-memory comparison, local tools, or something external. - Advise on how to structure storage so we don’t overpay for performance. - We prefer simple and practical over complex and over-engineered. IMPORTANT: -This will be a fixed budget project. -on your job proposal, SHARE YOUR ESTIMATED BUDGET.