Help Needed: Design a Cost-Efficient Semantic Deduplication Pipeline Using OpenAI Embeddings


$3.00
Hourly: $3.00 - $15.00

Description: Millions of medically relevant sentences need to be saved in a database and later sent automatically to a language model for accuracy checking, one by one. Before they are sent to LLMs for processing, each sentence on the Database must go through semantic deduplication using OpenAI embeddings. We're not sure how best to implement this — especially in a way that keeps infrastructure and costs as low as possible. We are looking for someone who can help us: - Design or suggest a scalable, cost-friendly deduplication method. - Recommend whether we should use in-memory comparison, local tools, or something external. - Advise on how to structure storage so we don’t overpay for performance. - We prefer simple and practical over complex and over-engineered. IMPORTANT: -This will be a fixed budget project. -on your job proposal, SHARE YOUR ESTIMATED BUDGET.

Keyword: Data Processing

Price: $3.0

Data Science MongoDB Python Machine Learning OpenAI Embeddings Vector Embedding OpenAPI Machine Learning Algorithm Deep Learning

 

Lead Data Entry Specialist

N/D

View Job
Assistente Virtuale per Vendite

N/D

View Job
AI-Capable Data Scientist

We're looking for a Data Scientist with NLP, LLM, and AI experience to help us develop AI fintech products. Key Responsibilities: - Design, develop, and deploy machine learning models for AI-driven fintech products. - Analyze large datasets to extract actionable insight...

View Job