Validating profile... 0% complete

What is your current level of experience in freelance work?

 

 

Deploy Open-Source LLM (Mistral or LLaMA 3) with API Endpoint for Mobile App

We’re using GPT via API inside our mobile biofeedback app (Genius Insight) and want to switch to a cost-effective, self-hosted AI solution. We need a developer or DevOps expert to: • Deploy an open-source model (Mistral 7B, OpenChat, or LLaMA 3) • Host it via a cloud GPU instance (Runpod.io, Vast.ai, or similar) • Serve it using Ollama, vLLM, or Text Generation Inference • Provide a clean, simple REST API endpoint we can call from our app (same as OpenAI-style) • Ensure it’s reasonably fast and stable for 50–500 daily users You should have experience with: • Docker and Linux server setup • LLM model deployment • Hosting models on GPUs (A100, 4090, or T4) • API setup and basic security This is a one-time job, but future maintenance work may be available. Deliverables: • Deployed model + inference server • API docs (how we send prompts + receive replies) • Basic walkthrough so our team understands how to monitor it