Do you want to apply for this freelance job vacancy?
Seeking an experienced AI/ML engineer for a one-time on-site setup of a high-performance local LLM system. This project involves configuring and optimizing a large open-weight language model (LLaMA 4 – 70B) for use in a secure, offline private research environment. Responsibilities will include: • Installing and configuring LLaMA 4 (Maverick version) locally on a high-performance Ubuntu system with RTX 6000 Ada GPU • Setting up token streaming or prompt-response architecture using vLLM, Ollama, or similar inference stack • Building a lightweight FastAPI (or CLI) interface for model interaction • Implementing logging of inputs/outputs to disk in JSON or plain text • Assisting with setup of a local embedding model (e.g., MiniLM or BGE) for vector search/memory recall Requirements: • Prior experience running large models locally (13B–70B) • Familiarity with GPU inference and memory optimization (without quantization) • Strong Linux skills (Ubuntu CLI) • Security-first mindset; must respect that the system is fully airgapped • Ability to communicate clearly and implement from spec Nice to have (not required): • Familiarity with LangChain, LangGraph, or agent orchestration frameworks • Knowledge of inference schedulers, token streaming, or routing logic Project Details: • Estimated time: 1–1.5 working days total • Compensation: Rate negotiable — please include your typical hourly or day rate when applying • Location: Must be available to work on-site in South Bend, IN • Security: NDA will be required
Keyword: Web Programmer
Price: $45.0