We are seeking an experienced Senior Site Reliability Engineer (SRE) to join our team and help build scalable, reliable, and secure infrastructure for our applications. The ideal candidate will have a deep understanding of cloud infrastructure, automation, observability, and incident management, ensuring high availability and optimal performance. Responsibilities: Design, implement, and maintain scalable, resilient infrastructure on cloud platforms (AWS, Azure, or GCP). Develop and manage CI/CD pipelines to streamline deployments and improve system reliability. Automate infrastructure provisioning, monitoring, and incident response using Terraform, Ansible, or similar tools. Monitor system performance and troubleshoot issues to improve uptime and response times. Implement observability solutions, including logging, monitoring, and alerting, using tools like Prometheus, Grafana, Datadog, or ELK Stack. Establish best practices for incident response and ensure post-mortem analysis is conducted for critical incidents. Collaborate with development and operations teams to enhance system reliability and ensure security compliance. Optimize cloud costs while maintaining system performance and availability. Requirements: 5+ years of experience in Site Reliability Engineering, DevOps, or a related field. Strong expertise in cloud platforms (AWS, Azure, or GCP) and container orchestration (Kubernetes, Docker). Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible. Hands-on experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI/CD, or ArgoCD). Deep understanding of Linux systems, networking, and security best practices. Experience with observability tools (Prometheus, Grafana, Datadog, New Relic, or ELK). Proficiency in scripting and automation using Python, Bash, or Go. Strong knowledge of database administration (SQL and NoSQL databases). Familiarity with incident response, root cause analysis, and post-mortem processes. Excellent problem-solving skills and ability to work in a fast-paced environment. Preferred Qualifications: Experience with distributed systems, microservices architecture, and event-driven systems. Knowledge of security best practices, including IAM, encryption, and compliance standards. Understanding of FinOps for cloud cost optimization. Prior experience in a high-traffic production environment.
Keyword: Python Development
Contractor Tier: Hourly: $30.00 - $100.00
Price: $30.0
Amazon Web Services Docker DevOps Amazon EC2 Ansible Kubernetes CI/CD System Administration
Hardest Math Problem Student Contest Content Expert Developer | May-July, 2025 Program Overview The Hardest Math Problem Student Contest is an annual competition presented by The Actuarial Foundation and the New York Life Foundation that challenges grades 6–8 students t...
View JobWe are looking for a senior Python/AI Developer. This project involves the complete process of developing and deploying an AI model using Python. The solution focuses on utilizing TensorFlow for model development and training, while employing TensorFlow Serving for mode...
View Job***AZURE + DATABRICKS*** In this job you will work as a shadow resource (or front facing if you want more $$$) part time and work upfront part time. This will be CONSISTENT + LONG TERM need someone who is dependable and dedicated. Usual time you will be needed is betwee...
View Job