We have developed a Python-based data pipeline for scraping and processing audio files. The pipeline downloads .wav files via multiple API calls, but due to API rate limits and long processing times, we need to scale the pipeline efficiently without managing servers. Project Goal: Deploy the Python scraping pipeline using AWS Fargate to parallelize execution across multiple serverless containers, efficiently process data, and upload results to Amazon S3—all while eliminating the need for direct EC2 instance management. Key Responsibilities: AWS Fargate Setup & Scaling: Deploy containerized scraping tasks with Fargate, allowing for dynamic scaling. Containerization (Docker): Package the Python data pipeline into a lightweight Docker container for deployment. Task Orchestration (ECS or Batch): Configure AWS ECS (Elastic Container Service) or AWS Batch to efficiently distribute and manage scraping jobs. Storage & Data Management: Optimize .wav file uploads to Amazon S3 and manage task execution logs. Security & Networking: Ensure containers have proper IAM roles, security groups, and VPC configurations for API access. Queue-Based Task Distribution: (Optional) Integrate AWS SQS or EventBridge to queue and trigger scraping tasks efficiently.
Keyword: Python
Price: $60.0
Python Docker Amazon Web Services Amazon EC2 Amazon S3 AWS Fargate AWS Lambda DevOps
Preciso desenvolver um sistema web completo para monitoramento de velocidade de veículos, que utilizará OCR para reconhecimento de placas, banco de dados para armazenar registros e gerenciamento de usuários com login e senha. O sistema será hospedado na nuvem e receberá...
View Job