What is your current level of experience in freelance work?
We have developed a Python-based data pipeline for scraping and processing audio files. The pipeline downloads .wav files via multiple API calls, but due to API rate limits and long processing times, we need to scale the pipeline efficiently without managing servers. Project Goal: Deploy the Python scraping pipeline using AWS Fargate to parallelize execution across multiple serverless containers, efficiently process data, and upload results to Amazon S3—all while eliminating the need for direct EC2 instance management. Key Responsibilities: AWS Fargate Setup & Scaling: Deploy containerized scraping tasks with Fargate, allowing for dynamic scaling. Containerization (Docker): Package the Python data pipeline into a lightweight Docker container for deployment. Task Orchestration (ECS or Batch): Configure AWS ECS (Elastic Container Service) or AWS Batch to efficiently distribute and manage scraping jobs. Storage & Data Management: Optimize .wav file uploads to Amazon S3 and manage task execution logs. Security & Networking: Ensure containers have proper IAM roles, security groups, and VPC configurations for API access. Queue-Based Task Distribution: (Optional) Integrate AWS SQS or EventBridge to queue and trigger scraping tasks efficiently.