We have developed a Python-based data pipeline for scraping and processing audio files. The pipeline downloads .wav files via multiple API calls, but due to API rate limits and long processing times, we need to scale the pipeline efficiently without managing servers. Project Goal: Deploy the Python scraping pipeline using AWS Fargate to parallelize execution across multiple serverless containers, efficiently process data, and upload results to Amazon S3—all while eliminating the need for direct EC2 instance management. Key Responsibilities: AWS Fargate Setup & Scaling: Deploy containerized scraping tasks with Fargate, allowing for dynamic scaling. Containerization (Docker): Package the Python data pipeline into a lightweight Docker container for deployment. Task Orchestration (ECS or Batch): Configure AWS ECS (Elastic Container Service) or AWS Batch to efficiently distribute and manage scraping jobs. Storage & Data Management: Optimize .wav file uploads to Amazon S3 and manage task execution logs. Security & Networking: Ensure containers have proper IAM roles, security groups, and VPC configurations for API access. Queue-Based Task Distribution: (Optional) Integrate AWS SQS or EventBridge to queue and trigger scraping tasks efficiently.
Keyword: Python
Price: $60.0
Python Docker Amazon Web Services Amazon EC2 Amazon S3 AWS Fargate AWS Lambda DevOps
AI Developer for Timecard Automation We need an AI developer to automate the verification and reporting of timecards (PDFs). This project will focus on improving payroll accuracy and efficiency. Key Tasks: Verify timecard accuracy (hours, coding, etc.). Total hours in t...
View Jobgostaria de uma interface onde agrupasse os dados de planilhas que tenho, onde guardo informações como endereços e dados, códigos do imóvel, datas limites etcCategory: IT & ProgrammingSubcategory: OtherProject size: MediumIs this a project or a position?: ProjectI c...
View Job