Experienced Data Engineer: Build API to BigQuery Pipeline (GCP, Python) - Segment 1


Project Overview: We are looking for a skilled Data Engineer to develop the first phase (Segment 1) of a data pipeline. This involves extracting data from a third-party cloud application's v1 REST API (used in the healthcare industry) and loading it into Google BigQuery for future analytics and reporting. Crucially, this project involves handling sensitive Protected Health Information (PHI). Adherence to strict security protocols is paramount, and signing a HIPAA Business Associate Agreement (BAA) is a non-negotiable requirement before project commencement. We will provide detailed API documentation (OpenAPI YAML spec for ~230+ endpoints) and access to a sandbox environment for development. This contract is specifically for Segment 1. Successful completion may lead to engagement for Segment 2 (more advanced data work) under a separate agreement. Responsibilities (Segment 1): -API Integration & Authentication: --Develop secure Python code for OAuth2 Client Credentials authentication (including token refresh). --Extract data from all necessary v1 API endpoints as defined in the documentation. --Implement robust handling for API parameters (filter, responseFields) and pagination (lastId mechanism) to ensure complete data retrieval. --Manage API technical rate limits gracefully (delays, backoff); be mindful of contractual volume limits (Client accepts potential overage fees). -Sandbox & Live Access: --Conduct all initial development and testing in the sandbox. --Support the process of gaining vendor approval for live API access based on successful sandbox work. -BigQuery Loading & Data Segregation: --Design appropriate BigQuery table schemas for the extracted API data. --Output 1 (Primary Load): Set up a primary BigQuery dataset and load the extracted data into corresponding tables. --Output 2 (Analytics Subset): Create a second, separate BigQuery dataset containing read-only views based on a subset of tables from the primary dataset (specific tables TBD by Client). --Output 3 (Anonymized Subset): Create a third, separate BigQuery dataset containing read-only views based on the analytics subset views. These views must be anonymized by removing specific PHI fields (e.g., names, DoB, contact info, addresses) while retaining necessary identifiers (e.g., patient ID, chart number) for analysis. -Automation: --Automate the extraction and primary BigQuery loading process to run reliably nightly using GCP tools (e.g., Cloud Functions, Cloud Scheduler). -Access Control Design: --Design and document a GCP IAM strategy ensuring read-only access can be granted exclusively to the anonymized dataset (Output 3), preventing access to the datasets containing raw PHI. -Documentation & Code Quality: --Deliver clean, well-commented, maintainable Python code. --Provide clear documentation (setup, configuration, schemas, IAM design). Required Skills & Experience: -Proven experience integrating with complex REST APIs (OAuth2, pagination, rate limits). -Strong Python skills for data extraction/processing. -Solid experience with Google Cloud Platform (GCP): --BigQuery: Schema design, SQL (views), data loading. --Cloud Functions & Cloud Scheduler (or similar GCP automation tools). --IAM: Understanding roles/permissions for data security. -Experience building ETL/ELT pipelines. -Data warehousing and modeling concepts. -Excellent communication and ability to work independently. Essential: Experience handling sensitive data (e.g., PHI) and understanding data privacy/security best practices. Important Notes: HIPAA BAA Required: You must sign a HIPAA Business Associate Agreement. Please confirm your understanding and acceptance in your proposal. Phased Project: This posting is for Segment 1 only. To Apply: Please submit your proposal detailing: -Your relevant experience (API integration, Python, GCP, BigQuery, automation, sensitive data). -Confirmation you understand and agree to sign a HIPAA BAA. -Your proposed approach for Segment 1. -Your estimated timeline for Segment 1. -Your rate or fixed price bid for Segment 1. We look forward to your application!

Keyword: Software Development

Data Warehousing & ETL Software BigQuery Data Analysis Google Sheets Looker Studio SQL REST API RESTful API ETL Pipeline Python Google Sheets Automation Data Modeling Automation

 

Website Testing - Georgia, Tennessee, North Carolina, South Carolina, Florida, and Alabama

Hello Upwork users, Looking for an exciting opportunity to work with a dynamic marketing company? LawMed Social is a marketing company based out of Santa Monica, CA that helps professional businesses across the country including law firms and medical practices increase ...

View Job
Long-Term PHP Developer for Various Projects

We are seeking a reliable PHP developer for ongoing projects. The ideal candidate should have a strong command of vanilla PHP, Laravel, and jQuery. You will be working on various tasks, including feature development, debugging, and maintaining existing applications. Thi...

View Job
Database Developer Needed for Consolidating Excel Data

We are seeking a skilled database developer to create a centralized database that consolidates data from multiple Excel sheets. The ideal candidate will be responsible for designing an efficient structure, importing data, and ensuring data integrity. Familiarity with Ex...

View Job