Experienced Data Engineer: Build API to BigQuery Pipeline (GCP, Python) - Segment 1


Project Overview: We are looking for a skilled Data Engineer to develop the first phase (Segment 1) of a data pipeline. This involves extracting data from a third-party cloud application's v1 REST API (used in the healthcare industry) and loading it into Google BigQuery for future analytics and reporting. Crucially, this project involves handling sensitive Protected Health Information (PHI). Adherence to strict security protocols is paramount, and signing a HIPAA Business Associate Agreement (BAA) is a non-negotiable requirement before project commencement. We will provide detailed API documentation (OpenAPI YAML spec for ~230+ endpoints) and access to a sandbox environment for development. This contract is specifically for Segment 1. Successful completion may lead to engagement for Segment 2 (more advanced data work) under a separate agreement. Responsibilities (Segment 1): -API Integration & Authentication: --Develop secure Python code for OAuth2 Client Credentials authentication (including token refresh). --Extract data from all necessary v1 API endpoints as defined in the documentation. --Implement robust handling for API parameters (filter, responseFields) and pagination (lastId mechanism) to ensure complete data retrieval. --Manage API technical rate limits gracefully (delays, backoff); be mindful of contractual volume limits (Client accepts potential overage fees). -Sandbox & Live Access: --Conduct all initial development and testing in the sandbox. --Support the process of gaining vendor approval for live API access based on successful sandbox work. -BigQuery Loading & Data Segregation: --Design appropriate BigQuery table schemas for the extracted API data. --Output 1 (Primary Load): Set up a primary BigQuery dataset and load the extracted data into corresponding tables. --Output 2 (Analytics Subset): Create a second, separate BigQuery dataset containing read-only views based on a subset of tables from the primary dataset (specific tables TBD by Client). --Output 3 (Anonymized Subset): Create a third, separate BigQuery dataset containing read-only views based on the analytics subset views. These views must be anonymized by removing specific PHI fields (e.g., names, DoB, contact info, addresses) while retaining necessary identifiers (e.g., patient ID, chart number) for analysis. -Automation: --Automate the extraction and primary BigQuery loading process to run reliably nightly using GCP tools (e.g., Cloud Functions, Cloud Scheduler). -Access Control Design: --Design and document a GCP IAM strategy ensuring read-only access can be granted exclusively to the anonymized dataset (Output 3), preventing access to the datasets containing raw PHI. -Documentation & Code Quality: --Deliver clean, well-commented, maintainable Python code. --Provide clear documentation (setup, configuration, schemas, IAM design). Required Skills & Experience: -Proven experience integrating with complex REST APIs (OAuth2, pagination, rate limits). -Strong Python skills for data extraction/processing. -Solid experience with Google Cloud Platform (GCP): --BigQuery: Schema design, SQL (views), data loading. --Cloud Functions & Cloud Scheduler (or similar GCP automation tools). --IAM: Understanding roles/permissions for data security. -Experience building ETL/ELT pipelines. -Data warehousing and modeling concepts. -Excellent communication and ability to work independently. Essential: Experience handling sensitive data (e.g., PHI) and understanding data privacy/security best practices. Important Notes: HIPAA BAA Required: You must sign a HIPAA Business Associate Agreement. Please confirm your understanding and acceptance in your proposal. Phased Project: This posting is for Segment 1 only. To Apply: Please submit your proposal detailing: -Your relevant experience (API integration, Python, GCP, BigQuery, automation, sensitive data). -Confirmation you understand and agree to sign a HIPAA BAA. -Your proposed approach for Segment 1. -Your estimated timeline for Segment 1. -Your rate or fixed price bid for Segment 1. We look forward to your application!

Keyword: Software Development

Data Warehousing & ETL Software BigQuery Data Analysis Google Sheets Looker Studio SQL REST API RESTful API ETL Pipeline Python Google Sheets Automation Data Modeling Automation

 

Zapier Twillo API integration specialist

I operate Austin Rental Boats. I use a booking system called Fareharbor. I collect customers information for the initial booking. Then their friends who are joining them will all sign waivers entering their email and phone number using a Smartwaiver and Fareharbor integ...

View Job
Need help setting up a custom domain for Gumroad

Hi I need you to walk me through setting up a custom domain for a product on Gumroad. The domain is on Namecheap Here is what I will need from you. 1. Be familiar with the process to accomplish the job. 2. Be able to walk me through the process on a video call 3. Know w...

View Job
Software Developer / CTO

We are looking for a full stack developer/ CTO for our health wearable company. Must have experience with hardware/software development and an interest in health technology. Please send examples of previous work. Pantherwearables.com

View Job