Friday, May 2, 2025

Dataflow + Terraform: Secure Cloud SQL to BigQuery via PSA/PSC

 

Introduction

Two fully automated, production-grade pipelines demonstrate secure data ingestion from Cloud SQL PostgreSQL into BigQuery, using Apache Beam (Python) and Dataflow Flex Templates — all without public IP exposure.





✅ 1. Private Service Access (PSA) – Internal IP Connectivity

A Cloud SQL instance is provisioned with a private IP via PSA, and Dataflow connects over the internal VPC network.

📌 Key Highlights:

  • Cloud SQL + read replica
  • Private IP via PSA
  • VPC subnet + firewall on port 5432
  • Beam pipeline reads from PostgreSQL → BigQuery
  • Fully managed with Terraform
  • Optional GitHub Actions for CI/CD

🔌 JDBC Format: jdbc:postgresql://<private-ip>:5432/my-database2

🔗 Project Repo: 👉 github.com/.../cloud-sql-psa-with-bigquery


🔒 2. Private Service Connect (PSC) – DNS + SSL Certificate Access

PSC provides hostname-based, IP-less access to Cloud SQL, secured via SSL certificates stored in GCS.

📌 Key Highlights:

  • PSC forwarding rule + private DNS
  • SSL certs: server-ca.pem, client-cert.pem, client-key.pem
  • Enforced permissions (chmod 0600 on private key)
  • Beam pipeline loads certs at runtime from GCS
  • Flex Template accepts PSC host, cert path, and DB credentials
  • psycopg2 uses sslmode=verify-ca
  • Logs: success + errors written to BigQuery

🔗 Project Repo: 👉 github.com/.../cloud-sql-psc-with-bigquery


🛠️ Shared Architecture & Automation (Both Pipelines)

  • terraform/: VPC, Cloud SQL, BigQuery, IAM, DNS
  • sql/: PostgreSQL table schema + init script
  • postgresql_to_bq_flex_template/: Beam logic, Dockerfile, metadata
  • GCS: stores SSL certs + Dataflow staging artifacts
  • Flex Templates fully parameterized via metadata.json
  • Error handling + retry logic + scalable Dataflow workers


📊 Use Case Comparison


Both implementations align with cloud security and automation best practices:

✅ No public IPs

✅ IAM least-privilege access

✅ SSL/TLS enforcement (PSC)

✅ Infrastructure as Code with Terraform

✅ CI/CD extensibility


These blueprints are ideal for teams building secure, private, and scalable data pipelines on Google Cloud Platform using Dataflow, Cloud SQL, and BigQuery.
























No comments:

Post a Comment

Dataflow + Terraform: Secure Cloud SQL to BigQuery via PSA/PSC

  Introduction Two fully automated, production-grade pipelines demonstrate secure data ingestion from Cloud SQL PostgreSQL into BigQuery, us...