Introduction
✅ 1. Private Service Access (PSA) – Internal IP Connectivity
A Cloud SQL instance is provisioned with a private IP via PSA, and Dataflow connects over the internal VPC network.
📌 Key Highlights:
- Cloud SQL + read replica
- Private IP via PSA
- VPC subnet + firewall on port 5432
- Beam pipeline reads from PostgreSQL → BigQuery
- Fully managed with Terraform
- Optional GitHub Actions for CI/CD
🔌 JDBC Format: jdbc:postgresql://<private-ip>:5432/my-database2
🔗 Project Repo: 👉 github.com/.../cloud-sql-psa-with-bigquery
🔒 2. Private Service Connect (PSC) – DNS + SSL Certificate Access
PSC provides hostname-based, IP-less access to Cloud SQL, secured via SSL certificates stored in GCS.
📌 Key Highlights:
- PSC forwarding rule + private DNS
- SSL certs: server-ca.pem, client-cert.pem, client-key.pem
- Enforced permissions (chmod 0600 on private key)
- Beam pipeline loads certs at runtime from GCS
- Flex Template accepts PSC host, cert path, and DB credentials
- psycopg2 uses sslmode=verify-ca
- Logs: success + errors written to BigQuery
🔗 Project Repo: 👉 github.com/.../cloud-sql-psc-with-bigquery
🛠️ Shared Architecture & Automation (Both Pipelines)
- terraform/: VPC, Cloud SQL, BigQuery, IAM, DNS
- sql/: PostgreSQL table schema + init script
- postgresql_to_bq_flex_template/: Beam logic, Dockerfile, metadata
- GCS: stores SSL certs + Dataflow staging artifacts
- Flex Templates fully parameterized via metadata.json
- Error handling + retry logic + scalable Dataflow workers
📊 Use Case Comparison
Both implementations align with cloud security and automation best practices:
✅ No public IPs
✅ IAM least-privilege access
✅ SSL/TLS enforcement (PSC)
✅ Infrastructure as Code with Terraform
✅ CI/CD extensibility
These blueprints are ideal for teams building secure, private, and scalable data pipelines on Google Cloud Platform using Dataflow, Cloud SQL, and BigQuery.
No comments:
Post a Comment