Sunday, May 18, 2025

🚀 Streaming PostgreSQL Changes to BigQuery using Cloud Run Jobs + Cloud Scheduler 🔄

This lightweight Change Data Capture (CDC) pipeline streams PostgreSQL logical replication events to BigQuery — no Debezium, no Kafka, just native GCP tools.



🧩 Architecture Overview

[Cloud SQL (PostgreSQL)]
   └─ Logical replication via wal2json plugin
        ↓
[Cloud Scheduler] (runs every 12 minutes)
   └─ Triggers →
[Cloud Run Job (custom CDC listener)]
   └─ Pulls logical changes from slot
   └─ Publishes change events (JSON) →
[Cloud Pub/Sub Topic]
   ↓
[Dataflow (Apache Beam Flex Template)]
   ↓
[BigQuery] 


📌 GitHub: github.com/HenryXiloj/demos-gcp/tree/main/pg-cdc-to-bq-streaming

💡 Key Highlights

✅ CDC listener written in Python using psycopg2 with logical replication

✅ Deployed as a Cloud Run Job for short-lived, stateless execution

✅ Orchestrated by Cloud Scheduler (runs every 12 minutes)

✅ Publishes change events as JSON to Pub/Sub

✅ Real-time ingestion into BigQuery using Apache Beam (Dataflow Flex Template)

✅ Includes graceful shutdown with SIGTERM handling


🔄 Why Use Cloud Run Jobs?

  • Precise control over frequency & duration
  • No continuously running services
  • Fully stateless and pay-as-you-go
  • No external brokers or connectors required

⚖️ Compared to Traditional CDC (e.g., Debezium/Kafka)

🔹 No always-on infrastructure

🔹 No Zookeeper or Kafka to maintain

🔹 Native GCP integration for IAM, networking, and monitoring

🔹 Lower cost, easier to secure and deploy

⚡ Performance & Scalability

  • Handles thousands of changes per minute
  • Typical latency: 10–15 minutes
  • Scales with workload by adjusting Cloud Scheduler frequency

This design offers full control, better security posture, and serverless simplicity, making it ideal for event-driven analytics pipelines.

👉 Full implementation:

🔗 github.com/HenryXiloj/demos-gcp/tree/main/pg-cdc-to-bq-streaming


















No comments:

Post a Comment

🚀 Streaming PostgreSQL Changes to BigQuery using Cloud Run Jobs + Cloud Scheduler 🔄

This lightweight Change Data Capture (CDC) pipeline streams PostgreSQL logical replication events to BigQuery — no Debezium, no Kafka, just ...