Tuesday, August 5, 2025

🚀 Understanding O(1) vs O(n) – With Practical Code Examples

When writing efficient Java code, algorithmic complexity matters—a lot. In this post, I’ll walk you through two fundamental time complexities: O(1) and O(n), using clear Java 21 examples, and explain which one is more efficient for common search operations.



📌 What Do O(1) and O(n) Mean?

  • O(1) – Constant Time: The algorithm takes the same amount of time regardless of input size.
  • O(n) – Linear Time: The algorithm’s execution time increases linearly with input size.


🔍 Real Example: Name Search in a Collection

Let’s say you have a list of customer names, and you want to check if a name exists.


👎 O(n) - Linear Search (using List)

import java.util.List;

public class LinearSearchExample {
    public static boolean containsName(List<String> names, String target) {
        for (String name : names) {
            if (name.equals(target)) {
                return true;
            }
        }
        return false;
    }

    public static void main(String[] args) {
        List<String> customerNames = List.of("Alice", "Bob", "Henry", "Diana");
        System.out.println(containsName(customerNames, "Henry")); // true
    }
}


⏱️ Time Complexity: O(n)

🔄 The loop must check each name until it finds the target (or reaches the end).


✅ O(1) - Constant Time Lookup (using Set)

import java.util.Set;

public class ConstantTimeSearchExample {
    public static void main(String[] args) {
        Set<String> customerNames = Set.of("Alice", "Bob", "Henry", "Diana");

        boolean found = customerNames.contains("Henry"); // O(1) lookup
        System.out.println(found); // true
    }
}

⏱️ Time Complexity: O(1) on average

💡 Set uses a hash-based structure (e.g., HashSet) that enables constant-time lookup.

⚖️ Which Is More Efficient?








✅ Verdict: Use Set.contains() if you care about lookup speed—it’s much faster for large collections.

✨ Final Thoughts

Understanding algorithm complexity helps you write scalable, high-performance Java applications.

Knowing when to switch from List to Set can make a massive difference in performance, especially when you're processing thousands (or millions) of items.






Sunday, May 18, 2025

🚀 Streaming PostgreSQL Changes to BigQuery using Cloud Run Jobs + Cloud Scheduler 🔄

This lightweight Change Data Capture (CDC) pipeline streams PostgreSQL logical replication events to BigQuery — no Debezium, no Kafka, just native GCP tools.



🧩 Architecture Overview

[Cloud SQL (PostgreSQL)]
   └─ Logical replication via wal2json plugin
        ↓
[Cloud Scheduler] (runs every 12 minutes)
   └─ Triggers →
[Cloud Run Job (custom CDC listener)]
   └─ Pulls logical changes from slot
   └─ Publishes change events (JSON) →
[Cloud Pub/Sub Topic]
   ↓
[Dataflow (Apache Beam Flex Template)]
   ↓
[BigQuery] 


📌 GitHub: github.com/HenryXiloj/demos-gcp/tree/main/pg-cdc-to-bq-streaming

💡 Key Highlights

✅ CDC listener written in Python using psycopg2 with logical replication

✅ Deployed as a Cloud Run Job for short-lived, stateless execution

✅ Orchestrated by Cloud Scheduler (runs every 12 minutes)

✅ Publishes change events as JSON to Pub/Sub

✅ Real-time ingestion into BigQuery using Apache Beam (Dataflow Flex Template)

✅ Includes graceful shutdown with SIGTERM handling


🔄 Why Use Cloud Run Jobs?

  • Precise control over frequency & duration
  • No continuously running services
  • Fully stateless and pay-as-you-go
  • No external brokers or connectors required

⚖️ Compared to Traditional CDC (e.g., Debezium/Kafka)

🔹 No always-on infrastructure

🔹 No Zookeeper or Kafka to maintain

🔹 Native GCP integration for IAM, networking, and monitoring

🔹 Lower cost, easier to secure and deploy

⚡ Performance & Scalability

  • Handles thousands of changes per minute
  • Typical latency: 10–15 minutes
  • Scales with workload by adjusting Cloud Scheduler frequency

This design offers full control, better security posture, and serverless simplicity, making it ideal for event-driven analytics pipelines.

👉 Full implementation:

🔗 github.com/HenryXiloj/demos-gcp/tree/main/pg-cdc-to-bq-streaming


















Friday, May 2, 2025

Dataflow + Terraform: Secure Cloud SQL to BigQuery via PSA/PSC

 

Introduction

Two fully automated, production-grade pipelines demonstrate secure data ingestion from Cloud SQL PostgreSQL into BigQuery, using Apache Beam (Python) and Dataflow Flex Templates — all without public IP exposure.





✅ 1. Private Service Access (PSA) – Internal IP Connectivity

A Cloud SQL instance is provisioned with a private IP via PSA, and Dataflow connects over the internal VPC network.

📌 Key Highlights:

  • Cloud SQL + read replica
  • Private IP via PSA
  • VPC subnet + firewall on port 5432
  • Beam pipeline reads from PostgreSQL → BigQuery
  • Fully managed with Terraform
  • Optional GitHub Actions for CI/CD

🔌 JDBC Format: jdbc:postgresql://<private-ip>:5432/my-database2

🔗 Project Repo: 👉 github.com/.../cloud-sql-psa-with-bigquery


🔒 2. Private Service Connect (PSC) – DNS + SSL Certificate Access

PSC provides hostname-based, IP-less access to Cloud SQL, secured via SSL certificates stored in GCS.

📌 Key Highlights:

  • PSC forwarding rule + private DNS
  • SSL certs: server-ca.pem, client-cert.pem, client-key.pem
  • Enforced permissions (chmod 0600 on private key)
  • Beam pipeline loads certs at runtime from GCS
  • Flex Template accepts PSC host, cert path, and DB credentials
  • psycopg2 uses sslmode=verify-ca
  • Logs: success + errors written to BigQuery

🔗 Project Repo: 👉 github.com/.../cloud-sql-psc-with-bigquery


🛠️ Shared Architecture & Automation (Both Pipelines)

  • terraform/: VPC, Cloud SQL, BigQuery, IAM, DNS
  • sql/: PostgreSQL table schema + init script
  • postgresql_to_bq_flex_template/: Beam logic, Dockerfile, metadata
  • GCS: stores SSL certs + Dataflow staging artifacts
  • Flex Templates fully parameterized via metadata.json
  • Error handling + retry logic + scalable Dataflow workers


📊 Use Case Comparison


Both implementations align with cloud security and automation best practices:

✅ No public IPs

✅ IAM least-privilege access

✅ SSL/TLS enforcement (PSC)

✅ Infrastructure as Code with Terraform

✅ CI/CD extensibility


These blueprints are ideal for teams building secure, private, and scalable data pipelines on Google Cloud Platform using Dataflow, Cloud SQL, and BigQuery.
























🚀 Understanding O(1) vs O(n) – With Practical Code Examples

When writing efficient Java code, algorithmic complexity matters—a lot. In this post, I’ll walk you through two fundamental time complexitie...