Why We Moved From CockroachDB to PostgreSQL

Introduction

ZITADEL Cloud recently made a notable shift in its database choice by moving from CockroachDB to PostgreSQL. This operational change was influenced by several factors, which we will explain in this blog. We took proactive steps to inform our valued customers about this infrastructure shift, and we've successfully completed the transition to PostgreSQL within our cloud service. Currently, we are actively working on revising our documentation and examples to reflect this change, i.e., prioritize PostgreSQL over CockroachDB. ZITADEL Cloud exclusively operates on Google Cloud Platform (GCP), which plays a significant role in our decision-making process. Our choice to adopt PostgreSQL was made smoother due to our seamless integration with GCP's managed database service, Cloud SQL. This fully managed, highly available PostgreSQL environment enabled us to tap into the scalability, reliability, and security features that GCP offers, all while benefiting from PostgreSQL's strong performance and efficiency.

Additionally, our aim to minimize the involvement of subprocessors and reduce reliance on third-party entities, including our previous discontinuation of Cloudflare services, played a significant role in this decision. This approach aligns with our goal of reducing dependencies on external partners, which not only simplifies control but also decreases the risk of data leakages.

Let’s dive into some of the key factors that drove this shift.

Data Residency and Control Concerns

One of the primary drivers behind our switch was concerns about how we managed data and where it was stored. With CockroachDB Dedicated, while we could set data storage regions, controlling the movement of log files and preventing data from ending up in unintended places was a real challenge. That challenge led us to difficulties in assuring our customers that their data would exclusively reside in their designated regions, ultimately influencing our decision to transition to PostgreSQL.

Shifting to PostgreSQL within GCP’s Cloud SQL environment gave us stronger tools for enforcing data residency within specific geographical boundaries. This was and still is vital for complying with regional data protection laws and building trust with our customers.

Technical Factors

Improved Latency and Efficiency

Our decision to switch was also motivated by the need to improve latency and streamline infrastructure efficiency. CockroachDB's global distribution model, while really advantageous for high availability, introduced higher latency and required more compute resources, leading to higher costs. In contrast, PostgreSQL, with its simpler architecture, enabled us to handle larger workloads with improved latency and reduced compute requirements. As a result, ZITADEL Cloud experienced significant enhancements in database query performance and overall system responsiveness. These improvements were particularly noticeable in our reduced API latency.

Scalability and Infrastructure Optimization

Our decision to shift to PostgreSQL was also rooted in a thorough analysis of our infrastructure needs. For example, running a CockroachDB cluster required 3 nodes per region, each equipped with 8 CPUs and 32 GB of RAM, operating at around 40-50% capacity. In contrast, PostgreSQL allowed us to achieve similar results with significantly fewer resources with 4 CPUs and 16GB of RAM, making our infrastructure utilization more efficient.

Scalability was another factor in our choice. GCP's ability to offer robust configurations combined with PostgreSQL's efficient scaling capabilities allowed us to confidently handle growing workloads. Whether we need to scale vertically or horizontally, PostgreSQL's architecture, coupled with GCP's infrastructure, provides the flexibility we need.

Recent advancements in cloud infrastructure have made PostgreSQL an even more attractive option. We can now vertically scale to machines with substantial resources, like 128 CPUs and nearly a terabyte of memory, addressing our previous scalability concerns in the past. Additionally, PostgreSQL's operational flexibility, including the ability to adjust database costs for different operations and utilize read replicas, gives us more options for scalability and performance optimization.

Operational Flexibility and Ease of Use

Moving to PostgreSQL also made our operational processes more straightforward. Running a dedicated PostgreSQL cluster in each region provides us with greater flexibility in how we handle tasks like isolated schema changes and rollouts, making data management smoother and less expensive. For example, optimizing for PostgreSQL was easier due to its non-distributed nature, allowing for straightforward use of transaction IDs. This simplicity was in contrast to the complexities associated with CockroachDB's logical timestamps. PostgreSQL presented a more deterministic and less resource-intensive approach.

Cost Effectiveness

Financial factors also weighed in our decision-making process. Operating a CockroachDB cluster incurred considerably higher costs compared to a similar configuration with PostgreSQL on Cloud SQL. For instance, we noted that a CockroachDB cluster with 3 nodes (8 CPUs, 32 GB RAM each) was almost three times as expensive as a PostgreSQL setup (4 CPUs, 16 GB RAM) while offering comparable performance per region. This cost difference, coupled with a more straightforward scaling model, made PostgreSQL a more economically viable option.

Conclusion

The decision to embrace PostgreSQL wasn't straightforward and was influenced by a blend of considerations, including data residency assurance, reduced latency requirements, cost-efficiency, operational flexibility, optimization simplicity, and scalability potential.

It's worth noting that CockroachDB remains extremely valuable, particularly in scenarios where high availability is a non-negotiable requirement, even if it means sacrificing a bit of speed. Essentially, it boils down to a classic trade-off between latency and availability, which you see in all distributed systems. If achieving the lowest possible latency while accommodating minor availability trade-offs is your goal, PostgreSQL emerges as the preferred choice. On the other hand, if ensuring rock-solid availability takes precedence and you can tolerate slightly higher latency, then opting for a CockroachDB cluster is the recommended route. Naturally, cost considerations also factor into this evaluation.

Liked it? Share it!