Перейти до основного вмісту

DB Sharding & Replication

This document outlines best practices for implementing database sharding and replication in a scalable Spring Boot architecture.


1. Overview

Sharding and replication allow the system to scale horizontally and improve availability. Sharding splits data across nodes; replication ensures fault tolerance and read scalability.


2. Sharding

What is Sharding?

  • Partitioning data into smaller, faster, more manageable parts.
  • Each shard handles a subset of data (e.g., by tenant ID or user ID).

Strategies

  • Hash-based sharding: Evenly distributes data, but limits range queries.
  • Range-based sharding: Efficient for queries on time or numeric ranges.
  • Directory-based sharding: Central map that routes to the correct shard.

Implementation

  • Use sharding middleware (e.g., ShardingSphere, Vitess).
  • Use separate DataSource beans per shard in Spring Boot.
  • Route based on a key using AbstractRoutingDataSource.

3. Replication

Primary-Replica

  • Writes go to the primary DB.
  • Reads are routed to one of many replicas.

Spring Boot Setup

  • Use AbstractRoutingDataSource for read/write separation.
  • Combine with tools like PgBouncer or ProxySQL.

Failover

  • Use health checks to reroute traffic if primary fails.
  • Automate with Patroni (Postgres), Orchestrator (MySQL).

4. Challenges

  • Consistency: Use eventual consistency models for reads.
  • Transaction boundaries: Avoid cross-shard transactions.
  • Join limitations: Denormalize data if joins cross shards.

5. Monitoring & Maintenance

  • Track replication lag via Prometheus exporters.
  • Monitor shard growth and rebalancing needs.

  • Document Version: 1.0
  • Date: 2025-06-24
  • Author: ArturChernets