Before Sharding - Steps and Alternatives

A Guide to When You Should Shard Your Database

By Abhishake Sen Gupta

Understand the Problem

  • Identify whether the issue is slow reads or slow writes.
  • Analyze the slowest queries and understand why they are underperforming: Analyze slow queries with this video.
  • Optimize your schema and indexes before considering sharding.

Horizontal Partitioning

  • Definition: Split the database into different ranges based on the partition key (often the primary key).
    • Example: Users A-M in one range, N-Z in another.
  • Benefits:
    • Smaller B-trees for indexes.
    • Improves query efficiency by reducing data scanned.

Vertical Partitioning

  • Definition: Separate infrequently accessed columns into a different table.
    • Example: Move metadata or logs into a separate table.
  • Benefits:
    • Faster reads for frequent queries (as they query fewer columns).
    • Smaller B-trees reduce memory usage.
  • Tradeoff: Slower access for queries involving the partitioned columns.

Master-Slave Replication (Read Optimization)

  • Setup:
    • Master handles all writes.
    • Slaves handle all reads.
    • Master syncs updates to slaves.
  • Benefits:
    • Reduces read load on the master.
    • Improves scalability for read-heavy workloads.

Master-Master-Slave Replication (Write Optimization)

  • Setup:
    • Multiple masters handle writes across regions (e.g., EAST and WEST).
    • Slaves handle reads.
    • Masters sync with each other and push updates to slaves.
  • Benefits:
    • Distributes write load across regions.
    • Enables regional failover and redundancy.
  • Tradeoff:
    • Synchronization delays between masters.

Shard Only as a Last Resort

  • When to Shard:
    • When you hit database capacity limits that partitioning and replication cannot solve.
  • Challenges with Sharding:
    • Adds complex client logic to manage shard queries.
    • Reduces ACID guarantees:
      • No transactions or isolation between shards.
    • Coupling between database and application logic.
  • Tools: Use Vitess to help manage sharding logic.

Additional Resources

Share: Twitter Facebook LinkedIn