• distributes data across different databases such that each database can only manage a subset of the data
    • use a hashing function to determine which shard each row belongs to
    • can also use initials, geographical location, etc
  • similar to the results of federation, sharding results in less read and write traffic, less replication, and more cache hits
  • index size is also reduced, which generally improves performance with faster queries
disadvantages
  • application logic needs to be updated to work with shards = complex sql queries
  • data distribution can become lopsided - need a good hashing function / rebalancing