Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Data / ML, Engineering

Herb: Multi-DC Replication Engine for Uber’s Schemaless Datastore

July 25, 2018 / Global
Featured image for Herb: Multi-DC Replication Engine for Uber’s Schemaless Datastore
Figure 1: The replication system we designed needed to address five key concerns.
Figure 2: This figure represents mesh replication topology with ‘n’ data centers, i.e. Herb connects each schemaless instance to all other instances in the other data centers.
Figure 3: In this example, there are five tasks, three of which are assigned to the Herb worker on Host1 while the two remaining are handled by the Herb worker on Host2.
Figure 4: The median replication performance of Herb, is 550-900 cells received per second and a latency of around 82 to 90 milliseconds. That latency also includes network communication time between data centers, which comes to around 40 milliseconds.
Figure 5: Our log files as they are stored on disk, and each record is stored in a data file. The corresponding offset is stored in an index file.
Figure 6: Each of Herb’s replication cohorts contain a few Schemaless instances split according to their needs. For instance, a high-traffic instance may share resources with one low-traffic instance, so both can be in one replication cohort. We also have dedicated replication cohorts for testing, staging, and other purposes.
Figure 7: Herb prevents data centers from receiving out-of-order updates.
Figure 8: Herb’s continuous checking framework reads across Schemaless nodes and validating order. As in this figure the data block represent cluster1 of both the data center and we can see that updates (101, 301, 401 and 501) originated in data center A and both the data center have them in same order. Similarly for update (201, 601) have same order across data centers.
Himank Chaudhary

Himank Chaudhary

Himank is the Tech Lead of Docstore at Uber. His primary focus area is building distributed databases that scale along with Uber's hyper-growth. Prior to Uber, he worked at Yahoo in the mail backend team to build a metadata store. Himank holds a master's degree in Computer Science from the State University of New York with a specialization in distributed systems.

Posted by Himank Chaudhary