Replication

This video belongs to the openHPI course In-Memory Data Management. Do you want to see more?

Replication

Time effort: approx. 11 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

About this video

Since the use of in-memory technologies for enterprise databases, new applications have been developed. These applications attract a growing number of users, who submit increasingly complex queries for interactive applications. The resulting increased workload requires scalability. Instead of scale-up, i.e. exploiting an increasingly large shared-memory server, scale-out, i.e. exploiting an increasing number of interconnected servers, is recognized to be the cheaper, more flexible, more resilient, and more scalable approach. Database replication, i.e., the duplication of data to additional machines, is a means to implement scalability.

The analysis of enterprise workloads shows that the majority of queries are read-only. Replicas can execute these read queries on snapshots of the data without violating data consistency. A single machine, called master, is responsible for transaction handling. Data changes are propagated continuously to the replica machines to keep them synchronized. Replication approaches can be classified with regards to which machines are allowed to receive transactions, what data is duplicated, and how replicas are updated with transactional changes.

Eager and lazy replication specify when replicas are synchronized. Eager replication propagates updates to all replicas as part of the transaction. In contrast, lazy replication postpones the synchronization of replica nodes to optimize transaction latencies and communication.

– Master and group replication differ in their approach of allowing write operations on specific nodes. Master replication allows transactions that alter data only on a dedicated node, called master or primary, which is responsible to propagate changes to the replicas. Contrary, group replication is an update everywhere strategy and allows executing write queries on every node.  

– The updates of replica instances can be performed either logically or physically. This is similar to the choice of how to persist database operations by logging. Logical replication describes updates on a higher level, such as SQL statements. Physical logging provides information on a lower level with regard to the used data structures, for example specifying the old and new value for an attribute that is changed by a query.  

Homogeneous and heterogeneous replication differ in the way which data is replicated. Homogeneous replicas are exact mirrors of the master. Heterogeneous replicas store subsets of data or optimize their data structures for specific fractions of the workload. They can, for example, store data in different layouts as well as deploy specific indices or materialized views.