Query Scheduling

This video belongs to the openHPI course In-Memory Data Management. Do you want to see more?

Query Scheduling

Time effort: approx. 9 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

About this video

The last part of the query processing topic deals with query scheduling, determining the execution order of queries and operators.

To ease the understanding of this lecture video, we further want to explain some specific vocabulary used in this part:

Workers execute the tasks. Depending on the database’s architecture, a worker is an operating system process or thread. A process is a program in execution. It has an address space (for data to operate on), kernel resources (to access files), and at least one thread that executes code. Creating processes and threads induces overhead which does not contribute to query processing. Instead of spawning a new worker for every task to execute, databases usually use a fixed-size pool of workers and assign tasks to them. Workers run on CPU cores. For NUMA systems, workers should primarily execute near the data they operate on. Therefore, we can create a separate worker pool per socket and bind the workers to that socket. To feed the socket-bound workers, the database has one or more local task queues. Tasks are put into task queues so that workers primarily access socket-local data. In real-world applications, workloads are often highly skewed. If their task queue is empty, workers can steal tasks from other sockets’ task queues. The degree of how much work stealing is allowed depends on the distance between sockets, CPU load, saturation of the interconnection, and other factors.

If further questions remain, we are happy to address them in the forum!