Subordination
A framework for distributed programming
|
Every Subordination application is composed of computational kernels — self-contained objects which store data and have routines to process it. In each routine a kernel may create any number of subordinates to decompose application into smaller parts. Some kernels can be transferred to another cluster node to make application distributed. An application exits when there are no kernels left to process.
Kernels are processed by pipelines — kernel queues with processing threads attached. Each device has its own pipeline (there is a pipeline for CPU, I/O device and NIC) which allows them to work in parallel: process one part of data with CPU pipeline and simultaneously write another one with disk pipeline. Every pipeline work until application exit.
Each programme begins with starting all necessary pipelines and sending the main kernel to one of the them. After that programme execution resembles that of sequential programme with each nested call to a procedure replaced with construction of a subordinate kernel and sending it to appropriate pipeline. The difference is that pipelines process kernels asynchronously, so procedure code is decomposed into act()
routine which constructs subordinates and react()
routine which processes results they return.
The first step is to decide which pipelines your programme needs. Most probably these are standard
Standard pipelines for all devices except NIC are initialised in subordination/api.hh
header. To initialise NIC pipeline you need to tell it which pipeline is local and which one is remote. The following code snippet shows the usual way of doing this.
The second step is to subclass kernel
and implement act()
and react()
member functions for each sequential stage of your programme and for parallel parts of each stage.
grep
command in UNIX) parallel it is sufficient to construct a kernel for each file and send all of them to CPU pipeline. A better way is to construct a separate kernel to read portions of the files via I/O pipeline and for each portion construct and send new kernel to CPU pipeline to process it in parallel.Finally, you need to start every pipeline and send the main kernel to the local one via send
function.
Use commit
to return the kernel to its parent and reclaim system resources.
In general, there are two types of failures occurring in any hierarchical distributed system:
In Subordination the "node" refers both to a cluster node and to a kernel, failures of which are handled differently.
Since any subordinate kernel is part of a hierarchy the simplest method of handling its failure is to let its principal restart it on a healthy cluster node. Subordination does this automatically for any kernel that has parent. This approach works well unless your hierarchy is deep and require restarting a lot of kernels upon a failure; however, this approach does not work for the main kernel — the first kernel of an application that does not have a parent.
In case of the main kernel failure the only option is to keep a copy of it on some other cluster node and restore from it when the former node fails. Subordination implements this for any kernel with the carries_parent
flag set, but the approach works only for those principal kernels that have only one subordinate at a time (extending algorithm to cover more cases is one of the goals of ongoing research).
At present, a kernel is considered failed when a node to which it was sent fails, and a node is considered failed when the corresponding connection closes unexpectedly. At the moment, there is no mechanism that deals with unreliable connections other than timeouts configured in underlying operating system.
Cluster node failures are much simpler to mitigate: there is no state to be lost and the only invariant that should be preserved in a cluster is connectivity of nodes. All nodes should "know" each other and be able to establish arbitrary connections between each other; in other words, nodes should be able to discover each other. Subordination implements this functionality without distributed consensus algorithm: the framework builds tree hierarchy of nodes using IP addresses and pre-set fan-out value to rank nodes. Using this algorithm a node computes IP address of its would-be principal and tries to connect to it; if the connection fails it tries another node from the same or higher level of tree hierarchy. If it reaches the root of the tree and no node responds, it becomes the root node. This algorithm is used both during node bootstrap phase and upon a failure of any principal.
At high-level Subordination framework is composed of multiple layers:
Load balancing is implemented by superimposing hierarchy of kernels on the hierarchy of nodes: When a node pipelines are overflown by kernels some of them may be "spilled" to subordinate nodes (planned feature), much like water flows from the top of a cocktail pyramid down to its bottom when volume of glasses in the current layer is to small to hold it.
Subordination framework uses bottom-up source code development approach which means we create low-level abstractions to simplify high-level code and make it clean and readable. There are three layers of abstractions:
From the developer perspective it is unclear which layer is the bottom one, but still we would like to separate them via policy-based programming and traits classes.