Two-phase initialization is an architectural pattern for artificially breaking and managing coupling between strongly coupled components. The motivation and implementation of this pattern are not always obvious, so I will give a couple of examples to demonstrate.
Let’s take an operating system as an example. Some of the components involved in the initialization of the operating system are the I/O manager, the memory manager, the object manager and many others. At runtime, the strong coupling between the various components is obvious and beneficial – they tend to use each other, all the time.
However, during system startup, these dependencies (especially if startup is performed synchronously) can lead to a dead end. For example:
- The memory manager initializes. It needs to create shared memory objects (section objects) to represent binary images being loaded during system startup. This requires a trip to the object manager.
- The object manager initializes. It needs to allocate memory for the system handle table and for the actual resources being created. This requires a trip to the memory manager.
Another example can be taken from an ESB infrastructure I have been implementing lately. The infrastructure services include a configuration service, a publish/subscribe service and a “DNS”-style service. These services are typically used by other system components, but they also need each other:
- The configuration service initializes. It needs to register itself in the “DNS” service to be accessible by other system components.
- The “DNS” service initializes. It needs to obtain its configuration and use the pub/sub service to register for configuration change notifications.
- The pub/sub service initializes. It needs to obtain its configuration and register itself in the “DNS” service to be accessible by other system components.
Disentangling these dependencies can be done in various ways. For example, we could say that the infrastructure services are not allowed to use each other – the pub/sub service will use local configuration, the “DNS” service will have a predefined list of registered endpoints, etc.
However, in an operating system we can’t resort to a solution in which the object manager manages its own memory, and the memory manager manages its own objects.
The only feasible alternative is two-phase initialization.
When using two-phase initialization, infrastructure components initialize in two phases. In the first phase, they do not rely on any other components to reach a stable state in which they are able to provide basic services to the rest of the system. In the second phase, they transition to a fully-functional state in which they rely on other components (which have not necessarily reached the second phase yet).
Using this model in our example, the “DNS” service can start with a predefined list of endpoints that will be used to communicate with the infrastructure services while they are in the first phase. In the second phase, these predefined endpoints will be replaced by the actual endpoints for the actual services. The pub/sub service can start with a local configuration during the first phase, and retrieve its configuration when the configuration service becomes available (enters the first phase), and so on.
Providing a generic implementation for all infrastructure and non-infrastructure services to account for two-phase initialization is exceptionally difficult, but achievable if the proper metadata is in place. Components must provide metadata regarding their explicit dependencies and ways to make forward progress while these dependent components are not yet available.
This sounds simple, but in reality it really isn’t. Multiple issues plague the two-phase initialization pattern, but do not undermine its principal validity:
- Transitioning between initialization phases might require a significant amount of work. For example, the pub/sub service might use a database to store the subscription information, and when transitioning to the second phase (by talking to the configuration service) the connection string to the database might have changed.
- Deadlocks can be introduced into the startup sequence if initialization is not carefully asynchronous.
- Terrible race conditions can be introduced into the startup sequence if it is not carefully synchronized for multiple threads of execution.
- Lots of noise is generated in the system while it’s restarting or when some components are being reinitialized.
The two-phase initialization approach is used by Windows. In the first phase (called phase 0), initialization proceeds in a single thread and bring up only the minimal services required for the second phase. In the second phase (called phase 1), system components can rely on other components being present to start transitioning into their fully-functional state.
To summarize, two-phase initialization is difficult to manage and implement, but in the real world where components circularly depend on each other there is rarely a better alternative.