This spec describes the implementation of multi-machine arcologies.
Basic Concepts
In a single-machine arcology, there is a primary process, known as Central Module, which launches several child modules (each implemented as a separate process). Central Module manages the lifetime of the child modules and is responsible for keeping some data synchronized among all modules (using Mnemosynth).
In addition, every module (including Central Module) creates an interprocess message queue to receive messages from other modules. When a module needs to send a message to another module, it deposits a message in the target module's queue.
A multi-machine arcology extends this architecture as follows:
There is a primary machine, known as Arcology Prime. This machine listens on the hexarc port (7397) for connections from other machines.
In order to grow the arcology we first need to add a machine. The user instructs a machine to connect to Arcology Prime (on 7397) and request to be added. We may need to sign in as an Arc.admin in order to complete the request.
Arcology Prime will create an entry in a database for the new machine. It will also issue the machine a private key for it to identify itself later (when it needs to reconnect). [This private key is unique for the machine, so we can easily blacklist it if necessary.]
Now the machine can connect to the arcology. It connects to Arcology Prime on 7397 and synchronizes Mnemosynth. Now it can boot up its local modules, which will also synchronize via Mnemosynth. Once fully booted, the machine listens for connections on 7397 to receive messages from other machines.
Just like the single-machine message queue, machines send messages directly to each other.
Walkthrough
Adding a New Machine
- We start by running arcology.exe with no parameters, which defaults it to Arcology Prime. We read Config.ars to load our storage and start all modules and engines.
- We run arcology.exe again, but this time we use the /arcology switch, passing it the address of Arcology Prime, which instructs it to run subservient to an existing arcology. [The switch also makes it default to use ConfigSecondary.ars.]
- NOTE: When installing Hexarc on the secondary machine, use the /arcology switch in the installation command. E.g.,
arcology /install /arcology:{name}
. - The secondary machines starts listening on 7397. [Note: the configuration file can choose a different port.]
- The secondary machine consults ConfigSecondary.ars and see if it has keys to arcology prime. If it does, then it continues with normal start up. If not, waits until Arcology Prime connects to it.
- To add the secondary machine we use AI2 on Arcology Prime and issue a command to add the machine, passing the address of the secondary. [We should optionally pass in a port number too.]
- Exarch (on Arcology Prime) generates a unique private key for the secondary machine and adds it to a database table (or perhaps to its own configuration file).
- Arcology Prime connects to the secondary machine and gives it its private key. The secondary stores it in its configuration file. It is now ready to communicate.
- Now that both sides have a private key, we communicate as follows: when A connects to B, B replies with a token that must be hashed with the private key. Once A provides the correct hash, B will mark its connection as authenticated and will accept messages from A.
Connecting to Arcology Prime
When a secondary machine boots up, it uses the following procedure to connect to Arcology Prime.
- When launching Arcology.exe, we pass in the network address of Arcology Prime and (optionally) the port number to listen to AMP1 messages. These parameters are what trigger Arcology.exe to behave like a secondary machine (instead of Arcology Prime).
- In CExarchEngine::OnStartRunning we load our configuration file and get the secret key that we need to communicate with Arcology Prime.
- We load all modules for the machine, as normal.
- When Mnemosynth has been synchronized across all modules, we call CExarchEngine::OnMachineStart (as normal). At that point, we send a
PING
message to Arcology Prime. We do this through Esper, and include our secret key in the message to Esper. - Esper keeps track of our connection to Arcology prime. Since we're not yet connected, Esper first connects and sends an
AUTH
message to Arcology Prime, passing the secret key. - Arcology Prime receives the
AUTH
message and validates the secret key. It tells Esper (its side) that the connection is valid, so future messages on that connection will be accepted. - Once it gets an acknowledgement that the
AUTH
message was received, the secondary machine sends the originalPING
message. - Arcology Prime receives the
PING
and makes sure that the connection is authenticated. If it is, it replies withWELCOME
. (Otherwise, it gets no reply.) - When the secondary machine receives the
WELCOME
message, it starts synchronizing its mnemosynth with Arcology Prime.
Communications
Arcology Prime creates a listener (on 7397) to process AMP1 messages. Esper is responsible for managing the protocol. Exarch processes the actual messages.
Exarch creates an AMP1 listener with Esper.startAMP1. When an AMP1 message comes in, Esper sends it to Exarch via Esper.onAMP1Message. Exarch can reply to the message depending on how it handled it.
Lesser machines create their own listener and receive messages from others in the same way.
AMP1
Arcology Message Protocol 1 (AMP1) is a session-based protocol designed for unidirectional message passing.
Command Format
A session works are follows:
- Sender machine connects to port 7397 on receiver machine.
- Sender sends an AMP1 command.
- Receiver replies with an AMP1 reply.
An AMP1 command has the following format:
codeAMP/1.00 keyword data-length CRLF
data CRLF
/code
Each command has five elements, separated by whitespace. Each element is described below:
- keyword: This is a token consisting of A-Z, a-z, 0-9, -, _, and $. It identifies the command being issued.
- data-length: This is the length of the data element, in bytes.
- data: The data element is a serialized Aeon datum. The contents of the data depend on the command.
An AMP1 reply has a much simpler format:
codeAMP/1.00 keyword CRLF
/code
ArcMin
Ideally, the Arcology.exe program should be able to act either as Arcology Prime or as a lesser machine. But we also propose a minimal version called ArcMin.exe. ArcMin has all the ability to connect to an arcology, but omits other engines. The goal is to more easily port this to other platforms.
Status
1/8/2017
- Fixed bug with reloading modules after restart. Now arcology remains stable after either primary or secondary is restarted. Initial operation goals reached!
1/3/2017
- Tested both Arcology Prime and the secondary restarting, and both reconnected with each other properly.
- Made sure Mnemosynth synchronized properly (including ports). Plus added code to invalidate the port cache after a module/machine restart.
9/20/2015
- Implemented a stub Luminous2 service. We can send the Luminous.status message to test.
- We can add modules to a (secondary) machine, but it does not get reloaded on a restart.
9/14/2015
- Worked on module.add command across machines. Command gets to the other machine, but haven't yet tested actually adding a module.
7/19/2015
- Intermachine ports are implemented.
- Mnemosynth synchronized to all machines.
- Need to remove mnemosynth endpoint when a machine is restarted (i.e., when a new machine connects with the same key).
- Still need to have Arcology Prime ping secondaries when it restarts.
- Need a way to install modules on remote machines.
5/10/2015
- arc.status machines implemented (to list status of machines).
- Secondary machines reconnect after restarting (but if Arcology Prime restarts, we need to restart everyone).
- We need to synchronize all machines via mnemosynth.
- Added debug options to list mnemosynth endpoints.
- We should add a way to send a message across machines. We can start with a message to Exarch to send an AMP1 message. Later we can optimize this by sending a message straight to Esper, but that requires a key (which only Exarch has right now).
2/15/2015
- arc.addMachine implemented.
- On startup, arcology prime should ping all its machines.
- On startup, secondary machine should ping arcology prime (if we have keys).
- We should keep track of the last time we heard from a given machine.
- Design a way for machine to advertise their ports (etc.).
- AI2 command to list machines (and current status).
- AI2 command to list ports (across all machines).
- AI2 command to list connections.
- Need design for managing modules (and engines?) across machines.
2/9/2015
- Run the normal arcology.
- Run a secondary machine process: /arcology:localhost /hexarcPort:7400
- In AI2, add a machine with: arc.addMachine localhost:7400
- The communications path between the two machines works, but the add machine command is not yet implemented.