Datamancer is the codename for a Hexarc service to query and visualize data sources. We support the following example use cases:

  • Querying the Ministry of Records: For example, we might want to know how many records were created in July 2017.
  • Querying the server logs: For example, what are the most common URLs that returned 404.
  • Querying the Transcendence game log: For example, what are the most common causes of death.

Core Concepts

The service is deployed to https://datamancer.kronosaur.com.

A data table is a well-defined data structure consists of 1 or more columns and 0 or more rows. A data table has a globally-unique, hierarchical ID. Each column in the data table has an ID (unique in the table) and a datatype.

Some data tables are defined by the system (e.g., server logs) and users may access them according to their rights. Other data tables are uploaded or liked by users (and may potentially be shared with other users).

A function takes 0 or more inputs and returns one of the supported datatypes (a data table is a first-class datatype). Inputs can be data tables, row ranges, column names, etc. Many functions can be cached for as long as the underlying data table does not change.

A program is a named text file (stored on the service) with 0 or more definitions. A definition is a named object that can be invoked either by the system or by another definition. A function is a type of definition.

A program can define pages and visualizations to display an interactive UI. If no visualizations are defined, the program runs as a console program with text input.

Programs have a unique URL for editing and for running. Thus I can create a dashboard as a program and return again and again to it by URL.

A console is a visualization consisting of a text input box and a scrolling output pane. The text input accepts any function invocation. The output shows the return value of the function.

Under the covers, the system is responsible for creating appropriate indices and caching them appropriately.

In the fullness of time we should support automation concepts such as auto-running programs, triggers for events (e.g., send an email if x data shows up), etc.

Example: Transcendence Game Stats

For this example, imagine we want to list out the most common epitaphs.

  1. We start by creating a program called, "Epitaphs by Frequency".
  2. We can list out all the tables that we have access to with the $User.tables variable. This returns a table of all tables accessible to the current signed-in user. We obtain this list by asking all Hexarc services for the set of tables that they have for the given user. [There is also $Dev.tables if we want to expose the set of tables accessible to the creator of the program.]
  3. The Trans service returns a table called, Trans.stats.
  4. We can use: Table("Trans.stats") to return a table reference to that table. And we use the column operator to get the epitaph column: Table("Trans.stats"):epitaph.
  5. To output the frequency, we use: print Distinct(Table("Trans.stats"):epitaph). This returns a table of all unique epitaphs with counts.

Implementation Notes

  • We use Auton for most of the heavy lifting.
  • In general, we load everything in memory to process it. Some functions might support batch processing so that they don't have to load everything. For example, a function like Distinct could probably read straight from the source.
  • We ask all services in the arcology to provide us with data table definitions for the given user. For example, when asking Ministry we would get a table for each program that the user has access to. And we would call back to Ministry to get the actual data so that we honor access control at the record level. Since we load it all in memory, we only have to do this once (at load time).
  • Each service is responsible for returning the actual data in a well-known format.
  • For server logs, we may need to expose the concept of default range. Unless otherwise specified, we only operate on the default range. For example, server logs might have a default range of 90 days (or something). We provide a way for the query to override the default range.

Service Messages

We implement a way to send messages to services (e.g., Ministry). Hyperion owns all services, so we send it through it:

  • We add a new message: Hyperion.sendTo {serviceName} {message} {params}.
  • {serviceName} is the prefix, e.g., "Ministry".
  • {message} is defined by the service, and implemented as a public function with the arc.messageHandler attribute.
  • Hyperion will call the function from a session (to handle async calls) and return the result.
  • We assume that all messages finish within the time-out. If not, services should expose a state-based interface.
  • We also add: Hyperion.broadcast {message} {params}. Hyperion will send to all services and aggregate into a single response (a struct keyed by service name).

Loading Data

  • We implement loading with service messages (see above).
  • We define a data source ID syntax in which the first segment identifies a service.
  • We use a broadcast to request a list of data sources from every service.
  • We can then send a message to a specific service to load a specific data source.

Implementation Plan

Phase 1

  1. Create a Datamancer service with associated Archon.
  2. Add appropriate AI2 console commands to test basic functionality.
  3. Implement callback to other services to list available tables.
  4. Integrate Auton to read data from other services.

Phase 2

  1. Start with a shell for the service. The home page lists all programs accessible to the user (includes all programs they created plus any public programs they added). Clicking on a program runs it. [There could also be a button to edit a program.]
  2. There is a button to create a new program, which creates a new untitled program and navigates to it in edit mode.
  3. In edit mode you type a program and save it. Saving a program checks for syntax errors. [For now, errors appear in an error bar.]
  4. In edit mode, there is a button to save and a button to run (which automatically saves).
  5. Running navigates to the program in run mode. For now, run mode can always go to a console visualization.