Welcome to Cross

Cross, the Common Runtime Object Support System is a Java API for definition, creation and execution of sequential workflows.

Sequential Assembly

A workflow in Cross is made up of a sequence of fragment command objects that use file fragments as their in- and output type. The number of in- and output file fragments processed by a fragment command can differ, thus allowing map-reduce-like processing schemes or generally schemes with different or equal parities. The basic configuration of all workflow elements is performed using a Spring Application Context and Spring Beans - based xml configuration, supplemented by runtime properties.

Validation

Cross allows fragment commands to define their required variable fragments by adding class-level annotations. Additionally, fragment commands may define which variable fragments they provide. Thus, Cross can validate the accessibility of all variables required by a workflow before the workflow is actually executed. This helps avoid running computationally expensive workflows on invalid data.

Monitoring and Transformation

A workflow monitors the fragment commands it executes and notifies reqistered listeners of various workflow-related events. These include the creation of primary and secondary processing results, as well as general progress information. A workflow logs all completed tasks and their results in a distinct and unique (depending on configuration) self-contained (except for initial input data) output directory. This output directory contains all information necessary to re-run the workflow with the exact same parameters and conditions. Workflows in Cross are therefore self-descriptive and repeatable.

Efficient Data Structures

A file fragment is an aggregation of variable fragment objects, defined by a storage location URI. File fragment objects may reference an arbitrary number of source files, thereby allowing virtual aggregation of processing result variables of previous fragment commands. Shadowing allows file fragments to hide the existence of an upstream variable of the same name from downstream file fragments. DataSource implementations allow different URI extensions to be handled, so that file fragment objects can exist as simple files on disk or within a distributed database system.

Caching of Intermediate Results

File fragments have access to a user-defineable caching implementation. Currently, Ehcache and db4o (in memory and on disk), as well as a mock in-memory hashmap-based cache are available.

Controlled Vocabulary

Cross variables have simple String-based names. However, in different contexts, the same variable name could have a different meaning. Thus, Cross supports namespaced controlled vocabularies for specific domains that translate a variable placeholder name to the actual, cv-supported clear name.

Parallelization

Cross uses the Mpaxs API for transparent parallelization of Runnable and Callable tasks either within the local virtual machine or on other remote machines that are coordinated through remote method invocation(RMI). Mpaxs therefore provides a standard Executor and Future compatible implementation to allow for easy scale-out of parallel jobs. Scaling up and down with the required amount of parallelization can be managed automatically by Mpaxs for example using its OpenGridEngine (OracleGridEngine) compliant compute host launcher implementation. Mpaxs uses a round-robin scheduling method to utilize all available hosts as fair as possible.