Efficient Instance-level Model Validation by Incremental Query Techniques - preliminary

Model Driven Development systems exploit the benefit of instance model validation and model transformation. Ever-growing model sizes used for example in critical embedded systems development require more and more efficient tools. The most time consuming step during model validation or model transformation is the model query step. This benchmark aims to measure batch style query and incremental style query of existing OWL (or RDF) and EMF based tools. 

Authors: Benedek Izsó, Zoltán Szatmári, István RáthÁkos HorváthGábor Bergmann, Balázs Polgár, Gergely Varró and Dániel Varró

Case Study Overview

Our experimental validation uses a domain-specific model of a railway system to demonstrate that instance-level ontology validation can be carried out orders of magnitude faster along this mapping compared to ontology reasoner tools. We demonstrate our approach using an example from the railway domain. The domain metamodel originates from the MOGENTES EU FP7 project, and the requirements were defined by railway domain experts.
 

Metamodel

The metamodel is defined in EMF language:

A train route can be defined by a set of sensors between two neighboring signals. The state of a signal can be stop (when it is red), go (when it is green) or failure. Sensors are associated with track elements, which can be a track segment or a switch. The status of a switch can be left, right, straight or failure. A route can have associated switch positions, which describe the required state of a switch belonging to the route. Different route definitions can specify different states for a specific switch.

 

Constraints

Several high-level requirements can be specified that must hold for any valid instance models of a train system. The queries of the following constraints must return constraint violating elements.

Requirement 1

Name SwitchSensor
Textual description Every switch must have at least one sensor connected to it.
Graphical graph pattern
Graph pattern

pattern switchSensor(Individual) = {
        Switch(Individual);
        neg find hasSensor(Individual);
}

pattern hasSensor(Trackelement) = {
        Trackelement(Trackelement);
        Sensor(Target);
        Trackelement.TrackElement_sensor(Trackelement, Target);
}

OCL query

context Switch:
oclIsKindOf(Switch) and self.TrackElement_sensor->isEmpty()

Requirement 2

Name PosLength
Textual description A segment must have positive length.
Graphical graph pattern
Graph Pattern pattern posLength(Source, Target) = {
        Segment(Source);
        Segment.Segment_length(Source, Target);
        check((Target as Integer) <= 0);
}
OCL query

context Segment:
oclIsKindOf(Segment) and Segment_length <= 0

Requirement 3

Name RouteSensor
Textual description All sensors that are associated with a switch that belongs to a route must also be associated directly with the same route.
Graphical graph pattern
Graph pattern pattern routeSensor(Sen, Sw, Sp, R) = {
        Route(R);
        SwitchPosition(Sp);
        Switch(Sw);
        Sensor(Sen);
        
        Route.Route_switchPosition(R, Sp);
        SwitchPosition.SwitchPosition_switch(Sp, Sw);
        Trackelement.TrackElement_sensor(Sw, Sen);
        
        neg find head(Sen, R);  
}

pattern head(Sen, R) = {
        Route.Route_routeDefinition(R, Sen);
}

OCL query

context Sensor:
oclIsKindOf(Sensor) and not self.Sensor_trackElement->forAll(te|
   te.oclIsKindOf(Switch) implies te.oclAsType(Switch).Switch_switchPosition->forAll(sp|
      sp.SwitchPosition_route.Route_routeDefinition->includes(self)))

Requirement 4

Name SignalNeighbor
Textual description A route is incorrect, if it has a signal, and a sensor connected to another sensor by two track element, but there is no other route that connects the same signal and the other sensor. This pattern checks for the absence of cycles, so the efficiency of join is tested. One-way navigable references are also present in the constraint, so the efficient evaluation of these are also tested.
Graphical graph pattern
Graph pattern pattern signalNeighbor(R1) =
{
    find exitSignalSensor(SigA, R1, Sen1A);
    find connectingSensors(Sen1A, Sen2A);
    find rDefinition(R3A, Sen2A);
    R3A != R1;
    neg find entrySignalSensor(SigA, _R2A, Sen2A);
}
    
pattern exitSignalSensor(Sig, R1, Sen1) =
{
    find exitSignal(R1, Sig);
    find rDefinition(R1, Sen1);
}
    
pattern entrySignalSensor(Sig, R2, Sen2) =
{
    find entrySignal(R2, Sig);
    find rDefinition(R2, Sen2);
}
    
pattern entrySignal(R, Sig) =
{
    Route(R);
    Signal(Sig);
    Route.Route_entry(R, Sig);
}

pattern exitSignal(R, Sig) =
{
    Route(R);
    Signal(Sig);
    Route.Route_exit(R, Sig);
}
    
pattern rDefinition(R, Sen) =
{
    Route(R);
    Sensor(Sen);
    Route.Route_routeDefinition(R, Sen);
}
    
pattern connectingSensors(Sen1, Sen2) =
{
    find sensorTrackelement(Sen1, Te1);
    find sensorTrackelement(Sen2, Te2);
    find trackelementConnected(Te1, Te2);
}
    
pattern trackelementConnected(Te1, Te2) =
{
    Trackelement(Te1);
    Trackelement(Te2);
    Trackelement.TrackElement_connectsTo(Te1, Te2);
}
    
    
pattern sensorTrackelement(Sen, Te) =
{
    Sensor(Sen);
    Trackelement(Te);
    Sensor.Sensor_trackElement(Sen, Te);
}

OCL query context Route:
oclIsKindOf(Route) and self.Route_exit->exists(signal |
       self.Route_routeDefinition.Sensor_trackElement.TrackElement_connectsTo.TrackElement_sensor->exists(sen2 |
                Route.allInstances()->exists(route2X |
                        (route2X <> self) and route2X.Route_routeDefinition->includes(sen2)
                )
                and not (
                        Route.allInstances()->exists(route2 |
                                route2.Route_routeDefinition->includes(sen2) and route2.Route_entry = signal
                        )
                )
        )
)

Instance model generation

The model generator instantiates the metamodel and generates EMF instance models, with size ranging from a few thousand elements (nodes and edges) up to approx. 14*109. The generator randomly introduced faulty element configurations into the model (with low percentage). The distribution of nodes belonging to a type and outgoing degree of typed edges are depicted in the following diagram:

Benchmark description

The benchmark simulates a typical model validation scenario in a model-driven system development process where small and local modifications are applied to large models (i.e. the knowledge base changes), and well-formedness constraints are continuously re-evaluated to check the correctness of the result model and highlight design flaws for engineers. A benchmark scenario is built up from well defined phases.

Phases:

  • Read: In the first phase, the previously generated instance model is loaded from hard drive to memory. This includes parsing of the input, as well as initializing data structures of the tool. The latter can consume minimal time for a tool that performs only local search, but for incremental tools indexes or in-memory caches are initialized.
  • Check, re-check: In the check phases the instance model is queried for invalid elements. This can be as simple as reading the results from cache, or the model can be traversed based on some index. Theoretically, cache or index building can be deferred from the read phase to the check phase, but it depends on the actual tool implementation. To the end of this phase, erroneous objects must be available in a list.
  • Modify: In the first part of this phase, elements to be modified are selected. This is not measured, since it involves tool-dependent queries, and query performance is measured in the check phase in a more systematic way. The time of the second part is recorded, which consists only of instance model edit operations, like modifying objects, or deleting relations.

During the benchmark, the time of each phase is measured using nanotime precision (which does not mean nanotime accuracy). This is why the highest resolution is 1 ms in the presented diagrams.

Technical details

Measurement environment

The exact versions of the used software tools and parameters of the measurement hardware are shown in the following list.

Sources:

Metamodels, tool specific benchmark codes and helping Linux scripts are available for download from SVN.

Measurement results

Benchmark diagrams can be downloaded from here. The following four diagram types are defined:

  • Batch validation: is the sum of the read and first check phase
  • Incremental validation: is the median of 100*(1*edit+1*re-check)
  • Batch Transformation: is the total time of all phases (read+first check+10*edit+re-check)
  • Memory usage: heap memory usage measured at the end of the execution

Batch validation

SwitchSensor query

PosLength query

RouteSensor query

SignalNeighbor query

Incremental validation

SwitchSensor query

PosLength query

RouteSensor query

SignalNeighbor query

Batch transformation

SwitchSensor query

PosLength query

RouteSensor query

SignalNeighbor query

Memory usage

SwitchSensor query

PosLength query

RouteSensor query

SignalNeighbor query