Efficient Instance-level Model Validation by Incremental Query Techniques - preliminary
Model Driven Development systems exploit the benefit of instance model validation and model transformation. Ever-growing model sizes used for example in critical embedded systems development require more and more efficient tools. The most time consuming step during model validation or model transformation is the model query step. This benchmark aims to measure batch style query and incremental style query of existing OWL (or RDF) and EMF based tools.
Authors: Benedek Izsó, Zoltán Szatmári, István Ráth, Ákos Horváth, Gábor Bergmann, Balázs Polgár, Gergely Varró and Dániel Varró
Case Study Overview
Our experimental validation uses a domain-specific model of a railway system to demonstrate that instance-level ontology validation can be carried out orders of magnitude faster along this mapping compared to ontology reasoner tools. We demonstrate our approach using an example from the railway domain. The domain metamodel originates from the MOGENTES EU FP7 project, and the requirements were defined by railway domain experts.
Metamodel
The metamodel is defined in EMF language:
A train route can be defined by a set of sensors between two neighboring signals. The state of a signal can be stop (when it is red), go (when it is green) or failure. Sensors are associated with track elements, which can be a track segment or a switch. The status of a switch can be left, right, straight or failure. A route can have associated switch positions, which describe the required state of a switch belonging to the route. Different route definitions can specify different states for a specific switch.
Constraints
Several high-level requirements can be specified that must hold for any valid instance models of a train system. The queries of the following constraints must return constraint violating elements.
Requirement 1
Name | SwitchSensor |
Textual description | Every switch must have at least one sensor connected to it. |
Graphical graph pattern | |
Graph pattern |
pattern switchSensor(Individual) = { pattern hasSensor(Trackelement) = { |
OCL query |
context Switch: |
Requirement 2
Name | PosLength |
Textual description | A segment must have positive length. |
Graphical graph pattern | |
Graph Pattern | pattern posLength(Source, Target) = { Segment(Source); Segment.Segment_length(Source, Target); check((Target as Integer) <= 0); } |
OCL query |
context Segment: |
Requirement 3
Name | RouteSensor |
Textual description | All sensors that are associated with a switch that belongs to a route must also be associated directly with the same route. |
Graphical graph pattern | |
Graph pattern | pattern routeSensor(Sen, Sw, Sp, R) = { Route(R); SwitchPosition(Sp); Switch(Sw); Sensor(Sen); Route.Route_switchPosition(R, Sp); SwitchPosition.SwitchPosition_switch(Sp, Sw); Trackelement.TrackElement_sensor(Sw, Sen); neg find head(Sen, R); } pattern head(Sen, R) = { |
OCL query |
context Sensor: |
Requirement 4
Name | SignalNeighbor |
Textual description | A route is incorrect, if it has a signal, and a sensor connected to another sensor by two track element, but there is no other route that connects the same signal and the other sensor. This pattern checks for the absence of cycles, so the efficiency of join is tested. One-way navigable references are also present in the constraint, so the efficient evaluation of these are also tested. |
Graphical graph pattern | |
Graph pattern | pattern signalNeighbor(R1) = { find exitSignalSensor(SigA, R1, Sen1A); find connectingSensors(Sen1A, Sen2A); find rDefinition(R3A, Sen2A); R3A != R1; neg find entrySignalSensor(SigA, _R2A, Sen2A); } pattern exitSignalSensor(Sig, R1, Sen1) = { find exitSignal(R1, Sig); find rDefinition(R1, Sen1); } pattern entrySignalSensor(Sig, R2, Sen2) = { find entrySignal(R2, Sig); find rDefinition(R2, Sen2); } pattern entrySignal(R, Sig) = { Route(R); Signal(Sig); Route.Route_entry(R, Sig); } pattern exitSignal(R, Sig) = |
OCL query | context Route: oclIsKindOf(Route) and self.Route_exit->exists(signal | self.Route_routeDefinition.Sensor_trackElement.TrackElement_connectsTo.TrackElement_sensor->exists(sen2 | Route.allInstances()->exists(route2X | (route2X <> self) and route2X.Route_routeDefinition->includes(sen2) ) and not ( Route.allInstances()->exists(route2 | route2.Route_routeDefinition->includes(sen2) and route2.Route_entry = signal ) ) ) ) |
Instance model generation
The model generator instantiates the metamodel and generates EMF instance models, with size ranging from a few thousand elements (nodes and edges) up to approx. 14*109. The generator randomly introduced faulty element configurations into the model (with low percentage). The distribution of nodes belonging to a type and outgoing degree of typed edges are depicted in the following diagram:
Benchmark description
The benchmark simulates a typical model validation scenario in a model-driven system development process where small and local modifications are applied to large models (i.e. the knowledge base changes), and well-formedness constraints are continuously re-evaluated to check the correctness of the result model and highlight design flaws for engineers. A benchmark scenario is built up from well defined phases.
Phases:
- Read: In the first phase, the previously generated instance model is loaded from hard drive to memory. This includes parsing of the input, as well as initializing data structures of the tool. The latter can consume minimal time for a tool that performs only local search, but for incremental tools indexes or in-memory caches are initialized.
- Check, re-check: In the check phases the instance model is queried for invalid elements. This can be as simple as reading the results from cache, or the model can be traversed based on some index. Theoretically, cache or index building can be deferred from the read phase to the check phase, but it depends on the actual tool implementation. To the end of this phase, erroneous objects must be available in a list.
- Modify: In the first part of this phase, elements to be modified are selected. This is not measured, since it involves tool-dependent queries, and query performance is measured in the check phase in a more systematic way. The time of the second part is recorded, which consists only of instance model edit operations, like modifying objects, or deleting relations.
During the benchmark, the time of each phase is measured using nanotime precision (which does not mean nanotime accuracy). This is why the highest resolution is 1 ms in the presented diagrams.
Technical details
Measurement environment
The exact versions of the used software tools and parameters of the measurement hardware are shown in the following list.
- Measured tools
- Java
- EMF-IncQuery 0.7.1.20130704131
- Eclipse OCL 3.3.0.v20130610-1317
- Eclipse OCL CG (Code Generation)
- OCL Impact Analysis 3.2.100.v20130520-1527 (combined with Eclipse OCL for the batch mode)
- 4store
- Drools 5.2.0
- Praxis
- Pellet 2.1.1
- Sesame 2.5.0
- Jena 2.6.4
- OpenVirtuoso 6.1.3
- Stardog 0.9.2.1
- Allegro Graph 4.5
- ModelAnalyzer of Egyed
- Neo4j
- MySQL
- (RacerPro Version 2.0 2011-07-11)
- Environment
- Eclipse: Kepler
- Java version 1.6
- Ubuntu precise 64 bit
- Hardware
- CPU: Intel(R) Xeon(R) CPU L5420 @2.50 GHz
- 32 GB RAM
Sources:
Metamodels, tool specific benchmark codes and helping Linux scripts are available for download from SVN.
Measurement results
Benchmark diagrams can be downloaded from here. The following four diagram types are defined:
- Batch validation: is the sum of the read and first check phase
- Incremental validation: is the median of 100*(1*edit+1*re-check)
- Batch Transformation: is the total time of all phases (read+first check+10*edit+re-check)
- Memory usage: heap memory usage measured at the end of the execution
Batch validation
SwitchSensor query
PosLength query
RouteSensor query