Providing incremental updates for queries, part II: expression evaluation

After the overview presented in my previous blog post, I am now going to discuss incremental expression evaluation, a.k.a. the eval() language element, which is my most recent contribution to EMF-IncQuery. The examples continue to rely upon the School metamodel introduced here.

Computed values

Queries very often refer to attribute values stored at objects. Sometimes this is not enough, and additional values must be computed based on the attributes in order to express queries. This computation is defined by a deterministic function expressed by an appropriate sublanguage of the query formalism.

Let us see, for example, an OCL expression that computes a tuition fee for each course based on the course weight:

context Course def: tuitionFee() = (100.0 + self.weight * 40.0)->max(450.0)

The previous blog post contained an equivalent EMF-InQuery graph pattern:

pattern courseTuitionFee(c: Course, fee) = {
    Course.weight(c, weight);
    fee == eval(Math.max(450.0, 100.0 + weight * 40.0));
}

Computed values vs. simple checks

Very often, the computed value is merely a true/false logical distinction whether the given elements should be included in the query results. For example, the following EMF-IncQuery graph pattern selects only those courses that have a weight exceeding 8 points:

pattern importantCourse(c: Course) = {
    Course.weight(c, weight);
    true == eval(weight > 8);
}

For this common use case, EMF-IncQuery provides a more convenient syntax as well, which has already been discussed before:

pattern importantCourse(c: Course) = {
    Course.weight(c, weight);
    check(weight > 8);
}

Computed values vs. EMF-IncQuery

Having seen the motivation behind function evaluation in queries, let us focus now on the current capabilities of EMF-IncQuery.

Since the inception of the earliest prototypes and experiments in 2009, EMF-IncQuery has always supported the check() construct. Up to now, however, the more general case of eval(), where the result of the computation is captured by a variable and can be referred to, was not supported.

Thanks to my latest development sprint, nightly builds of EMF-IncQuery now provide experimental support for eval() constructs with the Xbase expression language. While the new feature is not yet well-tested, simple constructs such as the pattern courseTuitionFee above work fine. Future iterations will improve internal indexing. Another remaining goal is to integrate the type inference mechanism of the Xbase language - at the moment, the result of the evaluation is treated as a generic Object by the type inference of EMF-IncQuery.

Anatomy of a language feature

Let us now see what steps were required to implement such a new feature.

Rete-level. The incremental evaluator engine powering EMF-IncQuery is based on the Rete algorithm. Pattern matching is reduced into a Rete network, which is a system of incrementally computable elementary operations, each expressed as a Rete node.

The feature eval() required the addition of a new type of Rete node, that is capable of evaluating an Xbase expression based on the values of "input" pattern variables, and bind the non-null result to a new pattern variable. In case the expression evaluates to null (or throws an Exception), the combination of input variables is discarded, as it is not considered a match in this case. In case of a non-null result, the tuple of input variables (as substitued by the parent Rete node) is extended by the value of the result variable, and this "output" tuple is emitted by the Rete node (to be consumed by child Rete nodes). At the same time, the internal cache of the node saves the mapping of the input tuple to the output tuple. This helps to avoid recalculation: when the input tuple is later deleted (due to changes in the model), the corresponding output tuple can be looked up (and removed from) this cache, and then the retraction of the obsolete output tuple can be emitted by the Rete node; if the cache lookup returns empty, then there was no output tuple propagated before, so no action needs to be taken.

The previous blog post exemplified the basic behaviour of this new Rete node in case of the pattern courseTuitionFee: when the value of the weight attribute is assigned for a new course, the tuition fee will be computed using a complex formula, and the match (the pair formed of the course and its newly computed fee) will be stored in the incremental cache. When the weight of the course is changed, the course-weight pair will be removed from the cache of the constituent elementary query; the fee that was once computed for course can be looked up, and the result removed from the match set cache.

Query planning. When the user requests the results of a previously unseen graph pattern, the EMF-IncQuery engine performs a chain of steps to design a construct a Rete network that incrementally maintains the result set of the pattern. This process involves reasoning about the variables and constraints of the graph pattern, for instance determining which variables need to be known before a certain constraint can be satisfied, and what additional variables will be substituted each way the constraint is satisfied.

As eval() expressions can be embedded in pattern constraints (e.g. as one of the arguments of a find constraint, or an equality constraint as shown in the above examples), the structure of the pattern must be simplified (canonicalized) so that query planner can process it. The logic in case of eval() expressions (similarly e.g. to count aggregations) is that the evaluation result is treated as a new anonymous variable in the pattern constraint where it is used, and eval() itself is represented by a separate pattern constraint that binds the result to this new variable.

There is no way to evaluate the expressions until the referenced variables are substituted, so the eval() Rete node can only be attached to a parent node where all of these variables already have a value. Moreover, different constraints may impose different types of variables; for type safety, evaluation is performed only when the input variables are guaranteed to be substituted for elements of the strictest type. Finally, the evaluator Rete node substitutes a value for the new result variable, so in some cases there is another constraint that the query planner can only schedule after the eval() computation. Alltogether, these are restrictions that the query planner needs to observe when building the Rete net.

Language extension. Finally, eval() was added as a new language element. While this in itself is a very easy step (thanks to the power of the Xtext grammar), it gives rise to a large amount of supplementary tasks. They range from simply providing a new entry in the Outline view of the Eclipse-based EMF-IncQuery pattern editor; through extending the code generator and expression compiler; to fairly complex adjustments in the extensive set of static validators that support the query engineer by pointing out potential mistakes in the query code.

The most involving part of the incurrend tasks would be the integration of Xbase type inference to learn the types of eval() results. This is important for type checking (which is another kind of static validations), but also because the inferred types can be used by the code generator of EMF-IncQuery to synthetize more user-friendly patter-specific query APIs. Due to the complexity involved, this integration effort is left as future work, and the current version approximates the type by the universal supertype Object.

Acknowledgement: this research was supported by the European Union and the State of Hungary, co-financed by the European Social Fund in the framework of TÁMOP 4.2.4. A/-11-1-2012-0001 ‘National Excellence Program’.