EMF-IncQuery advanced issues (v. 0.4)
This page is relevant for EMF-IncQuery version 0.4 and before! For the current version, go here!
FAQ (v. 0.4)
Q: What does EMF-IncQuery call the 'name' of a graph pattern?
It is actually the fully qualified name, which is the fully qualified name of the containing machine, a separator dot, and the simple name in the pattern (the one you define it with after the pattern keyword). The fully qualified name of the machine, in turn, is the simple name of the machine, prefixed by the machine namespace (if any).
Q: What is the complexity of pattern matching?
Well, in the worst case the complexity is of course very bad, as the match set itself on the output can be as large as n^k where n is the size of the model and k is the size of the pattern. In our typical use cases, however, the situation is much better, as patterns are relatively small, their constraints are pretty restrictive, and thus the size of the match set does not often explode combinatorically in practice.
The memory usage will be at least the size of the model + the size of the match set, and actually a bit more than that because some intermediate results are also cached and incrementally maintained. As of now, we do not carry out any really sophisticated (RDMS-grade) query optimization, so in some unfortunate cases these "intermediate results" can grow much bigger than the match set - although usually they don't. A good advice here is to look for small, connected parts of patterns that occur more than once, and refactor them into a separate helper pattern that will be called from all the original occurrences using the 'find' keyword.
Query evaluation time is pretty much instantaneous. Actually at low level, the result set has has to be copied, but then usually you will want to iterate over it anyways, so this won't be the dominant cost. Note that I said result set, not match set, so getOneMatch() can be much faster than getAllMatches(). If you have some bound input parameters (as opposed to retrieving all matches globally), then the restricted result set is even smaller, and is accessed with a cheap hash-based lookup.
Update time (i.e. model manipulation overhead) can be quite unpredictable, but it is certainly related to how many new matches appear or old ones disappear due to the modification, and also the amount of change in the internal caches and indices. Very often, a single change in the model only makes a bounded amount of change in the match set and the internal caches, and is therefore cheap.
Initialization time is composed of "model reading" and then filling up the caches and indices accordingly. The latter one is basically the update overhead on model element creation times the size of the model, as the mechanism is almost the same as the one that will maintain the cache afterwards upon model updates. As for model reading, the current version (see below) traverses the entire model once, when the first pattern matcher is constructed.
Q: How and when are the match set caches initialized?
You can attach a pattern matcher engine on an EMF root (preferably EMF ResourceSet or Resource, but potentially any containment subtree). In the current version, at most one pattern matcher engine is built for each of these EMF roots. It is constructed when you first initialize a pattern matcher on that root, and the next time you instantiate a pattern matcher (for the same or a different pattern) on the same root, it will reuse the underlying engine. This also means that as soon as you instantiate the first pattern matcher, all your patterns are already loaded and cached, and you already have to pay the memory and update overhead. The alternative would have been to initiate patterns in the engine on demand; this is possible, but then there would have been many repeated "model read" traversals over the entire ResourceSet, which would boost initialization costs significantly. We are thinking about finding a meaningful balance.
Q: How does one use attributes?
Use the EAttributes as relation types in the pattern definition to navigate from the EObject to its attribute value; let's say the variable AttrVariable is now bound to an attribute value. Afterwards, the raw value can be used in a check() condition by unwrapping the variable representing the value; e.g. toInteger(value(AttrVariable)), toDouble(value(AttrVariable)), etc. For EEnums, use toString(value(AttrVariable)) and compare it against the string representation of the enum literal.
Q: How are null-valued features represented in the query language?
Unset or null-valued attributes (or references) simply won't match, as there is no referenced EObject or attribute value to substitute in the target pattern variable. If you are especially looking for these, use a negative application condition.
Q: How does the generic pattern matcher work?
There is a "generic" matcher with a corresponding generic signature class; they are not as easy to use as the pattern-specific generated classes, but they conform to the same reflective interfaces. You can get an instance merely by providing the (fully qualified) name of the pattern. (At some point in the future, you will be able to obtain a similar generic matcher by constructing a pattern description at run-time.)
Q: What are MatcherFactory classes for?
Normally, you do not call the constructor of a generated Matcher, but use its static FACTORY field instead to attach a matcher to an EMF model. There is also a GenericMatcherFactory for the generic matcher.
MatcherFactory classes can create the appropriate (generated / generic) Matchers with some type-safe Java generics magic. This might be useful in a trigger engine or other very generic system, as you can collect a large number of matcher factories that contain all knowledge about the pattern, and parametrize them later to specify the actual EMF model to match against. Using Java generics, you can have a method that handles signature objects in match sets in a type-safe way for each matcher created from the collection of factories; e.g. see the custom prettyPrint override for InheritanceDiamondConstraint in the Papyrus example.
Q: What is included in the query results if the matcher is attached to an EMF root that is not the entire ResourceSet, just some Resource or containment subtree?
Every EObject in the containment subtree below the selected EMF root will be considered for pattern matching, as well as their attributes and the EReferences interconnecting between them. EReferences pointing outward from the subtree, as well as the elements they are directly pointing to, currently may or may not be considered (depending on complicated things), so do not assume either case. Nothing else will be considered.
Q: What is the delta monitor, and how to use it?
It is a device that can be attached to the pattern matcher. From that time on, it will keep track of newly appeared matches, as well as previously existing but disappeared ones. So by default it monitors the difference between the current state and the time it was created. You can remove these matches from the delta monitor yourself to 'acknowledge' them; for that particular match, the difference (appearance / disappearance) from that time on will be displayed instead. See the Papyrus example for actual usage (now superceded by an implementation using the EMF-IncQuery validation framework).
Q: I want to depend on a metamodel that is already part of my Eclipse installation, i.e. I do not have the genmodel and source packages. How do I reference it from the incquery.generatormodel of my EMF-IncQuery plug-in?
Often the .genmodel file is exposed by the binary build of the plug-in, you can verify it using Eclipse PDE ("Import as source project" in Plug-ins View). So when setting up the incquery.generatormodel file, issue "Load Resource..." and instead of clicking on a Browse button, just specify the resource URI (from the platform: namespace) manually in the text box. An example for such URI would be:
platform:/plugin/com.example.plugin/model/exampleStuff.genmodel
Afterwards, you can just add a reference to the EMF GenModel as usual.