More About I/O Keys

Introduction

We have introduced hierarchical data structures and talked a little about how managed typedef instances are returned using Inq's read() statement. Part of application design is to define the groupings of instances the application will manipulate. Inq formalises these groupings by expressing them as keys, which are then referenced in the read() and aggregate() statements that are used to build instance structures.

In this section we discuss in more detail the types of key supported by Inq, whether Inq caches the values yielded by a key and the relevant caching options when the typedef is bound to an SQL server.

The Structure of a Key Definition

A key definition comprises at most the following elements:

Key Fields

In the simplest case, a key is made up only of a subset of the fields of its enclosing typedef. It is possible, however, to create additional fields by referencing other typedef entities or aliases.

Eligibility Expression

If a key is cached Inq will try to maintain the instance(s) held against key values as instance fields are mutated. As we discuss further below, this is simple when the key yields instances only for equality, however if the underlying i/o configuration applies the key value some other way, say with an inequality or complex logical expression, a cached key requires an expression to mirror this logic. When instances are mutated, constructed or destroyed Inq uses the eligibility expression to ensure that any key value currently in the cache maps to the correct (set of) instances.

If a key is not cached then any eligibility expression is never used.

Key Volatility

As an alternative to an eligibility expression, a key can be defined as volatile. In this case the cache is discarded whenever any instance fields that comprise the key are modified.

Auxiliary Configuration

The so-called auxcfg data is information that is opaque to the key itself but applies to the i/o mechanism the typedef is bound to. Inq currently supports in-memory only and relational database systems, so the auxcfg data contains SQL statements and data for argument marshalling.

Explicit Foreign Key Fields

A key's auxcfg data may use foreign keys without these fields being part of the key definition itself. For Inq to maintain a key cache under these circumstances it must be told which typedef fields are being used as foreign keys by the i/o mechanism. As we will see in the examples below, the cache is discarded when an instance's explicit foreign key field is modified.

Primary, Unique and Non-Unique Keys

The Primary Key

Every typedef must have a primary key that is composed only of fields of the typedef itself, no matter whether the fields themselves were native declarations or defined using references to entities or aliases. Inq uses the primary key in the following ways:

  • The primary key is unique and its value is what defines the identity of a typedef instance to Inq.
  • The primary key is always cached, allowing Inq to manage a single internal object reference for a given instance. Inq can then propagate events to any processes holding such a reference.

Once in the managed state, the primary key field(s) of an instance cannot be modified. Attempting to do so causes a run-time error.

The name of the primary key is "pkey" and cannot be specified otherwise. There are also these restrictions that apply to the primary key:

  • the key must have equality semantics (that is it must only retrieve instances whose field(s) exactly match the key value);
  • an eligibility expression is not permitted.

Unique Keys

Any number of additional unique keys can be defined as part of a typedef's configuration. Not surprisingly, an error occurs if a transaction tries to create a new instance or modify an existing one to violate this condition.

A unique key value can return zero or one instance when applied to its typedef. If there is a mis-configuration between Inq and the underlying i/o binding (defined by the key's auxcfg) and this rule is violated, Inq reports an error.

Non-Unique Keys

Any number of non-unique keys can be defined as part of a typedef's configuration. There are no restrictions on how many instances a non-unique key value can return or, put another way, on how many instances can have the same value for a non-unique key.

Inq assumes a key is non-unique unless specified otherwise.

Example Key Definitions

To put all this together we will discuss examples showing all types of key definition and how these operate when the typedef is bound to an SQL server.

We will see how extending the simple cases introduced in the previous section covers the use of table joins, keys that include inequalities and the effect these issues have on whether the key should be cached.

To discuss these issues consider the following one-to-many relationship:

One To Many

A top-level legal entity has many regional sub-divisions, or counterparties, so that Cpty.Entity is a foreign key from Entity.Entity.

The attribute Entity.Type is an enumeration. Entity.GlobalLimit is a numeric that can take a positive value or null, meaning there is no limit. When reviewing the examples, sample SQL to create these types as tables is: Entity and Cpty

Cpty Primary Key

Primary keys are the easiest to define because they always take the same form and are restricted to:

  • equality semantics, so no eligibility expression;
  • caching cannot be disabled;
  • only typedef fields can be used.

When defining the key field(s), it doesn't matter if the fields themselves were defined by referencing other entity or alias definitions, just that the primary key's field list only specifies names from the typedef.

Here is the beginning of the Cpty typedef. As we are concentrating on key definitions, qualifiers to specify widths, labels and so forth have been omitted for clarity.

typedef Cpty
{
  fields
  (
    string        Cpty;
    LongName;
    Entity;
    Currency      BaseCurrency;
    Country       DomicileCountry;
    Active;
    LastUpdated;
    User;
  )
   .
   .
   .

The primary key must be the first key to follow the definition of the fields. Here is what it looks like:

  pkey
  (
    fields (Cpty)

    #include <{db}/xyCpty.pkey.sql>
  )

The primary key is introduced with the keyword pkey. The fields construct is a comma separated list of identifiers that must be typedef field names.

The most complex part of the primary key is the auxcfg expression. As we saw in the discussion of typedefs, it can be useful to place this in a separate file.

Inq uses the primary key not only to read instances from i/o but also to create, modify and delete them. When binding to a relational database the auxcfg data is a map of well-known (map) keys to SQL statements. Here are some examples

MySql Using Plain SQL

This example uses plain SQL. If you are familiar with JDBCTM, that is SQL that contains all arguments as formatted strings. This method is not as manageable as using prepared statements as outlined in the next example but can be used in the unlikely event the JDBC driver for your database implementation does not support them.

The select-stmt key maps to a string that contains the select statement used in the primary and all other keys. It is common to use a table alias in preparation for more complex SQL in other keys that joins with other tables.

The read-sql element uses a substitution syntax applied once against the auxcfg map and on each use against the value supplied for the primary key. This substitution syntax can use formatting strings in the same way as MessageFormat does for positional arguments.

The write-sql element contains the SQL to update or create a single typedef instance. As we see when processing the LastUpdated column, a format string to render a date appropriate for the SQL implementation and required granularity for the application must be included.

Lastly, the delete-sql element contains the SQL to delete a single typedef instance using the same where clause as read-sql.

MySql Using Prepared SQL

Here is another example but this time implemented using the JDBC prepared statement syntax. Inq will use a prepared statement when the element prepared is included with the value of boolean true. This method has the advantage of not requiring conversion of the arguments into strings, so no formatting is required. Everything is expressed in terms of the underlying Inq data types and their mapping to those of the table columns, meaning that there is no inference of precision in the format strings. The small disadvantage is that prepared statements require ordering of the arguments whereas the symbolic reference of Inq's plain SQL parameterised syntax did not. Depending on your chosen database, this may require more work, as shown in the Oracle example below. For MySql, though, we are in luck.

Inq marshalls the pkey (read) and typedef (write) fields in the order they appear in their respective fields clauses. The prepared version is therefore not only better but also easier than the plain version.

MySql Using A Stored Procedure

If we specify the well-known key stproc then Inq will invoke the SQL as a stored procedure call. In terms of SQL maintenence this method has the disadvantage that the select statement cannot be reused in this or any other Inq key defined in the typedef, because it is in the database engine.

This approach uses the same parameter marshalling as prepared statements. Note that Inq sets all arguments as input-only and expects the stored procedure to return a result set (in this case, containing a single row).

Oracle Equivalents

The Oracle plain SQL version illustrates how the necessary string formatting can become more complex. The prepared statement version is easier, however we do need to explicitly marshall the arguments. The write-order element is used by Inq to reference the typedef fields as required by the SQL syntax. Inq also supports a read-order, which is used in the read-sql and delete-sql statements. In this case the SQL statements are not complex and the default ordering of the key fields is appropriate. We will see how read-order is used when discussing keys used for filtering.

Cpty Key By DomicileCountry

Suppose we had determined while analysing the application that we needed a list of Cpty instances selected by the DomicileCountry field. Such a key could be defined like this:

key ByDomicileCountry
(
  fields(DomicileCountry)
  auxcfg( map(
  "prepared", true,
  "read-sql",
    "
      {select-stmt}
      where DomicileCountry  = ?
    "
  ))
)

All keys other than the primary must be named. As a convention, these names are of the form By<KeyFields>. Using only typedef fields, this key does not add much to our discussion other than to demonstrate:

  • the select-stmt defined in the primary key is available by symbolic reference;
  • there is no need to use an include file when the syntax of what is specific to the key is trivial (that is, invariant amongst the SQL engines the application will run on).

This is a non-unique key, that being the default in all cases except the primary. To define a unique key, specify the unique keyword after the name:

key BySomeUniqueFieldSet unique
(
   .
   .

Joining Entity and Cpty

In data designs with normalised relationships it is common to define a set of instances of one typedef that are determined by a foreign key join to another. In our Entity and Cpty example, we would like the set of Cpty instances for a given value of the related Entity.Type field. We define a key to do that like this:

key ByEntityType cached=true
(
  fields(Entity.Type EntityType)

  foreign(Entity)

  auxcfg( map(
  "prepared", true,
  "read-sql",
    "
      {select-stmt},
      XYEntity E
      where  C.Entity = E.Entity     // Table Join
      and    E.Type   = ?            // Foreign field
    "
  ))
)

The first thing to note is that the key field is not part of the Cpty typedef, so instead it is defined by making a reference to Entity.Type. Had we left it there, the key field would have the name Type, however we provide the localised name of EntityType to emphasise the fact that we are importing the field. Providing an alias like this does not have any effect other than to rename the field.

By default, Inq does not cache keys that include references to other typedef entities or aliases, that is when one or more key fields is not part of the enclosing typedef. There are two reasons why the cache of EntityType values could become invalid

  1. the value of the foreign key Type in a related instance of Entity changes;
  2. the foreign key field Entity in an instance of Cpty changes.

Respectively, these events take place 1) in a different typedef and 2) on a typedef field that is not part of the key. Inq automatically handles the first case. When a key field is defined using a typedef field reference then (applying this example) Inq will flush that key's cache under the following circumstances:

  • the value of Entity.Type is changed in any instance of Entity;
  • an instance of Entity is created or destroyed.

In the second case, the join of the Entity and Cpty database tables is actually implemented in the SQL statement. This is opaque to Inq and if the key was not cached would be of no consequence. If, as here, we elect to cache the key by specifying cached=true then we must also inform Inq of foreign key usage by the i/o configuration using a foreign clause. This is a comma separated list of any typedef fields that are not themselves already key fields

Again, Inq flushes the cache when Cpty.Entity changes or an instance of Cpty is created or destroyed.

Keys With Inequalities

In the examples presented so far, the i/o configuration has used the key value in simple equality expressions, that is, the where clause is of the form <table_column> = <key_value> and there has been a one-to-one correspondence between a key value and the set of instances it yields. If an application design requires more than this then there are further considerations when configuring cached keys.

Here is an example for the Entity typedef that uses a range to define the set of qualifying instances:

key ByGlobalLimitRange cached=true
(
  fields(Entity.GlobalLimit GlobalLimitLower,
         Entity.GlobalLimit GlobalLimitUpper)

  eligible
  (
    $this.instance.GlobalLimit  >= $this.key.GlobalLimitLower &&
    $this.instance.GlobalLimit  <= $this.key.GlobalLimitUpper
  )

  auxcfg( map(
  "prepared", true,
  "read-sql",
    "
      {select-stmt}
      where  E.GlobalLimit >= ?
      and    E.GlobalLimit <= ?
    "
  ))
)

In this case the where clause tests that the table column GlobalLimit is within the range defined by the key fields GlobalLimitLower and GlobalLimitUpper. Inq supports this kind of key with the following features:

Key Fields

The key fields are not part of the typedef and are defined using references. In this example we have referenced a field of the enclosing typedef, although this is no different from any other entity or alias reference.

Caching

As we saw above, defining fields using references turns off caching. We would like to enable it so we explicitly turn it back on.

Eligibility Expression

For Inq to correctly maintain the cache an eligibility expression must be defined that duplicates the logic of the SQL where clause. This takes the form of eligible(<statement>) and is called when Inq commits a transaction.

The eligible expression returns true or false and is passed two arguments via $this. The path $this.instance refers to the instance being committed (whether created, mutated or destroyed) while $this.key refers to a key value held in the cache.

Inq may invoke the eligibility expression on a given instance more than once, each time with a different key value, depending on the current state of the cache.
Note
If caching is disabled then the eligibility expression is not used, however not defining one in cached keys with an inequality will produce undefined results.

Although we have defined a non-unique key in this example, the use of an eligibility expression also applies to unique keys. Such a situation would apply, for example, with a data type whose instances include a from and to date field where no two instances can have overlapping date ranges.

Keys as Filters

Lastly, we cover the implementation of arbitrary filters. It is common for applications to have "browser" style front end GUIs that define a set of input parameters. Users can set specific values or wild-card a parameter so that a narrow or wide set of instances can be returned. Suppose we would like to define a filter to yield the set of Cpty instances according to their

  1. DomicileCountry
  2. BaseCurrency
  3. Active
  4. Entity
  5. related Entity.Type
  6. an upper and lower bound of the related Entity.GlobalLimit

Here is a key definition for Cpty to accomplish that:

key Filter volatile=true
(
  fields(DomicileCountry,
         BaseCurrency,
         Active,
         Entity,
         Entity.Type EntityType,
         Entity.GlobalLimit GlobalLimitLower,
         Entity.GlobalLimit GlobalLimitUpper)

  #include <{db}xyCpty.Filter.sql>
)

The volatile option implicitly enables caching and causes the cache to be flushed whenever any typedef field that is part of the key is mutated, or when an instance is created or destroyed. This is an extension of the behaviour described above for key fields defined by references and explicit foreign keys. Had the volatile qualifier been omitted the Filter key would default to non-cached.

There is a foreign key field Entity, which although used in the SQL join, is an Inq key field and because the key is defined as volatile, an explicit foreign clause is not necessary.

If we look at the SQL statement we can see that it has quite a complex structure including tests for NULL, which is the technique for wildcarding that filter parameter. This is typical of filtering keys. No eligibility expression is required because Inq does not maintain the cache as the native key fields are changed. The use instead of volatile is based on the assumptions that:

  1. Use of the key is driven by the users themselves, so is perhaps relatively infrequent in comparison to the activity generated by a batch process.
  2. The types involved are part of the application's static data, that is they are not updated often so the cache remains valid most of the time

Even though this example uses a join we could still have declared it as a cached key and supplied an eligibility expression. The expression would be this:

eligible
  (
    ( $this.instance.DomicileCountry == $this.key.DomicileCountry ||
      isnull($this.key.DomicileCountry) ) &&

    ( $this.instance.BaseCurrency == $this.key.BaseCurrency ||
      isnull($this.key.BaseCurrency) )    &&

    ( $this.instance.Entity == $this.key.Entity ||
      isnull($this.key.Entity) )

    ( $this.instance.Active == $this.key.Active ||
      isnull($this.key.Active) )

    ( $this.instance.GlobalLimit >= $this.key.GlobalLimitLower ||
      isnull($this.key.GlobalLimitLower) )

    ( $this.instance.GlobalLimit <= $this.key.GlobalLimitUpper ||
      isnull($this.key.GlobalLimitUpper) )

  )

The Entity field must again be declared in a foreign clause and because of this need not take part in the test for eligibility.

Capped Keys

Where a key could return a very large number of items a cap can be specified. This is an integer value that specifies the maximum number of instances the key will return minus one. Inq must exceed the specified value to determine that there would be more instances returned on a given application.

Here is an example:

  .
  .
typedef date SysDate format="dd MMM yyyy";
  .
  .

typedef Price
{
  fields
  (
    Instrument;
    SysDate RowDate;

    Price   BidPrice.
    Price   MidPrice;
    Price   AskPrice;
  )

  .
  .

  // Filter used by GUI frontend.
  key Filter cached=false max=1000
  (
    fields(Instrument,
           typedef SysDate FromDate,
           typedef SysDate ToDate)

    auxcfg(map(
        "prepared", true,
        "read-sql",
            "
            {select-stmt}
            WHERE (XY.Instrument = ? OR ? IS NULL)
            AND (XY.RowDate >= ? OR ? IS NULL)
            AND (XY.RowDate <= ? OR ? IS NULL)
            ",
        "read-order",
            array a = (
                "Instrument",         "Instrument",
                "FromDate",           "FromDate",
                "ToDate",             "ToDate"
            )
    ))
  )

  .
  .

}
Note
The typedef alias SysDate is being used to define two key fields, FromDate and ToDate. When using typedef aliases to define key fields the keyword typedef is required to resolve the ambiguity of referring to the enclosing typedef field SysDate.

The example represents a Price type where an instance is held per Instrument per RowDate. The key Filter would return all available instances if the value whose fields were all null were applied. While it might be expected that validation of input values would trap this situation it is safest to cap such a key using the max = <integer-literal> qualification.

Capped keys cannot be cached. If a key would be cached by default then caching must be explicitly disabled. Usage of capped keys is covered when discussing building node structures.

Choosing Caching Options

If a key is not cached then Inq will always apply the key value to the underlying i/o mechanism. The preceeding examples have shown various configuration options, all of which relate to maintaining the integrity of a key's cache as the loaded instances move through their life-cycle. Why does Inq place such an emphasis on caching?

A design goal is this regard has been to minimise the use of SQL server resources. The SQL server may be distant from the Inq server or perhaps is heavily loaded with a number of different applications. Set against this, as may have become clear by now, is the fact that Inq writes out modified instances one by one. In common with many persistence mapping mechanisms, the ability to perform a set-based update is compromised by the desire to use middleware for the application logic, although Inq does support synchronising with the external repository (see below) to permit a mixed implementation.

The following table summarises when Inq will default to use caching:

Cache Behaviour
Key Field Defined As
Native Entity Ref Alias Ref Explicit Foreign
Cached By Default YES NO NO NO
Cache Discarded When Non-volatile Never On update of referred field in any referred typedef instance Never - the key field does not exist anywhere else On update of referring field in any referring typedef instance
Volatile On update of any key or explicit foreign field

Maintaining a cache has overheads so when it is not enabled by default there is a trade-off to be made, taking into account the following:

  • The volatility of the data, that is how frequently instances would be modified such that the key would be affected and how often instances are created and destroyed.
  • This table describes the events on which an eligibility expression is run:
    Eligibility Expression Execution
    Eligibility Expression Executes For:
    Every Cached Key Value Key Value(s) Instance Cached Against
    Mutation of Key Field Old tick
    New tick
    Create tick
    Destroy tick
    Consideration therefore needs to be given to how many instances there could be loaded overall and how broad a range of key values an instance is likely to be cached against.
  • As an alternative to defining eligibility, consider using volatile in cases where a high cache hit rate is less of a concern.

Of course, it may not be clear which route to take until the system actually runs for some period, but it is straightforward enough to modify the configuration and use experimentation instead.

Resynchronising Caches

A common problem with systems that cache data is how to handle the case when the data is changed in the external source. Any managed instances currently loaded into one or more caches potentially become stale and may corrupt their external storage if subsequently written back. Inq offers a solution at two levels:

  • If the set of instances is known and their primary keys can be determined, the resync function synchronises an instance with its current state in external storage. This may result in its creation, mutation or destruction. The resync function uses the current transaction to place the instance in its appropriate life-cycle state, raising any necessary events as if the instance had been manipulated inside the Inq environment.
  • If the instance set cannot be determined or when it is known that most or all instances will have changed (for example when a SQL table has been reloaded) the expire function can be used to clear all caches and place all loaded instances of a typedef into the unmanaged state. The typedef and transaction sections cover instance states further. When unmanaged, an instance no longer participates in a transaction and cannot be written to external storage. The expire function raises an expire event on the specified typedef.

The resync and expire functions are covered further in the section on events.

Review Your Keys

It is very important to review key configuration for consistency and completeness. A key without an explicit foreign key declaration, an eligibility expression or an eligibility expression that does not match the SQL where clause will produce inconsistent and undefined results.