Tuesday, February 12, 2013

Stateful object history vs change journal DB patterns

During creation of retail sales module I have come over 2 different patterns dealing with history of object.
On one side each atomic operation need to be preserved for audit reasons. Which makes a patter of journalling of change records. The current object state in such pattern is a sequential combination of journal records. In the best case it is just a last record.  In worst it is a set of business rules applied over the sequence. In order to keep track of current state all records should be preserved.

As alternative instead of keeping change records as primary source for current state the object's state itself could serve as historical record. That would eliminate the need for extra table per entity(change one).

As the history is still is requirement, each change should have a matched record in HISTORY namespace. This namespace could reside as in another DB as in same but under another high-availability profile(partition, etc.)

There is another frequently used pattern which goes along with history track. Comments. The comment often accompanied the change but on another hand comment could be just a verbal instruction. While the 1st fits into change content, the comment without a change looks as overkill as it wasting the object state for "no reason". But if we look from the data integrity prospective the object state is a part of comment. I.e. without object state as whole comment has no track value.

The interesting side effect of history pattern is that schema will be preserved as on current objects snapshot as in history namespace. I.e. there is no need to increase DB schema complexity to add history track.

Having the state embedded into object rather using journal assumes the data from change will embedded into object. Normalized schema will allow to preserve minimal data footprint ( the change will have only changed fields and reference to obj ). Fusing the change and other object fields will preserve. Could the increase of db table size be justifiable? And what are the criteria?

1. DB schema simplicity. As stateful object itself comprise all potential changesets there is no need for keeping relations between change and object. During prototyping phase it is a strong argument. As well as in  conditions of stressed development resources.

2. DB performance effect.
The journalling allows to insulate change operations and avoid affecting of other object properties. Here the current object state is not simple extraction: query or complex procedure are giving the cumulative result. The optimization of such cases leads to creating the cached (either withing object or aside) "current" state. Which in fact is same pattern as stateful object except of some complexity on top of it. As stateful object reflects all kind of changes, tuning of each use case still in place(indexing, partitioning,etc). But it gives ability to separate statistics and troubleshooting( history review) optimizations within dedicated environment(HISTORY namespace). The most real-time namespace is "current" state is extracted from way larger volume of historical data and as result could be held in high-available profile.

The sync with history for stateful object could be done over generic queue-ing. Which will allow to decouple
RT and HISTORY namespaces. Obviously sync need to be in place when looking on RT data withing HISTORY. But that is a rare case as most of HISTORY operations(troubleshooting or reports) are done way beyond of queue flush time( days vs minutes)

3. DB integrity. While journalling permits to use strictly normalized schema, it is also subject for DB corruption or cost of transactional lock. Unlike that, operations over stateful object are atomic by definition(no object references) and do not require any locks.

4. For stateful pattern the State flow implementation gain simplest implementation. Either service which does the change initiates next step in the flow. Or it could be done natively by DB triggers (not my case but DB-centric apps will value it a lot).

5. Security. Stateful approach also given ability to create extra DB user profiles which in case of ShoppingCart could be a business/audit requirement. More discreet access to historical vs current state data, excluding any changes in HISTORY namespace are shaping true multi-tiered security.
Conclusion. If the project is stable and size + performance dominate the  development cost, journal records pattern could win. In other cases (including mine) Stateful pattern is the way to go. Hooray to conscious simplicity!

No comments:

Post a Comment