Data Rules and Routing Engine

DRRE

The Data Rules and Routing Engine was first developed for a major university in the UK to fix problems with an Identity Management system I had designed, installed and was supporting. This used 100% proprietary technologies, excelled at some things, but struggled with others. One very important area in which it struggled was the application of complicated business rules to data the was being processed. It used a sequential if-else-else-else approach which struggled with large rulesets (such as those mapping naming conventions between different systems). In addition, rules ended up distributed among multiple interdependent parts of a distributed system, making changes very hard to test.

The DRRE was first conceived as a piece of software with three main functions:

Accept data from event-driven processes, or retrieve it when these do not exist (e.g. standard SQL databases). Do this for a large range of protocols
Process the data through a rules engine capable of dealing with very large rulesets quickly and efficiently
Send the processed data to other applications and databases through a large range of protocols

Budget was an issue, and so open source projects were an obvious place to look for this functionality. It soon became clear that a application that united the functional of the Mule ESB (for receiving and send data), and the Drools rules engine would fill the requirements. Both of these are programmed in Java, and so Java would be used to programme the application. The Data Rules and Routing Engine was born.

The DRRE was successful in complementing the proprietary technology in use by taking over responsibility for business rules. The new system performed 16 times faster than the old, and made it easier to test and make changes to business rules.

Components

The DRRE uses the functionality offered by two fantastic open source projects (which do all the really clever stuff):

The Mule Enterprise Service Bus http://www.mulesource.org is great for providing interfaces into and out of java classes across a wide variety of protocols. It can do much more than this, but (apart from simple transformations on incoming or outgoing data), that's all the DRRE uses it for
The Drools Rules Engine http://www.jboss.org/drools/ processes rulesets quickly. Even better: rules can be expressed in spreadsheets, making then far easy to maintain

Drools operates on data that is stored in Java classes. A class is normally defined at design time and immutable at runtime. However, not all users of the DRRE would be Java developers, and it would be pain to have to recompile and redeploy the application whenever the nature of the data changed. It would be far better to define the classes which will hold the data in a configuration file, and only build them when the application was started. CGLIB made it easy to use dynamically created Java classes and, since it creates proper JavaBeans, these are compatible with Drools and Mule with no additional trickery needed.

Data matching

The DRRE is a tool that allows users to combine data stored in multiple data stores in a single store, and manipulate this data. In Identity Management such a store must be available and reliable before anything more useful (such as provisioning of access rights) can be attempted. Attempts to do such things without such a store lead to duplication, omissions and confusion.

A critical part of combining data from multiple data sources is matching data to ensure that data from Person A in System 1 is combined with the data for the same person A in System 2. Ideally this is done by comparing shared unique keys, but these are not normally available. This means that the data must be matched using whatever data attributes do exist (name, date of birth etc) in both systems. This is prone to error thanks to names and dates of birth not being unique to a single person! Possible matches must therefore be dealt with and, while we are about it, typos should be allowed for as well. Also, the linked systems are autonomous and data will still be input into them (complete with mistakes). Matching must therefore be ongoing, rather than one-off or a calendar process, and (since we link to event-driven systems) work in real time.

The DRRE provides for all of these when teamed with a Postgres database to store matching data. It can even to used to match data from different sources without an Identity store.

Data matching is important enough to make it a fourth function of the DRRE.

Oh, and there's Spring

We needed a way of storing dynamic class definitions, reusing beans and generally ensuring the everything was managed sensibly in terms of threading and object re-use. Spring was the obvious choice for this (after all, Spring promotes interfaces and the DREE is really just an interface between systems).

Current project state

A a beta version of the source code (much of it had been tested extensively in production and so is production-ready, but not all of it yet), and some example Spring configuration files have been uploaded to SourceForge. These provide example deployments such as syncing from a text file to a database, a text file to a directory, and a database to a directory. They are sparsely commented and documentation is a priority at the moment. A first version of this should be available early in 2009, so check back then, Meanwhile, download the code, and the dependencies listed below, load it all up in the development environment of your choice and see if it looks useful.

Download

The DRRE can be downloaded from http://sourceforge.net/project/showfiles.php?group_id=246899

Dependencies

Spring 2.5

Mule 1.4.4 (currently testing with 2.x)

Drools 4.x

CGLIB

OpenCSV

JDK 1.4 or greater (but some dependencies may require later version), this will change to Java 5 soon, so you are recommended to use Java 5.