Project Description

The View-Plug-Ins Programming on database development is an interface-oriented programming. This methodology has been practiced widely in many fields (e.g. OOP programming) for a long time. Here just focus on database development and introduce a few meta-tables and helper packages (for Oracle, or set of stored procedures for SQL Server). The purpose is to simplify the design of many complex batches processing, useful for ETL, reporting and DWH solutions.

The central motive behind View-Plug-Ins is also a Branches-Leaves model of database development. Branches are stored procedures of flowchart while leaves are views of particular implementation. It helps to remind people pruning fruit trees during the growing season (in early stages of software lifecycle, from the system requirement analysis to design stage). Then in later maintenance stage, it would be more clear to organize/see the whole picture from the root to branches if the program structure of SQL scripts like a deciduous tree in winter.

Interface Design

The process of interface designing prefers more converse thinking. Especially in realization phase, the whole programming steps is from Target (output) Model to Source (input) Models, as shown following:
Figure1

Once we get a clear and clean data requirement, then to abstract, induce and simplify into a unified target model is a prerequisite of interface-oriented designing.

  • The Target Model (above figure) defines the data model;
  • A Interface defines a cluster node to approach one kind of required data;
  • A view-plug-in defines a method model of data extraction.

Figure2

Two basic aims of bringing in the Interface and the Plug-ins:

Isolation

To isolate the differences of data model, transform different model from many sources to a common target model, follow by classification, labeling/tagging etc. It helps the code to be loose-coupled while the data keep tightly-coupled. To test each independent view can be much clearer than to test a tangled of steps in sp, make the test-driven database development simple.

Unification

A complex system can always be divided into a few clear-cut sub-systems with logical loose coupling integration. For database system design, the principle of division depends on the induction of data model. Then each target sub-system can be treated as one data model, the differences have been transformed into just some attributes in the same model.

Meta Tables

There are only 4 tables about meta data need to be maintained for above designed model:

  1. EXTRACT_SERVICE
  2. EXTRACT_INTERFACE
  3. EXTRACT_PLUGIN
  4. EXTRACT_RULE
db_diagram

1. EXTRACT_SERVICE

An Extraction Service encapsulates a set of related Interfaces in the form of a service layer:
  • In design-time, the Service is used to organize interfaces into application domains (projects). A Service is a management unit to build code generation and for Pre-deployment (see the later section [Pre-deployment] for detail).
  • In run-time, all Interfaces under a Service share the same session context for each Batch.

2. EXTRACT_INTERFACE

Sample:
INTERFACE_ID SERVICE_ID UNION_VIEW SELECT_LIST DESCRIPTION_
PRD_CLS APP_DOMAIN XYZ.VIEW_ALL_PROD BATCH_ID, SEC_ID, PROD_NAME Prod Identification
... ... ... ... ...
Columns
- INTERFACE_ID: The unique identifier for the interface, consider a naming convention within the enterprise.
- SERVICE_ID: The extract service (application) of this interface.
- UNION_VIEW: (Also called Hub-View). The name of view which will union all plug-ins under the same interface.
- SELECT_LIST: The select list in the select statement, every plug-in view of the same interface will follow this signature.
- DESCRIPTION_:
 

3. EXTRACT_PLUGIN

Sample:
PLUGIN_ID INTERFACE_ID PLUGIN_VIEW PLUGIN_ORDER DESCRIPTION_
PRD_CLS_SRM PRD_CLS XYZ.VIEW_PROD_CLS_SRM 1 Identify prods from Srm
PRD_CLS_SCM PRD_CLS XYZ.VIEW_PROD_CLS_SCM 2 Identify prods from Scm
PRD_CLS_DEA PRD_CLS XYZ.VIEW_PROD_CLS_DEA 3 Identify prods from Dea
PRD_CLS_BSK PRD_CLS XYZ.VIEW_PROD_CLS_BSK 4 Identify prods from Bsk
... ... ... ... ...
Columns
- PLUGIN_ID: The unique identifier for the plug-in, consider a naming convention within the enterprise.
- INTERFACE_ID: The interface of the plug-in belongs to.
- PLUGIN_VIEW: The view which will implement the plug-in.
- PLUGIN_ORDER: The order where the plug-in will be assembled in the interface union view.
- DESCRIPTION_:
 

4. EXTRACT_RULE

Sample:
RULE_ID PLUGIN_ID TAG$01 TAG$02 TAG$03 TAG$04 ...
1 PRD_CLS_BSK PC Flat Code A SWAP BASKET  
2 PRD_CLS_SCM PC Flat Code B LST_OPT INDEX  
3 PRD_CLS_SCM PC Flat Code C OTC_OPT INDEX  
4 PRD_CLS_SCM PC Flat Code D FUTURE DEBT  
5 PRD_CLS_DEA PC Flat Code E SWAP INDEX  
6 PRD_CLS_DEA PC Flat Code F FORWARD INDEX  
7 PRD_CLS_DEA PC Flat Code G FUTURE INDEX  
8 PRD_CLS_SRM PC Flat Code H E ADR  
9 PRD_CLS_SRM PC Flat Code I E COM  
10 PRD_CLS_SRM PC Flat Code J E GDR  
... ... ... ... ... ... ...
Columns
- RULE_ID: The primary key of this table is pointless, it just means one unique rule. The business key should be in one or some of TAG$... columns. The integrity of rules configuration must be checked manually.
- PLUGIN_ID: This rule will be applied to which plug-in.
- TAG$01: A constant parameter to be used by the extraction plug-in view.
- TAG$02: ..
- TAG$03: ..
- TAG$04: ..
- TAG$05: ..
- TAG$06: ..
- TAG$07: ..
- TAG$08: ..
- TAG$09: ..
- TAG$10: ..
- TAG$11: ..
- TAG$12: ..
- TAG$13: ..
- TAG$14: ..
- TAG$15: ..
- TAG$16: ..
 

These TAG$## columns look a bit ugly in the practice. From the viewpoint of applied business, all rules under a interface are in the same category. So create a rule-view for each interface specially can solve the ugliness.
A trigger will create a new declaration of columns alias in EXTRACT_RULE_TAG_ALIAS table while a new interface is being added into EXTRACT_INTERFACE table, please assign intelligible alias to corresponding appliable columns.

EXTRACT_RULE_TAG_ALIAS Sample:
INTERFACE_ID RULE_VIEW TAG$01 TAG$02 TAG$03 TAG$04 ...
PRD_CLS XYZ.VIEW_CLASS_RULE PROD_NAME PLATFORM CLS_TYPE SEC_TYPE  
... ... ... ... ... ... ...
Columns
- INTERFACE_ID: The interface of the rule-view.
- RULE_VIEW: The updatable view of EXTRACT_RULE which will assign intelligible alias to appliable TAG$## columns.
- TAG$01: Alias of EXTRACT_RULE.TAG$01 column (if appliable).
- TAG$02: Alias of EXTRACT_RULE.TAG$02 column (if appliable).
- TAG$03: Alias of EXTRACT_RULE.TAG$03 column (if appliable).
- TAG$04: Alias of EXTRACT_RULE.TAG$04 column (if appliable).
- TAG$05: Alias of EXTRACT_RULE.TAG$05 column (if appliable).
- TAG$06: Alias of EXTRACT_RULE.TAG$06 column (if appliable).
- TAG$07: Alias of EXTRACT_RULE.TAG$07 column (if appliable).
- TAG$08: Alias of EXTRACT_RULE.TAG$08 column (if appliable).
- TAG$09: Alias of EXTRACT_RULE.TAG$09 column (if appliable).
- TAG$10: Alias of EXTRACT_RULE.TAG$10 column (if appliable).
- TAG$11: Alias of EXTRACT_RULE.TAG$11 column (if appliable).
- TAG$12: Alias of EXTRACT_RULE.TAG$12 column (if appliable).
- TAG$13: Alias of EXTRACT_RULE.TAG$13 column (if appliable).
- TAG$14: Alias of EXTRACT_RULE.TAG$14 column (if appliable).
- TAG$15: Alias of EXTRACT_RULE.TAG$15 column (if appliable).
- TAG$16: Alias of EXTRACT_RULE.TAG$16 column (if appliable).
 

Then a rule-view named XYZ.VIEW_CLASS_RULE (above sample) will be generated, it's a updatable view and can be used as a substitute of EXTRACT_RULE table for a specific area (interface).

Above example is extracting and classifying hundreds of different product type from various source systems.

  • The table EXTRACT_RULE induces them into a few extraction models, and centralizes all constant parameters (tags) to minimize hard-code queries.
  • Each extraction model expresss as a plug-in view. The table EXTRACT_PLUGIN describes which view presents the realization of each plug-in and which interface need to follow.
  • The table EXTRACT_INTERFACE declares the view signature of each interface (like delegate in C#), and the union view which assembles all its plug-in views. The code of union views and rule views can be generated by following helper view:
SELECT * FROM VPI.VIEW_EXTRACT_CODE
view_code_generation
Click the <CLOB>..., the generated code will be show in a pop-up window of PL/SQL.
 

Session Context

In order to pass some parameters to a batch processing of views (like a OOP class with a parameterless constructor only), the session context is introduced in each batch of extraction processing. The package VPI.EXTRACT_UTILITY provides some functions/procedures to operate the session context.

  1. CREATE_BATCH: The constructor of batch session, this function return a new BATCH_ID.
  2. SET_PARAMS: Set properties in a batch session for passing variables cross views. These variables are stored in table EXTRACT_BATCH.

For most batch processings, people would like to trace the progress updates during the processing, such as elapsed time, current status, % in progress bar, etc.

  1. PROGRESS_START: This procedure initializes how many steps will the batch process.
  2. PROGRESS_UPDATE: This procedure updates current status.
For example (Oracle version):
PROCEDURE ETL_MAIN
(
    inDate  DATE
) IS
tBatch_ID   PLS_INTEGER := VPI.EXTRACT_UTILITY.CREATE_BATCH('CO.GRP.PRD.ETL');
BEGIN
    VPI.EXTRACT_UTILITY.SET_PARAMS(tBatch_ID, '2012-07-31');

    VPI.EXTRACT_UTILITY.PROGRESS_START(12, 1, 'Preloading(cleaning) some crucial slow sources ...');
    PRELOAD_CACHE_SCM(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading positions ...');
    LOAD_POSITIONS(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading top level securities and classifying product types ...');
    LOAD_TOP_LEVEL_SECURITIES(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading middle level securities ...');
    LOAD_MID_LEVEL_SECURITIES(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading bottom level securities ...');
    LOAD_BTM_LEVEL_SECURITIES(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading issuers ...');
    LOAD_ISSUERS(tBatch_ID);

    ...

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Done.');
END MAIN;

Then the view VIEW_EXTRACT_BATCH_LOG or VIEW_LAST_BATCH_LOG can be used to mornitor the whole batch progress in background.
Or, UI can use the procedure EXTRACT_UTILITY.POLLING_PROGRESS to display a progress bar.

Query Optimization

Avoid/minimize using dynamic SQL and temp tables for extraction, SQL expression errors should be discovered at compile-time. Database view provides a facility for this, the execution plan is prepared when compiling the query, every query can be clearly optimized in advance.

Data Quality

The quality of source data always affects queries' performance disastrously.
Bad data: missing, duplicate, impossible values ...
For instance,
- Data missing requires you have to use OUTER JOIN instead of INNER JOIN;
- Data duplication requires you to have a extra MIN/MAX...GROUP BY...;
...
You are not allowed to use straightforward SQL in these situations, but to complicate and slow down queries everywhere painfully! Some romantic business requirements even would ask you to split a regular JOIN SQL into several broken steps and check potential data error at every step, mark error flags, try this side, try that side and continue... Imagine there are some land mines were laid along the highway, then every vehicles are required to be mine-clearing vehicles!
It is utterly opposed to the principle of using Structured Query Language efficiently.
Everyone known in theory, "there is no bad data, but only bad applications(producers)". But in some companies for some reasons, once a new case of data exception was exposed, the requirements always ask downstream applications/reports to handle new error logic, instead of putting constraints in database and fixing upstream applications of data producer to prevent such case happend again. - Isn't it the Emperor's New Clothes, every eyewitness have to draw a new clothes on their own glasses!
As a downstream application/report, normally it's also difficult to ask those tables in source systems to add proper indexes for optimizing high frequency queries or bottleneck queries of data consumers.
... All thus limitation of fact prompted us to own a isolated clean data environment, we should no longer to drive mine-clearing vehicles on the highway. So a preprocess of source data clean-up becames very helpful for the performance of all follow-up batch extractions. Base on the thinking of view-plug-ins, a Relationship Clean-up Engine has been introduced in http://datawashroom.codeplex.com/, since in most cases of bad data, the bad relationship is the most headache of headaches.

Pre-deployment

According to above metadata, all hub-views (union of plug-ins views) and rule-views need to be generated and pre-deployed in early stages of development cycle.

The package VPI.PRE_DEPLOY provides following utilities:

  • BUILD_SERVICE
    Generate source code of all rule-views and hub-views of all interfaces for a service.
    Each time a service is built, a new version will be associated with the service. Behind the method BUILD_SERVICE, two tables PRE_DEPLOY_VERSION and PRE_DEPLOY_SCRIPT keep track of all historical versions.
  • PUBLISH_SERVICE
    Deploy the latest version which generated by BUILD_SERVICE to the database. The publishing is actually the database engine compiles the generated code. Compilation errors will be recorded in PRE_DEPLOY_SCRIPT table and also be displayed on DBMS Output window of PL/SQL.
    As an option, all old objects (views) deployed by previous version can be dropped before the new deployment.
  • BUILD_AND_PUBLISH_SERVICE
    During the development cycle, BUILD_SERVICE and PUBLISH_SERVICE usually would be run several rounds (once the metadata has a complete change). For the sake of convenience, BUILD_AND_PUBLISH_SERVICE simply combines BUILD_SERVICE and PUBLISH_SERVICE into one step.

Metadata Deployment

In essence, the production deployment is to deploy metadata. Please see also the [Build and Deploy] section in http://datawashroom.codeplex.com/ for detail deployment mechanism.

The view VIEW_EXTRACT_METADATA presents the Metadata Manifest for each service.

Supported Databases

  • Oracle
    - Currently support.
  • SQL Server
    - In the plan ...

Last edited Nov 22, 2012 at 5:49 AM by AbelCheng, version 59