The View-Plug-Ins Programming on database development is an interface-oriented programming. This methodology has been practiced widely in many fields (e.g. OOP programming) for a long time. Here just focus on database development and introduce a few meta-tables
and helper packages (for Oracle, or set of stored procedures for SQL Server). The purpose is to simplify the design of many complex batches processing, useful for ETL, reporting and DWH solutions.
The central motive behind View-Plug-Ins is also a Branches-Leaves model of database development. Branches are stored procedures of flowchart while leaves are views of particular implementation. It helps to remind people
pruning fruit trees during the growing season (in early stages of software lifecycle, from the system requirement analysis to design stage). Then in later maintenance stage, it would be more clear to organize/see the whole picture from the
root to branches if the program structure of SQL scripts like a deciduous tree in winter.
The process of interface designing prefers more converse thinking. Especially in realization phase, the whole programming steps is from Target (output) Model to Source (input) Models, as shown following:
Once we get a clear and clean data requirement, then to abstract, induce and simplify into a unified target model is a prerequisite of interface-oriented designing.
- The Target Model (above figure) defines the data model;
- A Interface defines a cluster node to approach one kind of required data;
- A view-plug-in defines a method model of data extraction.
Two basic aims of bringing in the Interface and the Plug-ins:
To isolate the differences of data model, transform different model from many sources to a common target model, follow by classification, labeling/tagging etc. It helps the code to be loose-coupled while the data keep tightly-coupled. To test each independent
view can be much clearer than to test a tangled of steps in sp, make the test-driven database development simple.
A complex system can always be divided into a few clear-cut sub-systems with logical loose coupling integration. For database system design, the principle of division depends on the induction of data model. Then each target sub-system can be treated as one
data model, the differences have been transformed into just some attributes in the same model.
There are only 4 tables about meta data need to be maintained for above designed model:
An Extraction Service encapsulates a set of related Interfaces in the form of a service layer:
- In design-time, the Service is used to organize interfaces into application domains (projects). A Service is a management unit to build code generation and for Pre-deployment (see the later section [Pre-deployment] for detail).
- In run-time, all Interfaces under a Service share the same session context for each Batch.
||BATCH_ID, SEC_ID, PROD_NAME
- INTERFACE_ID: The unique identifier for the interface, consider a naming convention within the enterprise.
- SERVICE_ID: The extract service (application) of this interface.
- UNION_VIEW: (Also called Hub-View). The name of view which will union all plug-ins under the same interface.
- SELECT_LIST: The select list in the select statement, every plug-in view of the same interface will follow this signature.
||Identify prods from Srm
||Identify prods from Scm
||Identify prods from Dea
||Identify prods from Bsk
- PLUGIN_ID: The unique identifier for the plug-in, consider a naming convention within the enterprise.
- INTERFACE_ID: The interface of the plug-in belongs to.
- PLUGIN_VIEW: The view which will implement the plug-in.
- PLUGIN_ORDER: The order where the plug-in will be assembled in the interface union view.
||Flat Code A
||Flat Code B
||Flat Code C
||Flat Code D
||Flat Code E
||Flat Code F
||Flat Code G
||Flat Code H
||Flat Code I
||Flat Code J
- RULE_ID: The primary key of this table is pointless, it just means one unique rule. The business key should be in one or some of TAG$... columns. The integrity of rules configuration must be checked manually.
- PLUGIN_ID: This rule will be applied to which plug-in.
- TAG$01: A constant parameter to be used by the extraction plug-in view.
- TAG$02: ..
- TAG$03: ..
- TAG$04: ..
- TAG$05: ..
- TAG$06: ..
- TAG$07: ..
- TAG$08: ..
- TAG$09: ..
- TAG$10: ..
- TAG$11: ..
- TAG$12: ..
- TAG$13: ..
- TAG$14: ..
- TAG$15: ..
- TAG$16: ..
These TAG$## columns look a bit ugly in the practice. From the viewpoint of applied business, all rules under a interface are in the same category. So create a rule-view for each interface specially can solve the ugliness.
A trigger will create a new declaration of columns alias in EXTRACT_RULE_TAG_ALIAS table while a new interface is being added into EXTRACT_INTERFACE table, please assign intelligible alias to corresponding appliable columns.
- INTERFACE_ID: The interface of the rule-view.
- RULE_VIEW: The updatable view of EXTRACT_RULE which will assign intelligible alias to appliable TAG$## columns.
- TAG$01: Alias of EXTRACT_RULE.TAG$01 column (if appliable).
- TAG$02: Alias of EXTRACT_RULE.TAG$02 column (if appliable).
- TAG$03: Alias of EXTRACT_RULE.TAG$03 column (if appliable).
- TAG$04: Alias of EXTRACT_RULE.TAG$04 column (if appliable).
- TAG$05: Alias of EXTRACT_RULE.TAG$05 column (if appliable).
- TAG$06: Alias of EXTRACT_RULE.TAG$06 column (if appliable).
- TAG$07: Alias of EXTRACT_RULE.TAG$07 column (if appliable).
- TAG$08: Alias of EXTRACT_RULE.TAG$08 column (if appliable).
- TAG$09: Alias of EXTRACT_RULE.TAG$09 column (if appliable).
- TAG$10: Alias of EXTRACT_RULE.TAG$10 column (if appliable).
- TAG$11: Alias of EXTRACT_RULE.TAG$11 column (if appliable).
- TAG$12: Alias of EXTRACT_RULE.TAG$12 column (if appliable).
- TAG$13: Alias of EXTRACT_RULE.TAG$13 column (if appliable).
- TAG$14: Alias of EXTRACT_RULE.TAG$14 column (if appliable).
- TAG$15: Alias of EXTRACT_RULE.TAG$15 column (if appliable).
- TAG$16: Alias of EXTRACT_RULE.TAG$16 column (if appliable).
Then a rule-view named XYZ.VIEW_CLASS_RULE (above sample) will be generated, it's a updatable view and can be used as a substitute of EXTRACT_RULE table for a specific area (interface).
Above example is extracting and classifying hundreds of different product type from various source systems.
- The table EXTRACT_RULE induces them into a few extraction models, and centralizes all constant parameters (tags) to minimize hard-code queries.
- Each extraction model expresss as a plug-in view. The table EXTRACT_PLUGIN describes which view presents the realization of each plug-in and which interface need to follow.
- The table EXTRACT_INTERFACE declares the view signature of each interface (like delegate in C#), and the union view which assembles all its plug-in views. The code of union views and rule views can be generated by following helper view:
SELECT * FROM VPI.VIEW_EXTRACT_CODE
Click the <CLOB>..., the generated code will be show in a pop-up window of PL/SQL.
In order to pass some parameters to a batch processing of views (like a OOP class with a parameterless constructor only), the session context is introduced in each batch of extraction processing. The package VPI.EXTRACT_UTILITY provides some functions/procedures
to operate the session context.
- CREATE_BATCH: The constructor of batch session, this function return a new BATCH_ID.
- SET_PARAMS: Set properties in a batch session for passing variables cross views. These variables are stored in table EXTRACT_BATCH.
For most batch processings, people would like to trace the progress updates during the processing, such as elapsed time, current status, % in progress bar, etc.
- PROGRESS_START: This procedure initializes how many steps will the batch process.
- PROGRESS_UPDATE: This procedure updates current status.
For example (Oracle version):
tBatch_ID PLS_INTEGER := VPI.EXTRACT_UTILITY.CREATE_BATCH('CO.GRP.PRD.ETL');
VPI.EXTRACT_UTILITY.PROGRESS_START(12, 1, 'Preloading(cleaning) some crucial slow sources ...');
'Loading positions ...');
'Loading top level securities and classifying product types ...');
'Loading middle level securities ...');
'Loading bottom level securities ...');
VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading issuers ...');
VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Done.');
Then the view VIEW_EXTRACT_BATCH_LOG or VIEW_LAST_BATCH_LOG can be used to mornitor the whole batch progress in background.
Or, UI can use the procedure EXTRACT_UTILITY.POLLING_PROGRESS to display a progress bar.
Avoid/minimize using dynamic SQL and temp tables for extraction, SQL expression errors should be discovered at compile-time. Database view provides a facility for this, the execution plan is prepared when compiling the query, every query can be clearly optimized
The quality of source data always affects queries' performance disastrously.
Bad data: missing, duplicate, impossible values ...
- Data missing requires you have to use OUTER JOIN instead of INNER JOIN;
- Data duplication requires you to have a extra MIN/MAX...GROUP BY...;
You are not allowed to use straightforward SQL in these situations, but to complicate and slow down queries everywhere painfully! Some romantic business requirements even would ask you to split a regular JOIN SQL into several broken steps and check potential
data error at every step, mark error flags, try this side, try that side and continue... Imagine there are some land mines were laid along the highway, then every vehicles are required to be mine-clearing vehicles!
It is utterly opposed to the principle of using Structured Query Language efficiently.
Everyone known in theory, "there is no bad data, but only bad applications(producers)". But in some companies for some reasons, once a new case of data exception was exposed, the requirements always ask downstream applications/reports to handle new
error logic, instead of putting constraints in database and fixing upstream applications of data producer to prevent such case happend again. - Isn't it the Emperor's New Clothes, every eyewitness have to draw a new clothes on their own glasses!
As a downstream application/report, normally it's also difficult to ask those tables in source systems to add proper indexes for optimizing high frequency queries or bottleneck queries of data consumers.
... All thus limitation of fact prompted us to own a isolated clean data environment, we should no longer to drive mine-clearing vehicles on the highway. So a preprocess of source data clean-up becames very helpful for the performance of all follow-up batch
extractions. Base on the thinking of view-plug-ins, a Relationship Clean-up Engine has been introduced in
http://datawashroom.codeplex.com/, since in most cases of bad data, the bad relationship is the most headache of headaches.
According to above metadata, all hub-views (union of plug-ins views) and rule-views need to be generated and pre-deployed in early stages of development cycle.
The package VPI.PRE_DEPLOY provides following utilities:
Generate source code of all rule-views and hub-views of all interfaces for a service.
Each time a service is built, a new version will be associated with the service. Behind the method BUILD_SERVICE, two tables PRE_DEPLOY_VERSION and PRE_DEPLOY_SCRIPT keep track of all historical versions.
Deploy the latest version which generated by BUILD_SERVICE to the database. The publishing is actually the database engine compiles the generated code. Compilation errors will be recorded in PRE_DEPLOY_SCRIPT table and also be displayed on DBMS Output window
As an option, all old objects (views) deployed by previous version can be dropped before the new deployment.
During the development cycle, BUILD_SERVICE and PUBLISH_SERVICE usually would be run several rounds (once the metadata has a complete change). For the sake of convenience, BUILD_AND_PUBLISH_SERVICE simply combines BUILD_SERVICE and PUBLISH_SERVICE into one
In essence, the production deployment is to deploy metadata. Please see also the [Build and Deploy] section in
http://datawashroom.codeplex.com/ for detail deployment mechanism.
The view VIEW_EXTRACT_METADATA presents the Metadata Manifest for each service.
- Currently support.
- SQL Server
- In the plan ...