Oracle   Warehouse   Builder   Introduction 
       
          A Developer’s Overview  

 

 

 

OWB support for data warehousing

 

Data warehouse functionality can be divided into three distinct areas:

 

*  Data warehouse object design

*  ETL process design

*  BI integration

 

The first is concerned with the structure of the data warehouse; the second with how to get data into the data warehouse; and the third with how to get data out of the data warehouse.  If a product can do all three, then it provides a complete, “end-to-end” data warehousing solution.

 

When it comes to specific areas of data warehouse functionality, OWB – like the proverbial curate’s egg – is good in parts, and, by implication, not so good in others.  However, the whole can be greater than the sum of its parts, and what makes OWB a good data warehousing product is that it provides “end-to-end” functionality from a single user interface: it can extract data from a wide variety of relational and non-relational sources in different formats, transform that data into a canonical format, validate it, summarize it in various ways for efficient access, and then deliver it to BI tools, such as Oracle Discoverer, in an end-user friendly format.

 

 

OWB interfaces

 

OWB has both a graphical interface, the Design Center, and a command line scripting interface, OMB Plus.  As you can see from the FAQs on this site both interfaces have their share of annoying foibles: inexplicable behaviour, impenetrable error messages, and missing documentation.  However, once you become familiar with the product – and the workarounds – both interfaces are relatively easy to use.

 

 

Data warehouse object design

 

When it comes to OLTP databases either you directly code the scripts needed to create objects using SQL and PL/SQL, or you use a design and code generation tool, like Oracle Designer, to achieve the same objective.

 

You can regard OWB as a design and code generation tool for data warehouses.  You can design and generate code for the familiar OLTP objects: tables, external tables, views, and database sequences; for PL/SQL functions, procedures, and packages; and for user defined types, such as objects, varrays, and nested tables (the main limitation is that you can’t design database triggers).

 

A data warehouse also has certain objects that you don’t usually find in OLTP databases: dimensions, cubes, and materialized views.  These objects can also be designed and the corresponding code generated within OWB.

 

 

ETL process design

 

In the case OLTP databases, data is usually entered using forms, and, as an OLTP database developer, you may have spent a considerable amount of time developing forms using tools like Oracle Forms.  A data warehouse also needs to be fed data, but in this case data entry is by means of batch jobs.  In particular, as one of the main purposes of a data warehouse is enterprise-wide data consolidation, data warehouse batch jobs typically load data from many different sources, tens, hundreds, thousands.  And the sources are usually heterogeneous in nature, varying from disk files to relational and non-relational databases from a disparate collection of software vendors.  The process of batch loading data is called ETL, short for Extract-Transform-Load.  Data is extracted from a source and loaded into a target.  Usually, the same target receives data from many different sources, with the same information being represented in many different formats at source.  That’s where the “transform” aspect comes in: data from each source must be transformed into a common canonical format before it can be loaded into the target.

 

ETL is organized into three layers:

 

*  Mappings

*  Process flows

*  Schedules

 

and OWB provides support for all three.

 

At the lowest level we have a “mapping”, which is responsible for moving data from a particular source to a particular target – think of a PL/SQL procedure or package containing an “insert into ... select from ...” or a “merge” statement.  The mapping would, of course, need some additional infrastructure, such as somewhere to store rows that don’t load into the target table.

 

Mappings can’t be run in just any order: what if a mapping that loaded a detail table ran before one that populated its foreign-key master?  So the next ETL layer involves coordinating the order in which mapping are run.  Mappings are organized using process flows to run in an appropriate order.  As with processes in an operating system, flows can be run in a specified sequence, flows can fork so that multiple mappings can run in parallel, or one flow can wait until another flow has finished its work.  And, of course, errors will occur from time to time, so that some error notification mechanism needs to be built into the set of process flows so that “wetware” can be alerted to the problem.

 

A data warehouse needs to be fed data at regular intervals – once a day, once a week, once a month – so someone or, preferably, something is going to have to schedule the process flows so that they run at the appropriate time.  Operating systems come with built-in or third party scheduling software, of which “cron” is perhaps the most famous example.  But OWB has its own in-built scheduling facility so that all aspects of ETL can be managed from the same software interface.

 

 

BI integration

 

Of course, there’s no point creating all these data warehouse objects and then populating them by running all those ETL processes, unless you intend to do something useful with the data.  The raison d’etre for a data warehouse is business intelligence – the only purpose served by those terabytes of data sitting there in a standardized canonical form is to feed the queries issued by BI tools.

 

BI tools have a user layer that provides a conceptual model that business users can understand and can use to formulate ad hoc queries.  But developers need to map the users’ conceptual model of data onto the database objects that actually store the data.  BI developer productivity can be increased if the BI tool that uses the data and the data warehouse that serves the data “understand” one another.  In particular, clever data warehouse design software should be able to produce some, or all, of the BI interface automatically.

 

One tool in Oracle’s BI stable is “Discoverer”, and OWB has the facility to automatically generate business definitions (corresponding to the Discoverer EUL layer) directly from warehouse metadata, and then export these definitions to Discoverer.