| Everything Oracle | Home | Everything Oracle |
![]() |
Maximizing | OBIEE | Performance | ||||
| Query Rewrite versus OBIEE Aggregation | |||||||
| Introduction |
When it comes to retrieving the data needed to satisfy a user query, OBIEE will usually have four possible sources at its disposal. In order of decreasing efficiency, these sources are as follows:
BI Presentation Services Instance Local Cache
BI Server Instance Local Cache
Data Source Summary Tables
Data Source Fact Tables
We’ll examine how OBIEE makes its choice between these different data sources with the aid of the following diagram:
br> Multi-Level Caching in OBIEE and a Back-End Data Source
Imagine that one of your OBIEE end users is viewing the results of a query in BI Answers using a table layout. Then he decides that it might be more informative to view the data in terms of a chart instead. So he selects the chart view and the BI Presentation Services instance is instructed to query the relevant data set. Now as the query result set for the chart is the same as that for the table, it’s very likely that the relevant result set will still be found in the local cache used by the instance of the BI Presentation Services allocated to the user’s session. Since BI Answers is an ad hoc query tool, users may spend a great deal of time viewing the same data in different ways, so the BI Presentation Services local caches offer a very efficient mechanism for data retrieval – first-level caches.
Now, imagine that your OBIEE end user decides to extend the scope of the query by adding another column. This time the result set may not be in the local cache of the BI Presentations Services instance. So the BI Presentation Services instance will pass the query on to the BI Server instance allocated to the user’s session, and ask it to retrieve the data. If it’s a common query, or a subset of a common query, then some other OBIEE end user who has been allocated the same BI Server instance may have issued it already. If so, then the BI Server instance may well be able to retrieve the query result set from its own local cache. So we now have another efficient mechanism for data retrieval – second-level caches.
But sometimes both OBIEE caches will return a miss. If this is the case, then the BI Server instance will have to decompose the query from the simple ODBC SQL it was sent by the BI Presentation Services instance into sub-queries native to the relevant back-end data sources to which OBIEE is connected (typically, data warehouses and OLTP databases).
Now inserts into, updates of, and deletes from the fact tables in a data source, for example “sales” and “customer enquiries” tables, will occur due to DML arriving from client applications. In the case of a data warehouse, the DML might be generated by nightly ETL processes. In the case of an OLTP database, the DML might be generated by user forms in client applications (the fact tables would be updated in real-time by database triggers).
But as fact tables may contain tens of millions of rows and the queries generated by OBIEE end users will typically involve aggregate functions, such as “sum” and “average”, a much more efficient solution to satisfying OBIEE end user queries is to add some summary tables to the data source. These summary tables will contain the pre-computed aggregate values that are needed to satisfy all the commonly occurring OBIEE queries. So when OBIEE can’t satisfy a query from its own caches, it will be almost certain that the query can be answered efficiently by using the summary tables, rather than the fact tables. So, in effect, the summary tables act as external, third-level caches for OBIEE.
Of course, on occasion, the summary tables won’t contain the answer to an OBIEE end user query, and the query will have to be run against the underlying fact tables – an operation that can prove very expensive in terms of computer resources.
While the OBIEE caches have an important role to play in ensuring good OBIEE performance, the key factor – the one you need to address first – is to ensure the presence of a good set of summary tables in each data source that is used by OBIEE. In the case of a data warehouse, the absence of a good set of summary tables will lead to OBIEE end user frustration as queries time out. In the case of an OLTP database, this frustration will extend to the OLTP end users, as the OLTP machine’s disk I/O bandwidth and CPU are hogged by expensive OBIEE queries.
If yours is a large organization with different divisions or subsidiaries and your data sources involve OLTP databases, then you can expect a “tug of war” with those divisions or subsidiaries that are hosting the back-end OLTP databases. The IT managers responsible for these databases won’t thank you for the poor performance attributable to your OBIEE queries. In the absence of summary tables, this could mean that they divert incoming OBIEE queries to a low priority queue, causing OBIEE to grind to a halt. And if summary tables are to be put in place to resolve the performance problem, the issues of who is to pay for creating them and who will maintain them will very soon appear on your “radar”. So, it’s best to address these issues well before OBIEE goes into production.
Having identified summary tables as the key factor in OBIEE performance, for the remainder of this article we’ll focus on the different options for creating summary tables and for making sure that the data they contain is refreshed in an efficient manner.
| Data Source Query Rewrite |
By far the best solution for implementing summary tables is to use query rewrite. Most modern databases have some form of query rewrite. Oracle has had a very elegant solution, using materialized views and materialized view logs, since version 9i of the RDBMS. If you’re using OBIEE then it’s very likely that at least some of your back-end data sources are Oracle data warehouses or OLTP databases, so we’ll look at how Oracle’s implementation of query rewrite works, and at the advantages it has to offer.
br> Using Oracle's Query Rewrite to implement Summary Tables
In the case of Oracle, summary tables are called materialized views. In the diagram above the focus of the incoming queries is a fact table, called “sales”, and an associated materialized view, called “total_sales”.
Before we look at query rewrite, let’s examine another relevant issue: with any kind of summary table it is important to minimize the amount of time and resources that are expended in keeping the contents of the summary table up to date. Suppose that your “sales” fact table contains 10 million rows and the corresponding materialized view, “total_sales”, contains 10 rows. Assume that the nightly ETL process modifies 1000 rows in the fact table. If you have to, for example, sum over selected column values from all 10 million rows in order to re-compute the column values for rows in the materialized view, then this process will be expensive, especially when you have many materialized views to update. An elegant solution to this problem is to keep a log of the changes made to the fact table during the ETL process updates – these changes are stored in separate tables, the materialized view logs. To re-compute the column values for rows in the materialized view will now only take the manipulation of 1000 rows. For example, if a materialized view column contains a sum, then its new value can be computed by adding inserts, adding the difference between new and old values in the case of updates, and subtracting deletes. This “fast refresh” mechanism will usually be possible for all the most commonly used aggregates that are stored in materialized views, and will be of considerable assistance in minimizing scheduled downtime.
Now, let’s examine how query rewrite works. Let’s take a simple query issued by one of the BI Server instances:
select sum( amount ) from sales;
The query rewrite mechanism that is build into the Oracle RDBMS will examine the query and determine if it can be satisfied by transforming the query into one that can be run against a materialized view instead of the fact table, “sales”, to which the query is addressed. In this case, the query can be transformed into:
select total_amount from total_sales;
so instead of the original query being sent to the fact table (on the left in the diagram) it is sent to the materialized view (on the right in the diagram).
Let’s look at the advantages that Oracle’s implementation of query rewrite brings. Firstly, OBIEE doesn’t need to know about the existence of table “total_sales”. OBIEE will always issues selects against the table “sales”, and query rewrite will use the fact table, “sales”, or the materialized view, “total_sales”, as appropriate. Because the presence of query rewrite in the database is entirely opaque to OBIEE, the Repository used by OBIEE (which maps the physical schemas of back-end data sources on to the common presentation schema seen by the BI presentation Services) is simplified.
br> Data Source Query Rewrite is available to all Client Applications
Secondly, as illustrated in the diagram above, the benefits of query rewrite extend not just to OBIEE, but to any client application that wishes to access the Oracle database – and in a large organization there may be many such client applications.
Thirdly, note that access to the Oracle database from its clients is on a read-only basis – maintenance of the materialized views and materialized view logs is carried out as part of maintaining the back-end Oracle database. This read-only access requirement brings with it fewer security concerns – an issue that will be particularly important if your business intelligence headquarters is based in, say, New York, and your back-end databases are based in London, Singapore, and Melbourne.
And finally, as we’ve discussed above, the “fast refresh” facility ensures that the materialized views can be kept up to date with a minimum amount of scheduled downtime.
So, the arguments in favour of data source query rewrite are as follows:
Simple interface that serves multiple clients,
Read-only access needed to data source,
Fast refresh minimizes scheduled downtime, and
Simplified OBIEE repository.
| OBIEE BI Server Aggregation |
There is, however, another method of implementing summary tables in a data source. Effectively, this method involves moving the query rewrite functionality from the client data source into OBIEE itself. The following diagram illustrates the principles involved:
br> Using OBIEE Aggregation to implement Summary Tables
This time the summary tables are called “aggregate tables”, and as you can see from the diagram we still have the same fact tables and summary tables as before.
Now, however, OBIEE needs to be aware of the existence of the aggregate tables as well as of the underlying fact tables. The repository used by the BI Server instances is more complex since references to the aggregate tables are maintained in both the physical and logical layers within the repository. When a BI Server instance receives a query from the BI Presentation Services it decides whether the query can be satisfied from the aggregate tables or from the fact tables. So sometimes the BI server instance will issue SQL that is directed towards the fact tables, and sometimes SQL that is directed towards the aggregate tables.
Another weakness of moving the query rewrite functionality from the database into OBIEE is that no other client application of the database can make use of query rewrite, which is now embedded within the OBIEE application. Other client applications would also have to know about the existence of the aggregate tables and would have to have an inbuilt mechanism to decide when to use them if they are to achieve the same performance gains available to OBIEE.
Now we need to look at how the aggregate tables are refreshed. A batch job written in the logical SQL understood by OBIEE is run periodically by the active BI Scheduler. This job creates native physical SQL that is dispatched to the back-end database. This SQL does three things; (1) it deletes the aggregate tables; (2) it recreates the aggregate tables; and (3) it re-computes the contents of the aggregate tables. Further disadvantages now become clear. Whereas Oracle’s query rewrite offers a “fast, incremental refresh”, OBIEE aggregation offers a “slow, full refresh”. OBIEE has no alternative to a slow refresh since it doesn’t know what rows have, or have not, been changed in the underlying fact tables. This “slow refresh” can lead to a substantial scheduled downtime.
We also have a security concern that was not present in the case of database query rewrite: OBIEE now needs to have read/write access to the back-end database. Unless the database is collocated with OBIEE in the same machine room, you’ll need to ensure that a hacker does not impersonate OBIEE and trash your back-end database. For remote back-end databases you will, of course, have a PKI infrastructure in place to verify the credentials of incoming requests before establishing an SSL connection. However, security is always best implemented by relying on independent mechanisms, so that if one is breached, another may still keep an adversary at bay.
The next disadvantage of OBIEE aggregation is that whereas Oracle’s fast refresh can take place in real time, OBIEE’s slow refresh can only be run at much longer intervals – typically once a day, given its “resource hungry” nature.
Yet another disadvantage relates to timing. If, say, your daily ETL process is due to finish at 4:00 in the morning you could schedule the OBIEE batch job to start at 4:15, but what if the ETL update is running late for some reason?
So, all the advantages that are present in Oracle’s query rewrite solution are lost if we adopt OBIEE aggregation. Instead, we have:
A complex interface that only serves OBIEE,
Read-write access needed to the data source,
A slow refresh leading to substantial scheduled downtime,
A complex OBIEE repository,
Real-time updates to the aggregate tables are not possible, and
Timing issues between the BI Scheduler and the ETL processes.
| OBIEE Aggregation Variants |
There is a workaround for the timing issues between the BI Scheduler and the ETL Processes. Once the ETL processes running in the data source have completed, the initiating batch job can trigger the BI Scheduler programmatically within OBIEE. The BI Scheduler can then read its aggregation batch job, can generate the physical SQL, and can then pass the SQL back to be run inside the data source.
An alternative to running the batch update from OBIEE is to capture the physical SQL generated by the BI Scheduler and to run it directly from within the data source. The disadvantage of this approach is that it couples OBIEE and the data source even more tightly together, in that the physical SQL may have to be updated whenever the OBIEE repository is updated.
If summary tables and a refresh mechanism already exist in the data source, then it’s possible to manually create the corresponding metadata in the OBIEE repository, though this is a rather time consuming task, and again strongly couples the two systems together, making maintenance more complex.
In general, if you decide to use OBIEE Aggregation, it’s best to automate the aggregation process by using the “Aggregate Persistence Wizard” within the OBIEE Administration Tool that is used to maintain the repository.
| When OBIEE Aggregation is Warranted |
While OBIEE Aggregation is an ugly solution to improving OBIEE performance, there are certain circumstances in which it may be justified. Its one saving grace – and this is far from being insignificant – is that by using the “Aggregate Persistence Wizard” OBIEE aggregation can be set up relatively quickly:
If a back end data source supports basic SQL, but does not support query rewrite, then any summary table solution that you implement would
still be difficult for clients other than OBIEE to use, so there are fewer disadvantages to OBIEE aggregation.
If a back end data source supports query rewrite, then OBIEE aggregation will probably be faster to set up than implementing query rewrite
within the back-end database (though your database may well have an automated mechanism for creating summary tables; for example, the SQL
Access Advisor in Oracle 10g/11g can automatically create the materialized views and materialized view logs associated with a set of dimension
tables).
So, if you’re in a “tearing hurry” to get that new OBIEE application off the ground – a circumstance not unknown in the annals of IT projects – then it might just be worth your while to use OBIEE aggregation as a “cheap-n-cheerful” method to get the system ready by the “go live” date. Then you could add the more sophisticated query rewrite functionality native to the data sources during the maintenance phase of the project.
| Everything Oracle | Home | Everything Oracle |
Copyright © 2009 PWG Consulting, All Rights Reserved
