Although I've written many articles on business intelligence (BI) specific to the System i community, it seems that the topic constantly evolves over time, with a never-ending stream of new things popping up. So once again, I'd like to share some of the happenings around BI for i! But first, let's get the new naming conventions out of the way early: i5/OS is now IBM i, and DB2 for i5/OS, DB2/400, and DB2 UDB for iSeries can all be referred to as DB2 for i.
One thing that's clear about BI implementations in the System i community is that there are no one-size-fits-all solutions. Many of you probably struggle with simply getting data to end users in a format they are familiar with. Query/400 was a great tool for its day, but using it only as a means to an end (e.g., building extracts to then import the data into other tools), or maintaining tens or even hundreds of Query/400 reports, are productivity killers.
For some, expedient data delivery and operational reporting through dashboards or spreadsheet views of data are paramount to improving productivity. For many, quenching end users' thirst for data while removing any impact on production systems is critical. You might want to consider operational data stores (mirrored images of production databases, but with more of a historical view), data marts (subject-oriented, often summarized, reporting repositories), and enterprise data warehouses (the holy grail of the BI infrastructure) to meet your goals. Wherever you fit into the BI architectural spectrum, the DB2 for i 6.1 and DB2 Web Query enhancements can help simplify your life.
DB2 for i 6.1 delivers a number of enhancements to the BI world. Over the past eight years, IBM Rochester has invested significantly in a total redesign of its query optimization technology. A key component of this investment is the SQL Query Engine (SQE). This more efficient optimization component of DB2 provides benefits such as improved performance and better monitoring and analysis tools. The earlier optimization technology, called the Classic Query Engine (CQE), is still in use today because it is required for non-SQL-based tools like Query/400 and others that use proprietary database APIs.
In 6.1, the SQE optimizer adds more self-learning and self-adapting capabilities. Two major aspects of these capabilities are the ability to adjust for poor user settings on the query, and the ability to adjust the query on the fly based on changing I/O characteristics.
For example, in 6.1 DB2 can re-optimize an access plan on the fly by recognizing that the actual query usage is different from the specified request to DB2. For instance, ODBC queries will by default specify a "first I/O" setting that suggests that the application wants to see only a few rows at a time. But if the nature of the query is to really return all the records, DB2 recognizes what is going on and re-optimizes the plan to assume all records are retrieved, improving the efficiency of the query execution.
Additionally, the optimizer can make changes that affect the performance of queries based on monitoring paging characteristics during query execution. For example, the first time a query is run, it is assumed that data and indexes are not in memory and subsequently have to be read from the disk before processing can occur. But if the optimizer finds that this query is run frequently and the data/indexes are usually resident in memory, the access plan can be rebuilt to optimize for this fact. Once again, this can result in significantly better performance.
Derived Key Index support in 6.1 adds performance benefits by letting you create an SQL index based on an expression rather than being limited to a simple column. With a derived key, you can create an index to match more complex expressions in queries, thereby simplifying query development without sacrificing performance. For example, if the query is attempting a case-insensitive search such as WHERE UPPER(NAME) = 'MY FAVORITE CUSTOMER', you can create an SQL index with a key of UPPER(NAME). This lets the optimizer match the index to the query and use the index in a fast search. The use of the index can significantly reduce the number of rows read from the queried tables, resulting in much better query processing time.
A watershed enhancement in SQE is the removal of limitations that prevented it from being used globally. Now, SQE 6.1 isn't confined to the limits of the national language sort sequence or prior translation restriction, so more IBM i clients can take advantage of SQL and SQE. Additionally, SQE now supports user-defined functions, and the default settings ignore the select/omit logic. In short, all "pure" SQL environments now completely utilize SQE. SQE has been the strategic query optimizer for several releases now, and it continues to be enhanced so more and more applications can reap its benefits.
When you upgrade from Query/400 to DB2 Web Query, you are entitled to the BASE components of the product, which consist of all the IBM i server components and a certain number of user licenses. Several web-based authoring tools and the ability to import Query/400 definitions let you modernize your Query/400 environment with new functions such as traffic lighting, HTML output, direct-to-spreadsheet output, charting and graphing, and boardroom-quality PDF output.
But DB2 Web Query also offers the ability to leap into true BI. In other words, although the BASE product is a web-based (but much more robust) version of Query/400, the additional features you can add to DB2 Web Query let you take huge steps toward simplifying the reporting environment for both IT and end users. For example, a common practice in BI is to leverage a metadata layer to describe the database to report authors and end users, so they don't have to be database experts to build or execute a report.
A great example is date decomposition. Date and time elements are a vital element to most reports. However, date formats are often stored differently in different source systems. Some dates are stored as eight characters (e.g., MMDDYYYY), some are stored in Julian format, and some might be stored in decomposed formats in the source system (separate fields for year, month, and date) and must be pulled together for the reporting tool. There is certainly a lot of work involved if every report author has to interpret these different data formats and put them into commonly referenced buckets (such as DAY/WEEK/MONTH/QTR).
Enter the metadata layer! The metadata layer lets you define date formats and create virtual fields that report authors can leverage to simplify the inclusion of date elements in reports. In addition, you get reporting features such as advanced date/time manipulation and calculations, calendar widgets for a more user-friendly date-selection interface, and date buckets that you can roll up at weekly, yearly, and quarterly levels. Metadata has several benefits, including
These simple examples only scratch the surface of the benefits of the metadata layer; it is an extremely important aspect in moving you from Query/400 into true BI.
Recently, IBM introduced the DB2 Web Query Report Broker to let customers and software developers extend their DB2 reporting environments. With this product, automated report distribution simplifies scheduled reporting with intelligent delivery of reports to end users through email. The Report Broker lets you
The Software Development Kit (SDK) is a set of web services that helps applications integrate DB2 Web Query functions more seamlessly. With the SDK, software developers can integrate DB2 Web Query reporting into their web applications or customize their own web interface into the reporting environment. With the SDK, applications can
For more information about these new products, be sure to keep an eye on ibm.com/systems/i/db2/webquery.
One exciting happening with BI on i is the number of third-party tool providers that are extending their products to provide further productivity or management capabilities around DB2 and DB2 Web Query.
Earlier I mentioned the benefits of a metadata layer; to reach that nirvana, some work has to be done to create the metadata. Many customers I've talked to haven't implemented referential integrity in DB2 the relationships between tables and files aren't defined anywhere other than within RPG code. Plowing through RPG code to define and document those relationships won't happen overnight. Enter Databorough's x-WebQuery tool (databorough.com/products/xquery.html). This product, based on the company's x-Analysis application modernization solution, can automate the process of reverse-engineering a data model from RPG/DDS source, then auto-populate DB2 Web Query metadata. This feature can significantly reduce the time it takes to define metadata so that you can start seeing the benefits of a "real" BI solution quickly. Instant OLAP is one way to describe the benefits of using x-Query with DB2 Web Query.
Also, many BI implementers move data to another partition or server to minimize (or eliminate) the impact of long-running queries against the production servers. Extract/transform/load (ETL) tools are available to manage this process. Information Builder's Data Migrator (informationbuilders.com) and Coglin Mill's Rodin edition for DB2 Web Query (coglinmill.com) provide GUI interfaces to schedule the movement and transformation of data into the reporting repository, while also automatically creating and updating DB2 Web Query metadata.
I expect additional adapters to appear over time to provide metadata for a variety of data sources (e.g., for a specific ERP system or Domino/Notes databases).
One of the challenges of selecting a data warehouse or data mart platform has been the complexity of putting together the right server with the operating system, database management system, and end-user and ETL tools. Historically, these were all potentially separate decisions, with the different components often coming from multiple vendors performance assurances were guesses at best. The notion of an "appliance" has evolved to simplify this selection process, integrate the various parts to lower the total cost of ownership and management complexities, and provide performance assurances through fixed configurations and a balancing of processor, memory, and I/O resources to meet the demands of a heavy query environment.
Interestingly, the AS/400 was founded on the idea of an application appliance, as everything was built in and simple to manage. With the additional enhancements to DB2 and DB2 Web Query, a Power Server running IBM i has effectively morphed into a BI appliance as well, boasting a tightly integrated set of hardware, OS, database, and BI tooling.
Furthermore, solutions have popped up in the industry that address sluggish query performance in a data warehouse environment. Cubing technology, columnar databases, or other highly indexed/memory resident proprietary database/hardware combinations are being touted to address high performance analytics outside traditional SQL/relational database performance. Although the performance benefits definitely exist, the implementation of an accelerator usually means moving data out of the relational database and into a specialized database with special properties specifically for high performance in a query environment. But it doesn't have to be that way.
IBM introduced encoded vector indexes (EVIs) back in OS/400 V4. This technology is effectively a BI accelerator that doesn't require you to move any data into a specialized database. If you are not familiar with EVIs, you should be; queries that used to take hours can be reduced to minutes with this technology! You can find guidance on the use of EVIs (Click Here).
Recently I had the opportunity to put myself in the business analyst role. As the product marketing manager for BI on Power Systems, I was very interested in understanding some of the metrics (key performance indicators) around DB2 Web Query orders. How many orders have we had to date? What is the breakdown by country? What features are being ordered? How many customers are ordering the Run Time User Enablement feature? The need for a BI solution became very clear, very fast.
The questions seemed simple. But to get answers, we (I enlisted one of my colleagues from our Lab Services group to assist) first had to understand where the data to answer these questions would come from. Unfortunately, we didn't have a single ERP system on one server with consistent data standards, clean data, and historical detail-level data (those of you who do stop complaining now!). To collect data about orders, we had to pull data from five different systems. Four of those systems had pretty consistent data elements (e.g., ORDER ID across all four were unique), yet transactional data was persistent in those source systems for only three months, so we had to find another way to get detail-level data for the several months since we started shipping the product. Additionally, the fifth data source was not consistent with data elements, so we had duplicate order IDs that could be differentiated only by customer ID.
We overcame this challenge by using surrogate keys. The surrogate key is a generated unique key that is an added column to your data warehouse table(s) that ensures you can view the original order ID.
There were many more challenges with this small BI project. We had a sixth data source from which we had to obtain data on IBM support contracts then correlate them with order IDs through customer IDs. Realtime order analysis was out of the question. Lastly, dates were not in consistent forms in the different source databases, so determining things such as how many orders of DB2 Web Query occurred per week over the last eight weeks required a transformation or metadata layer to deconstruct/reconstruct dates in forms that I could use to easily answer a time-related query.
I am happy to report that we have put our money where our mouths are and built a data mart using DB2, DB2 Web Query, and Report Broker for automated distribution, all running on a Power Systems server with IBM i 6.1! Since then, I have time to write articles, as my life has gotten extremely less strenuous. I now spend less than 5 percent of my time worrying about gathering data!
Doug Mack is an IBM Power Systems marketing manager focusing on BI. Doug's 28-year career with IBM has spanned many roles, including S/38 and AS/400 developer, IBM AS/400 systems engineer, and business intelligence technical sales consultant.