Information, Intelligence, Intellectual

Sunday, 26 August 2012

A chart is worth a thousand words - only when it's used appropriately (Part 2)

Charts are informative and widely adopted in Business Intelligence. However, sometime they are misused and generates no business insights but confusion. In my last post, we've discussed about Line Chart and Pie Chart. Today let's continue the same topic by examining map, gauge and Trellis.

Map Chart

Maps have been used to visualize information since the early phase of human civilization. Now in the recent years, thanks to the digitalization of maps more and more quantitative analysis of location related information can be performed on top of map. Furthermore, with powerful spatial systems such as Oracle Spatial, the business intelligence users can bring the spatial analysis to another higher level by invoking advance spatial functions such as topological operators or distance operators. In one of my previous posts, I briefly went through how spatial analysis can be achieved with OBIEE's integration with Oracle MapViewer, Oracle Spatial and digital map services such as Navteq or Google.

However, just because you can now display quantitative data on a map doesn't mean you should in every case. the location of something or its proximity to something else is not always useful information for analysis. But in some cases, it is critical to display data on a map.

In the example below, we color coat sales volume for individual stores on a floor-plan map. Intuitively, the correlations between store value and environment factors are revealed. How significantly is a food court attracting customers for stores nearby? Is it true that shops next to lifts will always have more customer visits than those far from lifts? Is that necessary to have baby feeding room on each floor of the shopping mall? With conventional views such as pivot table or bar chart, none of these can be easily presented.

Business Intelligence on Floorplan

In contrast, placing pie chart to indicate the state of sales for different products in regions on a map of the United States as shown in the next example adds no value. Firstly it's nearly impossible to tell the differences between products on a tiny pie chart, secondly a simple table with sales group by product and region would have provided the information more directly.

Pie chart is not helpful on Map

Gauge Chart

Gauge is normally used to show a single data value of key performance indicators (KPIs). Due to its compact size, a gauge is often more effective than a graph for displaying a single data value. Gauges identify problems in data. A gauge usually plots one data point with an indication of whether that point falls in an acceptable or unacceptable range. Thus, gauges are useful for showing performance against goals.

Depending on the data in the analysis, a gauge view might consist of multiple gauges in a gauge set. For example, if you create a gauge view to show the sales data for the last twelve months, the gauge view consists of twelve gauges, one for each month. If you create one to show the total sales in the US, then the gauge view consists of one gauge.

Gauge is the best candidate for displaying KPIs

However, if the objective is not for comparing against goal that's broken down into ranges, but for seeing the precise differences between values, Bar Chart is still a better choice.

Trellis Chart

It is often helpful to divide the data set we wish to examine into multiple graphs, either because we can't display everything in a single graph without restoring to a 3D display, which would be difficult to decipher, or because placing all the information in a single graph would make it too cluttered to read. By splitting the data into multiple graphs that appears on the screen at the same time in close proximity to one another, we can examine the data in any one graph more easily, and we can compare values and patterns among graphs with relative ease. This is referred by people as Trellis Chart.

The trellis chart, which was introduced in recent OBIEE 11.1.1.6.2 BP1 release, is the same as a pivot table—with one major exception: the data cells within the trellis contain graphs. Whereas a stand-alone graph type such as a single bar graph or a single scatter graph works on its own, the trellis graph works only by displaying a grid of nested graphs, known as inner graphs. So a bar-graph trellis view is actually comprised of multiple bar graphs.

Let's have a look at an example. We need to analyze the trend of monthly net cost of sale by product by region. Giving only 8 different offices and 4 products, we will will have the combination 32 lines stretching along the X-axis of time dimension. Obviously the chart will be too dense to read if everything is placed on one graph. But even if we break it into sections by products, we will end up with 4 clumsy line charts which won't be fit on one web page and not very readable individually.

Information overflowing on line chart

On the other hand, if we use Trellis Chart to present the same amount of information, by separating each region and product, we can now compare the correlation pattern formed by each group and more easily spot the differences. Trend lines for net sale costs for 18 months, 8 offices and 4 products are clearly visualized on one pivot table which can be examined by users with no scrolling through multiple pages.

Trellis Chart

In summary, the ultimate objective of BI visualization is for helping people to see, understand and digest information. As data visualization tool, graphical charts will only fulfill their functionalists if you can choose them wisely for different underlying data and business purposes.

Sunday, 6 May 2012

A chart is worth a thousand words - only when it's used appropriately (Part 1)

People love charts and graphs. They are intuitive, less intimidating (compare with big fat spreadsheet) and more informative. However, every now and then, I've seen people force data into unsuitable charts, not for revealing business insight purpose but for just having some color and graph on their BI reports. In those cases charts provide no value add but duplication or confusion, in terms of analysis. In this post, I will examine a few OBIEE chart types which I believe are important but sometime not used in the most appropriate way by end users.

Line Chart

Lines work better than any other means to make visible the sequential flow of values as they have changed with the passage of time. So, Line Chart will be the best choice if your objective is to see how quantitative values have changed during continuous period of time (in another words time series analysis). Time series analysis majorly reveals patterns such as Trend, Variability, Rate of Change, Co-variation, Cycles etc. Today I will do a bit drilling on Trend and Rate of Change.

A trend is the overall tendency of a series of values to increase, decrease or remain relatively stable during a particular period of time. For example, it is common to refer to sales revenue during a 12 months period as trending upward, downward or remaining flat. Trends are often obvious from the general slope of a line, but when the line moves both up and down throughout the period, the overall trend might be difficult to determine based on the appearance of the line alone. At such times, you should consider to use moving average function on the measure for Y axis so that a more obvious trend line can be visualized on the chart.

On the example chart below, the weekly numbers of orders for year 2008 are fluctuating therefore it is not easy to tell the trend from blue spiky line. But when I add Moving Average result of order number as derived measure and display it as red line on the chart, a much more obvious Bell Curve reveals the trend for my customer's order pattern.

Trend Line

Moving Average function in OBIEE Answers

The Rate of Change from one value to the next can be directly expressed as the percentage difference between the two. It is often enlightening to view change in this manner, especially when comparing multiple series of values such as sales per region. However, when the measure values being compared are in different order of magnitude, the slope of lines will become misleading and may make people draw wrong conclusion for the comparison between rates of change. For an instance, in a graph with a standard linear scale, the slope of a line that increased from $1000 to $1100 is less steep than on that increase from $10000 to $11000. Although the rate of change for both are 10%, our eyes will make us believe the growth rate for the second measure($10000) is quick than that for the first one($1000). In this case, we can create the line chart with a logarithmic scale. Using a lot scale, the rate of change will appear as accurate slope, no matter how much the actual values are or how great the difference between them.

Look at the example below, this line chart shows the comparison of Rate of Change between revenue and billed quantity of product for the first half of year 2008. When the line char is drawn with default scale, people will easily jump into the conclusion that the growth rate of revenue is much faster than that of product billed quantity.

Rate of Change Default Scale

However, when I use Logarithmic scale to display same underlying data, the graph contains two lines that exhibit precisely the same visual patterns and slopes, which reveals that the rate of change for revenue and billed quantity are the same.

Logarithmic Scale in OBIEE Answers

Rate of Change Log Scale

Pie Chart

Pie chart is also wildly adopted in the reporting world. Every year when I receive the government spending report for my local city council, pie chart is everywhere. Pie char is commonly used to display part-to-whole relationships. Without need to move eye sight around, the report reader can quickly identify which group contributes/consumes the largest/smallest slice of the pie. Look at the example below, this chart shows the operational costs associated with customers from different industries. Just at a glance, I can reach the conclusion that among all the customers, Government generates more than a quarter of the total cost while Distribution only contributes less than 5% of it.

Pie Chart

Pie chart is very intuitive in terms of revealing the top and bottom player in a whole. However, Pie Chart still has a deadly defect which make it very confusing under some circumstances. Let's look at the same example chart and ask this question: what is the difference between categories "Commercial", "Industrial" and "High Tech"? It's not easy for an ordinary users isn't it? The user needs to compare the 2D areas or the angles formed by the slices, but human being's visual perception doesn't accurately support either of these tasks very well. On the other hand, the difference between those three "close competitors" will be much more obvious if the same information was put on a Bar Chart, as we are normally good at comparing 1D lengths.

What if I want to not only leverage the advantage of Pie Chart (quickly reveals the part-to-whole relationship for "winner" and "loser") but also minimize the defect of it (challenges human being's capability of identify insignificant differences on 2D areas or angles)? A proper setting of data label on Pie Chart can probably help achieve what we want.

Data Label setting in OBIEE Answers

Pie Chart with percentage data label

In the next coming one, I will discuss about the features, tips and best practice of another three OBIEE chart types. Please stay tuned.

Sunday, 29 April 2012

Endeca - upgrade release and learning material from Oracle

It has been about 3 months since the finish of Oracle's Endeca acquisition. Last week, a new version was released: Endeca Information Discovery (EID) 2.3. This new release comes with enhancements for three major parts of EID - Latitude Data Integrator, Endeca Server and Endeca Studio.

For Endeca Latitude Data Integrator, the enhancements are focus on the ease of use for developer, and text enrichment with Sentiment.

Endeca Server, a new Java Server are introduced to simplify process management. As a single access point for query, Endeca Server manages named data stores (previously known as "DGraph"). Endeca Server also provides management web services to create, attach/detach, start/stop and list data stores.

For Endeca Studio, a new feature called EID View Model has been added. EID View Model defines a data set which include a set of attributes such as FullName(Dimension), Gender(Dimension), Sales(fact), etc. Then based on created EID View Model, business users can easily create variant types of chart without writing a single line of LQL (Latitude Query Language - for query data in MDEX engine). In addition, after been taken off the product after the acquisition, the Map functionality is back with EID 2.3 now. It's fully integrated with Oracle MapViewer and GeoSpatial text search is supported.

EID Map

At last but not least, the product team of Endeca has established a Youtube Channel for sharing all the exciting things about Endeca, from product overview to hands-on demonstration. If you want to learn Endeca Information Discovery, the game changing Agile BI product from Oracle, please don't hesitate to subscribe to this Youtube Channel.

Monday, 27 February 2012

Discovery channel - provided by Endeca Studio

Besides a unique way of organizing data, Endeca also provides an innovative interface for exploring and discovering information. Today let's have a closer look at the end user layer of EID - Endeca Studio.

Endeca Studio is a interactive, component-based environment for building analytic applications powered by MDEX Engine. It is built on web-based Liferay infrastructure that enable building analytic applications delivered through the use of Web browsers.

In Edeca Studio, a login user can have access to a number of Endeca analytic applications. Each analytic application may have multiple tabs and read data from more than one DGraphs. Each application tab contains variant components such as search box, chart, table, guided navigation, tag cloud, etc. The example here is an Endeca Studio application built for a consulting company. The initial view of this application shows overview information about staffs from different perspective - number break down for different categories, average billing rate, detail information for each individual and so on.

Suppose I am a manager who needs to find consultants for an urgent Siebel CRM project. I can start my journey of information discovery with "navigation". After clicking on "Now" and "Siebel -CRM" on the navigation panel, the system immediately responds by returning the smaller list of consultants who match those criteria, and by dynamically summarizing this smaller list of records and updating all metrics and analytics as well.

I can further refine my navigation result by choosing more navigation categories.

French language is also one of the requirements for this project. From the short-listed 21 staffs, I need to find people who can use French. However, the language competency is not available as a navigation category or table column so that I can filter on. It could be because the company has never captured that information in the HR system or other structured database systems. Individuals normally keep language competency in their resume files. Remember that Endeca MDEX engine is capable of consolidating both structured data and unstructured data. I can perform a search in this Endeca application against my MDEX engine. After correcting the spelling mistake automatically, Endeca finds me two consultants who can use French language and also meet all the other criteria.

To verify the validity of search result, I can choose to view the detail information for one of my candidates

If I click on the PDF icon, I can open the PDF file for that person's resume. As we can see, "Fluent in French" is one of the personal competencies that he included in his resume.

So, unlike those traditional BI operations such as choosing columns, defining filters, combining subject areas, it only took me a few navigation clicks and search to quickly find information that I need in Endeca Studio. More importantly, data stored in unstructured format was also not left out in the decision making process.

Sunday, 12 February 2012

Under the bonnet of Endeca

Endeca Information Discovery (EID) grants BI users agility of querying, navigating and searching across structured, semi-structured and unstructured data. The backbone of all EID applications is MDEX Engine. It stores data and receive requests via Endeca Web Services. After the execution of query request, MDEX Engine will return result to Endeca Web Services in XML format. Then front-end application in Endeca Studio performs formatting of the query result and return them to the client browser.

Today let's open the bonnet of EID and have a look at the nuts and bolts inside MDEX Engine.

Firstly of all, everything is running in "Dgraph" which is the term for the process of MDEX Engine. Data in MDEX Engine won't be accessible without a related running Dgraph. Relevant Data from variant source will be extracted, transformed and loaded into MDEX Engine. Compare to traditional RDBMS or OLAP Cubes, MDEX Engine structure its data in a different way.

The data model in the MDEX Engine consists of records and attributes.

Records are the fundamental units of data.
Attributes are the fundamental units of a record schema which describes the data model of Records.

For a data record, an assignment on an attribute (also known as key value pairs) provides information about that record. For example, for a list of bike records, an assignment on the "Category" attribute contains the category description (e.g. mountain) of the bike record. Each attribute is identified by a unique name.

Each attribute on a data record is itself represented by a record that describes this attribute. Following the bike records example, there is a record that describes the "Category" attribute. A collection of these records that describe attributes forms a schema for your records. The aspects of the attribute on a data record are configured in the schema. For example, an attribute on any data record can be searchable or not.

Let's have a look at an example which may help you to digest these concepts.

In an MDEX Engine which stores bike information, a typical Data Record will be like below:

TxnID = 12324
ProductID = 506
Category = Mountain Bike
Amount = $499.99
Suspension = Fox 32 F-Series
FrameType = Aluminium
Saddle = Bontrager SSR
Mountain Accessories = Fork and shock sag meter
Mountain Accessories = Water Bottle
Review = A great bike for off road. Smooth ride over the bumps
ReviewSentiment = Positive
ReviewTerm = Great
ReviewTerm = Off Road
ReviewTerm = Smooth
ReviewTerm = Bumps

In each line of this data record, Attribute is the part that is on the left-hand side of equation symbol. Attribute may be single-assign or multi-assign. In this example, attributes such as "TxnID" and "ProductID" are single-assign while attribute "Mountain Accessories" is multi-assign. In the MDEX Engine data model, Primary Keys (also known as Record Specs) are used to uniquely identify records.

The System Record that describes the Attribute “Category” may look like:

Name = Category
Type = String
Display Name = Category
Searchable = Yes
Sort = Ascendant

The collection of system records is called Schema.

In MDEX Engine, data records are not necessary to be stored in a conformed container. Null value key pair such as "AttributeName = Null" are not allowed. For example, as source data, if a relational database record has NULL value for column "Suspension", when it's loaded into MDEX Engine, a new MDEX data record will be inserted but no Attribute "Suspension" will be created for that record. So, it's not unusual have "Jagged records" like below exist in MDEX Engine, though they are describing the same business entity.

Data Records in MDEX Engine may be loaded from structured, semi-structured or unstructured data sources.

For structured data, each Tuple becomes a Data Record and each column (except for the columns with NULL value) becomes an Attribute.

Semi-Structured data is normally from enterprise applications, HTTP feeds, XML sources, etc. It will also be loaded as attribute/value pairs.This is a common cause of "jagged" record structure.

As the key differentiator, EID extends BI analysis to unstructured data such as text documents or social data. In MDEX Engine, unstructured data can be stored as their own records for "side-by-side" analysis. Or, they can be linked to existing data records by any available key.

Any unstructured attribute can be enriched using text analytics to expand the structure of its containing record. Common techniques include but are not limited to Automatic tagging, Named entity extraction ,Sentiment analysis ,Term extraction.

Beyond all these data records which consolidate information from database, XML document, Facebook, etc, MDEX Engine also creates hierarchy/relationship graphs, indexes for the attributes and attribute values. Those graphs and indexes are so important that information discovery can not be performed effectively and efficiently on MDEX data records without them.

In summary, MDEX Engine of Endeca stores information in data records as series of Attribute/Value pairs. Data Records can be structured differently with each other. With patented mechanisms of managing navigation graph for attribute relationships and hierarchies, users can quickly navigate through different attributes, search for keywords, or create queries as a more conventional approach. With MDEX Engine, no data is left behind.

Until next time, stay intelligent, stay agile.

Sunday, 5 February 2012

Agility Acquired

Together with well-designed ODS (operational data storage) or Data Warehouse, OBIEE is a comprehensive, reliable and scalable BI solution. Users get information in variant ways, operational reporting, dashboard, scorecard, ad-hoc analysis, what-if analysis, proactive alerting, mobile, etc. It can grow quickly and smoothly . From Gigabyte to Terabyte, from single server to cluster, from disk to In-Memory, hundreds or thousands users can access business information concurrently.

But what about Agility? Can I quickly perform data navigation without going through the modeling practice in DW or BI Server? What if my data is volatile and includes unstructured or semi-structured information? Is that possible to have analysis via search? All these request for quick and "good enough" analysis from business have been giving OBIEE hard time. Now, these challenges can gracefully addressed by a new member of Oracle BI : Endeca Information Discovery. It will complement Oracle's BI solution by providing agile data discovery on structured and unstructed information.

Endeca Information Discovery(EID) helps organization quickly explore all relevant data. You may have sales transactions from OLTP database, departmental forecast data from Excel files, customer survey result in word documents and product review articles on public websites/forums. Traditionally you have to model those data into relational star schema or multi-dimensional cubes before start to create reports and dashboards to answer provided questions from business. Some valuable information may not be included in analysis because the underlying data is unstructured and too hard to be modeled.

With EID, user can quickly consolidate data and perform data discovery in the style of navigating and searching. With no need to carefully create the logical and physical model, the MDEX engine (data storage of EID) enable users to centralize different information together, structured and unstructured, while keeping association between them. Then from Endeca Studio (browser based end-user layer), users can simply explore the data by searching key words or clicking through different attributes (think them as columns in dimensions), or create reports and charts in old ways. EID helps business to reveal answers to questions like "What is the sales revenue of my Top 5 products that my customers describe online with certain key words such as green, economic, etc?" "What are the other most contributing attributes such as "product color", "customer demography" for those Top 5 products?"

Below is the architecture of Endeca Information Discovery:

At a glance, it looks similar to the structure of traditional OBIEE. However, there are major differences from end-user's perspective. Unlike creating reports via choosing columns from tables in subject areas, EID users are able to quickly explore the data with the combination of traditional and agile approaches:

Endeca Information Discovery is an exciting complement of Oracle's current BI solution. The agility acquired enables the business to analyze information with much wider spectrum and faster speed . Finally the "invisible world" in business can be possibly seen and contribution to daily business decision making.

In the coming blog posts, I will gradually scratch the surface of EID and show you the details behind the scene of Oracle's new Agile BI.

Monday, 23 January 2012

Minority Report

Ranking is a very usual requirement of BI reporting. People tends to sort things in order and focus on the top performers. Which sales region is generating the most of revenue? What are my most popular products? Who are my best performing employees? All these type of queries can be easily answered via invoking the Rank() function on relevant fact measure.
What about "the other 80%" (Pareto Principle)? We normally are not interested in the details of "insignificant" individual contributors. But the overall collective contribution of "the others" can be very valuable business information. One example can be risk control. From the management's perspective, the bigger contribution "the others" generate, the lower risks the business will have.

Here I would like to share with you two different apporaches for composing this kind of minority report in OBIEE 11g.

I will be using the data and subject area "Sample Sales" from OBIEE SampelApp.

The first approach leverage CASE function and attribute column.
1, Create a new analysis with subject area "Sample Sales', drag in the column "P1 Product" and "1 - Revenue" twice.

2, Put the following code in the column formular of the second "P1 Product". Change the column title to "Top Individuals".

case when rank("Base Facts"."1- Revenue")<=5 then "Products"."P1 Product" else 'All other Non Toppers' end

3, Put the following code in the column formular of the second "1 - Revenue". Change the column title to "Rank". Add "sort acending" on this column.

case when rank("Base Facts"."1- Revenue") <= 5 then rank("Base Facts"."1- Revenue") else 6 end

4, Open the column formula for "Rank", tick the box for "treat as attribute column". This setting specify the column to be handled as an attribute column, which prevents the aggregation of the values. For the detail explanation of "attribute column", you can click on here and go one page up.

5, Go to the result tab, and add a new pivot table view. Click on the pencil button to modify the view and make it's defined like below.

6, Then you get your "minority report". You can also create a chart to get some intuitive views.

7, The only flaw of this apporach is Table view wouldn't give us the result we want. Once column "P1 Product" is exclude from the layout, the table aggregate everything into one record. This is because of the limit of "attribute column" function. More explanation can also be find from the document link mentioned in step 4.

The second approach is more straightforward in my point view. Instead of retrieving everything from database and then rely on the CASE function to categorize records on the presentation layer, This approach will retrieve "Top 5" and "Non Top 5" as two separate data set from database directly. Then uses the "combine function" in Answers to combine the data sets together as report result.

1, Create a new analysis with subject area "Sample Sales', drag in the column "P1 Product" and "1 - Revenue" twice.

2, Put the code below as the column formular of the second " 1 - Revenue" column. Change the column title to "Rank".

Rank("Base Facts"."1- Revenue")

3, Add a filter on column "Rank".

4, Below the save button, click on "combine results based on ..." button to add another set of criteria. Chose the same "Sample Sales' subject and add same columns in criteria.

5, Put the code below as the column formular of the second "P1 Product". Change the column tilte to "All other Non Toppers".

'All other Non Toppers'

6, Put the code below as the column formular of the second " 1 - Revenue" column. Change the column title to "Rank".

Rank("Base Facts"."1- Revenue")

7, Add a filter on column "Rank".

8, Click on "Result Column" and add sort ascending on column "Rank".

9, Click on the result tab and click on pencil button to modify the table view. Exclude column "P1 Product" from the table layout.

This approach works well with both table view and pivot table view.

In summary, I think the key learning from this practices is:
1, Attribute Column is your friend when you want to avoid the aggregation on certain column;
2, Don't forget about the "combine result" button on top right of your Answers interface. It helps a lot in certain circumstances.