Information, Intelligence, Intellectual

Sunday, 29 April 2012

Endeca - upgrade release and learning material from Oracle

It has been about 3 months since the finish of Oracle's Endeca acquisition. Last week, a new version was released: Endeca Information Discovery (EID) 2.3. This new release comes with enhancements for three major parts of EID - Latitude Data Integrator, Endeca Server and Endeca Studio.

For Endeca Latitude Data Integrator, the enhancements are focus on the ease of use for developer, and text enrichment with Sentiment.

Endeca Server, a new Java Server are introduced to simplify process management. As a single access point for query, Endeca Server manages named data stores (previously known as "DGraph"). Endeca Server also provides management web services to create, attach/detach, start/stop and list data stores.

For Endeca Studio, a new feature called EID View Model has been added. EID View Model defines a data set which include a set of attributes such as FullName(Dimension), Gender(Dimension), Sales(fact), etc. Then based on created EID View Model, business users can easily create variant types of chart without writing a single line of LQL (Latitude Query Language - for query data in MDEX engine). In addition, after been taken off the product after the acquisition, the Map functionality is back with EID 2.3 now. It's fully integrated with Oracle MapViewer and GeoSpatial text search is supported.

EID Map

At last but not least, the product team of Endeca has established a Youtube Channel for sharing all the exciting things about Endeca, from product overview to hands-on demonstration. If you want to learn Endeca Information Discovery, the game changing Agile BI product from Oracle, please don't hesitate to subscribe to this Youtube Channel.

Monday, 27 February 2012

Discovery channel - provided by Endeca Studio

Besides a unique way of organizing data, Endeca also provides an innovative interface for exploring and discovering information. Today let's have a closer look at the end user layer of EID - Endeca Studio.

Endeca Studio is a interactive, component-based environment for building analytic applications powered by MDEX Engine. It is built on web-based Liferay infrastructure that enable building analytic applications delivered through the use of Web browsers.

In Edeca Studio, a login user can have access to a number of Endeca analytic applications. Each analytic application may have multiple tabs and read data from more than one DGraphs. Each application tab contains variant components such as search box, chart, table, guided navigation, tag cloud, etc. The example here is an Endeca Studio application built for a consulting company. The initial view of this application shows overview information about staffs from different perspective - number break down for different categories, average billing rate, detail information for each individual and so on.

Suppose I am a manager who needs to find consultants for an urgent Siebel CRM project. I can start my journey of information discovery with "navigation". After clicking on "Now" and "Siebel -CRM" on the navigation panel, the system immediately responds by returning the smaller list of consultants who match those criteria, and by dynamically summarizing this smaller list of records and updating all metrics and analytics as well.

I can further refine my navigation result by choosing more navigation categories.

French language is also one of the requirements for this project. From the short-listed 21 staffs, I need to find people who can use French. However, the language competency is not available as a navigation category or table column so that I can filter on. It could be because the company has never captured that information in the HR system or other structured database systems. Individuals normally keep language competency in their resume files. Remember that Endeca MDEX engine is capable of consolidating both structured data and unstructured data. I can perform a search in this Endeca application against my MDEX engine. After correcting the spelling mistake automatically, Endeca finds me two consultants who can use French language and also meet all the other criteria.

To verify the validity of search result, I can choose to view the detail information for one of my candidates

If I click on the PDF icon, I can open the PDF file for that person's resume. As we can see, "Fluent in French" is one of the personal competencies that he included in his resume.

So, unlike those traditional BI operations such as choosing columns, defining filters, combining subject areas, it only took me a few navigation clicks and search to quickly find information that I need in Endeca Studio. More importantly, data stored in unstructured format was also not left out in the decision making process.

Sunday, 12 February 2012

Under the bonnet of Endeca

Endeca Information Discovery (EID) grants BI users agility of querying, navigating and searching across structured, semi-structured and unstructured data. The backbone of all EID applications is MDEX Engine. It stores data and receive requests via Endeca Web Services. After the execution of query request, MDEX Engine will return result to Endeca Web Services in XML format. Then front-end application in Endeca Studio performs formatting of the query result and return them to the client browser.

Today let's open the bonnet of EID and have a look at the nuts and bolts inside MDEX Engine.

Firstly of all, everything is running in "Dgraph" which is the term for the process of MDEX Engine. Data in MDEX Engine won't be accessible without a related running Dgraph. Relevant Data from variant source will be extracted, transformed and loaded into MDEX Engine. Compare to traditional RDBMS or OLAP Cubes, MDEX Engine structure its data in a different way.

The data model in the MDEX Engine consists of records and attributes.

Records are the fundamental units of data.
Attributes are the fundamental units of a record schema which describes the data model of Records.

For a data record, an assignment on an attribute (also known as key value pairs) provides information about that record. For example, for a list of bike records, an assignment on the "Category" attribute contains the category description (e.g. mountain) of the bike record. Each attribute is identified by a unique name.

Each attribute on a data record is itself represented by a record that describes this attribute. Following the bike records example, there is a record that describes the "Category" attribute. A collection of these records that describe attributes forms a schema for your records. The aspects of the attribute on a data record are configured in the schema. For example, an attribute on any data record can be searchable or not.

Let's have a look at an example which may help you to digest these concepts.

In an MDEX Engine which stores bike information, a typical Data Record will be like below:

TxnID = 12324
ProductID = 506
Category = Mountain Bike
Amount = $499.99
Suspension = Fox 32 F-Series
FrameType = Aluminium
Saddle = Bontrager SSR
Mountain Accessories = Fork and shock sag meter
Mountain Accessories = Water Bottle
Review = A great bike for off road. Smooth ride over the bumps
ReviewSentiment = Positive
ReviewTerm = Great
ReviewTerm = Off Road
ReviewTerm = Smooth
ReviewTerm = Bumps

In each line of this data record, Attribute is the part that is on the left-hand side of equation symbol. Attribute may be single-assign or multi-assign. In this example, attributes such as "TxnID" and "ProductID" are single-assign while attribute "Mountain Accessories" is multi-assign. In the MDEX Engine data model, Primary Keys (also known as Record Specs) are used to uniquely identify records.

The System Record that describes the Attribute “Category” may look like:

Name = Category
Type = String
Display Name = Category
Searchable = Yes
Sort = Ascendant

The collection of system records is called Schema.

In MDEX Engine, data records are not necessary to be stored in a conformed container. Null value key pair such as "AttributeName = Null" are not allowed. For example, as source data, if a relational database record has NULL value for column "Suspension", when it's loaded into MDEX Engine, a new MDEX data record will be inserted but no Attribute "Suspension" will be created for that record. So, it's not unusual have "Jagged records" like below exist in MDEX Engine, though they are describing the same business entity.

Data Records in MDEX Engine may be loaded from structured, semi-structured or unstructured data sources.

For structured data, each Tuple becomes a Data Record and each column (except for the columns with NULL value) becomes an Attribute.

Semi-Structured data is normally from enterprise applications, HTTP feeds, XML sources, etc. It will also be loaded as attribute/value pairs.This is a common cause of "jagged" record structure.

As the key differentiator, EID extends BI analysis to unstructured data such as text documents or social data. In MDEX Engine, unstructured data can be stored as their own records for "side-by-side" analysis. Or, they can be linked to existing data records by any available key.

Any unstructured attribute can be enriched using text analytics to expand the structure of its containing record. Common techniques include but are not limited to Automatic tagging, Named entity extraction ,Sentiment analysis ,Term extraction.

Beyond all these data records which consolidate information from database, XML document, Facebook, etc, MDEX Engine also creates hierarchy/relationship graphs, indexes for the attributes and attribute values. Those graphs and indexes are so important that information discovery can not be performed effectively and efficiently on MDEX data records without them.

In summary, MDEX Engine of Endeca stores information in data records as series of Attribute/Value pairs. Data Records can be structured differently with each other. With patented mechanisms of managing navigation graph for attribute relationships and hierarchies, users can quickly navigate through different attributes, search for keywords, or create queries as a more conventional approach. With MDEX Engine, no data is left behind.

Until next time, stay intelligent, stay agile.

Sunday, 5 February 2012

Agility Acquired

Together with well-designed ODS (operational data storage) or Data Warehouse, OBIEE is a comprehensive, reliable and scalable BI solution. Users get information in variant ways, operational reporting, dashboard, scorecard, ad-hoc analysis, what-if analysis, proactive alerting, mobile, etc. It can grow quickly and smoothly . From Gigabyte to Terabyte, from single server to cluster, from disk to In-Memory, hundreds or thousands users can access business information concurrently.

But what about Agility? Can I quickly perform data navigation without going through the modeling practice in DW or BI Server? What if my data is volatile and includes unstructured or semi-structured information? Is that possible to have analysis via search? All these request for quick and "good enough" analysis from business have been giving OBIEE hard time. Now, these challenges can gracefully addressed by a new member of Oracle BI : Endeca Information Discovery. It will complement Oracle's BI solution by providing agile data discovery on structured and unstructed information.

Endeca Information Discovery(EID) helps organization quickly explore all relevant data. You may have sales transactions from OLTP database, departmental forecast data from Excel files, customer survey result in word documents and product review articles on public websites/forums. Traditionally you have to model those data into relational star schema or multi-dimensional cubes before start to create reports and dashboards to answer provided questions from business. Some valuable information may not be included in analysis because the underlying data is unstructured and too hard to be modeled.

With EID, user can quickly consolidate data and perform data discovery in the style of navigating and searching. With no need to carefully create the logical and physical model, the MDEX engine (data storage of EID) enable users to centralize different information together, structured and unstructured, while keeping association between them. Then from Endeca Studio (browser based end-user layer), users can simply explore the data by searching key words or clicking through different attributes (think them as columns in dimensions), or create reports and charts in old ways. EID helps business to reveal answers to questions like "What is the sales revenue of my Top 5 products that my customers describe online with certain key words such as green, economic, etc?" "What are the other most contributing attributes such as "product color", "customer demography" for those Top 5 products?"

Below is the architecture of Endeca Information Discovery:

At a glance, it looks similar to the structure of traditional OBIEE. However, there are major differences from end-user's perspective. Unlike creating reports via choosing columns from tables in subject areas, EID users are able to quickly explore the data with the combination of traditional and agile approaches:

Endeca Information Discovery is an exciting complement of Oracle's current BI solution. The agility acquired enables the business to analyze information with much wider spectrum and faster speed . Finally the "invisible world" in business can be possibly seen and contribution to daily business decision making.

In the coming blog posts, I will gradually scratch the surface of EID and show you the details behind the scene of Oracle's new Agile BI.

Monday, 23 January 2012

Minority Report

Ranking is a very usual requirement of BI reporting. People tends to sort things in order and focus on the top performers. Which sales region is generating the most of revenue? What are my most popular products? Who are my best performing employees? All these type of queries can be easily answered via invoking the Rank() function on relevant fact measure.
What about "the other 80%" (Pareto Principle)? We normally are not interested in the details of "insignificant" individual contributors. But the overall collective contribution of "the others" can be very valuable business information. One example can be risk control. From the management's perspective, the bigger contribution "the others" generate, the lower risks the business will have.

Here I would like to share with you two different apporaches for composing this kind of minority report in OBIEE 11g.

I will be using the data and subject area "Sample Sales" from OBIEE SampelApp.

The first approach leverage CASE function and attribute column.
1, Create a new analysis with subject area "Sample Sales', drag in the column "P1 Product" and "1 - Revenue" twice.

2, Put the following code in the column formular of the second "P1 Product". Change the column title to "Top Individuals".

case when rank("Base Facts"."1- Revenue")<=5 then "Products"."P1 Product" else 'All other Non Toppers' end

3, Put the following code in the column formular of the second "1 - Revenue". Change the column title to "Rank". Add "sort acending" on this column.

case when rank("Base Facts"."1- Revenue") <= 5 then rank("Base Facts"."1- Revenue") else 6 end

4, Open the column formula for "Rank", tick the box for "treat as attribute column". This setting specify the column to be handled as an attribute column, which prevents the aggregation of the values. For the detail explanation of "attribute column", you can click on here and go one page up.

5, Go to the result tab, and add a new pivot table view. Click on the pencil button to modify the view and make it's defined like below.

6, Then you get your "minority report". You can also create a chart to get some intuitive views.

7, The only flaw of this apporach is Table view wouldn't give us the result we want. Once column "P1 Product" is exclude from the layout, the table aggregate everything into one record. This is because of the limit of "attribute column" function. More explanation can also be find from the document link mentioned in step 4.

The second approach is more straightforward in my point view. Instead of retrieving everything from database and then rely on the CASE function to categorize records on the presentation layer, This approach will retrieve "Top 5" and "Non Top 5" as two separate data set from database directly. Then uses the "combine function" in Answers to combine the data sets together as report result.

1, Create a new analysis with subject area "Sample Sales', drag in the column "P1 Product" and "1 - Revenue" twice.

2, Put the code below as the column formular of the second " 1 - Revenue" column. Change the column title to "Rank".

Rank("Base Facts"."1- Revenue")

3, Add a filter on column "Rank".

4, Below the save button, click on "combine results based on ..." button to add another set of criteria. Chose the same "Sample Sales' subject and add same columns in criteria.

5, Put the code below as the column formular of the second "P1 Product". Change the column tilte to "All other Non Toppers".

'All other Non Toppers'

6, Put the code below as the column formular of the second " 1 - Revenue" column. Change the column title to "Rank".

Rank("Base Facts"."1- Revenue")

7, Add a filter on column "Rank".

8, Click on "Result Column" and add sort ascending on column "Rank".

9, Click on the result tab and click on pencil button to modify the table view. Exclude column "P1 Product" from the table layout.

This approach works well with both table view and pivot table view.

In summary, I think the key learning from this practices is:
1, Attribute Column is your friend when you want to avoid the aggregation on certain column;
2, Don't forget about the "combine result" button on top right of your Answers interface. It helps a lot in certain circumstances.

Sunday, 15 January 2012

Miss is as good as a mile

A few days ago, I was reviewing my prior post and something unusual caught my attention.

How come those densely populated suburbs in Sydney are not covered by the thematic view for postal areas??? In order to further confirm the issue, I moved the map around to another state and I found some thematic views are put on Tasman sea (see below)!!! I've got an alignment issue.

This is a critical issue for a BI Mapview as Business Intelligence are not accurately aligned with the real location on map. A few centimeters shifting of map theme (geometries that represents regions) on Google Map will make users look at the business information that is actually relevant with area/spot a few kilometers away. This makes the whole spatial analysis meaningless.

After spending some time searching and studying, I found the root cause of this problem - the inconsistency between SRIDs for Google Map and spatial data in database. SRID represents "Spatial Reference system IDentifier" which is a unique value used to unambiguously identify projected, unprojected, and local spatial coordinate system definitions. These coordinate systems form the heart of all GIS applications. Older map uses old reference system while newer maps use more accurate reference system made using satellites and sophisticated electronic equipment. The coordinates for a point on the earth can vary significantly depending on the SRID used. Consequently, if all spatial data is not in the same spatial reference system, the geographic layers will not overlay and the analysis will become misleading.

In our case, the map service provider Google is using SRID 3785.

However,when import spatial data from shapefiles into Oracle database, the SRID is set to 8307 by default. It can be retrieved by running the SQL Script below:

select table_name, column_name, srid from user_sdo_geom_metadata

where table_name = 'NSW_ELB_REGION';

Because it's not practical for us to change the SRID of Google Map, we need to do something on our spatial data in Oracle database so that it is compatible with SRID 3785.

In Oracle Spatial Developer's guide, three approaches are suggested for fixing this issue:

1, Use SRID 4055 data;

2, Declare an EPSG rule between SRID 8307 and 3785;

3, Call oracle stored procedures to convert spatial data that physically stored in database;

I've tested the first two approaches in my environment. Either of them fixed the alignment issue perfectly for me.

For the first approach, all you have to do is specify the SRID value as 4055 while import the shapefile data with Oracle Mapbuilder (step in my last post). Then the data will be loaded into Oracle database as SRID 4055 which is compatible with Google's SRID 3785.

However, if reloading the data is not a feasible option for you, you can run the following SQL script to declare a transformation rule between SRID 8307 and 3785. As the result, when Oracle retrieve the spatial data that you loaded, it will apply the rule to transfer it to SRID 3785 and send it to Oracle Mapviewer for rendering.

CALL sdo_cs.create_pref_concatenated_op(830723785,
'CONCATENATED OPERATION',
TFM_PLAN(SDO_TFM_CHAIN(8307, 1000000000, 4055, 19847, 3785)),
NULL);

Now, the SRIDs between the spatial data and Google Map are consistent and I can have my Business Intelligence perfectly overlay with Google Map. No more CBD region displayed as middle of no where or resident postal area sitting on the surface of sea.

Saturday, 17 December 2011

Put Intelligence on the map (part 2)

In my last post, we discussed how OBIEE enables geospatial analysis on digital map. Now it’s time to get hands dirty and learn more from practice.

Out of the box, the deployment of OBIEE includes Oracle Mapviewer. However, no geospatial data or map tiles are provided with the product installation. Those can be acquired separately from NAVTEQ. A package of free sample NAVTEQ data with relevant deployment instructions can be downloaded from the URL: http://www.oracle.com/technetwork/middleware/bi-foundation/obiee-samples-167534.html

The free sample NAVTEQ data contains two components: geospatial data (coordinates, boundaries, etc) and map tiles (background images for different zoom level) which are good enough for you to have a preliminary taste of OBIEE geospatial analysis. But, if you want to analyze your business intelligence on street- level detail map or with specific postcode/suburb boundary in your country, you need to find more comprehensive source for map tiles and geospatial data. For something that you want to put on production, paid data package with professional support services should always be the first choice. For learning, practicing or demoing purpose, there is free self-service approach to achieve your goal. In this blog post, you will see how to leverage government’s statistic websites and Google map to have you business intelligence on local map with local boundaries. Instead of having lame geospatial analysis like this:

You will be able to achieve a much better one like this:

Step 1, Download Boundary Geospatial data

In order to draw boundary on the map, we need geospatial data stored in oracle database (with Spatial or Locator option, of course). The Sample NAVTEQ data doesn’t contain detail boundary information for countries other than United States. We need to find it from somewhere else.

As I live in Australia, I will use Australia New South Wales Electoral boundary data as example here. The two government websites below provide very rich geospatial data for different categories. Make sure the data you choose is ESRI ShapeFile format and has “boundary” information, as we want to generate thematic view on our BI Map report. You should be able to find similar free information from your local government websites.
http://data.gov.au/tag/geospatial-information/
http://www.aec.gov.au/Electorates/gis/gis_datadownload.htm

After the extraction of download zip file, you should have a set of files look similar to:

ESRI Shape Files

Step 2, Import ShapeFile data into your Oracle database

1, Make sure the availability of the following files in the file system of your Oracle database server:
$ORACLE_HOME/jdbc/lib/ojdbc5.jar
$ORACLE_HOME/md/jlib/sdoutl.jar
$ORACLE_HOME/md/jlib/sdoapi.jar

If you are missing Sdoutl.jar and sdoapi.jar, they can be downloaded from http://www.oracle.com/technetwork/database/options/spatial/downloads/software/sp-download-distlic-522138.html

2, Copy the Shapefiles to the file system of your database server;

3, Run Oracle MapBuilder to import ShapeFile into Oracle database;

Oracle MapBuilder can be downloaded from: http://www.oracle.com/technetwork/middleware/mapviewer/downloads/index.html

Connect to your database at first

Choose the shapefile that you downloaded;

Keep the default value and click on “next”;

Choose NOT to create theme now and go next

Then you should be able to find a new table created in your database:

Step 3, Create Geometry Theme in Oracle database for the boundary data imported

1, right click on “Geometry Themes” and create a new theme;

3, in the wizard, make sure you choose the imported geospatial table for “Base Table”;

4, Tick the box label style and choose a column as attribute for this style. Later on you will use this column to link with the geo information column (e.g. city name, district name, etc) in your BI subject area. Specify the value for “Label Style” and change the value for “Label Function” from -1 to 1. In this example, I used the column “Electoral Division name”;

5, specify the query condition if you have any

6, Click on Finish to create the theme;

Step 4, Setup Google Map in Oracle MapViewer

Because the NavTeq sample data doesn’t provide map background details beyond country level for Countries other than United States, we want to leverage Google Map as a service for the background images. The instructions for configuring Google Map with Oracle Mapviewer can be found from the section 2.7.2 of deployment guide for OBIEE Sample App.

http://www.oracle.com/technetwork/middleware/bi-foundation/downloads/sampleapp107-deploymentguide-435786.pdf

5, Setup layers and background map in OBIEE 1, click on manage map data in OBIEE administration page

2, Import a new layer. Choose to import layers from the Geometry Theme that you just created in step 3.3;

3, Click on the pencil icon to edit the newly created layer. Map the Layer Key with a BI Key from your BI subject area. Normally the BI Key will be a dimensional table column for end user facing geospatial information such as country name, city name, district name, etc. Click on OK to save the changes;

4, Go to “Background Maps” tab and click on “import background maps”. Chose the Google Map entry that you configured in step 4;

5, Add an interactive BI Layer to the background map. Choose the one that you just imported at step 5.3

6, setup a proper zoom level for the layer. You can add multiple layers on one background map. That will give you drilling capability on BI Map thematic view.

7, click OK to saving all the map data setting.

6, Create your BI Map report with the imported boundary on map

1, Define Answers Report Criteria;
In minimal, you need to include at least one fact measure and one dimensional column in your report criteria. The dimensional column must be the one BI Key column that you defined at step 5.5.

2, Add a map view for the answers report. If all the settings in the previous steps are correct, you should be able to get the initial result as below automatically

3, Click on the pencil icon on top right to edit the map view;

If you get the error message like below, check two settings in your BI Admin Console:

Star schema join in business model;
Aggregation rule for the measure column;

If no error message, you can add/modify map format as you want

Now, enjoy the visualization of your business intelligence on interactive local map.

Until next time.