OEID 3.0 First Look — Text Enrichment & Whitespace

I recently spent some cycles building my first POC for a potential customer with OEID v3.0.  After running some of the unstructured data through the text enrichment component, I noticed something odd:

whitespace_prob

The charts I configured to group by those salient terms were displaying a “null” bucket.  This bucket was essentially collecting all records that were not tagged with a term.  After a bit of investigation, it seems this is expected behavior in v3.0 — the Endeca Server now treats empty, yet non-null attributes, as valid and houses them on the Endeca record.  Empty, yet non-null, attributes are common after employing some of the OOTB text enrichment capabilities in 3.0 (tagging, extraction, regex).  Thus, a best practice treatment for this side-effect is warranted.

The good news is that the workaround was very straightforward.

1) Add a “Reformatter” component to the .grf before the bulk loader with the same input and output metadata edge definition.  From the reformatter “Source” tab, select “Java Transform Wizard” and give your new transformation class a name like “removeWhitespaces”.  This will create a .java source file and a compiled .class file in your Integrator project’s ./trans directory (where Integrator expects your java source code to reside).

removeWhitespace

2) Provide the following java logic in your new “removeWhitespaces” transformation class:
import org.jetel.component.DataRecordTransform;
import org.jetel.data.DataRecord;
import org.jetel.exception.TransformException;
import org.jetel.metadata.DataFieldType;

public class removeWhitespaces extends DataRecordTransform {

@Override
public int transform(DataRecord[] arg0, DataRecord[] arg1) throws TransformException {
for(int i = 0; i < arg0.length; i++) {
DataRecord rec = arg0[i];
for(int j = 0; j < rec.getNumFields(); j++) {
if(rec.getField(j).getMetadata().getDataType().equals(DataFieldType.STRING)) {
if(rec.getField(j).getValue() == null || rec.getField(j).getValue().equals(“”) || rec.getField(j).getValue().toString().length() == 0) {
rec.getField(j).setValue(null);
}
}
arg1[i].getField(j).setValue(rec.getField(j).getValue());
}
}
return 0;
}
}

3) Make sure the name of this new class is specified in the “Transform class” input.  Rerun the .grf that loads your data and….profit!

whitespace_fix

We look forward to sharing more emerging OEID v3.0 best practices here….and hearing about your approaches as well.

 

 

OEID 3.0 First Look – Update/Delete Data Improvements

For almost a decade, the core Endeca MDEX engine that underpins Oracle Endeca Information Discovery (OEID) has supported one-time indexing (often referred to as a Baseline Update) as well as incremental updates (often referred to as partials).  Through all of the incarnations of this functionality, from “partial update pipelines” to “continuous query”, there was one common limitation.  Your update operations were always limited to act on “per-record” operations.

If you’re a person coming from a SQL/RDBMS background, this was a huge limitation and forced a conceptual change in the way that you think about data.  Obviously, Endeca is not (and never was) a relational system but the freedom to update data whenever and where ever you please, that SQL provided, was often a pretty big limitation, especially at scale.  Building an index nightly for 100,000 E-Commerce products is no big deal.  Running a daily process to feed 1 million updated records into a 30 million record Endeca Server instance just so that a set of warranty claims could be “aged” from current month to prior month is something completely different.

Thankfully, with the release of the latest set of components for the ETL layer of OEID (called OEID Integrator), huge changes have been made to the interactions available for modifying an Endeca Server instance (now called a “Data Domain”).  If you’ve longed for a “SQL-style experience” where records can be updated or deleted from a data store by almost any criteria imaginable, OEID Integrator v3.0 delivers.

Continue reading

OEID 3.0 First Look – Democratizing Data Discovery

Adjectives like “agile” and “self-service” have long been used to describe approaches to BI that enable organizations to ask their own questions and produce their own answers.  Applied to both processes and products, these labels are applicable any time an organization can relax the “IT bottleneck”.  Over the past decade, the core tenets of the Endeca vision (“no data left behind, ease of use, and agile delivery”) have shaped a product that has empowered organizations to unlock insights in their enterprise data in ways never before possible while simultaneously reducing their reliance on IT to do so.  Notice I said “reduce” their reliance, not “eliminate”.

Data discovery is a quest not a destination.  It is a never-ending initiative.  As soon as new truths come to light from your discovery apps, inevitably, new questions arise as well.  Ideally, these new questions can be answered within the application at hand.  Sometimes, however, finding answers to these new questions requires experimentation and alternative data “mash-ups”.  Almost always in these cases, the time comes to pick up the phone, call IT……and wait.

All of the discovery tools on the market today that promise self-service and agility still require IT’s involvement when new data sources or new data models are required, OEID included.  However, through some new features in the the latest v3.0 release, it appears as if Oracle is making strides to address this dependency.

Granted this is just one man’s opinion and largely speculative, but a few of the new features in the product have me convinced that Oracle is pushing to democratize data discovery.  Through subtle (and not so subtle) changes, it seems they’re shifting the product to a platform — one that empowers the business to broaden their own exploration and answer the next round of questions, further reducing your organizations reliance on IT.

 

Here’s what got me thinking

 

A Collaboration Platform

The revamped “home page” experience surfaces new ways to provision and share your applications.  Casual users can now create their own applications, associate them to a data domain, and start composing their apps.  Initially, the applications are “private”, and only made accessible to a group of users hand-picked by you.  You can make your application “public” once you feel it is ready for the prime-time and mass consumption.

Self-Service Data Upload

Another nod in v3.0 to democratization comes with the introduction self-service data upload.  Not only will the upload move data into your data domain, but it will profile your data and (usually) arrive at the proper attribute configuration (data types, etc.)   Currently, this only supports Excel file formats, but if you’re like me, you can see where this is heading…

Excel_upload

Better Cluster Management

At first I was a bit miffed by Endeca Server’s move from Jetty to WebLogic 11g (and even a little frustrated by the involved installation process), but reading the v3.0 literature around improved cluster management, it became clear that more sophistication in the cluster support might mean there is a future in the cloud for the product.  Adding and subtracting nodes from your data domains will be required if end users are actively adding more data or opening up their data mashups to more users in their organization.  Elastic computing would have to underpin such a platform with such dynamic, unpredictable resource demands.

A Vision

Again, this is just one man’s hope for the product.  These changes indicate a shift in the way “self-service” is approached.  In future releases, “self-service” and “agile” BI may no longer mean simply asking your own unanticipated questions.  It may mean introducing new data, new applications and collaborating across the enterprise to further fulfill the promise of data discovery without IT.

I hope Oracle continues down this path.  I long for a future where data discovery happens in the cloud so organizations do not have to fumble with infrastructure, scale and upgrades.  I see a future with data uploads across a variety of formats which can then be added to a data marketplace within the product for the whole organization to leverage.  I hope for new capabilities in Studio so that the data configuration, joining, and cleansing that happens in integrator today by ETL experts and data stewards can be accomplished intuitively by the end users and analysts.

It is my hope that 3.0 is not the end game, but the first step of many towards democratizing data discovery and offering a broader definition to “self-service” BI.

 

 

OEID 3.0 First Look – The Little Things

There’s so much new “goodness” in Oracle Endeca Information Discovery (OEID) 3.0, it’s been a little bit of a challenge to “spread the word” in small enough chunks.  We start writing these posts, get a little excited and pretty soon we’ve got Ranzal’s very own version of the Iliad.

In the coming weeks, there will be a few Iliads, and maybe an Odyssey as well, but before we get too deep into the platform, I wanted to illustrate and elaborate on a couple “small changes” that should prove beneficial to people just coming up to speed and OEID veterans alike.

The Guided Navigation Histogram

As one of my colleagues pointed out, I neglected to highlight a key enhancement to the Guided Navigation user experience when posting to the blog earlier this week.  Often when doing data modeling for an OEID application, you’ll be transforming, joining, doing denormalization and all sorts of other operations on your data as it is being brought into your Endeca Server.  What often happens is that you lose some of the original context that was present in the source system.  For example, you may have a set of sales records that a user has the ability to refine by State, by City and by Product.  When you wanted to give the user the ability to understand “how much data” was behind a given Guided Navigation option, the typical answer was to use Refinement Counts.

OldGuidedNavigation

As you can see above, this construct gives a numerical value to the frequency of a given attribute value in the current data set.  However, this number often causes confusion for users.  Is it the number of Invoices?  Is it the number of line items on my invoices?  Is it the number of Shipments?  Often, it’s none of these things and simply an artifact of how the data is being modeled.  With OEID 3.0, there is a new way to visually display this frequency data, without the messiness of (often) meaningless numbers.

As you can see above, I get the same ability to message to users that most sales are occurring in Toronto with both versions of the product.  However, OEID 3.0 provides the immediate, visceral context that tells the user, my Toronto transactions are nearly three times as numerous.  In addition, the aforementioned absence of “strange numbers” eliminates confusion and encourages users to explore rather than over-analyze.

Multi-Lingual LQL Parsing And Validation

Continuing with the theme of Internationalization, the LQL Parsing Service now supports a language parameter when compiling and validating queries.  While English is still the lingua franca of the internals of the platform, having the ability to troubleshoot your queries in your native language is a huge plus.  Below, you can see the Metrics Bar Portlet returning my syntax error in Portuguese:

Note: For those of you following along, this is the “Unexpected Symbol” error where the per-select Where clause expects the criteria to be in parentheses.  At least I think it is, my Portuguese is a little rusty.

This concept is supported by the Parsing Service itself so any application making use of the Endeca Server web services can leverage this functionality as well.

Languages in Studio vs. Languages in the Engine

One additional note on support for multiple languages is that Endeca Server actually supports more languages than OEID Studio has been translated into so far.  Users in OEID Studio have ten locales to choose from in the application:

  • German
  • English (United States)
  • French
  • Portuguese (Portugal)
  • Italian
  • Japanese
  • Chinese (Traditional) zh_TW
  • Chinese (Simplified) zh_CN
  • Korean
  • Spanish (Spain)

However, Endeca Server supports the above 10 in addition to the following 12 (with their language codes, as Endeca Server expects them, in parentheses):

  • Catalan (ca)
  • Czech (cs)
  • Greek (el)
  • Hebrew (he)
  • Hungarian (hu)
  • Dutch (nl)
  • Polish (pl)
  • Romanian (ro)
  • Russian (ru)
  • Swedish (sv)
  • Thai (th)
  • Turkish (tr)

Note that Endeca Server expects RFC-3066 codes, which will differ slightly from the locales that are used in Studio as well.  For example, setting the language of a given attribute to en_US would not work in Endeca Server while being a perfectly good locale in Studio.  Language would be “en” for Server in this case.

That’s all for now.  More posts coming later today and tomorrow.  Happy Exploring!

OEID 3.0 First Look – The New Guided Navigation

Guided Navigation has been the foundation, the rock upon which Endeca (now Oracle Endeca Information Discovery) has based its entire value proposition for the past decade.  As a result, it has been the one area of the product that has seen the least amount of change.  With the release of Oracle Endeca Information Discovery (OEID) v3.0, Guided Navigation has gotten not an overhaul, but certainly a facelift for the first time that we can recall, making this an exciting time to be on the cutting edge with this software.

When we talk about the facelift, it helps to take a quick step back and understand what Guided Navigation is all about.  If you needed the 140-character synopsis (i.e. the new “elevator speech”), you’d say Guided Navigation is “allowing the user to refine their working data set using its intrinsic attributes” (hope that’s less than 140 characters).  Notice that we make no mention of typing or taxonomies, it’s about the data.  However, if you’ve ever built an application in previous versions of OEID, you’ll notice that you have Range Filters AND Guided Navigation components available.  Shouldn’t they be the same thing since they serve the same purpose and behave in the exact same way?  Well, now they are:

guided-navigation-improvements1

What you see above is the new Guided Navigation configuration screen.  Leaving aside the visual improvements, such as a cleaner design, rounded corners, you’ll notice the attribute-level configuration that is now available.  Not only can a power user pick and choose the attributes to display (as previously discussed), they can also choose how an attribute is displayed.  In the capture above you can see that, for numeric/date attributes, the option to configure an attribute as a “List of Values” or a “Range Filter” is offered.  Implicit in this capability is that the Range Filter component no longer exists and is now built into Guided Navigation! This really drives home the improvements that have been made around the concept of Guided Navigation by the Oracle Endeca Engineering team and, especially, the User Experience guys.  The raison d’etre of Guided Navigation is to allow users to explore their data, regardless of format or source.

guided-navigation-range-filters

Digging even deeper into the configuration (see above), there is a wealth of additional configurability to help your users explore their data.  It’s truly a huge leap forward and I wouldn’t be surprised to see additional improvements (thinking a “Preview” function similar to what you get in the Chart) coming down the road in future releases.

The final result is an extremely compelling data discovery experience that has knowledge of data content and capabilities and uses this knowledge to enable and guide the user, rather than limit them.

And it looks really nice too.

OEID 3.0 First Look – Internationalization

Internationalization in Endeca Server

With the release of Endeca Server 7.5.1, the MDEX engine takes a huge leap forward when it comes to supporting a wide variety of languages for Data Discovery applications. The new version supports 22 languages, up from 7 in the previous version. Supporting these new languages is a huge step forward, but the way in which langauages are implemented and supported inside of Endeca Server is what makes this such a big deal.

Going underneath the hood for the moment, a request to create an attribute in an Endeca instance typically gets issued from Integrator. To get really deep, the XML that gets sent to Endeca Server in OEID 3.0 might look something like this:

<ns1:record>
<mdex-property_DisplayName>Case Officers</mdex-property_DisplayName>
<mdex-property_IsSingleAssign>false</mdex-property_IsSingleAssign>
<mdex-property_IsTextSearchable>true</mdex-property_IsTextSearchable>
<mdex-property_IsUnique>false</mdex-property_IsUnique>
<mdex-property_IsPropertyValueSearchable>true</mdex-property_IsPropertyValueSearchable>
<mdex-property_Key>CaseOfficersInThai</mdex-property_Key>
<mdex-property_TextSearchAllowsWildcards>false</mdex-property_TextSearchAllowsWildcards>
<mdex-property_Type>mdex:string</mdex-property_Type>
<mdex-property_Language>th</mdex-property_Language>
<system-navigation_Select>multi-or</system-navigation_Select>
<system-navigation_ShowRecordCounts>true</system-navigation_ShowRecordCounts><system-navigation_Sorting>lexical</system-navigation_Sorting>
</ns1:record>

If you take a careful look above, you’ll notice that I was able to specify a property called Language (well, mdex-property_Language). What this allows you to do is specify a language at the individual attribute level, allowing a single MDEX to serve multiple sets of content from the same instance. This opens up all sorts of possibilities when it comes to supporting multiple languages in your application.  The attribute I’ve created above will support the Thai language.

We’re currently limited from discussing some of the things that are coming to further enable internationalization until OEID 3.0 Studio is fully released, per our agreements with Oracle. But, you can just imagine the power of augmenting individual attributes and all of their capabilities (type ahead searchability  to use one small example) with a language component that is specific to that attribute, rather than an entire index. As more and more components of the solution are released, we’ll have a ton more to say so stay tuned.

Note: We were asked to remove some posts you may have previously seen on our blog by Oracle as they referred to information that had not yet been made public.  They will be returning to the blog over the coming weeks as OEID 3.0 is released.

OEID 3.0 First Look – Attribute Based Navigation Config

The first in a series of “mini-posts” detailing new functionality in Oracle Endeca Information Discovery 3.0 – Today: Attribute Based Navigation Configuration

As mentioned last week, there are enhancements both great and small coming in OEID 3.0.  None smaller (and proportionally greater) than the ability to select individual attributes in your guided navigation configuration, rather than attribute groups.

For those of us who have been working with the product since its inception, this was a feature that existed in the first three or 4 versions of OEID Studio (then called Discovery Framework) and was widely utilized.  With the advent of Attribute Groups, first as a part of Studio then as a fully fledged part of the MDEX, this configuration capability was removed.

Ah, the "good old days" with DF 1.4

Ah, the “good old days” with DF 1.4

While this change forced best practices in terms of re-usability across pages and allowed for some performance gains inside the MDEX, it removed a lot of flexibility and would force applications into a fair amount of configuration duplication.  For example, it’s usually standard practice for a given record type in Endeca Server to have at least 2 attribute groups.  One, for guided navigation, the other for record list display in a visualization component like Results Grid.

With the OEID 3.0 release, the power to include these attributes in an adhoc fashion will return and give power users greater flexibility when designing their navigation experience, not to mention when you’re still developing your application.  You’ll still want Attribute Groups for increased performance and re-usability but the return of this capability is most welcome.  It sounds like such a small thing but for anyone who has had to go and create an attribute group just to have 2 or 3 attributes show up in a guided navigation portlet, it’s huge.

*Note: Don’t worry, in 3.0, the navigation configuration is way slicker than the 1.4 version pictured above.

Oracle Endeca Information Discovery 3.0 – First Look

A few days ago, Oracle invited us over to their offices for a sneak preview of the latest release of Oracle Endeca Information Discovery (OEID). The OEID 3.0 release will be made available to the public later this year and it’s jam-packed with exciting enhancements across all areas of the product.

The 3 primary components of the solution (Endeca Server, OEID Studio and OEID Integrator) have major enhancements baked in and the result is an incredibly compelling solution that further enables truly agile Information Discovery.

We were furiously taking notes during our session so this may not be an exhaustive list but some of the “highlights” we took away from our session were:

  • Foreign Key Updates are in!
  • Enhanced Clustering with Cluster Management occurring in Studio
  • Record Filters now support EQL-style syntax in Studio
  • Crosstab and Results Table Visualizations fully rebuilt with new features
  • Self Service BI in Studio: Upload an Excel file through Studio and get going
  • Attribute-based (rather than Attribute-group based) guided navigation configuration has returned
  • Refinement Aware State Manager (guess we can “retire” ours?) now a part of Studio
  • Multi-lingual Endeca Server support at the column level, not just the index level
  • Integrated multi-country currency format support
  • Full SSL support for secure reads and writes
  • Endeca CAS is has been renamed IAS and will ship with a new version of Oracle OutsideIn

There’s a bunch more that is probably difficult to summarize in bullet points (ex: Certified vs. Community-based applications) but suffice to say, it’s a huge release and a continuation of the product evolution that began with OEID 2.3. The platform is moving quickly towards becoming a collaborative solution for building discovery applications that provide a view across an entire enterprise from one location, rather than a set of “siloed” applications that serve specific functions. There’s still a lot more to come but you can start to see the direction taking shape.

Keep watching the rest of this week as we’re going to “deep dive” on each of the above bullet points and provide further context around what they are and how they can be used in your solution.