Business Intelligence (BI) as a business practice strives to be fact based, and usually number based. Hence, it’s not enough to discover facts; you have to quantify them, too. This helps explain why BI—for most of its life—has focused almost exclusively on structured data, as expressed in reports and analytic data models. Furthermore, most BI tools for reporting and analysis operate exclusively on data in specific structures—like relational tables and multidimensional cubes—and most tools access these only through SQL.
While documents containing unstructured data can contribute to the decision making of BI, they cannot participate directly in its data-driven reports and analyses—unless facts discovered in unstructured data are extracted and transformed into structured data that’s conducive to reporting and analysis. As precedence, we assume that data extraction, transformation, and load (ETL) are part and parcel of integrating structured data into a data warehouse or similar BI data store.
We need now to extend that assumption to also encompass unstructured data and semi-structured data. They also require extraction to locate relevant entities and their facts—followed by transformation into appropriate data structures—before they can be loaded into a data warehouse and be useful for the traditional accoutrements of BI, like standard reports, multidimensional analyses, and statistical analyses. The curious irony is that this data is unstructured or semistructured in its source form, yet must be transformed into structured data—via some kind of text analytics—before participating fully in BI.
KTMG Consulting has been helping companies extract facts, relationships and sentiment from unstructured data, which comprise approximately 85% of the information they store electronically. The solutions provided involve natural language processing technology to address collective intelligence in social media and forums; the voice of the customer in surveys and emails; Social Customer Relationship Management (Social CRM); e-services; research and e-discovery; risk and compliance; and intelligence analysis.