The Evolution of Textual Analytics

By W H Inmon

The world of computers is immature. Computers have been around since the 1950’s.Prior to that there was no computing. Compare the computing profession to other professions and there is no contest.  Go to Rome and see the works that are standing today that were wrought by engineers two thousand years ago. Go to Egypt and look at the pyramids and look at the hieroglyphics on the wall. Much of the writing on the wall is from an accountant discussing how much grain is owed the Pharoah.  Go to the caves in Chile high in the Andes and find bones 10,000 years old that indicate that medicine – in a crude form – was practiced ten millennia ago.

When you compare the computer profession to other professions – engineering, accounting, medical - it simply is inarguable that the computer profession is immature.

One sign of the immaturity of the computer profession is the rapid evolution of computer technology. Across the board, the computer profession and its underlying technology has evolved and is still evolving at an amazing rate.

To date, all aspects of computing have gone through or are still going through an evolutionary process. The first computers evolved from stand alone batch computers to the mainframe to parallel processors. Applications evolved from batch processing to online processing to ERP processing. Analytical processing for structured data evolved from reports to Business Intelligence.

In terms of evolution, textual analytics is no different from any other form of computing.  Textual analytics is witnessing an evolution of technology today.


The Steps of Evolution

Fig 1 shows the steps up the ladder of evolution that textual analytics is witnessing.


Fig 1

There are essentially seven steps up the ladder of maturity for textual analytics. The first step is the awareness that unstructured data is in the corporation and that unstructured data contains a lot of important information. Unstructured data accumulates in the corporation in a random, unorganized manner. There are emails. There are corporate contracts. There are medical reports. There are insurance policies. In a hundred ways and in a hundred forms, unstructured data runs through the corporate lifeblood of the business of the organization.

One day someone decides that unstructured data should not be stored in a random, unmanaged fashion. The next step up the ladder is the notion that unstructured data should be collected and managed in a methodical manner. Soon unstructured data is gathered into cabinets and archives.

And then the organization discovers that paper erodes quickly and that reading text off of paper is tedious and inexact. A better approach is to store text electronically. Soon unstructured data is lifted off of the paper on which it is stored and is placed in an electronic format. A repository of electronic data is made that consists of textual data in an electronic form.
Then one day someone decides that if the unstructured data is going to be stored electronically, that the data ought to be useful. The next step of evolution occurs when search technology is used to examine the unstructured text that is stored electronically. Search satisfies the curiosity about the contents of unstructured data. But organizations soon discover the limitations of search. In many regards, a search of unstructured data is like a candy bar for dinner. It only serves to whet the appetite for something more substantial.

Organizations graduate from search to an understanding that textual data needs to be integrated before it is fit for textual analytics.  This is the next step up the ladder of evolution of textual analytics. Once textual data is integrated, it is fit to be placed in a data base. Once in a data base, the integrated text can be analyzed by standard analytical processing. All sorts of analytical opportunities open up once the textual data is integrated.

But seeing patterns in the textual data across broad vistas is difficult to do. The next step up the ladder of evolution for textual analytics is that of visualization. With visualization of text it is possible to see patterns that have not before been obvious.

As important as visualization is, there is yet one last step of evolution. That last step of evolution is the capability of merging/integrating structured and unstructured data together. Once structured and unstructured data can be merged together, all sorts of analytical possibilities open up.

There are then seven steps of evolution that textual analytic processing goes through. As in every evolution the steps of the evolution are inevitable. Trying to hold back any one step of evolution is like trying to hold back the tide with your bare hands. It is futile, dangerous, and a waste of time.

Different stages of evolution have their own technologies which are prominent. Fig 2 shows some of the prominent technologies that are popular with organizations at different levels of maturity.


Fig 2


In Fig 2 it is seen that email repositories, stores of medical notes, stacks of insurance policies all are found in this very immature stage of evolution. The storage cabinet becomes the medium of exchange for this very early stage of management of textual data.  At this point the organization is merely collecting and loosely organizing raw unstructured data. There is little organization and little uniformity to the textual data at this point.

A step up from the filing cabinet is microfiche. With microfiche large amounts of data can be stored in a small physical location and the data is – more or less captured permanently.

The next stage of evolution occurs where unstructured data is collected in an organized manner. Technology that is popular here includes ECM – enterprise content management software – such as Documentum, FileNet and Stellent are used. These technologies are good for finding and storing textual data in an organized manner.

The next stage of evolution occurs for search technology. Here Google, Fast, and Yahoo play a role. They are good for searching text.

At the next three levels of evolution is found Forest Rim Technology. Forest Rim Technology patented technology allows text to be read and integrated, producing results that form the basis of textual analytics. In addition Forest Rim Technology produces visualizations that are useful in looking at the entirety of a body of text. And Forest Rim Technology integrates data so that structured and unstructured analysis can be done in an integrated manner.

Reasons for Evolution

For every evolution that has ever occurred there are reasons for the evolution occurring. The evolution of textual analytics is no different.

At every step of the evolution of textual analytics, pain and dissatisfaction have caused the next step of evolution to occur.
In the beginning, textual data was collected, almost as an after thought, or as a by product of conducting business. Intuitively someone knew that there was some intrinsic value in collecting a body of text. But there was no notion of accessing, analyzing , and using the text.

One day someone got tired of all of this unstructured data hanging around decided to collect and organize it in a methodical manner. The dissatisfaction with randomly organized data led to the desire to collect and organize the data in an organized manner.

Soon ECM technology was used to gather and organize textual data. The organization of unstructured data meant that at least the analyst stood a chance of finding a piece of unstructured data should the need arise. But after a few experiences of searching for data, the analyst decided that there had to be a better way. The pain of conducting a search led to the recognition that technology for search was needed.

The next step of evolution was that of search. Once the textual data was collected and a search engine was employed, search was easily accomplished. But the analyst quickly realized that search as full of limitations. Classes of data could not be recognized, text could not be counted and analyzed by classes of data, differences in terminology preempted efficient analysis of data, textual renderings of data prevented understanding and interpretation of data, and so forth. In short the limitations of search became obvious to the business analyst.

Soon textual data was integrated. And with textual integration came the opportunity to do very sophisticated analysis of text. All sorts of analysis could be done that was before merely a wish in an analysts mind.  But looking across vast vistas of data was very difficult. The next level of evolution occurred when textual data began to be able to be analyzed visually.
And as valuable as visualization was, an even more sophisticated level of analysis occurred when unstructured data could at last be merged with structured data.

It is at this point that the data warehouse that contained both structured data and unstructured data was able to be realized and worlds of analytical processing that once were a theory began to be a reality..   

 

COMENT | PRINT |SHARE


 

Comentar sobre éste artículo









 

 

Compartir éste artículo












 

COPYRIGHT® INTELLEGO 2009 / INTÉGRATE / WEBMAIL / ESPAÑOL