
Text Analytics and Transformation Solutions |
|
Task
|
Text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents. (From Wikipedia, the free encyclopedia.) Business data does not always come prepackaged in spreadsheets, relational databases or XML. Developers in different industries frequently face challenging tasks involving processing unstructured text. As the expectations of computers being 'smarter' persist, text mining and text analytics become increasingly important parts of the Business Intelligence. |
Solution
|
Our products "shred" content into a list of numbers each corresponding to a particular concept behind a word or an idiom. The concepts are semantically interconnected. In addition to the usual phone numbers, emails, locations, addresses, and other kinds of recognizable patterns, the flexible architecture of our products offers a choice of over 100,000 entities (closer to 120,000). It is possible to find, for example, names of illicit drugs (including street names), or locations in US, or components of explosive devices, or nuclear physics related terms, or words with British spelling. The same applies to the domains of discourse. As our software is dictionary-based, it does not require training sets. Thorough text analysis requires much more than simple string matching. For example, the word "nice" in a sentence like 'Nice to see you here' is not a city, while 'Nice is a great place to relax' does mention a city. Our products are able to distinguish between different meanings of the same word. |
Sample customer scenarios |
|
Scenario 1:
|
Situation: A travel advisory website aggregates news feeds. An alert needs to be sent to a supervising editor when a disease outbreak, or armed conflict is reported in a particular region. Solution: Carabao DeepAnalyzer processes the news feeds. The customer's source code searches the collection of ID numbers returned by Carabao for any kinds of 'diseases', or anything related to 'riots', 'conflicts' or 'terrorism'. If found, the relevant news feed is emailed to the supervisor. For example, out of 10,000 news articles received daily, the supervisor receives one or two alerts. Note that there is no need to explicitly look for every disease or conflict; as the concepts are linked, the customer's source code only searches for a common parent, e.g. 'disease'. Costs involved: 1 license of Carabao DeepAnalyzer + about 3 hours of development by an average software developer. |
Scenario 2:
|
Situation: A real estate agency holds free flowing natural language descriptions of its real estate agents. As their strengths and specialization differ, it is required to search their profiles by numerous features rather than keywords, such as: expertise in particular demographic groups (adults, young couples, immigrants), special linguistic skills (e.g., Spanish speaker), personal characteristics (educated, patient, etc.). Solution: Carabao DeepAnalyzer is used to index word senses in each profile. A user is presented with a dropdown list of possible meanings of each term entered in the search query, using Carabao MorphoLogic's built-in thesaurus. The indexed profiles are then searched for the disambiguated meanings. On a later stage, it is possible to ehnance the result page with a custom dropdown list, presenting only features found in the profiles (e.g., sort by demographic group match). Costs involved: 1 license of Carabao DeepAnalyzer + 1 license of Carabao MorphoLogic + 6 to 10 hours of development by an average software developer. |
Scenario 3:
|
Situation: A news agency aggregates news feeds from different sources, and needs to assign labels according to geographical region(s) and relevant subject(s). Solution: Carabao DeepAnalyzer is used to extract dominant domains of discourse and geographical information. The customer's program simply calls a method in Carabao DeepAnalyzer class to process the data and extract the associated domains of discourse. High accuracy and robust algorithms of Carabao allow it to distinguish between homonymous words, such as Paris in Texas and Paris in France. Costs involved: Site license of Carabao DeepAnalyzer + 1 hour of development by an average software developer. |
Scenario 4:
|
Situation: A customer service operator needs to search for loosely defined resellers of a particular product and contact them with all the available means (emailing, faxing, calling) to communicate emergency announcements (such as product recall) from the manufacturer. The phone and faxing assignments are delivered on a geographical basis. Solution: Carabao DeepAnalyzer is used to extract names, phone numbers, emails and locations. Names are associated with the phone numbers, emails and locations. If information is missing or seems to be incomplete, an alert is sent to a supervisor. The bulk of the faxing and emailing operations is automated, minimizing personnel costs. Costs involved (mining phase only): 1 license of Carabao DeepAnalyzer + 8 hours of development by an average software developer. |
Scenario 5:
|
Situation: A word processor grammar checker incorporates functionality to offer substitutes for certain words and phrases (obsolete terms, obscenities, slurs, regionally used, etc.). Solution: Carabao MorphoLogic is used to obtain stylistic information and to list possible alternatives to the specified words. The word processor's GUI lists the alternative words including a thesaurus article. Costs involved: Developer license of Carabao MorphoLogic + about 10 hours of development by an average software developer. |
Conclusion |
Our products are built on a powerful generic basis. The possibilities are numerous, and if you are not sure if our software can handle it, or want to evaluate the costs involved, please feel free to request a free evaluation without any obligations. |
| Home | Products | Download | Services | Contact| Digital Sonata Pty Ltd © 2007-2008| Privacy Policy| Terms of use |