Digital Sonata

Digital Sonata
 intelligent solutions for language processing

News & Press Releases

Monday, 8 September 2008

Free source code section

We added a small source code section on our Download page, where we will post freebies for developers.
Monday, 8 September 2008

Carabao Language Kit 1.2.0.0 released

The version 1.2.0.0 is now available for download.

Fixed:

  • Unknown patterns were translated as hypernyms

  • Regression: certain category-based sequences were omitted on second execution because of a malfunctioning guess scan caching mechanism

  • In analytical mode (Carabao DeepAnalyzer), there was a mismatch between word index number and an idiom member index, in sentences with attached tokens such as 'em, 'm

  • When copying a token with 1 rule units or less, the text is always reset to the original

Added:

  • Capability to match numbers as patterns

  • When a translation is not found, the engine tries to fall back to a matching hypernym instead

  • New methods to Carabao DeepAnalyzer that enable accessing the members of the detected idioms

  • New methods to Carabao CDA that enable accessing the unknown heuristics table

  • New sequences

  • Russian morphological exceptions

Improved:

  • If an "unknown pattern" is forced to match a known word, it will not create a new guess if a guess with a same hypernym already exists. For example, if you force to check, whether a known word can be a city, a new record will not be created, if there is already a guess with a known city
    Automatic input language switching in locator fields
  • Locator fields are pre-filled with the list of all existing languages in the database, eliminating the need to jump to the next language

Wednesday, 23 April 2008

We are published at ELRA

After a few months of evaluations, agreements, and inspections, our linguistic data is published at European Linguistic Resources Association's website. The Russian - English OLIF dictionary is sold at quite a price, while the freebie Swahili, Czech and Cebuano dictionaries are distributed for free (although ELRA takes postage and media charges).

It is important to mention that all this data can be created from (usually free) ASCII dictionaries on the net using Carabao Linguist Edition.

Clarification: OLIF is Open Lexicon Interchange Format backed by SAP, especially created for NLP oriented lexica. The official website is www.olif.net.
Tuesday, 1 April 2008

Server transition

We just moved to a new server. Much better performance, but there might be some minor technical glitches in the next few days. Thank you for your patience.
Tuesday, 11 March 2008

Carabao Language Kit 1.1.0.1 released

The version 1.1.0.1 is now available for download - mostly to fix the regressions reported in 1.1.0.0.

Fixed:



  • Crash when using sequence extraction option (regression from 1.1.0.0)


Added:



  • Capability to import sequences by data entry directly from the Sequence Sheet

  • Capability to manually set sequence descriptions
  • Some sequences for multi-word entity extraction
  • More morphological exceptions for Russian


Improved:



  • Processing speed and memory consumption - further boost

  • Token Sheet (words & sequences) GUI

Thursday, 28 February 2008

Carabao Language Kit 1.1.0.0 released

The version 1.1.0.0 is now available for download.

Fixed:

  • Volatility of newly assigned rule units in late sequences
  • Inconsistencies in the generation of inflected forms in design time

Added:

  • Friendly GUI of meta-rules such as lemmatized forms and generation of inflected forms
  • MorphoLogic now inspects the design time data generation meta-rules when generating inflected forms

Improved:

  • Processing speed and memory consumption
  • Increased maximum length of the meta-rule content field
  • Increased some fields to accommodate large sequences and a lot of grammatical data
  • Concurrency during long processing
NOTE: if you are upgrading from 1.0 and would like to keep your data, please run convertTo11.exe executable on your data.
Sunday, 24 February 2008

Our products are now available at ComponentSource

ComponentSource, the largest online reseller of software components, is now selling Carabao DeepAnalyzer, with Carabao MorphoLogic and Carabao Translation Server on the way. Here is a direct link to our page:

http://www.componentsource.com/features/digital-sonata/index.html

It took us a while (over 2 months) to sign up, with all the checks, examinations, questions, and reviews.

ComponentSource provides the corporate customers a more convenient mode of purchase, compliant with their supply chain procedures, and establishes higher visibility for our products.
Wednesday, 23 January 2008

Carabao Language Kit 1.0.0.3 released

The version 1.0.0.3 is now available for download.

Fixed:

  • Various tagging problems
  • A bug with mid-sentence sequences priority setting
  • Generation of lemmas from the canonic form for tagging-only affixes

Added:

  • A button to tag new entries morphologically
  • A handful of commonly used business entities (e.g., address, phone, fax, business hours)

Improved:

  • Accuracy of some sequences
  • Domains
Thursday, 27 December 2007

Carabao Language Kit 1.0.0.2 released

The version 1.0.0.2 is now available for download.

Fixed:

  • Inflection generation problems of TagLemma results (words not in the dictionary) in Carabao MorphoLogic

Added:

  • Capability to inspect other guesses. For example, in a sequence like "adverb" + "adverb", it is possible to quickly scrap the entire sequenec if the second adverb can be a preposition
  • Comprehensive morphology of Russian language

Improved:

  • Removed description of negative constraint elements (those that do not have an identity) in sequence in order to make the descriptions less cluttered
  • Performance of sequence processing
  • Accuracy of sequences
  • Domains
Thursday, 22 November 2007

Carabao Language Kit 1.0.0.1 released

Following the first feedbacks and testing results, we made certain changes to the English lexicon, increasing its accuracy. The version 1.0.0.1 is now available for download.


Fixed:




  • Various validation problems with attached tokens

  • Lookup windows are no longer maximized on opening

  • Incorrect tooltips after deletion in the dictionary table


Added:



  • GUI support for negative constraints in sequences

  • Handling of irregular 'smart quotes' in Translation Console

  • Manual disambiguation table in Carabao Linguist Edition

  • Style tags to the tooltips in the dictionary table

Improved:



  • Supplied syntactic structures for English

  • In the translation console, the original thesaurus article is suppressed when the word is part of an idiom - to prevent confusion
Wednesday, 31 October 2007

Digital Sonata releases Carabao Linguist Edition

Digital Sonata announces the release of Carabao Linguist Edition desktop suite.


Carabao Linguist Edition desktop suite allows users to import bulk data from Carabao Exchange XML files. The entries in the imported dictionaries are matched either by their ID number or using fuzzy comparison with existing entries of any language in the database. In addition, Carabao Linguist Edition contains a module which allows for the transformation of unstructured "paper" dictionaries into machine-readable OLIF XML format.

Monday, 17 September 2007

Digital Sonata releases Carabao Language Kit 1.0.0.0

Digital Sonata is proud to announce the release of Carabao Language Kit 1.0.0.


Carabao is a family of linguistic tools providing the following capabilities:


  • Sense disambiguation & text understanding

  • Detailed, sentence by sentence domain extraction

  • Machine readability evaluation

  • Automatic translation between languages

  • Deep morphological analysis and synthesis

  • Transliteration between scripts

  • Named entity recognition and classification

  • Automatic linguistic profiling

  • Universal measure conversion
The most distinctive feature of Carabao is the engine's complete abstraction - from the linguistic point of view. All the linguistic logic resides in a database complete with a powerful GUI data editor. This enables users to tweak, modify, alter the engine (to the point of adding a new language) in every possible way without writing one line of source code.

In addition to more straightforward purposes, Carabao can be integrated with 3rd party machine translation and other NLP packages (such as OCR or even voice recognition) to improve their accuracy.