Digital Sonata

Digital Sonata
 intelligent solutions for language processing

News & Press Releases

Wednesday, 27 January 2010

English - Swedish OLIF dictionary released

Engish - Swedish OLIF dictionary added to the list of OLIF lexicons distributed by Digital Sonata. The dictionary is available for download from http://www.digitalsonata.com/download.aspx?type=linguisticData.

Sunday, 10 January 2010

Bilingual OLIF dictionaries released

Digital Sonata released a set of low-cost royalty-free bilingual dictionaries in OLIF format, optimized for use in NLP and content management applications. Translation, part of the speech, and a thesaurus article is included. The dictionaries are available at http://www.digitalsonata.com/download.aspx?type=linguisticData. Currently the following dictionaries are available:


  • English -> Finnish

  • English -> French

  • English -> German

  • English -> Japanese

  • English -> Korean

  • English -> Russian

  • English -> Spanish

Tuesday, 5 January 2010

Carabao Language Kit 1.6.2.1 released

The version 1.6.2.1 is now available for download.

Fixed:

  • Transliteration to empty string
  • Partial transliteration

Added:

  • Change log which allows distributed collaboration

Improved:

  • Processing speed
  • Entry matching accuracy
Monday, 1 June 2009

No more direct sales

Please be advised that we no longer license our products off-the-shelf. If you would like a quote for our services, leave us a message

Wednesday, 4 March 2009

Guide to sequences uploaded

We uploaded a short guide to building and debugging the sequences. It is available at our whitepaper download page.
Friday, 20 February 2009

Carabao Language Kit 1.5.0.1 released

The version 1.5.0.1 is now available for download. Lots of changes and enhancements thanks to ongoing development of Chinese (not in the default database though).

Fixed:

  • Regression: "phantom capitalization" of re-used words
  • Regression: sequence style forcing / avoiding
  • Repositioning errors in sentences with attached tokens
  • Sequence processing in languages not using white spaces
  • Regression: single member sequence processing

Added:

  • Lattice-based processing for speech recognition and OCR application usage
  • Optional and unmapped members in sequences
  • Members in sequences which are validated but not mapped
  • Possibility to get a crosslingual representation (components only: DeepAnalyzer and Translation Server)
  • Possibility to load content from a disambiguated crosslingual representation
  • GUI in Translation Console to enable lattice-based processing
  • GUI in Translation Console to enable loading crosslingual representation
  • GUI in Translation Console to hint the system about the expected domains in the text
  • Analysis mode in Translation Console, when the source and target languages are the same and no styles are enforced / avoided
  • Capability of using the white space as a delimiter in languages that don't have white spaces
  • Smart quotes and other delimiters

Improved:

  • Dictionary GUI - presents thesaurus from another language, if missing in the current language
  • Sequence builder GUI - color coding of members which are not mapped, or contain conditions producing empty sets
Monday, 8 December 2008

Carabao Language Kit 1.2.3.0 released

The version 1.2.3.0 is now available for download.

Fixed:

  • Handling of single quotes as syntax delimiters in English

Added:

  • A segmentation mode more effectively handling languages that don't use white spaces (e.g. Chinese, Japanese, Korean, Thai). In this mode, different character classes are broken into tokens (e.g. Chinese, and then immediately English). The remaining unidentified part is run through unknown heuristic identifier.

  • Automatic conversion for Unicode clipboard data into the currently active encoding in tokens table

  • Better warning when attempting to overwrite the current token

  • A utility to rebuild semantic links cache

Improved:

  • In some systems, the table of tokens with every update was adding a new set of system icons (minimize, restore, maximize) to the MDI frame window. The maximize option now causes the window to be set roughly to the full client area, but not in maximize mode

Monday, 8 September 2008

Free source code section

We added a small source code section on our Download page, where we will post freebies for developers.
Monday, 8 September 2008

Carabao Language Kit 1.2.0.0 released

The version 1.2.0.0 is now available for download.

Fixed:

  • Unknown patterns were translated as hypernyms

  • Regression: certain category-based sequences were omitted on second execution because of a malfunctioning guess scan caching mechanism

  • In analytical mode (Carabao DeepAnalyzer), there was a mismatch between word index number and an idiom member index, in sentences with attached tokens such as 'em, 'm

  • When copying a token with 1 rule units or less, the text is always reset to the original

Added:

  • Capability to match numbers as patterns

  • When a translation is not found, the engine tries to fall back to a matching hypernym instead

  • New methods to Carabao DeepAnalyzer that enable accessing the members of the detected idioms

  • New methods to Carabao CDA that enable accessing the unknown heuristics table

  • New sequences

  • Russian morphological exceptions

Improved:

  • If an "unknown pattern" is forced to match a known word, it will not create a new guess if a guess with a same hypernym already exists. For example, if you force to check, whether a known word can be a city, a new record will not be created, if there is already a guess with a known city
    Automatic input language switching in locator fields
  • Locator fields are pre-filled with the list of all existing languages in the database, eliminating the need to jump to the next language

Wednesday, 23 April 2008

We are published at ELRA

After a few months of evaluations, agreements, and inspections, our linguistic data is published at European Linguistic Resources Association's website. The Russian - English OLIF dictionary is sold at quite a price, while the freebie Swahili, Czech and Cebuano dictionaries are distributed for free (although ELRA takes postage and media charges).

It is important to mention that all this data can be created from (usually free) ASCII dictionaries on the net using Carabao Linguist Edition.

Clarification: OLIF is Open Lexicon Interchange Format backed by SAP, especially created for NLP oriented lexica. The official website is www.olif.net.
Tuesday, 1 April 2008

Server transition

We just moved to a new server. Much better performance, but there might be some minor technical glitches in the next few days. Thank you for your patience.
Tuesday, 11 March 2008

Carabao Language Kit 1.1.0.1 released

The version 1.1.0.1 is now available for download - mostly to fix the regressions reported in 1.1.0.0.

Fixed:



  • Crash when using sequence extraction option (regression from 1.1.0.0)


Added:



  • Capability to import sequences by data entry directly from the Sequence Sheet

  • Capability to manually set sequence descriptions
  • Some sequences for multi-word entity extraction
  • More morphological exceptions for Russian


Improved:



  • Processing speed and memory consumption - further boost

  • Token Sheet (words & sequences) GUI

Thursday, 28 February 2008

Carabao Language Kit 1.1.0.0 released

The version 1.1.0.0 is now available for download.

Fixed:

  • Volatility of newly assigned rule units in late sequences
  • Inconsistencies in the generation of inflected forms in design time

Added:

  • Friendly GUI of meta-rules such as lemmatized forms and generation of inflected forms
  • MorphoLogic now inspects the design time data generation meta-rules when generating inflected forms

Improved:

  • Processing speed and memory consumption
  • Increased maximum length of the meta-rule content field
  • Increased some fields to accommodate large sequences and a lot of grammatical data
  • Concurrency during long processing
NOTE: if you are upgrading from 1.0 and would like to keep your data, please run convertTo11.exe executable on your data.
Sunday, 24 February 2008

Our products are now available at ComponentSource

ComponentSource, the largest online reseller of software components, is now selling Carabao DeepAnalyzer, with Carabao MorphoLogic and Carabao Translation Server on the way. Here is a direct link to our page:

http://www.componentsource.com/features/digital-sonata/index.html

It took us a while (over 2 months) to sign up, with all the checks, examinations, questions, and reviews.

ComponentSource provides the corporate customers a more convenient mode of purchase, compliant with their supply chain procedures, and establishes higher visibility for our products.