Probability-based translation

The more I translate, the more convinced I become that developing accurate translation software is a nearly impossible task, and one that certainly can’t be achieved with probability-based models alone, as is used by Google Translate. Aside from the idiosyncratic and cultural properties of language (as previously discussed here), machine translation is complicated by the incompleteness of reference databases. Essentially, it’s impossible for a piece of software to translate a term for which no dictionary entry or prior translation exists. This problem is much more pressing than one might initially suspect, considering the frequency with which the translator encounters little-used terms for which no translation is immediately forthcoming. Translating the other day, I kept a list of uncommon terms and the number of Google hits each term yields. Here are a few: Patentfeld (7 hits); patentstark (3 hits); versatzfähig (1 hit); bestandeskundlich (2 hits); tiefenstufenabhängig (3 hits).

In the absence of an ability to consider the larger context of a text and deconstruct meaning – in short, without the ability to think – translation software is unable to effectively deal with non-standard terms. Yet the complicating factor of rare terminology is just one example of the many situations in translation in which a 1:1 rendering is not possible. Clearly, the dynamic transformation of the signifiers in a source text necessary to produce an accurate and legible translation is an act of creative interpretation that is totally beyond the present capabilities of translation software, particularly software based on probability models.