Sharing CAT memories: numbers as words as songs.Abstract The incidence of computers in translation ranges from the rather unsuccessful attempts to attain a Fully Automatic High Quality Machine Translation to the current widespread usage of translation memories. Gone are the years of vast government research funding Research funding is a term generally covering any funding for scientific research, in the areas of both "hard" science and technology and social science. The term often connotes funding obtained through a competitive process, in which potential research projects are evaluated and attempting to "help" computers do the translator's job. Spending comes now from professionals and companies that invest in expensive software in order to help computers help them in their translating. This article explores the current standards and future possibilities of file sharing Copying files from one computer to another. See peer-to-peer network, file sharing protocol and file and printer sharing. as an efficient cost-sharing formula. Glory and Shame of Machine Translation The idea of trying to make numbers talk like words is an old one. While thinkers like Leibniz already devised a mathematical system of language representation and translation as early as in late 17th century, and even Descartes sketched out what he called a "universal language" in form of mathematical expressions (Couturat 2002), we can go back as far as 1661 to trace one of the first fully developed attempts to work out a mathematical model
Athanasius Kircher (listen ) (sometimes erroneously spelled Kirchner in 1663, or an even earlier one by Cave Beck Cave Beck (1623 - c.1706) is known as the author of an early constructed language. In The Universal Character, published in London in 1657, he proposed a universal language based on a numerical system. in 1657 (Hutchins 1986, 2:1). The idea of such "mechanical dictionaries" experienced a revival in the early 20th century, with the "Mechanical Brain," by the French engineer Georges Artsruni, or the invention by the Russian Petr Trojanskij. By using rolling drums, belts, wheels, perforated tape and typewriter-like output methods, were the first truly mechanical translation devices (Freigang 2001). The heyday of many other subsequent attempts to develop an automated translation device started with a well-known memorandum addressed to the Rockefeller Foundation Rockefeller Foundation, philanthropic institution established (1913) by John D. Rockefeller, Sr., to promote "the well-being of mankind throughout the world." During its first 14 years the foundation received $183 million from Rockefeller. in 1949 by Warren Weaver Warren Weaver (b. July 17 1894 in Reedsburg, Wisconsin d. November 24 1978 in New Milford, Connecticut) was an American scientist, mathematician, and science administrator. . While rather questionable, his mathematical model of communication, developed together with Claude Shannon Noun 1. Claude Shannon - United States electrical engineer who pioneered mathematical communication theory (1916-2001) Claude E. Shannon, Claude Elwood Shannon, Shannon , would consolidate the idea of translation as a mere question of "breaking the code" and would initiate two decades of frantic activities and huge resource and budgetary investments, embedded altogether in the specific dynamics of the Cold War, to attain the so called "Fully Automated High Quality Machine Translation," mainly from Russian into English. The final report in 1966 of the Automatic Language Processing
Language processing refers to the way human beings process speech or writing and understand it as language. Advisory Committee (ALPAC ALPAC Automatic Language Processing Advisory Committee ) is also as famous--or, as some would say, infamous [I]. After severely criticizing the actual outcomes of the above projects under a mixture of pertinent observations, questionable assumptions and some understatements, this report marked the end of massive government spending Government spending or government expenditure consists of government purchases, which can be financed by seigniorage, taxes, or government borrowing. It is considered to be one of the major components of gross domestic product. for research on machine translation and the establishment of a certainty certainly. See also: Certainty that lasts today: machine translation is mostly useless without human intervention in the form of editing or rewriting (Hutchins 1996). However, further attempts and approaches would provide new insights to the complexity of the machine translation question, like the initiative in the former European Community European Community: see European Union. European Community (EC) Organization formed in 1967 with the merger of the European Economic Community, European Coal and Steel Community, and European Atomic Energy Community. , called Eurotra, which not only in a sense "retrieved" Descartes's original idea of developing what is called an "interlingua," or intermediate metalanguage A language used to describe another language. 1. metalanguage - [theorem proving] A language in which proofs are manipulated and tactics are programmed, as opposed to the logic itself (the "object language"). , but also provided richer analytical developments while establishing the basis for current computer-assisted translation Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process. techniques. The idea of developing an input-controlled translation method is very much associated with the Canadian system for bilingual weather reports, Meteo, which is still working today. This approach, which is also effectively working for many multinational companies in the production of their internal multilingual paperwork, memorandums and manuals, can be very well summarised by outlining the features of the project called KANT, for "Knowledge-based Accurate Natural-language Translation." KANT works by carefully controlling the input quality of the source text. Developed by Carnegie Mellon University Carnegie Mellon University, at Pittsburgh, Pa.; est. 1967 through the merger of the Carnegie Institute of Technology (founded 1900, opened 1905) and the Mellon Institute of Industrial Research (founded 1913). , the system monitors ambiguities in the original, returns to the "writer" those segments considered incorrect by the machine's internal grammar and only when the text is considered "understandable" by the machine, automated translation takes place (Nyberg and Mitamura 1992). However, it is a fully human intralingual in·tra·lin·gual adj. Relating to a single language. translation; in the sense of R. Jakobson (Jakobson 2000[1959]) what takes place first: the original is conventionally translated into another simplified "original." Automated translation becomes more of a by-product by·prod·uct or by-prod·uct n. 1. Something produced in the making of something else. 2. A secondary result; a side effect. by-product Noun 1. rather than a real translation. But with science-fiction high-brow automated translation projects based on the qualitative structural processing of sentences more or less at a halt, down to earth translation professionals started to benefit from the advantages of computers in a fashion that tends instead towards the quantitative and statistic usage of information, words and sentences. Computer-Assisted Translation (CAT) is "the broadest term used to describe an area of computer technology applications that automates or assists the act of translating text from one language to another" (SDL (Specification and Description Language) A modeling language used to describe real time systems. It is widely used to model state machines in the telecommunications, aviation, automotive and medical industries. International). The list of contributions of computer technologies that conform to Verb 1. conform to - satisfy a condition or restriction; "Does this paper meet the requirements for the degree?" fit, meet coordinate - be co-ordinated; "These activities coordinate well" this definition is not short: word processors, electronic dictionaries, terminological data banks, BBS (1) (Bulletin Board System) A computer system used as an information source and forum for a particular interest group. They were widely used in the U.S. and discussion groups, optical character recognition optical character recognition (OCR), method for the machine-reading of typeset, typed, and, in some cases, hand-printed letters, numbers, and symbols using optical sensing and a computer. , spell and grammar check, e-mail, WWW WWW or W3: see World Wide Web. (World Wide Web) The common host name for a Web server. The "www-dot" prefix on Web addresses is widely used to provide a recognizable way of identifying a Web site. documentation, desktop publishing desktop publishing, system for producing printed materials that consists of a personal computer or computer workstation, a high-resolution printer (usually a laser printer), and a computer program that allows the user to select from a variety of type fonts and sizes, , speech recognition, specific localization Customizing software and documentation for a particular country. It includes the translation of menus and messages into the native spoken language as well as changes in the user interface to accommodate different alphabets and culture. See internationalization and l10n. tools, translation memories, etc. From MT to TM I intend here to speculate about the pendulum-like movement that may articulate the relationship between translation memories and machine translation which goes beyond a simple swap of capital initials--from MT to TM--although it may very well have to do with the swapping of full, translated sentences. Translation memories (TM) may be defined as set of software applications devised to help translators in their activity by retrieving already translated terms or segments and recycling them, or by building up tentative translations from previously translated segments that share common traits. Those perfectly duplicable segments are called "perfect matches." Those tentative translations generated from analogous segments are called "fuzzy matches." Leaving aside the particular mechanics of different software, there are more than a dozen different Translation Memories in the market, ranging in price from the twenty-dollar amateurish "Alair II" to the highly professional, corporate and expensive 5,000 dollar "Alchemy Catalyst," or other suites like "Trados," a de facto standard Hardware or software that is widely used, but not endorsed by a standards organization. Contrast with de jure standard. de facto standard - A widespread consensus on a particular product or protocol which has not been ratified by any official standards body, such as ISO, , or its competitors "Deja Vu See DjVu. ," "SDLX" or "'Transit". Translation memories are optimal tools for highly repetitive texts that belong to a larger corpus of specialized texts to be translated, present a wide specialized terminology pool and belong to multilingual localization projects. They help to guarantee a high degree of terminological consistency, ease massive revision processes, speed up productivity in large localization projects and efficiently cumulate topic-related formulisms. However, it is easy to anticipate that they do not deal well with "stylistically rich" originals and that they impose a segment-restricted optics instead of general-text approaches. The so-called "perfect matches"" may induce disastrous context-related misinterpretations. Furthermore, there has been a traditional problem of low compatibility between different TM software and, in most cases they involve an expensive investment for translators that may need to face too diverse customer requirements. Let us focus on the last two problems: how can the information contained in a translation memory be shared between users of different software? Why can that be useful and when could it be desirable? Large localization projects are often undertaken by teams of translators who are required to use the same software. Their already translated segments are uploaded into a common repository that subsequently provides possible perfect of fuzzy matches not only to the one translator that uploaded them, but also to the other members of the translation team. Although the advantages of sharing one's work with other project partners, by means of increasing the size of the commonly-developed repository of paired sentences and, thus, the overall amount of translated text that can be recycled, are clear and appealing, there are a few serious drawbacks that the current actual practice of translation memory sharing involves. Translators may have to put up with non-agreed solutions, revisions and eventual changes do affect other translators' work, the search for consensus tends to slow down the process, and there is a higher workload for early starters, while more recycled segments are available for late participants. Finally, all translators must use the same software and, to some extent, the same versions. Thus, a professional may end up being excluded from a project because it may not be worthwhile for him or her to invest in that particular new software that may be needed exclusively for this new commission because of the specific standard a translation agency may be working with. Even thinking of it as an investment in the long run, by when he or she may need the same software for a new project, new incompatible versions of the program may have been already released. The issue also affects pedagogy. While integrated translation curricula usually include at least either a core or an elective course Noun 1. elective course - a course that the student can select from among alternatives elective course, course of instruction, course of study, class - education imparted in a series of lessons or meetings; "he took a course in basket weaving"; "flirting is on CAT, instructors often face the dilemma of either training students in the usage of one specific translation memory software, or introducing the widest possible variety of different programs at the expense of actually not being able to deepen in the mastery one of these powerful tools. The first option--focusing in one particular standard--risks overspecialization and it usually involves the budgetary distress of acquiring educational licences of use, which are not inexpensive and somehow difficult to justify as a structural expense for courses that, on the other hand, tend to present rather humble enrollment numbers. The second possibility--using free demo or lite versions with a limited performance--poses a danger of shallowness in the final degree of command attained for each tool, and is almost inevitable doomed to fall into the teaching of unexciting repetitive patterns of innumerable "steps"--procedurally diverging but conceptually identical--that are necessary to take for every different program in order to "just start" translating. For example, while every program allows manual text alignment as such at some stage, the mechanics provided by the diverse interfaces in each software makes of such elementary operations something rather disparate from case to case and thus, potentially disconcerting dis·con·cert tr.v. dis·con·cert·ed, dis·con·cert·ing, dis·con·certs 1. To upset the self-possession of; ruffle. See Synonyms at embarrass. 2. for the student [2]. By sheer dispersion of methods and repetition of similar outcomes, the process of learning becomes dull and unappealing. Among the above problems, some are strictly work-flow related--which will not be discussed here--and some others are good-old translation problems. Finally, for the problems related with software standards--those preventing users to cooperate in projects because of the particular program they are using, and those involving pedagogical ped·a·gog·ic also ped·a·gog·i·cal adj. 1. Of, relating to, or characteristic of pedagogy. 2. Characterized by pedantic formality: a haughty, pedagogic manner. decisions based more on budgetary than on truly didactical concerns--TMX provides a general solution that is becoming increasingly accepted and integrated by software makers. TMX TMX Translation Memory eXchange TMX Trimix (mixture of oxygen, helium and nitrogen used by divers) TMX Tandem Mirror Experiment TMx Time Management System TMX Transparent Matrix (switch; Hekimian) Translation Memory eXchange language (TMX) is a SGML/XML-based markup language--which involves a fairly easy and compatible Internet implementation. It is a standard established by LISA The first personal computer to include integrated software and use a graphical interface. Modeled after the Xerox Star and introduced in 1983 by Apple, it was ahead of its time, but never caught on due to its $10,000 price and slow speed. (Localization Industry Standards Association--www.lisa.org--) that is being increasingly integrated by translation memory makers within the export/import capabilities of their latest versions. There are several levels of compliance with the TMX norm, ranking 1 to 3, and being 1 the maximum possible level of compatibility. This is defined by the amount of metadata aside from purely textual information which the system is able to convert into TMX--i.e., not just aligned translations and their language-pair identification but also information on format, topic, domain etc., It becomes a powerful exchange tool when combined with TBX TBX Textbox TBX T-Box (gene) TBX Term Base eXchange TBX Team-Building Exercise TBX Tactical Ballistic Missile Experiment TBX The Best Mix TBX Thin Line Array TBX Tactical Ballistic Experimental TBX Telebox (TermBase eXchange Language), which is its counterpart by means of exchanging terminological database contents. Ultimately, by using TMX, translators would not have to use the same TM software in order to co-participate in the same localization project. Essentially, TMX works as a text-only based mark-up language into which aligned text--original and its translation(s)--is exported from a translation memory. No matter which TM software is being used, as long as it furnishes TMX import/export capabilities, the resulting tagged text-only file could be "read" by any other TM software that effectively participates of the same capabilities, no matter what particular internal codification The collection and systematic arrangement, usually by subject, of the laws of a state or country, or the statutory provisions, rules, and regulations that govern a specific area or subject of law or practice. system it uses to store the information. New Paradigms in File Sharing This is a--very much--general picture of how far things have evolved. How further can they go is still questionable but here follows a speculation on the potential of TMX when combined with currently existing possibilities and software already running on the Internet. What will be said from now on, however speculative, is not simple science-fiction and, should technical and human means be provided, an interesting field of theoretical research and practical application may unfold before us. The new paradigms in Internet file sharing must be considered here. In the late 1990s a new way of sharing information and files shook the music industry and pushed it to the fringe of bankruptcy in some cases. Programs like Napster, Gnutella, Kazaa, and others, allow users to share their files--including music--and to exchange them freely. Several national branches of large music companies were forced to close or to deeply restructure their business philosophies because of the economic breakdown inflicted by peer-to-peer Internet music sharing See peer-to-peer network. . As a result, a ruling of the Supreme Court in 2001 closed Napter's web page and all its activities. This involved one of the most echoed direct interventions of the administration on the actual practices that take place in the Internet. But it is not music or even major financial consequences what may be interesting in regard to translation memories: it is instead the fact that a network of independent users may share their files so easily which becomes of importance here. Basically, a program like Napster works as follows: a user sets a series of music files in his computer within an especial es·pe·cial adj. 1. Of special importance or significance; exceptional: an occasion of especial joy. 2. "share" folder. The program sends the list of filenames (song titles) to the server, which indexes it. Then the user sends a query about any song he or she may be interested in. Since many other users of the same software sent their shareable filenames to the server using Napster, the server locates the requested song title in his indexed directory and tells the first user in which other computer the song is stored. Then, both users' computers connect directly one with another and file transmission takes place on a one-to-one basis. The bulge of data (the comparably huge music file) is only transmitted in the final stage. All what happens before that is just listings of short textual units (song titles) going to and fro to and fro adv. Back and forth. to and fro Adverb, adj also to-and-fro 1. . The Gnutella system works in a slightly different way: it is more of a "word-of-mouth" system--if such a bodily metaphor can be used when talking about computers--which is consequently slower but requires no central server. One user launches a request, which is directed to only two computers, the "closest ones" in the network of Gnutella users. The odds are that those particular computers are not able to satisfy the request of that particular filename file·name also file name n. A name given to a computer file to distinguish it from other files, often containing an extension that classifies it by type. , so the next thing they do is to re-launch the same query to the next two computers. After twenty times, more than one million computers will have received the request. Once the requested file is located, a response stating where the host is travels back the chain and, finally the requester and the provider get in touch directly, without a "middleman mid·dle·man n. 1. A trader who buys from producers and sells to retailers or consumers. 2. An intermediary; a go-between. " this time, and the file is transmitted, again on a one-to-one basis. Peer-to-peer TMX The question arising from this seems both obvious and compelling: Can a peer-to-peer exchange system be developed for translation memory sharing? Using of TMX as a unifying standard would provide common grounds This article or section needs sources or references that appear in reliable, third-party publications. Alone, primary sources and sources affiliated with the subject of this article are not sufficient for an accurate encyclopedia article. for the exchange. Once a translation project is finished, translators usually return their final version to their client, while they usually hold the resulting translation memory as a by-product of their work. A program would convert the contents of those memories into TMX-tagged multilingual text and an "exchanger" would expose the memories to the World Wide Web by placing them it into a share area open to public access. The repetition of this action by several users would create a dense and sprawling network of interconnected computers, as it happens with Napster and Gnutella, which could potentially become the largest pool of aligned text (translations and originals) ever. Whenever a translation project starts, users would connect to the network and their "memory exchanger" would launch queries for similar segments to the bulge of participants. Slowly, in a way similar to that of the basic translation memories themselves, pre-translated replies would travel back to the requester, some in form of perfect matches, most of them in form of fuzzy matches. This would result in a pre-translated draft, whose production perhaps could require the computer to be left working overnight (depending on factors such as length, actual degree of matches found, level of requirements set by the user etc...). There are of course many questions arising from this, most of them far beyond the scope of this paper. And not a few immediate drawbacks. To start with, all previous drawbacks from conventional non peer-to-peer sharing See peer-to-peer network. would be still there, unsolved. But also, there would be higher risks of potentially wrong translations from anonymous partners: sharper criticism on received equivalences will be needed, making thus the revision process even more demanding. The obviously wider range of topic variety would add to confusion and metadata describing the thematic adscription of segments would be indispensable in order for the machine to "trust" one potential translation or the other. Bandwidth requirements Bandwidth requirements (communications) The channel bandwidths needed to transmit various types of signals, using various processing schemes. Every signal observed in practice can be expressed as a sum (discrete or over a frequency continuum) of sinusoidal would be unknown. There would also be legal and copyright issues on translated text--as such--versus equivalent segments whose ownership is determined differently depending on national legislations. Final remarks With technical, legal and translational problems ahead, the possibility to implement some peer-to-peer device for translation memory sharing appears, however, both as a challenging enterprise and as a promising area of research, aimed to the improvement of the actual performance of translation memories not only in small and large scale professional activities, but also for classroom implementation. Choosing a particular translation memory software will be conditioned by procedural preferences both in the professional and the educational fields. Professional translators will be thus able to participate in multi-personal translation projects regardless of the software used by the other project members. On the other hand, assured that no matter what program students are trained to use, data will be easily transferred to other systems, instructors would be free to choose one software or the other according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. their instructional criteria, without the need of submitting to the economic demands of expensive de facto standards. By using an encoding standard--like TMX--that bridges over different software singularities and eases budgetary stress in the need of acquiring expensive packages, in combination with the already demonstrated file exchange potential of peer-to-peer arrays--like Napster, Gnutella and others--, it could be possible not only to greatly reduce the current barriers in the exchange of aligned segments of translated text--i.e. in the exchange of already existing "human," accurate and functional translations--, but also to open the door to a potentially infinite common repository of humanly validated translations to be scanned globally and proposed locally by the translation software as "first drafts". Beyond its immediate technical feasibility, this line of work could help to further optimize the man-machine tandem which the practical application of computers to translation seems to be directed to. As one good old friend of mine said, in reference to the criticism of "intuition" being the indispensable ingredient for translation that all computers lack, "machines don't have intuition, but they have memories" (Fustegueres 2001, my translation). I would add, maybe by helping them sharing their memories, they can help us back in translating. References Abaitua, Joseba. "TMX format," 1998, http://paginaspersonales.deusto.es/abaitua/konzeptu/ta/tmx.htm Brain, Marshall. "How Gnutella Works," How stuff works, http://computer.howstuffworks com/file-sharing3.htm Couturat, Louis. "The Universal Language," The Logic of Leibniz in Accordance with unpublished documents in Rutherford, Donald and Monroe, Timothy (translators), 1997 [1901], http://philosophy2.ucsd.edu/~rutherford/Leibniz/ch3.htm Davis, Paul C. Stone Soup
Freigang, Karl Heinz. "Automation of Translation: Past, Presence, and Future" in Revista Tradumatica No. 0 (2001), http://www.fti.uab.es/tradumatica/ revista/num0/ sumari/sumari.htm Fustegueres, Silvia. "Qui te por de les memories de traduccio?" Revista Tradumatica, No. 0 (2001). Gow, Francie. Metrics for evaluating Translation Memory Software. Unpublished Thesis. University of Ottawa Hutchins, John. "The precursos and the pioneers," Machine Translation, past, present and future. New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of : Halsted, 1986. "ALPAC: the (in)famous report," MT News International Vol. 14, June 1996, pp. 9-12. Reprinted in: Readings in machine translation, ed. Sergei Nirenburg, Harold Somers, and Yorick Wilks Yorick Wilks (born 1939) is Professor of Artificial Intelligence at the University of Sheffield, and a Senior Research Fellow at the Oxford Internet Institute. __FORCETOC__ Biography (Cambridge, Mass.: The MIT MIT - Massachusetts Institute of Technology Press, 2003), pp. 131-135. Also available at: http://ourworld.compuserve.com/homepages/WJHutchins/Alpac.htm Jakobson, Roman Jakobson, Roman (rəmän` yäk`ôbsən), 1896–1982, Russian-American linguist and literary critic, b. Moscow. He coined the term structural linguistics and stressed that the aim of historical linguistics is the study not of . "On Linguistic Aspects of Translation." In Baker, M., and Venuti, L.k Eds. The Translation Studies Reader, 113-118. London and New York: Routledge, 2000. Nyberg, Eric; Mitamura, Teruko. "The Kant system: fast, accurate, high-quality, translation in practical domains," Proceeds of Coling 92, Nantes, 1992. Sanchez-Gijon, Pilar Pilar strong-minded female leader of a group of guerrillas in the Spanish Civil War. [Am. Lit.: Hemingway For Whom the Bell Tolls] See : Female Power Pilar . "Cataleg de sistemes de memories de traduccio," Revista Tradumatica, No. 0 (2001). SDL International. An Introduction to Computer Aided-Translation, http://www.sdl.com/products and http://tc.eserver.org/18490.html Several authors. "CAT fight," Proz, The Translators Workplace, http://www.proz.com/?sp=cat/compare Zerfass, Angelika. "Evaluating Translation Memory Systems," First International Workshop on Language Resources for Translation Work and Research, Gran Canaria Gran Canaria is the third largest island of the Canary Islands, an archipelago located in the Atlantic Ocean 210 km from the northwest coast of Africa and belonging to Spain. It is located southeast of Tenerife and west of Fuerteventura. , 2002. End notes [1] See Hutchins (Hutchins 1996) for an enlightening description of the most frequent misinterpretations and misleading circumstances related to the ALPAC report. [2] Aligned texts are the main asset of a translation memory, and many resources are usually devoted by companies and institutions to align texts that had been translated before the implementation of TM software in order to enhance the production of subsequent translation activity Jose Davila-Montes, University of Texas at Brownsville Davila-Montes is Assistant Professor of Translation and Interpreting in the Department of Modern Languages. |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion