|San José State University|
& Tornado Alley
Strictly Computational Methods
Linguistically the proper way to translate a sentence in language A into a sentence in language B would consist of first parsing the A language sentence; what is called in schools as diagramming the sentence. The parsed sentence is then translated in to a parsed sentence in language B. The conversion of the parsed version into a verbal or written sentence may involve a rearrangement of word order. For example, in English an adjective comes before the noun it modifies, but in Spanish it comes after.
There is a role for straight computation. The computer can look for idiomatic expressions that defy normal translation.
The strictly computational approach to translation goes far beyond this. One ploy would be to take a large body of text involving the same statements expressed in the two languages of interest. There are a limited number of different structures. For example, a Japanese group did a study of the titles of novels and concluded that there are only 18 basic structures.
Since the advent of the internet there are billions of words of text in computer readable form. In the 1970's that was not the case and it was especially rare to find such material with statements made in two or more languages. But researchers were diligent and resourceful.
IBM had one team pursuing computer translation from a solidly linguistic approach, but the company also had another team doing strictly statistical analyses of text. That team was computing the probabilities of different words occurring as a function of the sequence of words preceding it. They first analyzed the text of children books. This gave them a body of text with a severely limited vocabulary. Then they got 60 million words of machine-readable text from the American Printing House for the Blind.
In the late 1980's the researchers got a great bonanza in the form of 100 million words of the text of the Canadian Parliament. They had the text of millions of the same sentences in English and French. Google obtained 200 billion words in machine-readable form from the United Nations. This had the same statements in a multitude of languages. This brought Google actively into the field of computer translation. It had had an interest in the topic because of its role of searching and supplying information from the internet.
(To be continued.)
For the earlier history of computer translation see Computer Translation.
HOME PAGE OF Thayer Watkins