Common Vocabulary in Urdu and Turkish Language: A Case of Historical Onomasiology.

Byline: Maria Isabel Maldonado Garcia and Mustafa Yapici

The current study must be framed within the discipline of applied and historical linguistics and historical onomasiology. One hundred and fifty apparent cognates in Turkish and Urdu languages will be analyzed in order to confirm their origin as well as lexical similarity through their distance. This research work has the purpose of discovering whether the similar terms analyzed in this study are cognates or rather loanwords that Turkish and Urdu have borrowed from different languages. In addition the level of similarity through etymological lexical and semantic comparative analysis will be revealed with the same purpose.

Turkish Language

Turkish is a member of the Altaic family of languages. According to Ethnologue its classification is Altaic Turkic Southern Turkish and it is spoken by approximately 50 million people not only in Turkey where it is the national language but also in Greece and the former Yugoslav Republic of Macedonia.1

During the Ottoman Empire which lasted more than 600 years the Turkish language flourished as the language of the administration receiving loanwords from Arabic and Persian. The language was then named Ottoman Turkish actually a formal version of the everyday spoken Turkish of the time. Some of Ataturk's reforms included a new Turkish alphabet. However the aim of the Turkish language reform was to eliminate the Arabic and Persiangrammatical features and the many thousands of Arabic and Persian borrowings that had long been a part of the language." (1999 2)3. The borrowing of Persian words was without any doubt of a considerable importance. Arabic on the other hand had a larger infiltration due to two main factors; a religious factor since it is the language of the Quran and the other because Persian language had itself borrowed innumerable terms from Arabic and when an Arabic term was borrowed it brought with it the complete family of that very term (1999 6)4.

The new Turkish as proposed by AtatA1/4rk aimed to eliminate the Persian and Arabic borrowings from the Turkish language. In this manner Suleyman Pasha was the first to publish a grammar of the new Turkish language which was titled Ilm-I Sarf-i TA1/4rki in 1874. Turkish language also stopped being called Ottoman by virtue of the Constitution of 1876 and retained its official status as Turkish language. The first dictionary of Turkish Language Kamus-i-TA1/4rki (1900) of two volumes was written by Semseddin Sami which was an attempt to rid Turkish language of its Arabic and Persian elements.

Turkish had utilized a Perso-Arabic script for thousands of years. Semseddin Sami with Abdul Bey his brother created an alphabet composed by 36 letters from Latin and Greek. After this move an Alphabet Commission was created which took several steps for ridding all remains of Persian and Arabic from the Turkish language by changing the pronunciation of the Arabic sounds. Another step taken by AtatA1/4rkwas to propose Turkish substitutes for the Arabic and Persian terms on the newspapers and let the public comment on them. A Language Society was created in 1931 as well following the wishes of AtatA1/4rk and its purpose was not only ridding the Turkish language from the Persian and Arabic influence it had enjoyed until then. The aftermath of this linguistic reform was a modernization effort.

The aim of this effort included the thought that at some point in the future when the new generations would learn the new script they would be unable to read the old script and the works in the new script would be controlled by the government. At the same time since the old script would not be read the old Islamic works in Arabic and Persian would progressively lose their influence achieving the modernization AtatA1/4rk so much craved. The Language Society was to exist even after Ataturk's government and for this reason it was made a non-governmental entity.

As the language reform achieved a substantial amount of objectives in a short span of time it cannot be said that it was not a successful attempt. However a large number of terms from Arabic and Persian still remain an essential part of the Turkish language through a considerable part of the rich Ottoman linguistic heritage has forever been lost. It is perhaps due to this contradiction that Geoffrey termed the reform a catastrophic success (1999).

Urdu Language

Urdu is a South Asian language from the Indo-European family and within the central zone's Indo-Iranian branch. Ethnologue classifies it as Indo-European Indo-Iranian Indo- Aryan Central zone Western Hindi Hindustani5. It has 193 238 868 speakers in Pakistan where it is the official language along with English although other languages and dialects are spoken in the country such as Punjabi Sindhi Saraiki Pashto Balochi etc.6 Because of the neighborhood relationship in these geographies Turkish has lots of similar words with Urdu Chinese Persian Arabic and other languages.

In fact the evolution of Urdu language started with contact with Persian and Arabic due to the invasions on India by Persian and Turkic armies in the 11th century and afterwards. It continued from the Delhi Sultanate (1206 to 1526) and later on during the Mughal Empire (1526 to 1858). Already in 1908 Dowson said Urdu abounds with Arabic derivatives which have brought with them the grammatical powers of their original language" (1908 18)7.

Interestingly enough the word Urdu derives from the Turkish ordu which means army. Urdu language is written in utilizing the Persian alphabet to Nastaliq style. Both languages have received Persian loanwords due to the closeness among the countries. Iran shares borders with Pakistan as well as Turkey which facilitated the expansion of the Farsi terminology.

The expansion of Islam and Saudi Arabia's proximity to both countries facilitated the spread of Arabic as the language of Quran and a new form of life which included innumerable words for the new realities which could only be named using the original terms in Arabic.

Like in Turkish there have been purists attempts which tried to rid Urdu of Persian and Sanskrit loanwords. According to Maldonado (2013)9 the new vocabulary of Urdu is derived mainly from Persian Arabic and Sanskrit although also from Prakrit with a minimum influence from other languages.

Cognates or loanwords

The main purpose of this study is to figure out whether the sets which will be analyzed in this study are cognates or loanwords. For Whitley a given word W from Language X and a word W from Language Y are termed cognates if and only if they have been inherited from the same ancestor language of X and Y" and if their similarity is a coincidence they are not considered true cognates (2002 305)10.Holmes and Ramos define cognates as items of vocabulary in two languages which have the same roots and can be recognized as such" (1993 88)11.

On the other hand The best-known generalization about lexical borrowing is the constraint that core vocabulary is very rarely (or never) borrowed". (2009 36)12 Of the same opinion are Hock and Joseph (1996)13 as well as Thomason (2001)14. Loanwords can be recognized if their form and meanings have considerable similarities with the form and meanings of a word in another language from which it could have been received due to a possible historical event and scenario having in consideration the exclusion the chance for descent from a common ancestor and if there is a source word to which it can be traced orlinked. (Haspelmath and Tadmor 2009)15. Thurgood (1999)16 adds to the equation the similarity of the phonemes through which they can be recognized. Language contact does not require complete bilingualism or multilingualism either rather the utilization or presence of two languages in one location at the same time.

For Maldonado (2013)17 one of the relevant aspects of semantic relations or lexical solidarities apart from etymology is synonymy. Synonymy will play a vital role in the identification process. In this case we will have to reveal whether the sets have a common origin or not and if they do whether they actually belong to the language itself or are in fact borrowings from the same language. At this point we will proceed to the analysis of the Turkish-Urdu sets of term.

1. Methodology

The sample of our investigation consists of 150 sets of Turkish and Urdu terms. The degree of similarity with reference to different aspects of linguistics will be assessed. These aspects include:

1. Identification of the Turkish-Urdu Sets.

2. Etymological Aspects. The etymology of each word in Turkish language will be extracted and compared with its counterpart in Urdu language in order to contrast the origins of both terms and verify whether in fact they are loanwords received from the same language directly they have been received through another language or rather the terms are cognates.

3. Interlingual Synonymy Related Aspects.

3.1 Semantic analysis. Definitions will be compared in order to find out if the loanwords are synonymic or not.

3.2 Phonetic analysis. The phonetics of the word pairs will be compared under the following parameters:

a. There is no difference in phonetics.

b. The difference is one or two sounds usually at the end.

c. The difference is found in two or more different sounds sometimes at the initial position.

d. The difference is more than half of sounds.

e. The difference is based on that most of the sounds are different and have an uneven layout.

f. The Levenshtein distance will also be used as a factor to determine the level of phonetic similarity among the loanwords.

2. Results

1. Identification of the Turkish-Urdu Sets.

The identification of the 150 sets of terms was performed during our interaction with Pakistani individuals as well as students of Turkish language. One hundred and fifty pairs of terms were identified and selected due to the fact thay they presented similarity in terms of semantics and phonetics in both languages.

The list of terms is as follows:

2. Etymological Aspects:

The origin of the termswill be a determining factor in revealing whether the terms are cognates or loanwords. For this purpose an etymological analysis18 has been performed. Theresults are as follows:

3. Interlingual Synonymy Related Aspects

2.1 Semantic analysis: At this point the synonymy of the terms will be verified:

a. Phonetic Analysis.

The phonetic analysis will be performed through the Levenshtein algorithm which indicates the difference between two strings. The following diagram illustrates the algorithm:

Table 10 Levenshtein Algorithm19 Equation

4. Analysis of the Results

1. Etymologycal Aspects.

The etymological analysis of the 150 terms yielded the following results:

Table 17 Complete Etymology

###Sets With Etym. In Arabic###106

###Sets With Etym. In Persian###37

###Sets With Etym. Turkish###3

###Sets With Diff. Etym.###4

Of the 150 sets of terms 106sets present etymology in Arabic language 37 sets present etymology in Persian language 3 sets present etymology in Turkish language and 1 set presents etymology in Arabic language and arrived in Turkish and Urdu through Persian language (as discussed earliar Persian language itself had received numerous loanwords from Arabic language) 4 sets present differing etymologies.

2. Interlingual Synonymy Related Aspects.

2.1 Semantic analysis. Definitions were compared in order to find out if the sets are synonymic or not. The analysis returned synonymy in the 150 sets.

2.2 Phonetic analysis.The forms in which Turkish and Urdu are written without any doubt are completely different. While Turkish utilizes Latin script Urdu utilizes Arabic- Persian script Nastaliq style. For this reason the pairs do not share any orthographic characteristics. The form was then analyzed through the phonetics of both languages. The string comparison was performed through the Levenshtein algorithm in order to find out how different the sets are from each other.

Table 18 Complete Levenshtein Distance or String Similarity Analysis

###Sets With L.D. 0###76

###Sets With L.D. 1###46

###Sets With L.D. 2###24

###Sets With L.D. 3###3

###Sets With L.D. 4###1


Large percentages of vocabulary of Urdu and Turkish languages derive from Arabic and Persian. A sample of shared vocabulary was extracted in order to reveal whether the similar sets were loanwords or rather cognates. The etymological analysis revealed common etymology of Arabic in 106 of the sets a common etymology in Persian in 37 of the sets. The terms in both languages were borrowed by Turkish and Urdu due tothe geopolitical situation of the time. Three of the analyzed sets came directly into Urdu from Turkish. 1 set with etymology in Arabic was taken by Turkish and Urdu through Persian language. Four sets present differing etymologies in both languages.

The study has identified the terms as loanwords or borrowings due mainly to the fact that Turkish and Urdu do not belong to the same family of languages. This fact rules out the possibility of inheritance from an ancestor language from which Turkish and Urdu could have been derived. The second factor when considering the identification of language borrowings or loanwords is the language contact situation present in both countries in the past.

In terms of semantics all the sets share common meanings. This implies that the terms did not adopt other meanings when they were borrowed and are in fact synonymic.

The string phonetic analysis revealed 76 sets with identical pronunciation in Turkish and Urdu 46 sets present only one difference in the string analysis that is a minor phonetic difference. 24 sets present 2 differences in sound in the string analysis 3 sets present 3 differences and 1 set 4 differences. This means 81% of the sets present no difference or a slight difference indicating none or slight evolution in both languages after the term was borrowed.

The fact that the terms have survived in the recipient languages after so many years and planned linguistic attempts to rid the languages of these borrowings indicate that the terms have become an integral part of the recipient languages and are still utilized to this day. The identified terms will assist Turkish students of Urdu and Pakistani students of Turkish language with vocabulary acquisition.

Notes and References

1 Lewis M. Paul (2009) Ethnologue: Languages of the World. Seventeenth edition. Dallas Tex.: SIL International.

2 Table created with information from: Dryer Matthew S. and Haspelmath Martin (eds.) 2013. The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at Accessed on 2014-03-14

3 Lewis Geoffrey (1999)The Turkish Language Reform. A Catastrophic Success. New York: Oxford University Press.

4 Lewis Geoffrey (1999)The Turkish Language Reform. A Catastrophic Success. New York: Oxford University Press.

5 Lewis M. Paul (2009)Ethnologue: Languages of the World. Seventeenth edition. Dallas Tex.: SIL International.

6 Cia World Facts the-worldfactbook/geos/pk.html

7 Dowson J. (1908): A Grammar of the Urdu or Hindustani Language. London Kegan Paul Trench Trubner and Co.

8 Maldonado Garcia Maria Isabel (2013) Comparacion del LACopyrightxico Basico del EspaAol el InglACopyrights y el Urdu. Doctoral Thesis. Madrid: UNED.

9 Maldonado Garcia Maria Isabel (2013) Comparacion del LACopyrightxico Basico del EspaAol el InglACopyrights y el Urdu. Doctoral Thesis. Madrid: UNED.

10 Whitley M.S. (2002)Spanish/English Contrasts: A Course In Spanish Linguistics. Second Edition. Washington Dc. Georgetown University Press.

11 Holmes J. and Ramos R. (1993) False Friends and Reckless Guessers: Observing Cognate Recognition Strategies. T. Huckin M. Haynes and J. Coady. Second Language Reading and Vocabulary Learning. Norwood NJ: Ablex.

12 Haspelmath Martin and Tadmor Uri (2009)Loanwords in the World's Languages: A Comparative Handbook. Berlin. Germany: De Gruyter Mouton.

13 Hock Hans Henrich and Joseph Brian D. (1996)Language history language change and language relationship. Berlin: Mouton de Gruyter.

14 Thomason Sarah Grey(2001)Language Contact. Washington D.C.: Georgetown University Press.

15 Haspelmath Martin and Tadmor Uri (2009)Loanwords in the World's Languages: A Comparative Handbook. Berlin. Germany: De Gruyter Mouton.

16 Thurgood Graham (1999)From ancient Cham to modern dialects: two thousand years of language contact and change.Honolulu: University of Hawai's press.

17 Maldonado Garcia Maria I. (2013)Estudio Etimologico de Cuatro Pares de Cognados en EspaAol y Urdu. Revista Iberoamericana de LingA1/4istica. Vol.8. Valladolid.

18 Etymology of Turkish terms Nisanyan Sevan (1995)TA1/4rkce Etimolojik S.

Etymology of Urdu terms urdudictionary/Urdu Dictionary (2011)Urdu Encyclopedia. Islamabad Ministry of Science and Technology.

Etymology of Persian terms NouraiAli (2013)An Etymological Dictionary Of Persian English and other Indo-European Languages. Exlibris Corporation. USA.

19 LEVENSHTEIN V. I.(1966) Binary codes capable of correcting deletions insertions and reversals. Akademii Nauk. Moscow Soviet Physics Doklady.
