Case frame generation for Sanskrit language--a knowledge representation tool.Introduction
Why Sanskrit has been considered for the knowledge representation? Answer to this question is straightforward, firstly it is an order free language and it causes no ambiguity in the meaning of the sentence if the order of words are changed. Secondly, case endings are very efficient for identifying the nominals:-as case ending analysis of each word reveals properties as well as meaning and role of the word in the sentence. Association of suffix with the root word follows many rules, making semantic analysis much specific and concrete. Overall architecture of the system is designed layer wise, where first level is the creation of exhaustive database for all categories of suffix ie for noun, pronoun, adjective and verbs. Each of these suffix are stored with unique identification number, where each digit identifies the attribute of that word like tens, number, classification etc. Second layer performs the work of attaching POS (Part of speech) tagger to each word and third layer performs the analysis of identification number(ID). Using these ID's, vibhakti karka mapping and rules, case frames for the sentence are created. Most of the word of Sanskrit language has suffix called vibhakti which maps to karka roles (thematic roles).By identifying the vibhakti, cases can be determined.
For developing the entire system following components are required:-
Databases: Verb suffix--Dhatupad, noun suffix-Shabdroop, pronoun, adjectives, each having the unique is number associated with them.
[FIGURE 1 OMITTED]
Algorithm : Algorithm is designed to perform two task:-(a)To partially parse the sentence and give POS tagger as well as to extract the ID of each successful mapping. (b)To map the endings with the vibhakti-karka table and apply rules to generate the case structure
Database for Suffixes
While processing the sentence in sanskrit each word of the sentence is mapped into the database for identifying the word as noun, pronoun verb and adjective. For each type of the word a structured numbering scheme has been introduced which are as follows:
A. Database scheme for verb suffix:
A 5 digit number scheme is used for the storing the verb suffix. In sanskrit grammar, verbs are classified into ten groups called gan, and it occupies the MSD in the number scheme as shown in fig2. When a root word joins with the suffix (pratya), some changes takes place at the junction. With respect to these changes verbs are classified into gan and it will take values in the range 0-9. Second position from left is occupied by pad and there are three pad-Atmnepad, parsmaipad and ubhaypad hence values will vary from 1 to 3. Those verbs whose outcome is for another person, they fall under Parasmaipad and those verb whose outcome is for oneself come under Atmnepad. Words whose outcome is for both, otherperson and oneself, they come under ubhaypad.
Figure 2: five digit number scheme for verb suffix. Gan pad Tense and mood Person Number
In sanskrit : Brahmin oodan pachate--Atmnepad
In English: [Brahmin rice cooks] Brahmin cooks rice.
In sanskrit : mAtA oodan pachati--Parasmaipad
In English:[Mother rice cooks].Mother cooks rice.
Here, in Example 2.1, the act of cooking is being done for oneself which is indicated by the verb form--pachate belonging to Atmnepad whereas in example 2.2 the act of cooking done by mother is for other person in house also, indicated by the verb pachati belonging to Parasmaipad.
Next position in number scheme is taken by the various tense. These tenses indicate the time and mood of happening of an action. There are ten different tenses (lakAr) in sanskrit grammar which are as follows: latlakAr-Present tense, lotlakAr-Imperative mood, laDn-Imperfect,vidhilaDn-Potential mood,lRt-First Future tense, lut-Periphrastic future, AShIliRDn-Benedictive, lRDn-Conditional(second) future, lit-Perfect, luDn-Aorist. These will be in the range (0-9).Second last position from right represents the Person (First,Second and third as uttam,madhyam and pratham puruSh) and last digit represents number (singular,dual and pluralas ekvachan, dwivachan and bahuvachan).Range of values taken by person and number positions is 1-3.
Example 3: suffix ti of bhavati is represented as 01011 which identifies it as
0-latlakAr Present tense
1-pratham puruSh Third Person
B. Database scheme for noun suffix
A four digit number scheme is planned for the nouns and they are called shabdroop in sanskrit. All the noun are classified with the type of ending word like a, A, i, I, RRi, ending (which takes the value of variable x in fig 3) and similar ending verb are declined in same fashion. For each type of ending word may belong to one or more of class of masculine, feminine or neuter gender. Each word either noun, pronoun and adjective have seven cases and vibhakti and each vibhakti have three numbers. So the number scheme followed is given in fig 3.
Figure 3: Four digit number scheme for noun suffix. x ending gender Vibhakti Number
Each vibhakti is associated with a case, so identifying the vibhakti will lead to identification of cases therby helping in development of case structure. Meaning and mapping of vibhkti to cases(karka) is explained as follows
This table is used to develop the case frames identifying the relation of word in sentence. If a word suffix maps it to prathma vibhakti then that word plays the role of agent with respect to action (verb) in the sentence.
C. Database scheme for pronoun
As number of pronouns are few in number(nearly 31), instead of maintaining the suffix database, complete pronoun words with their inflection are stored in the system.Hence the number scheme for pronouns is as follows
Figure 4: four digit number scheme for pronoun. Pnumber gender Vibhakti Number
Where pnumber gives which particular pronoun word is referred. For example, 1 is given to tad meaning 'that' in english and its masculine, feminine and neuter gender are identified by next position in number scheme. All the nouns given in the book by Dr dwivedi. has been considered with nearly 400 entries in database.
Algorithm for Generatiing Case Frames
Knowledge representation tools are helpful in identifying the semantic of the sentence. Semantic net, frames, scripts are used to represent the meaning of sentenec in an input language. Out of these, case analysis done by conceptual dpendency by maintaing structures with respect to rules.This case grammar is used to detect the agent, object and recepient in the sentence which is obtained by analysing the words with repesct to grammar of that language. In sanskrit language, Panini, an ancient grammarian, has given six cases which can be identified by the suffix attached with the root word . It was first stated by R.Briggs that such identification is possible and can be acheieved easily in sanskrit language. Here we have exploited this feature to develop case frames. In knowledge representation, frames are structures used to represent slots and their values which later is used for inferencing with the system. Here we have used frames to store the cases with respect to each sentence. Later these case frames can be used for interelating semantic accross the sentences. Algorithm for case frame generation is as follows:
Step 1: Input sentence S.
Step 2: For each word Wi [euro] S
(a) Match Wi in pronoun databse
If match found
(i) Set Wi.cat=pronoun
(ii) get the pronoun ID number from database of Wi
(iii) Extract the 2nd LSB from that number
(iv) Identify the vibhakti using that number
(v) Set the case value in the frame
Display match for pronoun not found
(b) M the word in the suffixes of noun database
If match found
(i) Set Wi.cat=noun
(ii) get the noun ID number from database
(iii) Extract the 2nd LSB to identify the case
(iv) Identify the vibhakti using that number
(v) Set the case value in the frame
Display Wi is not noun
(c) Match Wi in the suffixes of verb database if found
set Wi.cat= verb
display Wi is not verb
Step 3: return the case frame.
Here the <case> will take the value like agent, object, of, from, for, location, instument as given by Panini. This system works well with simple, indirect sentence. Few examples for case frame generation for each type of case is given below:
Example 3.1: Demonstrating nominative case (karta kAraka -agent and prathma vibhakti)-
Sanskrit: ramaH gachchati
English: Ram goes.
Example 3.2: Demonstarting Accusative Case (karma kAraka-object and dviteeyA vibhakti)
Sanskrit: geetA pathshAlAm gachchati
English: Geeta goes to school.
Example 3.3: Demostrating instrumental case (karan kAraka-instrument and tritIyA vibhakti)
Sanskrit: tvam kamalen likhasi
English: You write with pen.
Example 3.4: Demonstrating dative case (sampradAn kAraka-for and chaturthi vibhakti)
Sanskrit: sH rAmAya kathati
English: He tells to Ram.
Example 3.5: Demonstrating Ablative case (apAdAn kAraka -from and panchami vibhakti)
Sanskrit: VrikshAt phalam patati.
English: Fruits fall from tree.
Example 3.6: Demostrating genitive case (samandha kAraka -of and Shashti vibhakti)
Sanskrit: idam sudipasya pustakam asti.
English: This is Sudeep's book.
Exampple 3.7: Demonstarting locative case (adhikaran kArak -location and saptami vibhakti)
Sanskrit: tagAde kamalAni vikasanti
English : Loutses bloom in the pond.
action : vikasanti
The algorithm is able to identify all types of cases in simple sentence. Number of sentences were tested and it was found that for every 10 sentence 70 % samples showed correct result with 100% accuracy and 20% samples showed 60-80% accuracy in identification of cases. Howerver few problems encountered were as follows:
A. Multiple matches in different databases
A word Wi in sentence S is matched with pronoun database and then with suffix database of noun and verb. There are cases where Wi may map to both the groups, example chAtrAH(Nk=:) here
AH [euro] NDB and H [euro] VDB, so following checks is placed
If suffix si [euro] VDB and si [euro] NDB then
If root [euro] verblist then Wi is verb
Else Wi is noun
Here splitter function is used to split the suffix from the word wi and get the root word. As number of verbs in the language are limited, it is easy to maintain the verblist of all the verb roots. Generally most of the verb occur as last word, so even if it maps with NDB final 'action' attribute will be asociated with Wi. But this may not work for all cases as sanskrit is order free language, verbs may appear before noun also, hence splitter function resloves the ambiguity.
B. Multiple matches in noun group
Second problem was associated within the noun group, a word wi may match with multiple suffix in the NDB. For example - chAtrAH vidyAlaye paThanti, Here e suffix of vidyAlaye maps it to locative case, which is desired, but aye suffix will map it to ablative case. As algorithm retains longest matched suffix in NDB, aye is considered. Here the longest suffix will not always find the best solution, so list of all the multiple matches is maintained and the the word wi is splitted with respect to each suffix, for example splitting of VidyAlaye w.r.t. e will give following output
Vidyalay + e root word vidylay [euro] dictionary word, whereas splitting vidyalaye w.r.t aye will give vidyal [euro] dictionary word.
C. Identification of adjectives
As declension for adjective is same as that of noun and in a sentence, adjective follow the same suffix as the noun. Hence for dealing with the adjective, rule implemented in the algorithm is as follows:
Let [w.sub.i] and [w.sub.i+1] be two consecutive words of the sentence S, having same suffix and hence same attributes, then one of them will be adjective and other will be noun. If root of any one of them will map to adjective list then it is identified as adjective describing the noun.
Sanskrit: Kamalam nilam asti.
English: Lotus is blue.
Here am sufix in kamalam and nilam indicate that one of them is adjective, hence after splitting kamal+am and nil+am, nil [euro]adjective list, hence nilam is adjective.
Rule in form of predicate logic is stated as follows:
--Rule1: VxVy noun([w.sub.i])  noun([w.sub.i+1])  noun([w.sub.i+1])same_suff([w.sub.i], [w.sub.i+1])belong(root([w.sub.i]),adj_list)][right arrow]adj([w.sub.i])
Rule2:VxVy noun([w.sub.i])  noun([w.sub.i+1])same_suff([w.sub.i], [w.sub.i+1])belong(root([w.sub.j]),adj_list)[right arrow]adj([w.sub.j])
D. Case identification with active and passive voice
In sanskrit the rules of association of verb and agent or object is definite. In, kritvAchya, active voice, declension of verb are according to the katra, agent. Here subject is in nominative case and object is in accusative case and verb is according to the subject. Whereas in karmavAchya, passive voive, object is in nominative case and subject is in instrumental case and verb is used according to the object's person, number and gender. Hence implementing the rule for the same we have to match the last two digit of the noun and verb.
Rule 3: VxVy noun([w.sub.i])  verb([w.sub.j])  same_no(last_two_digit([w.sub.i]), last_two_digit([w.sub.j]))  equal(second_last(digit([w.sub.i])),1) [right arrow] sentenece(S,active)  agent([w.sub.i])
Rule 4: 4:VxVy noun([w.sub.i])  verb([w.sub.j])  same_no (last_two_digit ([w.sub.i]), last_two_digit([w.sub.j]))  equal(second_last(digit([w.sub.i])),2) [right arrow] sentenece(S,passive)  object([w.sub.i])
Complete Flow of the System
Complete flow of the system is explained in the fig 2. The order of check is pronoun, noun and then verb. Here N and V are used for noun and verb respectively,Wi is a word of sentence S, POS is Part of speech for the word Wi. The code for the system is implemented in MATLAB and database is implemented in MS-Acess 2007. The system shows good efficiency for simple direct sentences. This paper shows the significance of case based semantic analysis using case endings or suffix of words in Sanskrit. Hence this language has great potential in the field of artficial intelligence wherever extracting meaning is the key issue.
[FIGURE 2 OMITTED]
This system has emphasised the identification of noun, pronoun, verb, adjectives, but still there are more categories in sanskrit to be included in the system such as prtyaya, avyaya which is currently not the part of the system. Lot of work is going on the development of sanskrit parser for machine translation and generalising the panini grammar framework in other indian languages too..This is a partial parser with extraction of semantics as key issue and it has been found that sanskrit uses rules for POS as well relations of words which becomes efficient for implementation. Sanskrit can be used in any AI based application where semantics is required and hence it can be seen as universal language for internal processing.
 Akshar Bharti, Vineet Chaitanya and Rajeev Singhal "Natural Language Processing : A panian persentive" IIT, Knaput, PHI 1995
 G'erard Huet "Towards Computational Processing of Sanskrit "INRIA
 Dr Kapildev Dwivedi "Rachan Anuv Adakumaudi" vishavavidy Alaya prak Ashan, Varanasi
 Nills J Nilson "Principles of Artificial Intelligence"
 Akshar Bharati, Amba Kulkarni, Department of Sanskrit Studies University of Hyderabad "Sanskrit and Computational Linguistics"
 Subhash C Kak "The panian approach to natural language processing "International journal of approximate reasoning vol 1 issue 1
 R. Briggs "Knowledge representation in Sanskrit and Artificial Intelligence "AAI, AI magzine
 Ms Smita Selot, Dr Jyoti Singh "Infromation retrieval in an Intelligent system using panani grammar" at Technovision -2207-a National Conference at SSCET, Bhilai
 Ms Smita Selot Dr Jyoti Singh "Knowledge representation and Information Retrieval in Panini Grammar Framework "International Conference ICSCIS-07 2007 at Jabalpur, MP
Mrs Smita Selot (1) AS Zadgaonkar (2) and Neeta Tripathi (3)
(1) Reader Department of CA, SSCET, Bhilai E-mail: firstname.lastname@example.org
(2) Vice chancellor Info-cell, CV Raman University Bilaspur
(3) Principal, GD Rungta College, Bhilai E-mail: email@example.com
Table 1: noun suffix with case mapping. No Case(in Karka Vibhakt in meaning English) (in sanskrit) sanskrit 1 Nominative kartA Prathma Agent or subject 2 Accusative Karma dviteeyA Object 3 Instrumental karaNa triteeyA With, by 4 Dative sampradAn Chaturthi For 5 Ablative apAdAn Panchami From 6 Genitive Sambandh Shashti Of 7 Locative Adhikaran Saptami In, on, into 8 Vocative sambodhan sambodhan Oh!