314 | IETE TECHNICAL REVIEW, Vol 23, No 5, 2006 |
Kazhugu, a multilingual Internet search engine, claimed to be India’s first in regional languages, has been developed by Anna University-KB Chandrashekhar (AU-KBC) [2] esearch Centre, Chennai. B. AgroExplorer Architecture Figure 1 shows the block diagram of AgroExplorer
[1]. The Focused Crawler crawls the web and collects
pages relevant to the Agricultural domain and creates
a HTML corpus. This corpus is then passed to a
HTML Parser, which separates the text and the design
part of the pages. The design part of the HTML pages
is saved for later use. The raw text in the form of
sentences is then passed on to the Enconverter, which
converts it into UNL form. The UNL corpus thus
created is then preprocessed and passed to the Indexer
module, which creates an inverted index on the UNL
expressions. This is an offline process, which takes
place in the background. Once the user enters a query,
we first get the UNL expression of the query through The search module returns documents that are in the UNL format. Then depending on the language Fig 1 Block diagram for AgroExplorer
|
selected by the user, the UNL documents are passed to the corresponding Deconverter, which converts the documents into the target language. This document is then merged with the HTML design templates, which were saved earlier. C. Universal Networking Language The Universal Networking Language (UNL) is an electronic language for computers to express and exchange every kind of information [3]. It does so by capturing the meaning of every sentence. It has all the components corresponding to a natural language. It is composed of Universal Words (UWs) that represent concepts that are linked with other UWs to form the UNL expressions of sentences. These links, called as relations, specify the role of each word in a sentence. The subjective meanings intended by the author are expressed through attributes. The UNL Knowledge Base (KB) defines possible relationships between UWs. Thus the UWs constitute the vocabulary of UNL, relations and attributes constitute the syntax of UNL and the UNL KB constitutes the semantics of UNL. Fig 2 shows the working of the UNL system Enconversion is the process of converting from natural language into UNL whereas deconversion is the process of converting from UNL back to natural language.
1) UNL’s Representation of Sentence: The UNL represents information i.e. meaning, sentence by sentence [3]. Sentence information is represented as a hypergraph having nodes as concepts and relations as arcs. This hyper-graph is also represented as a set of directed binary relations, each between two concepts present in the sentence. Concepts are represented by character strings called “Universal Words”. UWs can be annotated with attributes, which provide further information about how the concept is being used in the specific sentence. Figure 3 shows the UNL graph of the sentence “John eats rice with a spoon.” |