314
IETE TECHNICAL REVIEW, Vol 23, No 5, 2006

 

Kazhugu, a multilingual Internet search engine, claimed to be India’s first in regional languages, has been developed by Anna University-KB Chandrashekhar (AU-KBC) [2] esearch Centre, Chennai.

B. AgroExplorer Architecture

Figure 1 shows the block diagram of AgroExplorer [1]. The Focused Crawler crawls the web and collects pages relevant to the Agricultural domain and creates a HTML corpus. This corpus is then passed to a HTML Parser, which separates the text and the design part of the pages. The design part of the HTML pages is saved for later use. The raw text in the form of sentences is then passed on to the Enconverter, which converts it into UNL form. The UNL corpus thus created is then preprocessed and passed to the Indexer module, which creates an inverted index on the UNL expressions. This is an offline process, which takes place in the background. Once the user enters a query, we first get the UNL expression of the query through
the Enconverter. After preprocessing, this UNL expression is passed to the Search Module, which uses the inverted index, created earlier and performs a graph-based search on the UNL expression of the query.

The search module returns documents that are in the UNL format. Then depending on the language

Fig 1 Block diagram for AgroExplorer

 

 

selected by the user, the UNL documents are passed to the corresponding Deconverter, which converts the documents into the target language. This document is then merged with the HTML design templates, which were saved earlier.

C. Universal Networking Language

The Universal Networking Language (UNL) is an electronic language for computers to express and exchange every kind of information [3]. It does so by capturing the meaning of every sentence. It has all the components corresponding to a natural language. It is composed of Universal Words (UWs) that represent concepts that are linked with other UWs to form the UNL expressions of sentences. These links, called as relations, specify the role of each word in a sentence. The subjective meanings intended by the author are expressed through attributes. The UNL Knowledge Base (KB) defines possible relationships between UWs. Thus the UWs constitute the vocabulary of UNL, relations and attributes constitute the syntax of UNL and the UNL KB constitutes the semantics of UNL.

Fig 2 shows the working of the UNL system Enconversion is the process of converting from natural language into UNL whereas deconversion is the process of converting from UNL back to natural language.


Fig 2 UNL system

1) UNL’s Representation of Sentence:

The UNL represents information i.e. meaning, sentence by sentence [3]. Sentence information is represented as a hypergraph having nodes as concepts and relations as arcs. This hyper-graph is also represented as a set of directed binary relations, each between two concepts present in the sentence. Concepts are represented by character strings called “Universal Words”. UWs can be annotated with attributes, which provide further information about how the concept is being used in the specific sentence. Figure 3 shows the UNL graph of the sentence “John eats rice with a spoon.”