Jilani, Aisha (2013) Parallel corpus multi stream question answering with applications to the Qu'ran. Doctoral thesis, University of Huddersfield.

Question-Answering (QA) is an important research area, which is concerned with developing an automated process that answers questions posed by humans in a natural language. QA is a shared task for the Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing communities (NLP). A technical review of different QA system models and methodologies reveals that a typical QA system consists of different components to accept a natural language question from a user and deliver its answer(s) back to the user. Existing systems have been usually aimed at structured/ unstructured data collected from everyday English text, i.e. text collected from television programmes, news wires, conversations, novels and other similar genres. Despite all up-to-date research in the subject area, a notable fact is that none of the existing QA Systems has been tested on a Parallel Corpus of religious text with the aim of question answering. Religious text has peculiar characteristics and features which make it more challenging for traditional QA methods than other kinds of text.
This thesis proposes PARMS (Parallel Corpus Multi Stream) Methodology; a novel method applying existing advanced IR (Information Retrieval) techniques, and combining them with NLP (Natural Language Processing) methods and additional semantic knowledge to implement QA (Question Answering) for a parallel corpus. A parallel Corpus involves use of multiple forms of the same corpus where each form differs from others in a certain aspect, e.g. translations of a scripture from one language to another by different translators. Additional semantic knowledge can be referred as a stream of information related to a corpus. PARMS uses Multiple Streams of semantic knowledge including a general ontology (WordNet) and domain-specific ontologies (QurTerms, QurAna, QurSim). This additional knowledge has been used in embedded form for Query Expansion, Corpus Enrichment and Answer Ranking.
The PARMS Methodology has wider applications. This thesis applies it to the Quran – the core text of Islam; as a first case study. The PARMS Method uses parallel corpus comprising ten different English translations of the Quran. An individual Quranic verse is treated as an answer to questions asked in a natural language, English. This thesis also implements PARMS QA Application as a proof of concept for the PARMS methodology. The PARMS Methodology aims to evaluate the range of semantic knowledge streams separately and in combination; and also to evaluate alternative subsets of the DATA source: QA from one stream vs. parallel corpus. Results show that use of Parallel Corpus and Multiple Streams of semantic knowledge have obvious advantages. To the best of my knowledge, this method is developed for the first time and it is expected to be a benchmark for further research area.

ajilanifinalthesis.pdf - Submitted Version

Download (7MB) | Preview


Downloads per month over past year

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email