This research offers a new method for knowledge representation and answer retrieval, mainly targeting question and answer systems. The research is based on the study of real-world natural language processing problems, which mostly occur when systems or algorithms are trained to answer questions correctly within a given subject domain. When the same algorithms are faced with a new domain of questions, then new sets of rules or training must be done. Existing systems such as natural language personal assistants, Chatbot’s, and query analysers are particularly useful testing tools for this research.
The problem of training systems to answer questions correctly within a set of domains is that, if the question domain changes, then extra time must be allocated again to the retraining of systems so as to adapt to a new set of domain rules. This research offers a generic method to improve the current limitation in this field without having to always retrain algorithms to match a new set of rules.
An Information System (IS) methodology was adopted for the research. A conceptual framework for the problem was developed by combining the understanding from three viewpoints:
1. A study and preliminary experimenting of the real-world language processing systems
2. Reviewing the relevant disciplines for methods and concepts leading to theories for supporting the problem requirements.
3. Investigating popular natural language classifiers to the cognizance of the depth of passage information extraction done with classifiers. Applying the knowledge of Artificial Intelligence (AI) research in the field of knowledge representation and natural language processing, a generic architecture that incorporates language learning for improved answer retrieval was proposed. The Stanford NLP Classifier was used to categorise, and sort large passages of text given as input to the algorithm. A newly created technique using dependency structure, lemma, parts of speech and name entity recognition was used to filter passages and retrieve the correct answer to a question.
Based on the classification result obtained from Stanford NLP Classifier, a generic knowledge-base was created to represent this knowledge for easier retrieval of answers.
Stories on various topics were shared out to test users to write down question of choice from them. These questions were used to test the algorithm against other language processing systems and Chatbots.
Following this, a confusion matrix was used to analyse and evaluate the algorithms functionality and output against other systems. The algorithm was found to produce more accurate answers for questions in different domains and lesser errors when compared with other systems.
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (3MB) | Preview
Downloads
Downloads per month over past year