Search:
Computing and Library Services - delivering an inspiring information environment

Application of a Genetic Algorithm to the Production of Text Signatures

Wade, Steve (1995) Application of a Genetic Algorithm to the Production of Text Signatures. The New Review of Document and Text Management , 1. pp. 147-167. ISSN 1361-4584

Metadata only available from this repository.

Abstract

This paper describes the results of a preliminary attempt to use a genetic algorithm to
divide an inverted file into a specified number of partitions such that the total number of
documents indexed by a particular partition is approximately equal to the total number of
documents indexed by each of the other partitions. The purpose of identifying such
equifrequent partitions is to assist in the generation of text signature representations of
documents which are more discriminating than those created using more traditional
techniques.
The paper is divided into six sections. Following the introduction, the second of these
describes the main idea behind the signature approach The third section introduces the
idea of genetic algorithms and briefly reviews earlier work on the application of this
technique to information retrieval problems. The fourth section describes how we have
used a genetic algorithm to partition a section of the inverted file used to index the LISA
document test collection and how we have then used the partitioned file in the production
of text signatures to represent documents in the collection. We might say that the
signature representations produced in this way have been customised to the vocabulary of
the LISA document collection and we would therefore expect them to be more
discriminating than text signatures produced using more traditional techniques. The fifth
section of the paper compares the results of searches conducted using signatures with
results obtained from searches of the full inverted file. It is concluded that the signatures
produced with the assistance of the genetic algorithm are more discriminating than those
produced using simpler techniques.

Item Type: Article
Subjects: Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Schools: School of Computing and Engineering
School of Computing and Engineering > Pedagogical Research Group
School of Computing and Engineering > Informatics Research Group
School of Computing and Engineering > Informatics Research Group > Software Engineering Research Group
School of Computing and Engineering > Informatics Research Group > XML, Database and Information Retrieval Research Group
School of Computing and Engineering > Serious Games Research Group
Related URLs:
Depositing User: Stephen Wade
Date Deposited: 18 May 2010 12:47
Last Modified: 05 Jan 2011 12:07
URI: http://eprints.hud.ac.uk/id/eprint/7628

Item control for Repository Staff only:

View Item

University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©