This paper describes the results of a preliminary attempt to use a genetic algorithm to
divide an inverted file into a specified number of partitions such that the total number of
documents indexed by a particular partition is approximately equal to the total number of
documents indexed by each of the other partitions. The purpose of identifying such
equifrequent partitions is to assist in the generation of text signature representations of
documents which are more discriminating than those created using more traditional
techniques.
The paper is divided into six sections. Following the introduction, the second of these
describes the main idea behind the signature approach The third section introduces the
idea of genetic algorithms and briefly reviews earlier work on the application of this
technique to information retrieval problems. The fourth section describes how we have
used a genetic algorithm to partition a section of the inverted file used to index the LISA
document test collection and how we have then used the partitioned file in the production
of text signatures to represent documents in the collection. We might say that the
signature representations produced in this way have been customised to the vocabulary of
the LISA document collection and we would therefore expect them to be more
discriminating than text signatures produced using more traditional techniques. The fifth
section of the paper compares the results of searches conducted using signatures with
results obtained from searches of the full inverted file. It is concluded that the signatures
produced with the assistance of the genetic algorithm are more discriminating than those
produced using simpler techniques.