The MultiText Project The MultiText Project is concerned with developing techniques for the indexing and retrieval of very large electronic collections of text. By "very large" we are not referring merely to collections such as the *Complete Works of William Shakespeare* or the *Encyclopedia Britannica* that might fit on one or more CD ROM disks and be purchasable by the owner of a personal computer. Rather, we are concerned with techniques for collections many times larger --- all issues of a large newspaper for several decades, all journals in a subject area, or, ultimately, a significant fraction of all text available electronically. In developing these techniques we are considering the many unique requirements of very large text collections: *Multiple Users* It is not possible for each user to have a copy of the text and indexing information on his or her own personal computer. Our techniques allow many thousands of users to simultaneously query a text collection across a computer network. Incoming requests are scheduled to minimize the impact users have on one another. *Multiple Server Machines* Several computers must work in cooperation to provide storage and indexing for collections of this size. It is not feasible to store all information on a single computer or even at a single site. Our techniques allow effective and efficient communication of information between user's machines and the various machines indexing and storing the text. *Continuous Availability* The text collection must be updated, reorganized and extended while remaining available to users. The individual computers storing and indexing the text must be maintained and repaired with only a minimal reduction in performance. An unexpected failure of one of the individual computers must have no effect on availability and only a minimal effect on performance. *Multiple Query Languages* A variety of query languages and graphical user interfaces must be simultaneously supported, accommodating variances in user's tastes and abilities. *Multiple Text Formats* Documents in different formats must be stored in the same collection. Despite differences in format, users may still formulate queries that refer to document structure --- title or author, for example. Project Principals: Gordon Cormack (cormack@plg.uwaterloo.ca) Forbes Burkowski (fjburkow@plg.uwaterloo.ca) Project Staff: Charlie Clarke (claclark@plg.uwaterloo.ca) Rob Good (rcgood@kiwi.uwaterloo.ca)