Friday, December 30, 2011


Classifying, categorizing and organizing information seems to be a basic human activity. Flowing from this is the need to retrieve information using those skills. Indexing is the basic mode of doing this. The core of an index is using a known set of symbols, such as the alphabet.

 Book index is indeed the oldest among the figurative or applied senses of the word, and that this specific usage (like the word itself) goes back to ancient Rome. There, when used in relation to literary works, the term index was used for the little slip attached to papyrus scrolls on which the title of the work (and sometimes also the name of the author) was written so that each scroll on the shelves could be easily identified without having to pull them out for inspection. So that the copyists may take some bits of parchment to make title slips from them. From this developed the usage of index for the title of books.

Finding the actual first index seems to be a difficult exercise because of a lack of detailed book and manuscript catalogues. Cataloguers of Hebrew and Latin Mss often refer to a table of contents as an index. Many mss are being digitized but the cataloguing seems to be inadequate; the presence of indexes not being mentioned.

In the 10th century Hebrew scholars called Masoretes appear to have compiled list with Words from the Hebrew bible arranged in alphabetical order and the phrases from which they come aligned with them.

An early form of indexing could also be seen in the tables (cannon) comparing the verses of the four Gospels found in mss from the dark ages. The first step in indexing is the breaking up of mss into chapters, especially done for the Bible and Justinian’s legal code. Stephan Langdon (1150-1228) did this for the Bible. Chapters had been assigned in 2nd and 3rd century Greek New Testament mss. He included verse numbers with his chapter divisions. This made the compilation of concordances to the Bible possible.

Medieval indexing reflected the interests of the Institutions, libraries and monastic houses, where they were compiled. They were what we would call in-house products. Alphabetical lists of moral issues in a compendium of treatises might be produced, but neutral information, such as that on rivers rocks and flowers, might be ignored. Their usefulness was often limited by copyist’s errors and mistakes in the numbering of leaves.

Knowledge of the alphabet could not be assumed. Giovanni di Genoa in his Catholican (1286) though it necessary to explain this:-

“ ‘Amo’ comes before ’bibo’ because ‘a’ is the first letter of the former and ‘b’ is the first letter of the of the latter and ‘a’ comes before ‘b’ … by the grace of God working in me, I have devised this order”. Eisenstein (1979)

As late as 1604 Robert Cowdrey in his Table alphabetical of hard English words noted that ‘the reader must learn the alphabet, to wit: the order of the letters as they stand. Eisenstein (1979).


One of the consequences of the invention of printing after 1456 was the commercialization of publishing. Production was now from printer’s workshops in commercial centres rather than from stationers and scribes working in university towns or monastic scriptoria. Important was the use of alphabet in the arranging of the types.

The oldest printed indexes are found in two editions of St Augustine's De arte praedicandi, published respectively by Fust and Schoeffer (the printers of Gutenberg's Bible) in Mainz, and by Mentelin in Strassburg, probably in the early 1460s. Previous research has established Fust's priority, while Mentelin probably copied Fust's edition, including the index. The book's preface specifically mentions the index and explains its use. The index, whose locators refer to paragraphs indicated by letters, contains 230 entries for only 29 pages of text; it has many cross-references and some rotated multi-word entries. In a later advertisement for his books, Schoeffer mentioned the index to Augustine's book as a useful feature. The first dated index appeared in 1468 in Speculum vitae, a moral treatise printed by Sweynhcym and Pannartz in Rome. This index was also reprinted many times by other early printers.

Among all the books and other printed material from the shop of Fust and Schoeffer came from the joint press before 1467, there is only one that has an index, namely St Augustine's De arte praedicandi (On the art of preaching) which is the fourth part of his larger work Dedoctrina Christiana.

Shortly after the Augustine's index, whose exact dating is still unknown, the first dated index appeared in the editio princeps of Speculum vitae,y3 a moral treatise discussing the advantages and merits as well as the disadvantages and perils of various professions from king to shepherd, written by the Spanish bishop Rodrigo de Zamora (Rodericus Sancius Zamorensis, 1404-1470), and printed by Sweynheym and Pannartz in Rome as their fifth publication in 1468. The book has 300 large pages (287 x 200 mm), 292 of which contain the preface, table of contents and text, and only six and a half pages (leaves 147a-150b) of index.

The printer Peter Schoeffer (ca. 1425-1503) of Mainz in his catalogues specifically mentions that his better arranged books have complete indexes; an index being an important sales point. The earliest dated (sort of) printed list representing an index appeared in Epistolae Hieronymi in 1470, according to Colin Clair (1969) it was a list of the first words of each of the gatherings. Later that year Ulrich Han printed his, with the catch words of each double page listed. He gives a specific date for Exposito Psalteri, 4 October, 1470. Schoeffer did not date all his first books so there might have been an earlier one. Hans H. Wellisch (1986) claims that it was Furst and Schoeffer’s editions of Saint Augustine’s De Arte praedicandi, which, as they can be dated to the early 1460s, had the first printed index.

In these early years of printing the distinction between the contents page, a register and an index was not very clear. In this 1554 Commentary by St John Chrysostom on the Epistles of Paul the ‘ indice copiosisimo’ (most copious index) is placed at the beginning of the work and there is no contents page. The words would indicate that the publishers here thought the index, mentioned on the title page, would be a draw card to encourage scholars to buy the book. The Latin title of the index itself also shows it to be an important feature. This translates as “Most copious index of all the matters which we have in this book”.

Toscanelli’s 1568 commentary on Virgil has an alphabetical table of contents, which is in fact as comprehensive as the previously mentioned ‘Indice copiosisimo’ the table has the title in Italian ‘table taking up the notable content in this present work of observations on Virgil.

Ortelius in his 1570 Theatrum Orbis terrarium starts with an ‘Index tabularum” which lists the maps alphabetically for more specific location you have to refer to the gazetteer at the end which does not give you the page numbers only the name of the area in which it lies and does not necessarily relate to the maps.

Indexes go way back beyond the 17th century. The Gerardes Herbal from the 1590s had several fascinating indexes according to Hilary Calvert. Barbara Cohen writes that the alphabetical listing in the earliest ones only went as far as the first letter of the entry... no one thought at first to index each entry in either letter-by-letter or word-by-word order. Maja-Lisa writes that Peter Heylyn's 1652 Cosmographie in Four Bookes includes a series of tables at the end. They are alphabetical indexes and he prefaces them with "Short Tables may not seeme proportionalble to so long a Work, expecially in an Age wherein there are so many that pretend to learning, who study more the Index then they do the Book." 


The 18th century produced some, interesting index entries. The ladies magazine or Entertaining companion for the fair sex. Vol.7, 1776. It has separate indexes for Essays and prose, Poetry and Births, Marriages and deaths.

An important indexer from this period was Alexander Cruden (1699-1770), with his famous A Complete Concordance to the Holy Scriptures (1737). He is reputed to have been so enthusiastic about compiling it, working late into the night that he failed to notice that the stock of his bookshop was being depleted and was surprised by the subsequent fall in sales.


Indexing came to be more professional in this period, but first it had more of the same thing.
Another Magazine from early 19th century was the ladies monthly museum. Vol. 6, 1801 the issue of 1801 has an index similar to the earlier one. Notes and queries was a journal for artists, antiquaries, genealogists and literary people. The index of Vol. 1, 1849-50 included titles and headings exactly as they appeared in the text, with resulting curious entries.

John Fiske sometime librarian at Harvard wrote Darwinism and other essays, 1893 and might have indexed it.

These people might have benefitted from the services of Mary Petherbridge, 1870-1940. She set up a Secretarial Bureau in 1894, offering secretarial and library services which included training in those fields as well as training in indexing. In 1904 she published a manual called “The technique of indexing”. She worked as a freelance indexer for various publishers and even government departments, especially the India Office. She wrote an article, ‘Indexing as a profession for women’ for the magazine Good housekeeping in 1923.She explained the steps in indexing and the various types of indexes, books and periodicals as well as the indexing of documents:
“There are four stages in indexing:
The writing of the slips;
The alphabetical listing of them;
The critical editing and its attendant research;
The proof reading, this must always be done by the indexer personally.”

However, indexes in the modern sense, giving exact locations of names and subjects in a book, were not compiled in antiquity, and only very few seem to have been made before the age of printing. There are several reasons for this. First, as long as books were written in the form of scrolls, there were neither page nor leaf numbers not line counts (as we have them now for classical texts). Also, even had there been such numerical indicators, it would have been impractical to append an index giving exact references, because in order for a reader to consult the index, the scroll would have to be unrolled to the very end and then to be rolled back to the relevant page. (Whoever has had to read a book available only on microfilm, the modern successor of the papyrus scroll, will have experienced how difficult and inconvenient it is to go from the index to the text.) Second, even though popular works were written in many copies (sometimes up to several hundreds), no two of them would be exactly the same, so that an index could at best have been made to chapters or paragraphs, but not to exact pages.


Rouse and Rouse report that subject indexing was invented in Paris in the thirteenth century. (This is ironic in light of the absence of indexes in many modern French works.) It is an interesting coincidence that the citation index described by Gaster was produced in the same century as the first subject indexes, and in the same country — in the city of Avignon.
Wellisch (1994) suggests that subject indexing began in the 4th century with the Apothegmata, a compilation of sayings of the Greek Church Fathers. Witty describes this as an alphabetically arranged tool rather than a subject index to a narrative text (1973, p. 196). Richardson (1939, p. 844) states that the earliest Biblical dictionary was the Onomasticon by Eusebius (264-340 C.E.), but it was not in alphabetical order.

Bacher (1912) notes that a Greek dictionary of Biblical proper names is ascribed to Philo Judaeus, who lived in Alexandria from 20B.C.E.to40C.E


The function of an index is to provide users with an effective and systematic means for locating documentary units (complete documents or parts of documents) that are relevant to information needs or requests. An index should therefore:

a. identify documentary units that treat particular topics or possess particular features.

b. indicate all important topics or features of documentary units in accordance with the level of exhaustivity appropriate for the index.

c. discriminate between major and minor treatments of particular topics or manifestations of particular features.

d. provide access to topics or features using the terminology of prospective users.

e. provide access to topics or features using the terminology of verbal texts being indexed whenever possible.

f. use terminology that is as specific as documentary units warrant and the indexing language permits.

g. provide access through synonymous and equivalent terms.

h. guide users to terms representing related concepts (narrower terms, other related terms, and if possible, broader terms).

i. provide for the combination of terms to facilitate the identification of particular types or aspects of topics or features and to eliminate unwanted types or aspects.

j. provide a means for searching for particular topics or features by means of a systematic arrangement of entries in displayed indexes or, for non—displayed indexes, by means of a clearly documented and displayed method for entering, combining, and modifying terms to create search statements and for reviewing retrieved item


Indexes may be categorized by type of object to which headings refer; by type of term used for index headings; by type or extent of indexable matter used to produce the index; by method of arranging entries; by method of term coordination; by type, format, genre, or medium of documents being indexed; by medium of the index; by mode of publication; by periodicity, that is, whether the index is a one-time (closed—end) index or a continuing (open-end) index; and by type of authorship. The following examples illustrate common types of indexes. They are by no means exhaustive.

Indexes by type of object referred to

A. authors: all types of document creators such as writers, composers, illustrators, translators, editors, choreographers, artists, sculptors, painters, inventors.

B. subjects (topics or features): topics treated in documents and/ or features of documentary units (for example, genre, format, methodological approach). Separate indexes are often devoted to special types of topics such as persons, places, or corporate bodies; features, such as genres (for example, poetry, drama); or notations, such as International Standard Book Numbers (ISBN).
Indexes by type of term used for headings

A. names: proper nouns, such as names of persons, places, corporate bodies.

B. numbers or notations: numerical or coded designations, such as classification notation, patent number, ISBN, date.

C. words and phrases: common words and phrases (as opposed to names or proper nouns).

Indexes by type or extent of indexable matter on which an index is based

A. full text of documents.

B. abstracts.

C. titles only.

D. first lines only (for example, first lines of poems).

E. citations (reference citations to other documents)

Indexes by arrangement of entries

A. alphabetical or alphanumeric.

B. classified: Headings arranged on the basis of relations among concepts represented by headings, for example, hierarchy, inclusion, chronology, or other association. Classified indexes are often based on existing classification schemes, such as the Dewey Decimal Classification.

C. alphabetico-classed: Broad headings arranged alphabetically. Narrower headings are grouped under broad headings and arranged alphanumerically or relationally on the basis of hierarchy, inclusion, chronology, or other association.

NQTE: Electronic indexes often have no arrangement that is apparent to the user. However, indexes designed for human scanning, browsing, and examination must have some arrangement, regardless of medium

Indexes by method of document analysis

A. human intellectual analysis and identification of topics and concepts expressed and/ or features manifested.

B. computer algorithms designed to identify useful terms, phrases, or features.

C. combination of computer—based and human analysis.

Indexes by method of term selection

A. assignment of terms to represent topics and features (whether or not the term is in the documentary unit being indexed).

B. extraction of terms from the documentary unit.

C. a combination of assignment and extraction methods

Indexes by method of term coordination

A. pre-coordinate combination, such as subject heading indexes, string indexes, chain indexes, keyword indexes (including KWIC, KWOC, KWAC indexes), rotated, and permuted indexes.

B. post-coordinate combination. Includes the use of Boolean operators, proximity measures, and the combination of weighted terms

Indexes by type, periodicity, format, genre, or medium of document(s) being indexed

Examples are: books, monographs, periodicals, serials, poetry, fiction, short stories, films, videos, illustrations, pictures, paintings, artifacts, software, computer-readable texts, maps, and sound recordings.

Indexes by medium of index

A. printed or written.

B. microform.

C. electronic media, including online, CD-ROM.

D. Braille

Indexes by proximity of documentary units

A. indexes published together with the documentary units to which they refer, including both back-of-the-book indexes and full-text databases.

B. indexes published separately from the documentary units to which they refer.

Indexes by periodicity of the index

A. one—time, closed—end indexes.

B. continuing, open-end indexes.

Indexes by authorship

A. authored: An authored index; a separately authored document distinct from the document(s) that is (are) being indexed. It is created independently by one or more persons through intellectual analysis of text, as distinguished from indexes that are created solely through algorithmic analysis of text carried out electronically.

B. automatically generated

Genealogical Indexing

Genealogical indexes allow users to look up people’s names and find information about personal and family relationships. They often eliminate the need to access original source materials example cemetery inscriptions.

Book indexing

Book indexes provide access to detailed contents of books. Back of the book indexes are made for all types of non-fiction books, including textbooks, multi-volume works, technical reports and annual reports. Book indexes are lists of words, generally alphabetical, at the back of a book, giving a page location of the subject or name associated with each word.

Legal Indexing

Legal indexing involves indexing of legal materials by form and content. Legal indexers are familiar with legal concepts and classification and are able to translate the classification into accessible indexes.

Periodical and Newspaper Indexes

Periodical and newspaper indexes give access to the contents of individual articles and other items in serialized publications. Many periodical and newspaper indexes are based on a controlled vocabulary to ensure consistent use of terms from year to year.

Pictorial Indexes

Indexes to images help users identify relevant pictures in collection of photographs, art works, videos and films.

Word Indexes

Word and name indexes, which are sometimes called concordances, are indexes to the individual names and words that the author used, and in one sense most closely represent the information and ideas the author had in mind when creating the manuscript. These indexes are the exact term or word within the context of the document which pinpoint the subject discussed and its location.

String Indexes

Although string indexing is a modern term, the idea to display a series of rotating index entries from a basic list of index terms that make up the string. The objective is to give the user an entry point for all index terms and to display them in context with each other. A string index is usually, but not necessarily, output by a computer. Except for basic keyword in context (KWIC) indexing, string indexing is generally not a fully automated process.

Permuted Title Indexes

Permuted title indexes are created by systematically rotating information-conveying words in the title as subject entry points into the index. The premise of permuted title index is that titles effectively indicate the content of documents. Because it reflects the content of a document, a permuted title index helps users decide if that document would satisfy their information needs.

Multimedia Indexes

These indexes integrate images, sounds, and textual material.

Internet Indexes

Internet indexes exist in traditional forms, in hypermedia forms, in automatic forms, and in implicit forms.

Hypermedia Indexes

Hypermedia is text and non-textual material in an electronic form that allows users to search non-linearly by associations. This type of indexes allows users to thread their way to what they want through electronic nodes and links between those nodes.

First-Line Indexes

First-line indexes generally refer to poems. In these indexes all the words in the first line poem are listed in their alphabetical position in the index.

Faceted Indexes

Facet, by definition, means one side of something that has many sides. In a faceted indexing system, any subject is not a single unit but has many aspects; thus a faceted index attempts to discover all the individual aspects of a subject and then synthesize them in a way that best describes the subject under discussion.

Cumulative Indexes

A cumulative index is a combination or merging of a set of indexes over time. Such indexes for established works can often cover many decades. Generally, such indexes apply to journals and to large, important works and are published as separate volumes. Cumulative indexes are complex and usually are done by teams of indexers.

Coordinate Indexes

Coordinate indexes allow terms to be combined or coordinated

Classified Indexes

A classified index has its contents arranged systematically by classes or subject headings.

Citation index

A citation index consists of a list of articles, with a sub-list under each article subsequently published papers that cite the articles. In order words, given a particular paper, a citation index shows who cited that paper at a later point in time. Basically, this kind of index implies that a cited paper has an internal subject relationship with the papers that cite it, and this relationship is used to cluster related documents.

No comments:

Post a Comment