© F. Dell'Orso, Bibliography Formatting Software: An Evaluation Template
Last update: May, 6, 2004


      1. Introduction
 1.1. Key issues for analysis and choice

"What kind of product would suit my needs?"
Rather than questioning the object, let's ask the subject: "what are your needs"?

 1 Canned vs flexible 
What kind of user are you?
"Well, I want a stable and efficient product..."
Sure, and we agree that being stable is even more important than being "brilliant", "over-rich" in features.
But if you had to choose between "fast and canned" and "sophisticated and elaborate", what would you prefer?
This choice involves critical consequences.
The second type of user will certainly accept a steeper learning curve.
He will feel willing to read the full documentation.
He will enjoy flexibility much more than ready-made solutions.
He will love modifying the product (add and change record types, fields, term lists..., not to mention citation styles).
The first type of user will appreciate the opposite.
Needless to say we do not have just two types of users: try to locate yourself in the range.
The more you modify or customize the delivered product, as far as document types and their fields are concerned, the more you tend to isolate yourself from the context. These software products are made and shipped with default structures: record type definitions, fields, output styles and import filters which all tend to work together harmoniously.
For example, if I add, modify or suppress a field in a workform (e.g.: in "book chapter" I do not expect to find "Place of publication" but I want "ISBN"), I cannot expect the system to know all that and adjust the filter while importing from an external bibliographic database; rather I will have to intervene and adapt the filter. Ditto for output styles: e.g. Chicago or Vancouver will first react according to the default database structure as designed by the software developer and should be adapted to comply with my modifications.
So normally a product that is fairly stiff, or less customizable than others, is more easy to be used.
If you do not need to modify the shipped database structure you have a much quieter life, as styles and filters will work relying on the document types and fields configuration designed by the developer.
But the 'active' user is also likely to exhibit the skill and the stamina required to reach the shore (other users, standard output formats ...) from which he is temporarily distant.
The trend of the market and of the leading products is to offer simpler, more limited, ready-made products. Publishers believe that the large majority of users prefers a large quantity of solutions rather than the tool to craft its own solution (this is especially the case with import filters and output styles).

 2 Money
The use of such a personal bibliography management product is likely to be an investment for at least a few years, if not four your professional lifetime. The amount of money to be spent in purchasing such a tool should be regarded as an investment, whose duration is at least a few years. But money can always be a problem and it could orientate your choice towards free or extra cheap products.

 3 Time
"I would love ... but I have no time to read the documentation, to train myself or to attend a training class ...".
Got it: you look like the ÇThere is never enough time to do it well, but there's always time to do it againÈ category.
Simple, all-intuitive software, often accomplish just trivial tasks. And, if you do not have time at all, just leave it and do not read here any further: another loss of time.

 4 Platform
Mac¨, Windows¨, Linux¨, Unix¨, cross-platform ... The largest majority are MS-Windows¨ packages. What kin of OS do you use?

 5 Network
You need to work in a network with a real network functionality: at least simultaneous write access and record locking (plus: different authorization levels ...?): this is likely to be a key factor as there are not many products which offer this full functionality.

 6 Language
Ready to read everything in English? Though you may find French, German and Czech, products they represent rare examples in an English-speaking universe (web pages, menu options, help, manuals, e-mail supports: all in English).

 7 Database structure
Fields should all have variable length, or in any case be 'very' hospitable: we are mainly dealing with text strings whose length cannot be predicted once for all, to allow up to 40 characters for the Publisher field is nonsense, we simply do not know in advance, we need ... free space. At least some of the fields should be able to host multiple values with no practical limitation: multiple authors, multiple keywords (subject headings). Those multiple entities are not simply words: "water pollution" is one keyword. Multiple values have to be recognized and handled as such when building lists, indexes, sort sequences.
Are you looking for flexibility in order to add reference types and fields, and term lists?
Would you like to be able to modify fields attributes?
Read 1 above. There exist products which work with a given, unmodifiable, number of reference types and fields (often including a few empty neutral-User-Custom fields), others let you add new ones: the utility of this difference is deeply appreciated when you manage your own project with specific requirements (e.g. dealing with a special collection, a very detailed bibliographic project with lots of fussy fields...).
As far as import filters and output styles are concerned there is no question, any product will allow you to modify the existing ones or create others.
Do you need to establish vertical and/or horizontal links between records, entries, record and notes etc.?
Most of the packages we are dealing with do not offer this capability apart from the connection between an entry (a name or a keyword e.g.) and the multiple records which contain it. By far, most BFS are flat file managers with no relational database structure displayed to the user.
Subfields is another prominent way to structure data: normally they are lacking in BFS database structure. The only exceptions are: name fields (author, editor, secondary author etc.  where one single comma marks the splitting into subfields) and the date field (publication year, where often the software can extract the year portion from a full date).


 8 Freedom: export
"When you enter look for the exit": it may sound like a spy story motto. Prior to selecting your product, prior to entering massive amounts of data, check the export facility, the available formats, test the export of all reference types and fields (if possible with any style attribute), default and custom. Open the export file and take a careful look at it, check how repeatable fields (namely authors and keywords) are handled: this will turn out to be crucial whenever you have to give your data to somebody else using a different system, and whenever you decide to switch to another software package. The latter is something that is very likely to happen, sooner or later : products come and go, whereas data are, and have to be, more stable; in general: data is much more important than the package you use to manipulate it.
Export should be available in delimited, tabbed, tagged formats. It is important that you may export also a code for the reference (document) type and that you properly separate the occurrences of a multi-value field (authors, keywords)
Conversely: import is very important.

 9 Input/Edit
Capturing existing data from term lists. Global editing functions. Copy record. Duplicate detection via user-defined criteria. These are the basic required functions.
Characters-symbols table, case conversion, default values, date stamping, autocompletion, validation, spell-checking are much less essential.

 10 Import
Very important function.
By far the most difficult and delicate procedure within this kind of software. Here flexibility is an asset, if you are not able to shape the shipped filters you must completely rely on them. First of all flexibility means: IF ... THEN, to be able to set conditions. Conditions can be set also when IF ... THEN is not explicitly stated, because selection of  predesigned subordinate options often implies conditional commands. Predesigned subordinate options are almost always present in BFS, something like: "if there are more than 3 authors, display only the first and add [et al.]", is more than a simple "IF ... THEN", it has already been handled by programmers, the code is hidden, you only choose one or more options. Parsing is another example of imposing conditions. Parsing means fragmenting one field to send its chopped contents to different fields. Parsing is important, often used with the 'source' field for Journal title, volume, issue, date, pages.
It is crucial to be able to handle several formats (tabbed, delimited, tagged in various ways). Relevant issue: varying structure and position of field tags, occurrences separator for multi-value fields, wrapping lines. None of the reviewd packages is able to import records formatted in ISO 2709 format, including MARC bibliographic records, but nowadays MARC records are most often displayed, captured and converted in the tagged (labelled) format. One way to get them is via the Z39.50 search and retrieve protocol.
The trend is to offer the users hundreds of ready-made filters. I would never trust any of them without double-checking what they do with the data I am interested in.

 11 Searching
Searching and retrieving is the main way of selecting part of the database. As one seldom manipulates at the same time the whole database, search is the preferred approach to data.
Presently library OPACs and BFS seem to consider a window-structured and driven search interface as the most appropriate tool. No symbols, no explicit logic is required from the user. This seems to be simple and efficient, but it is also deceptive. It makes searching easier and finding faltering.
We still think and speak with clauses and pauses. (A OR B) and (C OR D) is not such a complicated query: I look for (children OR adolescents) AND (death OR suicide). With the mainstream dumb window-structured search interface, such a basic query statement becomes impossible to formulate, because parentheses are not foreseen. The algorithm which governs the syntax and the priority among boolean operators is hidden and the expression is commonly transformed into: (children OR (adolescents AND death) OR suicide) or in (((children OR adolescents) AND death) OR suicide). In the first case priority is given to the type of operator, in the latter priority is top down. None of them gives the appropriate response to the abovementioned query.
The alternative is not necessarily full SQL (structured query language), it is enough that one can make use of parentheses.
Any field (full text indexing), truncation and phrase queries are essential.
Searching in the result ('refine') and saving query expressions are essential.
The use of browsing term lists directly pointing to the records is useful.
Other aspects like soundex, relevance ranking, adjacency operators are a bit finical in this context.
Z39.50 searching remote database is important to retrieve and import data.

 12 Output and formatting language
Again, the trend is to offer the user hundreds of ready-made styles rather than a powerful and rich formatting language, and once againg a powerful language is one that can handle conditions (IF something is absent THEN do that). A basic language is always incorporated into these packages and citation styles can be modified or created from scratch but they tend to perform and assure the minimal performance. It is fashionable to state that the package can produce HTML and XML outputs: presently the quality, the complexity of such a ?? formats can be truly deceiving.
"Subject bibliography" traditionally refers to a list which is not only sorted by one or more nested criteria, but a list where the sorting key is also a heading out of context. It is routine in large bibliographies and catalogs where the key used to sort the items is clearly displayed on top:

Reference List:

Adam, D. M. (2)

Alam, F., A. H. Soloway, R. F. Barth, N. Mafune, D. M. Adam and W. H. Knoth. "Boron Neutron Capture Therapy: Linkagae of a Boronated Macromolecule to Monoclono Antibodies Directed Against Tumore Associated Antigens." J. Med. Chem, 32 (1989): 2326-30.

Tjarks, W., A. K. M. Anisuzzaman, L. Liu, S. H. Soloway, R. F. Barth, D. J. Perkins and D. M. Adam. "Synthesis and in Vitro Evaluation of Boronated Uridine and Glucose Derivatives for Boron Neutron Capture Therapy." J. Med. Chem. 35, no. 9 (1992): 16228-786

Here the heading (the "subject" of the bibliograpic list) is the author name: Adam, D.M. followed by a counter for the references where it is recorded.
Two nested levels of sorting headings would be a plus, normally lacking. If you can replace the full bibliographic references displayed under the headings by a reference, for exemple, to the record number,  you get an index: useful.
Sorting records is essential to handle data, it comes just after searching. Sorting records via more than one nested criteria (first date, if date is the same then sort by authors, under the same author sort by main title etc.) is very important (and it is another example of 'hidden' "IF...THEN" clause).
Sort implies and can often hide other important factors regarding the way characters are handled: main heading lacking in records, length of the sort key, case, digits, leading articles, letter's sequence according to the selected language (Spanish sorts different from Italian) ...

 13 Manuscript formatting
Manuscript formatting means to place markers in the document you are typing via a word processor and to format the paper exploiting those markers. Markers are something like: (Alam 1992) that make reference to the relevant record by Alam contained in your database. When you eventually format the paper, that marker can be transformed within the text (or footnote) into something like: (Alam, F. et al., 1992) or into (Alam et al., Boron Neutron Capture Therapy) or into (1), also the full reference can be printed in the bibliography list at the end of the paper.
Two great advantages: 1. you store the reference just once as a record in the database with all the required data, and then you use that record, you exploit it as many times as you wish in different papers: a normal, though not dim at all, way of using a database. 2. thanks to citation or output styles you can format the very same reference in n ways (traditionally: Chicago style, Vancouver, MLA, APA, Science, Harvard and hundreds of others) without changing a comma in your paper as long as the markers are there. It is simply a matter of one selection when you decide to switch for example from Turabian to Chicago A.
We may even say that this has been the very reason to develop such a kind of software that we still define bibliography formatting software in the early eighties. Developers (like prof. Victor Rosenberg author of ProCite) realized that a scholar might submit the same journal article to more than one journal's editorial board at the same time. It is almost the rule that different journals have their own different citations policy and styles. Scholars would appreciate not needing to change manually the output format of the references, either within the text or in the final reference list. Packages like these streamline the process: once the triangle works (1. references are properly stored in the database, 2. the output style is tuned and 3. the markers are correct) the writer does not need to worry about changing style, formatting citations and bibliography.
Manuscript formatting is one main function that clearly distinguishes and serves to identify this family of software packages from the others: pure personal information managers, or generic databases, not to mention word processors or spreadsheets, are seriously lacking it.
This is typically a procedure where details are countless, and incessantly increasing too, therefore we are not going to mention any of them (see for a detailed analysis
Manuscript formatting). It is notable that the market trend is to offer the user the possibility to stay within the document, and the wp, somehow 'calling' the references from the database to insert the markers and format the same document. This requires the developers to write third-party portions of codes in order to interact with a certain number of word processors. This is not always feasible, markers are placed via a command given within the BFS, the document is saved in one format (e.g. RTF) that can be read by the database and a formatted copy of it is eventually produced.
The whole procedure is so important that producers invest a lot of effort in improving its features at every release, at the same time users choose this kind of software product just because this function is available. Therefore this is likely to be an important factor in selecting a package.

 14 Documentation
We consider Documentation still important. Very important. We hate learning things in this field of knowledge by trial and error, or just by the use of the mouse ... by serendipity. Manuals and online help vary to great degrees in this respect. Additionally, information can be scattered: FAQs, web pages of the publisher's site, tutorial, help, reference manual ... One solid authoritative source: what a dream. Documentation is often the last thing publishers worry about. It should always be updated, it should be revised by two kinds of skilled personnel: computer people and more user-oriented people. This is not always the case.

 15 Internet and Web publishing
This has increasingly become a quite important issue due to the evolution and diffusion of the web. Users often want to be able to publish, and make dynamically searchable, their own database in a seamless way, without the need to convert data from their BFS into another web database management system. You will notice that few packages offer this function and how the picture will change in future. 

 Scope, purpose and use
"What are you going to use this software for"?
This could be the first question to ask yourself.
For example, let's take the two most prominent and somehow opposite goals: will you be using the software mainly to publish papers or to manage a database? In the first case the Manuscript formatting function is likely to be the, or one of the, most important features for you; in the latter case, searching and sorting will have more importance.

 




Table of contents Bibliography Index

© F. Dell'Orso, Bibliography Formatting Software: An Evaluation Template
Last update: May, 6, 2004
p. 3/42