Dino Buzzetti and Andrea Tabarroni (University of Bologna)
DATABASE EDITION OF NON-COLLATABLE TEXTUAL TRADITIONS
Fourteenth-century teaching books in Italian universities - The peculiarities of the textual tradition
Image processing has turned out to be of great help in a project concerning the critical edition of teaching books produced in the university of medicine and arts of Bologna in the late Middle Ages. The decision of producing an electronic critical edition of these texts was prompted by the recognition of the common nature of their textual tradition. The transmission of these texts and their physical production was heavily affected by common teaching techniques such as repetitio. A characteristic figure in the Bolognese scholastic tradition was in fact the repetitor, "a young master who acted as a teaching assistant for the master appointed to the ordinary course, with the special duty to repeat' to the students in the evening the lecture given by the master in the morning". Traces of the activity of these repetitores are preserved in the manuscripts, mainly under the form of anonymous marginal glosses or even long passages interpolated within the text, but reported only by few or even one copy. As a result, "the works of the Bolognese masters of philosophy and medicine in the 14th century are often characterized by a complex textual tradition, providing evidence for a gradual process of composition through the different interventions of the master himself and of his repetitores. Hence, these texts also usually exhibit a sort of fluidity', affected as they are by a great number of alternative readings, scattered through the different manuscript copies, and by glosses and additions which can even be peculiar to each copy".
A typical case of this kind is the commentary on Porphyry's Isagoge written in the first decade of the 14th century by the Bolognese master Gentilis de Cingulo, the work we first started to transcribe. The text is witnessed by four manuscript copies exhibiting some major discrepancies, such as glosses and interpolations, which do not characterize however different redactions of the work.
For a text of this kind , "the traditional goal of assessing the text in the most reliable way", that is through a critical edition based on the canonical printed book model, "could be neither feasible nor desirable". For, "it is often not easy to decide whether a gloss or an addition stem from a later intervention by the author himself or by a repetitor (and, in the latter case, whether the repetitor is merely repeating his master's doctrines or is speaking on his own authority)". Moreover, "for the purposes of our research project, which is focussed on the early institutional framework of the study of arts and philosophy in Bologna, all the different versions of a text are of the same historical relevance".
The critical edition - Basic difficulties
How could then a good critical edition be produced? The "fluid" nature of the textual tradition of Gentile's commentary was interposing a major stumbling block. How could we construct a critical text? Was it, in our case, the very idea of making clear-cut choices based upon collation a viable one, or even a sound one? The usual procedures of text construction did not look totally appropriate any more. It was at this point that we came across the idea of a database representation of the entire textual tradition, and we saw it as a possible way out of our difficulties. And it should also be stressed that in our case computers and information processing were offering a solution to a difficulty that could have hardly been met otherwise. Computers and information processing were taking us a step further in our own disciplinary domain, by enabling a methodological advance in textual criticism itself.
A provisional solution - Database administration of encoded diplomatic transcriptions
A provisional solution was found in the idea of producing encoded diplomatic transcriptions of all our manuscript witnesses. The TEI guidelines were not yet completed, but we thought we would have soon been able to refer to their recommendations. Our encoded transcripts, then, could have been stored in a database, and we would have been able to retrieve and analyse all necessary information.
Digital images and transcriptions - The kleiw database management system
That solution was immediately superseded as soon as we realized that we could avail ourselves of digital images and that we could appeal to a system capable of processing both digitized images and their transcriptions, such as the kleiw database management system developed by Manfred Thaller at the Max-Planck-Institut fuer Geschichte in Goettingen.
Logical vs physical representation - What's a (diplomatic) transcription of a text - What's a digital image of a text
The availability of digitized images made us rethink the function of a diplomatic transcription and, for that matter, of a transcription altogether. Diplomatic transcriptions could be conceived no more as a substitute for the original, but as a form of analysis of the information contained in it; they could more properly serve the purposes of further processing. Transcribed information, diplomatically or otherwise, could be further processed, and computers were enabling further analytical advances.
With a system such as kleiw, which provides image processing facilities, the images themselves could be dealt with as logical information, as additional data integrated into the database. Images were not supplied any more as simple illustrations, but could be processed to improve readability, or to be associated with their corresponding transcriptions.
From this point of view, "both the image and the transcript" were no more "regarded as physical reproductions referring back to the original document, but rather as analytical data pointing toward a new logical representation of the source". The idea of conceiving both the transcription and the image as a logical representation of a manuscript source led us to reconsider the notion itself of an edition. Computers could provide new forms of representing a manuscript text. They were not providing only an aid to expeditiousness, or a means of coping with very large amounts of data; they were affording a way out of the snares of the printed edition, where the classical model of textual representation could not cope. We could appeal to computers for a new form of representation, precisely to overcome critical problems of textual representation.
An edition whatsoever is a form of textual representation and the model we had could not solve our difficulties. But digitized transcriptions and images were now thought of as a form of logical representation and they could be put to a different purpose; they could now serve not only as a physical reproduction of a manuscript, but rather as a new conceptual representation of its content. They could be used not as a representation of a document, but as a representation of a text. Digitized images and transcriptions are documents themselves, a particular form of representation. Why should they be used to represent other documents, other forms of representation, and not the text? Could they not provide a new logical form of representing the text? So we decided to use digitized images and transcriptions not to replicate a manuscript or a printed edition in a new physical medium; we took them as new forms of textual representation, on a par with any other form, be it a manuscripts, or be it a printed book. As a form of textual representation, an edition is a document, not a text; precisely in the same way, digitized images or digitized transcriptions, as a form of representation of an information content, are data, not the information they convey. But data are processable, and processing is analysis. A database representation of a textual tradition contributes to its critical analysis in a way that overcomes tha limitations of the printed model of textual representation.
Textual cononicity and textual representation - Limitations of the printed model
The limitations of the printed edition, the classical model of textual representation, are obvious in a number of cases. One is the case of handwritten "drafts" or outlines with alternative readings left over by the author in a fragmentary state. In this case, the very placing and spatial arrangement of different portions of the text are very important and "the process of becoming a textual structure is there fixed in the spacial relations of chronologically different, but structurally equivalent textual units". How shall one reproduce in a printed edition the fragmentary nature of a text that shows itself "in the language of his spacial semantics"? A possible solution would be a diplomatic transcript and the use of diacritic marks. But "in the age of reproducibility, the attempts to represent manuscripts through description or special signs" can be seen as "an anachronism, if not a charicature of philology". Facsimiles have substituted for diacritics, but again "the distinction between physical and logical representation of a document, or between its mere reproduction and its analysis, must be carefully kept in mind". For "it is only through the apparatus", that is to say through a logical representation or a due analysis, "that the facsimile -- and finally the very manuscript itself -- becomes capable of asserting" its information content. "Editing a manuscript remains categorially different from simply imitating it".
Another example is provided by the textual tradition of 12th- and 13th-century romance literature. As has been observed, "most of us almost automatically equate texts with printed books", but the medieval idea of textual canonicity "includes both the notion of authorship' and a variable textuality reflecting scribal creativity' and refashioning". The medieval text is "fluid and dynamic", for "fidelity to an author's work generally involves what we would call changing what the author wrote". But as reproduced in a printed book, text is fixed and immutable again, "the form of representation is mistaken for the form of what is to be represented". To keep closer to the varied and diversified nature of medieval textuality, the study of the Old French manuscript tradition pertaining to Chrétien de Troyes' romance Chevalier de la Charrette (Lancelot) has been approached, in a project carried out at Princeton University, through the creation of a database including an encoded diplomatic transcription of all the extant manuscripts. However, "the methodological significance of a computer representation does not lay so much in its mimetic, as in its structural and logical features, which make sources available as data for further processing and analysis". The "organizing power" of a database representation, is able "to augment the resources open to scholars" because it increases their options "in regard to analysis".
A wealth of very similar cases is handed over to us by that kind of "fluid" tradition that is typical of the teaching books produced in the university of medicine and arts of Bologna in the late Middle Ages. In such cases, the practice of teaching and repetitio made the text vary and evolve with its own tradition; the role of the author gradually fades away and is sometimes reduced to a purely eponymous function with respect to a freely developing tradition. Here again we have to cope with a different kind of textual canonicity, "where alternative readings cannot be debased to lower-rank variants".
In all these cases the appropriate editorial policy is almost mandatory"also the so called alternatives are [to be] edited as text'". A database provides the obvious solution to such a problem and from the idea of a database as edition we came upon the idea of an edition as database. But then we have to face a new theoretical challenge. Clearly such an "edition", or more precisely a complete archive organizing all the information conveyed by all the manuscript witnessess of the text, "will be something different to a printed edition also from a theoretical point of view". But can we legitimately say that "the correct memorization of the text of the manuscripts such as it is, combined with the possibility of querying and analysing them through automated systems, can to advantage substitute for the so called critical edition in the traditional sense"? Before we try to give an answer to this question, let us briefly describe the "Gentile database", the prototype we have so far built for our commentary.
The Gentile database - Database design and construction
How was the database designed and what use was made of digital imaging? Let us recall the description we have given elsewhere
"The kleiw database management system was chosen because it can administer images as a data-type, together with other more conventional data-types such as full text and structured alphanumerical data, all in the same processing environment. Within the system, images can be connected to textual descriptions and/or transcriptions organized as structured elements of a database. The transcriptions were arranged accordingly in a kind of hierarchical database, following the internal structure of the text. The commentary is divided in a principium (lacking ms. S) and seventeen sections corresponding to an equal number of lemmata of Porphyry's text in the Boethian translation. Each lemma of the literal commentary comprises a divisio textus and a brief exposition of the sententia auctoris, followed by the discussion of notabilia and dubitationes. We therefore obtained, for the first lectio, the following (simplified) structure (Fig. 1)
(Fig. 1 and 2)
Every portion of the text in each of its four manuscript witnesses was then defined as the value of a structural element of the database, thus enabling us to connect it with the corresponding portion of an image. The main image files are bitmaps of a manuscript page, recto and verso of a manuscript folio, respectively. By means of the image processing facilities of the system, we could obtain from each image relevant cuttings for each portion of the text. The resulting structure for each sequence of textual units within a given manuscript can be represented by two independent tree structures built from these primitive units, very much the same as in an ODA conformant model
(Fig. 3 and 4).
Edition and database representation - Reconstructing and representing the text - Critical apparatus vs archive edition
But now, is a database representation of an entire textual tradition "just an aid to the critical reconstruction of the text, or may it be considered as a step towards a new form of edition"? We have answered that, indeed, it serves both purposes. For "by means of a database management system (DBMS), information can be both processed and represented", and precisely for that reason a computer based edition can be "open-ended" and "dynamic". And in fact the kleiw database management system "is a tool for processing information (in our case, for retrieving evidence, both textual and visual) and inferring analytical results (in our case, for making editorial decisions), as well as a means to represent both the data and the result of their processing (in our case, an entry in the apparatus and a reconstructed text). The enormous advantages afforded by kleiw's image processing facilities to improve readability and asses unclear manuscript evidence are hardly to be underestimated; but it is its power in representing and organizing both evidence and results (in our case the very process of documenting and reconstructing a text) that better suits the purpose of producing an edition".
So why should a textual scholar still want to "stress", indeed undeniably, that a database "is not an edition"? It is a claim, in our opinion, that has to be accepted, if a database is only thought of as a form of "replicating" a manuscript tradition. Indeed, "a database is by no means an edition as long as it is thought of as a sheer duplicate of its source material", and there is a point in rejecting the notion of "a new type of edition", a so called "archive edition", whose task would comprise the "archival survey of all witnesses, and thus of all variants, both in the composition and transmission of the text", a sort of "inventory", conceived "primarily", and "in the sense of modern information theory", as "an information bearer", that would "substitute for the originals under consideration". And, after all, also an image "is only the best logical approximation to a document, and not a substitute for it".
But a database "had better be thought of as a structured logical representation of the sources", and here is our answer to the problem. "An information bearer, whichever it may be, cannot be just a replicate of the original the problem is indeed to put its logical features to a good use. But how, exactly, can that be done, for the sake of producing an "edition"? The most plausible answer appears to be to organize a database as an apparatus. For that seems to be precisely what makes an edition - not just an archive - out of anything". Representing in database form "with commentary" a textual tradition is already translating encoded textual features into structures. And that could possibly be done just for the sake of documenting one or another reconstruction of the text, which is precisely the purpose an apparatus is created for. It is also the problem our "Gentile" database has to faceits claim to be a step towards a sound "critical edition in electronic form" very much depends on its solution to this problem.
Possible solutions - Historical Text Engine - Self-Documenting Image Files
As it as been stressed, an edition connot simply comprise a comprehensive, all-inclusive archive. An editor has to make choices, evaluating and discarding irrelevant factual information. To allow selections and new arrangements of textual material the editor should be able to provide for alternative structural representations of the text. And in order to do that its database management system shoud be endowed with adequate text processing tools powerful enough to handle alternative and possibly overlapping hierarchical structures of the same data.
Another requirement for a database to serve as an edition would be the publication of its structured data. A database can be thought of as an edition not because of its content of mere transcriptions and images, but for the way they have been organized and given a certain structure. It is the apparatus that matters, the result of the editor's choices and analysis. It is stuctured data that have to be made publicly available. Digitazed images and transcriptions are of interest, as an organized database, only if they can be merged with others as structured data. Only as structured data they can be "quoted" or referred to as an edition from one database to another.
To meet these requirements two important developments of the kleiw database management system look very promising. Their architecture has already been designed and their implementation is in progress. The first of these developmens concerns the idea of a text engine, that is based on the notion of historical text. The notion of an "historical text engine" is connected with the idea of a "dynamic edition", a "potentially new technique for the dissemination of manuscript materials", a new form of edition "radically different from the notion of the classical printed edition", an edition that is "open-ended" and "continuous", potentially "never finished", as opposed to a "static" and immutable one. From this point of view, the edited text of a documentary source, the text that constitutes the object of historical research, the "historical text" as it is called by Manfred Thaller, can be defined as "the formally treatable representation of the current assumptions of a researcher about what his documents actually contain". Hence the "dynamic edition of an historical text tends to approximate to that form of representation of the hermeneutic invariant' of a text that better suits the new exegetical practices enabled by the formal processing of textual data".
An historical text engine can then provide a mechanism for the formal treatment of enhanced strings, a new "fully integrated data type", comprising "a mixture of ASCII characters and arbitrary portions of bitmaps". Such a form of representation of textual data can take into account "several layers of traditions" and allow "a given portion of text to have more than an equally valid form". An historical text engine can enables a database management system to process just one "coherent" machine readable representation of a textual tradition, either comprising a single manuscript, or "the logical sum of two or more manuscripts". Thus it can "not only make it possible to handle variants, but to treat all streams of tradition combined into a text' as potentially equal". In general it allows "to define the relationship between a text' as a running representation of a tradited document and a text' as converted into a database according to some abstract model", so as to serve the purpose of reconstructing a text and organizing its critical apparatus.
The second development of the kleiw database management system concerns the idea of a Self-Documenting Image File (SDIF). In a nutshell, SDIFs are to be thought of as an extension of the Tagged Image File Format (TIFF), especially designed to allow the import and export of portions of organized data to and from different database management systems. Besides the techncal description of the "physical characteristics" of a bitmapped image as provided by a TIFF file, SDIFs would contain "all the information necessary to understand the description of the image contained within it"; they would integrate the "historical description of the meaning of an image" with "the technical description of its physical properties". Exchanging SDIFs between different systems would allow scholars to export from an archive those portions of the organized materials that are relevant to their work, recombining them on a local machine into "one consistent database". In our case, images and their transcripts could be exchanged together with ther own editorial apparatus. All in all, we may conclude that "the SDIF proposal lends itself as a valid theoretical solution for the dissemination and the distributed usage - the publication, in good substance - of a database representation of texts handed down to us by all sorts of fluid manuscript traditions".