The following is the text of a talk given at a workshop organised by Professor Dr. Claudine Moulin and colleagues at the University of Trier Centre for Digital Humanities (the workshop was called Möglichkeiten der automatischen Manuskriptanalyse) on Feb 24th 2014. At the time, I was University of Wales Chair in Digital collections at The National Library of Wales (NLW), running a collaborative research programme on the digital collections of Wales. I was interested in questions of how cultural heritage organisations which hold cohesive national
digital collections, and collaborative frameworks to support interdisciplinary
and collaborative research, can create new opportunities to
build a digital workspace that can effect the sort of transformative research
that has been the promise of digital humanities for many years. In the talk, I looked at the history of 'copying' at NLW, and looked at how the digital dissemination of knowledge is part of a long tradition of copying and distributing manuscripts in Libraries. In the talk, I looked at the ways that some of NLW's most iconic material, including the Hengwrt Chaucer manuscripts, have been copied and disseminated over the past hundred years. As new methods of manuscript digitisation are developed (including RTI and hyperspectral imaging) are developed, I make the argument that we should be moving away from 'mass digitisation' towards 'slow digitisation': that we can learn more from looking at all the copied versions of a manuscript than a form of 'distant reading' of whole collections. Libraries should spend effort on bringing together all existing copies of a manuscript (rotograph, photostat, photograph) and bring them together with images gathered using new technologies, making the complete 'biography' of a manuscript available to scholars. I've also written about some of the preparatory work for this an article taken from a keynote I gave at the Sheffield HRI Digital Humanities Conference in 2012. As this talk was written to be read, it's a little rough in places but I do hope to continue this research now I am at Glasgow University and write this up as a fuller article in the near future. Some of these themes were also taken up at a workshop I organised with Andrew Prescott at NLW in 2015, and we hope to run a follow on event next year.
As ever, I am enormously grateful to my former colleagues at NLW for their help with developing this research.
As ever, I am enormously grateful to my former colleagues at NLW for their help with developing this research.
Digitization at the National Library
of Wales
The National Library of Wales is a legal deposit library, established by
Royal charter in 1907. It is the preeminent repository of information
for Wales, offering a world class collection of documentary heritage, including
numerous rare, valuable and significant works. The foundation collections of NLW are its manuscript collections: the Peniarth, Llanstephan and Cwrtmawr collections. The Peniarth
Collection is listed in UNESCO’s Memory of the World register.
The Library has always had an appreciation of the importance of
state-of-the-art technologies for access to its collections for education and
research, and infrastructures in Wales support connected communities linked to
library and archive resources: since devolution, the Welsh Government has made cooperation,
collaboration and digital delivery key areas of focus.
As a result, NLW created A National Digital Public Library of Wales: a distinct, unified national collection
that is freely available to users as the “research data” for all
disciplines. Most of our content is licensed through creative commons licenses
for free use and re-use. The Library has built internal expertise and capacity in the entire digital lifecycle: selection, conservation, capture, management and preservation. Copyright and other intellectual property rights are cleared as a managed part of digitisation process: where material is on deposit and/or the current rights holders are known, permission is requested, and where declined, materials are not used; when a current rights holder is unknown, reasonable efforts are made to identify and/or contact them. Digitised resources are licensed for re-use and re-purposing under an open license (ideally, BY-NC-SA: Creative Commons Attribution-Non Commercial-Sharealike license). A fundamental principle is that free access is key to realising the potential community, social, research and economic benefits of digitised resources.
There are several strategic objectives for digitizing NLW collections. The most paramount is access: The library has approx 85,000 physical user per year, but over 2 million online users. The remote location of the Library a primary driver for making resources
accessible digitally – not everyone can get to Aberystwyth to work with the
original materials! Online access ensures the material reaches researchers, students, and the public (especially the Welsh diaspora) worldwide. Digitization also offers enhanced access to primary sources, building in functionalities including the ability to search, browse, collate and annotate sources. Digitisation also supports preservation – while not a preservation medium, digital access protects rare and fragile materials from handling, and also identifies
conservation needs, as digitization selection is an opportunity for carrying out an inventory of material that is not in circulation, or uncatalogued. Another key reason for digitisation is collections
enhancement and reunification – digitization is an opportunity to bring
materials together – for example, the AHRC funded research project Imaging the Bible in Wales brought together a collection of manuscripts from
archives and special collections all around Wales. And finally, there is the potential for digital collections to effect a transformation of
scholarship: The traditional library is now a digital research
infrastructure, with reading rooms replaced by Internet browsers, and primary
sources accessible for new types of analysis using computer tools and methods.
NLW Research Programme in Digital
Collections
In 2011, in recognition of these developments in digital scholarship, the Library set
up a research programme in Digital Collections with the establishment of my
post, a research Chair funded by the University of Wales. We’ve developed a
fairly large portfolio of projects with two main areas of focus: Better and increased use of our existing
digital content for research, and creating new digital resources that address
specific research challenges across the disciplines. We are
building the programme on the principle of digital humanities: using digital
humanities methods and tools to foster scholarship across the disciplines, and
to act as a bridge between content and curators, building essential
collaborative relationships that integrate research into all aspects of our
collections.
It’s also built
on the assumption that digital humanities is about working
with digital content, using and methods for the analysis and interpretation of
this content, and communicating the results of this work to the widest possible
audience using traditional and non-traditional publishing methods, allowing
greater engagement with research and research data than was previously possible: this binds scholarship
to research infrastructures in ways that are deeper and more explicit than we
are generally accustomed to in scholarship, and makes it dependent on networks
of people (Kirschenbaum, 2011)
This
transforms humanities research in two
ways:
- Firstly, by facilitating
and enhancing existing research, by making research processes easier via the
use of computational tools and methods,
- And secondly, by enabling
research that would be impossible to undertake without digital resources and
methods, and asking new research questions that are driven by insights only
achievable through the use of new tools and methods.
Greg Crane, Humbolt Professor of the University of Leipzig
has referred to this work as e-Wissenschaft reflecting that the best
examples of digital humanities are a new intellectual practice with elements
that distinguish qualitatively the practices of intellectual life in this
emergent digital environment from print-based practices (Crane, 2009).
One of the
key elements of diversion from traditional scholarly practice is that the
digital humanities is collaborative: as the field matures, it is becoming
recognized as one in which the best research is created through partnerships
between different aspects of research, and indeed, between researchers from
multiple disciplines and stakeholder communities – researchers across the arts
and humanities and scientific disciplines, librarians, archivists, cultural
heritage staff, funders, technical experts, data scientists…In many ways, the library is the ideal locus for digital humanities, as a place where all this comes together around the original source materials.
In order to
think through some of these questions, and to see how the research library is
becoming a digital research infrastructure, it’s useful to look at some of the
National Library’s work with manuscripts. The next few images show examples of some of the things that happen to a manuscript in a Library,
some of the conversations around its use, and the way that information about a
Library’s manuscripts are collected and used.
Copying is,
of course, part of this documentation and information gathering, and going back
to the early histories of reproduction of manuscripts allows us to see
digitization as a continuum of the use of current technologies to construct knowledge about specific
manuscripts.
In 1919, the
Library acquired a Photostat machine, and began advertising the possibility of
making reproduction copies available of its manuscripts, advertising in the Journal Welsh Outlook, from January 1920: "The National Library of Wales: By means of the Photostat recently installed the National Library can supply at very reasonable rates facsimile reproductions from manuscripts, books, maps, prints, drawings, etc., for the use of students and others. Enquiries should be addressed to the Librarian, National Library of Wales, Aberystwyth".
One of the main reasons for the acquisition
was the ability to make copies of manuscripts and to send them to schools in
Wales for education purposes. The image below is a negative photostat print of NLW Peniarth 610 MS 191:
Photostats
were also popular with scholars who used them for research, and to illustrate
journal articles. Foreshadowing creative commons licenses – the library didn’t
restrict re-use of these images, seeing getting its content out there as part
of its mission. .
The
Library’s archive of correspondence shows who requested these images – and what
they did with them. By 1926, a Professor John M. Manly in Chicago was writing to the
Librarian for a Photostat of Peniarth 1926, the Hengwrt Chaucer….
…Which the
Library promptly sent, but for some reason sent the positive copy to Chicago, keeping
the negative:
Manley and
the Librarian kept up a lively correspondence over the years, which forms an interesting
record of ‘technology transfer” – when Manley observed the use of fluorescence
technology in use at the Huntington in 1930, he wrote to NLW to suggest this
was a new technology that could be used. NLW were equally enthusiastic about the new technology, and had invested in fluorescence cabinet. Ballinger reported to Manly to report that staff and readers reported good results with the cabinets, with some readers spending 'a whole day' reading 'difficult' manuscripts under the lights:
Manly and Edith Rickert’s research on Hengwrt found its way back to the Library – in 1939, their
research into the Hengwrt Chaucer to was published in the NLW Journal, illustrated
with a new photograph of the manuscript (a record of the order for the
photograph can be found in the Library's archives).
In the
archive of correspondence, we see the original draft of the article, and the
editorial comments by Manly and Rickert. Interestingly, the Chaucer “workshop” was
originally called the “Chaucer Laboratory” – I like to think that this use was
over-ruled as an unnecessarily scientific nomenclature, but it's an interesting way of conceptualising the ways that the research was dependent on early imaging technologies.
Other documents in the NLW archives of correspondence between Ballinger and his successor as Librarian, William Llewelyn Davies and Manly show that the connection between the University of Chicago and Aberystwyth was a mutually beneficial collaboration over many years. Manly and Rickert came to Wales to authenticate the 'Merthyr fragment’ in 1936, and also to advise on the re-binding of Hengwrt, showing how fluid
the relationship between library and scholar was around these manuscripts.
The next
major technology to be adopted by the Library was, of course microfilming, and
the existence of the photographic section made it possible for the Library to
adopt this technology. From 1941-45, the
Library was the home of many of the treasures of the British Library and the
British Museum, which were moved to Aberystwyth for safekeeping, and stored in
the Library “Cave” for the duration of the war. During this time, many of these
materials were microfilmed – partly as an insurance against the possible loss
of the originals, but again, for access, preservation and as an attempt to
better represent the information in collections to scholars. These microfilms
found their way to the Library of Congress British Manuscripts Project, A Checklist of the Microfilms Prepared in England and Wales for the AmericanCouncil of Learned Societies, 1941-1945, promoting Welsh manuscripts to an even
wider audience.
The 'Cave' at NLW today |
The 'Cave at NLW during the Second World War |
The use of digitization technologies to increase and enhance access to the collections of Wales can be seen as a continuum of the enthusiasm and innovation attached to the adoption of new technologies – be they Photostat or microfilm – throughout the history of the National Library of Wales, and as a pragmatic response to particular issues associated with the Library’s mission, collections, history, and location. The National Library has been slowly digitizing the manuscript collection and putting it online, and exploring the use of emerging imaging technologies for analysis of manuscripts.
In 2013, a
team from Mellon-funded ‘Digitally Enabled
Scholarship with Medieval Manuscripts’ project at Yale came to NLW to carry out
photospectral imaging of our three Chaucer manuscripts. This method of capture enables
imaging across the colour spectrum, to highlight different aspects of an image.
A multispectral image is one that captures image data at specific
frequencies across the electromagnetic spectrum. The wavelengths may be
separated by filters or by the use of instruments that are
sensitive to particular wavelengths, including light from frequencies beyond the
visible light range,
such as infrared. Spectral imaging can allow extraction of
additional information the human eye fails to capture with its receptors for red, green and blue. It was originally developed for
space-based imaging.
Multispectral imaging divides the
spectrum into bands – in our case, seven. Each one acquires one digital image
(in remote sensing, called a 'scene') in a small band of visible spectra,
ranging from 0.7 µm
to 0.4 µm, called red-green-blue (RGB) region, and going to infrared
wavelengths of 0.7 µm to 10 or more µm, classified as near infrared (NIR),
middle infrared (MIR) and far infrared (FIR or thermal). The scenes are
combined to comprise a seven-band multispectral image.
This technology has also assisted
in the interpretation of ancient papyri, such as those found at Herculaneum, by imaging the fragments in the
infrared range (1000 nm). Often, the text on the documents appears to be
as black ink on black paper to the naked eye. At 1000 nm, the difference
in light reflectivity makes the text clearly readable. It has also been used to
image the Archimedes palimpsest by imaging the parchment leaves
in bandwidths from 365-870 nm, and then using advanced digital image
processing techniques to reveal the undertext of Archimedes work.
Here you can
see the outline of different captures at each level of the spectrum, and the composite
image, incorporating all seven captures:
For the Yale
project, ultimately, these images will be presented alongside related
manuscripts from other Libraries, like the Huntington Ellesmere Chaucer, using
a ‘shared canvas’ for annotation.
The
following images show how important the early captures are in the history of a
manuscript. This is a mss from the Llanstephan collection. The 1941 microfilm
shows a fairly legible text – but a recent 2013 digital capture shows text
loss, probably due to a conservation incident in the 1950s.
NLW imaging experts carried out
ultraviolet imaging to see if some of the text elsewhere in the manuscript could
be made legible again, but with very limited success. This is a case where the 1940's microfilm image is the only record of some of the intellectual content in a manuscripts: showing the importance of keeping all copies of images of manuscripts taken over the years. Information can be captured by some processes (especially rotograph or even photostats) that are not to be seen in more recent images.
This shows how all types of image capture (not just digital imaging) become part
of the biography of a manuscript. Photostats, microfilms and digital captures
all tell us new things about a manuscript and what has happened to it over the years. We can’t anticipate what will happen to
manuscripts in the future, and we also can’t predict if new technology will
give us new ways to read and understand manuscripts that we may have captured
digitally: one of the great benefits of digital content is its use for rare and
unforeseen purposes, which again is an argument for retaining all historic image captures and making them available to scholars for analysis .
Many
different methods for digital capture and presentation of manuscripts are now
becoming part of a manuscript Library’s documentation and dissemination of
manuscripts. The ease of access to a large body of manuscripts from a collection enables the manuscript scholar to work in different ways – to take a more 'archival approach', working through large quantities of manuscripts, captured in many ways, and ideally also using the related documentary materials, such as the correspondence between Manly and the Librarians of Wales documented above.
However, if a Library is to be more than just a digital photocopy service, publishing pretty pictures on its website, there must be direct engagement with scholars who can use these images for analysis and interpretation. Libraries also need scholars to be involved in the dialogue of encouraging the adoption of new technologies – it is the researchers that can advise on candidate manuscripts for this sort of imaging and presentation, and provide annotations and shared texts. This way, all the data that we gather digitally about manuscripts can become integrated into the scholars’ toolkit, part of the digital ecosystem that supports research.
However, if a Library is to be more than just a digital photocopy service, publishing pretty pictures on its website, there must be direct engagement with scholars who can use these images for analysis and interpretation. Libraries also need scholars to be involved in the dialogue of encouraging the adoption of new technologies – it is the researchers that can advise on candidate manuscripts for this sort of imaging and presentation, and provide annotations and shared texts. This way, all the data that we gather digitally about manuscripts can become integrated into the scholars’ toolkit, part of the digital ecosystem that supports research.
Once manuscripts
are available as digital images – especially those that capture different
aspects of the image through photospectral and other technologies – a range of
methods is available to support scholars who wish to ask new questions, or
explore old questions in new ways. For example,
the systematic analysis of similarities and distinctions in hands can be
measured and calculated, enabling analysis on the number of scribes, processed
used in creating manuscripts, etc. It’s also possible to analyse fragments for
the purpose of reconstruction, as well as
using hyperspectral and UV images to recover text. All this requires access to the full collection of images of a manuscript, including those developed in the early era of photographic reproduction. This calls for a more integrated approach to digital dissemination by libraries, and a focus on 'slow digitization' as opposed to 'mass digitisation': deploying the time and cooperation of manuscript libraries to work slowly and acquire all available photographic documentation of a manuscript, allowing for the establishment of 'layers' of data that can be added to add information about an archive. This sort of work requires digitisation in depth rather than mass: it will take more time, but ultimately provide a richer archive for scholars.
One of the
great advantages of basing a DH programme in a Library is the ability to seamlessly build these bridges between expertise required to explore the potential of digital imaging; scholars who are
experts on the manuscripts; and expertise in digital methods that are used
across the disciplines as they become familiar from other projects, in a
sustainable way over the long term. This sort of collaboration will contribute to the resources available to manuscript scholars in the
digital age, which ideally should not just be about a use of static tools and methods, but about
fostering a more fluid environment of interdisciplinary co-production. The digital
research infrastructures infrastructure to support this sounds a bit like the
‘manuscript laboratory’ envisaged back in 1939. We can see this envisaging as prescient or optimistic but regrettably, more likely (given the lack of resources and institutional will for such partnerships) wishful thinking.