Digitisation
We are a world leader in exploring innovative and novel ways of digitising our 3 million herbarium specimens
-
Our specimen data and images can be downloaded for free from the online catalogue, including full resolution TIFFs of specimens. All specimen images and data have a CC-0 licence.
We have now completed the digitisation of all type specimens, plus the following geographic regions and families:
- Temperate South America (region 18)
- Australia (region 7)
- Zingiberaceae
- Sapotaceae
- Gesneriaceae
- Begonia
Digitisation on Demand
We have an active digitisation programme working to make all of our material available through our online catalogue. As part of this we accept reasonable requests to digitise parts of the collection which are currently unavailable online. Please use the contact form below if you would like to make an image request.
The digitisation of the herbarium is also linked to requests for loans and destructive sampling with specimens being imaged before destructive sampling occurs or specimens are sent out on loan.
-
Useful links:
Other ways to access the collections:
Research and Development
We have developed an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. Where possible we have incorporate automated systems to enable us to expand and speed up the digitisation process. There are three main elements: a specimen workflow, a data workflow and an image workflow.
- The specimen workflow involves the selection and preparation of specimens and folders and is closely linked to workflows for [loans], incoming specimens, [destructive sampling] and curation.
- The image workflow incorporates image capture, processing, image management data recording, optical character recognition (OCR), quality control, image streaming online and archiving. Our automated image processing system allows images from several different imaging methods to be handled via a single system, based around a dropbox folder structure. The folders are ordered in a structured hierarchy and this provides basic image management data including the equipment and operator’s name. This information is written to the image management database creating the metadata for each file automatically. The system also creates the necessary derivatives for serving the images to the web, links the images to the database record and sends a copy of the file to our OCR workflow.
The data workflow includes all elements of capturing and managing data associated with specimens. This databasing process is primarily focussed on the curatorial data – the geographic filing area and filing name, which is shared by all specimens within a folder. A form in the data management software allows this information to be entered once and multiple records created by scanning specimen barcodes. These minimally databased records can be enhanced using various methods including the use of OCR and citizen science projects.
Optical Character Recognition (OCR) has been part of our image workflow since 2012, and all specimen images are run through this process. We continue to find seek inventive ways to use our OCR output to enhance our digitisation workflow.
Preparation of specimens for manual and semi-automatic data entry
Currently we use OCR to sort specimens prior to databasing (i.e. by collector and country) or to enhance records that have been minimally databased.
Enhancement of Quality Control procedures
More recently we have explored the potential of OCR in our Quality Control processes. OCR records open up the possibility to check that the barcode used in each image filename matches the one read by the OCR software from the specimen image thus helping us to correct camera operator errors.
We have been exploring several different Citizen Science platforms to help with the transcription of our specimens. These include Herbaria@Home, DigiVol, and Notes from Nature. We are also in the process of developing internal Citizen Science projects, one to help with sorting of specimens and one for the transcription of label data.
Developing catalogue to include links to the living collection and images of the plant in the field.
Publications
Haston, E.M.; Cubey, R.W.N.; Pullan, M.; Atkins, H. & Harris, D.J. (2012). Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach. Zookeys, 209: 93-102. DOI: 10.3897/zookeys.209.3121
Haston, E.M. & Cubey, R.W.N. & Harris, D.J. (2012). Data concepts and their relevance for data capture in large scale digitisation of biological collections. International Journal of Humanities and Arts Computing, 6(1-2): 111-119. DOI: 10.3366/ijhac.2012.004
Drinkwater, R. E.; Cubey, R.W.N. & Haston, E.M. (2014). The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. Phytokeys 38: 15-30. DOI: 38.7168
-
Contact the Herbarium
If you have any queries relating to the Herbarium, please get in touch using the form
Get in touch