The DIMEV consists of a collection of XML-encoded documents (for RECORDS, MANUSCRIPT5, PRINTED BOOKS, INSCRIPTIONS, BIBLIOGRAPHY, GLOSSARY, etc.) from which linked data is extracted and rendered into web pages by means of XSLT style sheets. The database is queried through a series of SEARCH pages that activate PHP-coded search scripts or link to indexes compiled in advance through XSLT transformations. The principle of “write once, use many times” applies throughout, providing a consistency of reference format and structure.
The use of XML IDs insures that no duplication of records or witness numbers can occur. We not encoded the IMEV and its Supplement to create an online book (for which the TEI encoding scheme might have been appropriate), but have instead created a digital original using an XML tagging scheme designed specifically for the DIMEV.
XML was chosen because it is a standard, cross-platform, non-proprietary coding language. XML-encoded data is easy to export and translate into other formats; for example, for online presentation it is rendered into HTML using XSL style sheets for reading on the screen; different XSL style-sheets can be used for reading in other formats as digital media evolve without having to reformat the XML-encoded data. Even more important for long term sustainability, style sheets can be used to transform XML documents into other formats of XML, such as RDF-XML (RDF=“Resource Description Framework”).
The data files will eventually be indexed for keyword searching within and across fields using open-source indexing software. This has the particular advantage for Middle English texts of using character-swapping so that for indexing purposes the thorn character can be indexed as “th”. Stemming and phonetic “sounds-like” analysis will also assist searchers with the vagaries of irregular spelling.
As semantic web technology evolves over the next decade we expect to add new capabilities to the DIMEV by ingesting data from libraries and other resources, and by exposing our data on the Web as an RDF repository to make it available for scholarly uses we can only begin to imagine. The semantic-web principle is to make data independent of interfaces used to manipulate and present it (such as the DIMEV’s HTML rendering and related searching and indexing capabilities). By publishing its data in machine-readable RDF format users will be able to cross-search it with information reposing in other RDF repositories around the world that make reference to its uniquely-identified persons, works, and documents.