Tapas Project

From some angles, TAPAS looks very much like a content management system akin to WordPress or Omeka. Users can sign up for accounts, upload and edit content, and create publications. However, there are several important differences. Because the “content” in TAPAS is TEI data, the TAPAS platform needs to be able to handle a set of complex processes to support the conversion of TEI files to HTML (for reading online) and other formats (for visualization). It also needs to be able to perform TEI and XML-specific functions like validation, well-formedness checking, XML indexing and searching, and schema management. Finally, because TAPAS offers long-term data curation services for TEI data, it must include a repository component. Each of these features on its own presents challenges, and taken all together they represent a complex and fascinating design problem. The TAPAS platform does not yet meet all of these needs, but it has laid a firm foundation and continues to make steady progress.

Fedora-Samvera Repository

The foundation of the TAPAS platform is a Fedora-Samvera repository (formerly Fedora-Hydra). It is an implementation (a “head”) of the Samvera repository framework, and is built upon the 4.x version of the Fedora repository application. The Samvera framework is an open-source environment that is used by universities and cultural institutions worldwide to build repository solutions based on a common set of protocols and standards.  As a commonly implemented framework, Samvera supports the networking of repositories and the dynamic integration of content across repositories. Northeastern University, which hosts the TAPAS service, has extensive experience with Samvera and Fedora and has built a robust digital repository application (the DRS) with them; TAPAS draws on this pool of expertise for its development.

TEI files and related materials (including supporting files such as personographies and page images, and auxiliary files such as ODD files) contributed to TAPAS are stored in the repository for long-term storage, and are associated with TAPAS projects and collections using Fedora’s set and collection identification features.  TEI data is also stored in an XML database (see “XML Database”, below) and is indexed there for rapid XML-aware retrieval.

Communications and interaction between the components of the system are handled by the Samvera application. Data in the repository is exposed to the other components via a RESTful API, and this API will also be made publicly accessible in the future to support third-party use of TAPAS data.

XML Database

To support the XML-specific features of the TAPAS service, TAPAS includes an open-source XML database component, currently built on eXist-DB. As part of the ingestion process, the Samvera application sends uploaded TEI files to eXist. eXist indexes the files for XML-aware search, and generates MODS metadata records using data from TEI headers and from user-provided fields. After ingestion into the repository, TEI files are sent by the Samvera application to eXist, where they are indexed to support XML-aware searching.

eXist also handles the transformation (via XSLT) of TEI files into the browser-friendly formats (generally HTML) needed by the reading interface.  Each TEI file is also transformed into one of several kinds of output for reading or visualization. Transformation is currently done statically, upon upload, but in the future it may be a dynamic process. All generated output files are stored in Samvera for publication and archiving.

User Interface

The TAPAS user interface was originally implemented in Drupal. However, over time we found that Drupal made it more difficult for us to provide the kinds of direct interaction with TEI data that we envisioned for TAPAS, and also was a poor match for the repository and file management aspects of TAPAS. The user interface was reimplemented as a Samvera application in Rails.

Reading Interface View Packages

The TAPAS reading interface provides a way to view TEI files in a variety of different ways that exploit different aspects of the markup and serve different reading and viewing needs. Each view is supported by a “view package” which includes XSLT and CSS stylesheets and JavaScript, plus a manifest documenting the contents and purpose of the view package. The XSLT transformation component determines what information from the TEI source will be made available to the interface, and in what format (HTML, SVG, JSON, etc.). The CSS component provides styling and in some cases also controls the visibility of specific elements depending on user options. The JavaScript provides user interaction and control over the display of user-controlled features (such as the choice to show or hide footnotes, or the choice of colors associated with thematic tagging). View packages are stored as open source software on GitHub. The TAPAS server periodically checks GitHub, and updates its local cache of view packages when an update is pushed to GitHub.


For more detail on the technical components please visit  our Github space for information and code:

Hydra Repository and web app https://github.com/NEU-DSG/tapas_rails

Drupal Front-end (TAPAS 1.0) Modules https://github.com/NEU-DSG/tapas-modules

Drupal Themes https://github.com/NEU-DSG/tapas-themes

eXist application https://github.com/NEU-DSG/tapas-xq

Sign In Search