TCDL Home   |   Contents

 

TCDL Bulletin

Mechanisms for Custom Interfaces

If a digital library is to be more than a data repository, we claim that it should endeavor to provide a custom user interface to its collection. This article surveys various mechanisms for such customization.

Thomas A. Phelps and Robert Wilensky
Computer Science Division
University of California, Berkeley

blue dots separator

spacer

Motivation

If, at the lowest level, all digital libraries represent their collections as ones and zeros, beyond that they are extremely diverse. Digital libraries may hold mathematical papers in TeX format of Wiles' proof of Fermat's Last Theorem, 3-D models of an architectural reconstruction of the Coliseum in Rome, a database of genetic information from the Human Genome Project, videos of a doctor performing an experimental operating technique, photographs of the Civil War taken by Mathew Brady, a recording of a Stockhausen string quartet played in helicoptors in flight, and so on, ad infinitum.

The content of digital libraries can be grouped broadly by media type as text, image, video, data, sound, and others. Within those types are many concrete formats such as, for text, HTML, XML, Microsoft Word, PDF, PostScript, TeX/LaTeX, Lout, Word Perfect, PowerPoint, and troff, to name but a handful of the most popular. A digital library's collection or corpus typically spans many media types and formats, and moreover — as hinted at in the examples above — the intellectual content itself usually exhibits nontechnical idiosyncratic characteristics.

If a digital library is to be more than a data repository, we claim that it should endeavor to provide a custom user interface to its collection with the following qualities:

  1. Beyond the inherent limitations of a possible physical-to-digital conversion, the examination and manipulation of individual elements of the collection should receive full support of special qualities at no disadvantage to actual possession of the physical object or creating software application.
  1. The interface should exploit advanced features made possible by the digital representation, including remote access, full-content search, and discussions and annotation.

As practical considerations,

  1. Where the content of libraries are similar, digital libraries and, more pointedly, their users should be able to share and take advantage of the best interface techniques regardless of the source. For instance, a user should be able to take the 3-D model viewer from one site, annotation tools from another, and apply them to data from a third library.
  1. Since new ideas are continually being invented, there should be some mechanism to augment or supercede tools yet maintain the technological framework of the existing interface. For instance, given a new "sound visualization" of the 3-D model above, we would like to integrate it with the viewer, annotation tools, and data, while not requiring additional work from those three existing sites.

Survey of Mechanisms

The most popular implementation of a digital library today is as a World Wide Web site server speaking the HTTP protocol over the Internet to web browsers. This has proven extraordinarily successful. Yet we believe that there is room for improvement. Let us survey the variety of current practice, and then in the next section examine one system that delivers all four of the desirable qualities presented above.

Custom Application

Before the Web, different media and collections were often supported by custom applications. Custom applications were often tailored to exceptionally fine degree, supporting the most idiosyncratic aspects of the data.

However, custom applications consume a large amount of work relative to their applicability. Even when the general functionality is common across domains, the implementation must be duplicated. For instance, hyperlinks have been retrofitted into DVI viewers, Microsoft Word, PowerPoint, UNIX manual page viewers, help systems, and seemingly every other text viewer — time after time duplicating work.

HTML in a Web Browser

With a web browser, a digital library can take advantage of a document engine that is universally available, networked, cross platform, relatively high quality (HTML vs. ASCII), multimedia and, to a limited degree, two-way with communication via hyperlinks and forms. With the JavaScript and the definitions of the Document Object Model (DOM), one has programming language control of the entire document. Moreover, a server can easily generate HTML and communicate with a potentially very large database.

However, confronted with the diversity of digital library content, it is apparent that HTML, per se, is appropriate only for a certain fraction of primarily text- and graphics-based documents. Moreover, even within that subset, HTML is not appropriate for those that require fine control of appearance or, even with conversions, for many documents in different concrete formats. JavaScript controlling a DOM does not add fundamental capability to HTML; it merely enables interactive speeds to manipulations that would otherwise require page regeneration by a server.

Java applet, Flash, other plug-ins

Of course, within a browser one is not been limited to HTML. Java applets, Macromedia Flash, and other plug-ins as for PDF or VRML have full control over a portion of the browser window and can be programmed with general purpose programming languages. After perhaps a one-time installation of the plug-in, content written for one of these systems is easily distributed.

However, this class of system shares similar limitations. Effectively, they are restricted to their rectangle on the screen, aside from interaction with other such rectangles and limited interaction with the browser itself. They do not interact well with one another, especially if not prearranged. For example, the Adobe Acrobat plug-in has desirable tools for annotation, but the tools operate only on PDF, not on HTML or applets or other plug-ins. Likewise, unanticipated composition is virtually non-existent; for while it is straightforward to write HTML to juxtapose several applets and plug-ins on the same page, these will not communicate with one another in progressive ways.

Mozilla, Amaya, and Open Source

As compared to proprietary web browsers, the Mozilla browser and W3C's experimental Amaya browser seem to offer inventors an inviting playground in which to execute their ideas. As Open Source projects, the browsers' source code is available for arbitrary changes. Furthermore, Open Source viewers can be found for many but probably not all other document formats, which may be good bases for extending the browsers. Several researchers have taken this path, for example extending the early NCSA Mosaic to experiment with annotations, advanced stylesheets, and more.

But the situation is not ideal. In the first place, retrofitting a feature into the browser for one document format still neglects all the other document formats one works with regularly; annotation tools for web pages do nothing for email, PDF, net news, and so on. Moveover, in the case of Mozilla, the system is enormous: a recent version (it is ever larger now) comprises 3817 C++ files and over 1.6 million lines of code. While it is modularized, there is nevertheless a significant amount to master before one can think about extending it.

When it comes to the essential point of distribution, Open Source systems are not necessarily an improvement over closed source. Open Source proponents claim that one can simply redistribute the source with changes, and indeed this can be done. In practice, however, any useful system is continually being improved and one would want to track new versions, but revising one's changes to match new mainline source changes is tedious at best. One can contribute the changes back to the main project developers, but they may not be accepted — especially if they are of limited applicability or if they are large and thus hard for the main developers to maintain.

The Multivalent Browser

The Multivalent Browser, developed in UC Berkeley's Digital Library Project, is our candidate for the system that provides most of the strengths of the mechanisms described above but that suffers the fewest limitations.

The Multivalent Browser is a free, Open Source, pure Java system that displays and annotates many document formats. It currently supports PDF, HTML, scanned paper, TeX DVI, UNIX manual pages, the Plucker e-book format, and more. All these formats can be annotated in situ by the user, and annotations are robustly anchored to survive edits to the base document. It has many advanced features, such as lenses, which range from the common magnification lens to one that operates on scanned paper images to show the OCR translation.

In the screen dump of the Multivalent Browser below, we see an HTML document in a window with many of the familiar web browser controls, including menubar, forward/backward buttons, and URI type-in line. If this were a PDF or TeX DVI format document, it would have controls to move from page to page. If it were a UNIX manual page, the content would be a collapsible outline.

Screen shot from the multivalent browser

Screen shot from the Multivalent Browswer.

The HTML page is variously annotated. At the bottom is a yellow highlight, constructed by selecting a range of text and choosing a menu option. Just above it is a "short message" annotation, which has reformatted the text to open up space between lines. Floating on top of the page is a note annotation with styled text, which has itself been annotated. The red line in the middle left from "Oklahoma" to "Microsoft" is a "move text" annotation; clicking on the source by someone with write permission executes the annotation, deleting the text from the source and inserting it at the destination. The same annotation code operates across document formats; if we were viewing the page image of scanned paper, the "short message" annotation would reformat the image.

Finally, two lenses are in operation. The magnification lens, at its default zoom factor of 200%, is enlarging text, image, and even the floating note annotation. The "Rot-13" lens implements a simple cypher, sometimes used to obscure jokes, that replaces each letter by the one halfway through the alphabet. Where the lenses overlap, their effects combine, in this case giving us magnified Rot-13 text.

Multivalent Architecture

The above features happen to be implemented and packaged together, but they could just have well been third party extensions. The architecture is extremely open to extension — much more so than a web browser — whether with new document formats or arbitrary features.

A brief look at the Multivalent architecture helps assess its applicability for Digitial Library interfaces.

  • Hubs. Conceptually, every document gets a custom browser. The "code components" active for a given document are given in a hub, which is an XML document that lists the components and whose attributes and content subtrees are passed to components when they are instantiated for the document. The annotations seen above are stored in a hub in writable storage and keyed to the URL. Because it is convenient and more efficient, all documents of a given format also share a common hub particular to the format, and all documents share a hub of universal system functionality.
  • Behaviors. The "code components" listed in hubs, behaviors are Java classes that adhere to the certain protocols described below. Behaviors range from "media adaptors" for PDF and HTML, to functionality shared across document formats such as searching and annotation, to features that are hardcoded in other systems such as the menubar and other GUI widget and hyperlinks. While some behaviors happen to be packaged with the basic system, all could have been written as third party extensions and integrated simply by downloading a Java JAR file of classes and organizing hubs into the same directory as the main browser.
  • Document Tree. The reason behaviors can operate across document formats is that they do not separately target some large number of concrete formats, but rather a single well-defined high-level model of a document. Document structure is captured by internal nodes, and format-specific content at the leaves. The document tree is similar to HTML's DOM, but whereas that applies only to HTML and XML, in the Multivalent model behaviors can introduce new node types and whatever else is necessary — try faithfully representing PDF and implementing lenses in JavaScript and DOM. By targeting a document tree with guaranteed properties across document formats, a single implementation of a behavior, such as annotations, works on all document formats, including all those currently supported by the system and ones yet to be invented.
  • Protocols. The reason behaviors can be combined arbitrarily in hubs and not conflict with each other is that communication among them is highly stylized in protocols. Every basic operation of a digital document system — restore from disk to active memory, build document tree from concrete format, format document tree, paint tree, interact via low-level events (mouse and keyboard) and high-level semantic events ("open document") — all are reified into protocols. Any behavior can hook into any protocol and modify its results. For instance, in painting a document on the screen, the basic document is painted, then the behavior controlling the move text annotation draws a line on top. These sweeping protocol passes have proven remarkably effective at preventing conflicts between behavior, but other more specific communication mechanisms are possible; for example, in the case of lens composition, a shared "manager" behavior computes lens overlaps and sequences composition effects.

Conclusion

With arbitrary functionality applied to arbitrary documents, the Multivalent Browser supports interfaces to idiosyncratic content to the same degree as custom applications. Although its protocols appear quite basic, they have, without tweaking, proven capable of supporting such advanced features as lenses and annotation. As a client-centric system, the interface to content is not confined to a specific set features of the server where the content happens to reside; through edits to the applicable hubs, behaviors can be combined from arbitrary sources and applied as appropriate to arbitrary content. For these reasons, we call the Multivalent Browser "a platform for new ideas".

Availability

The Multivalent Browser is available from <http://www.cs.berkeley.edu/~phelps/Multivalent/>. It is available as executable and source code. It is covered by the BSD license and therefore free for all uses.

Acknowledgement

This research was supported by the Digital Libraries Initiative, under grant NSF CA98-17353.

References

[Adobe] Adobe Acrobat plugin, http://www.adobe.com/products/acrobat/readstep.html

[Amaya] World Wide Web Consortium, Amaya, http://www.w3.org/Amaya/

[Document Object Model] World Wide Web Consortium, Document Object Model, http://www.w3.org/DOM/

[Mozilla] Mozilla home page, http://www.mozilla.org/.

[NCSA Mosaic] National Center for Supercomputing Applications (NCSA), Mosaic, http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/help-about.html.

[Phelps and Wilensky] Thomas A. Phelps and Robert Wilensky, "The Multivalent Browser: A Platform for New Ideas", Proceedings of Document Engineering 2001, November 2001, Atlanta, Georgia.

[Plucker] Plucker e-book format, http://www.plkr.org/.

Copyright © Thomas A. Phelps and Robert Wilensky
spacer
spacer
spacer