|
Motivation
If, at the lowest level,
all digital libraries represent their collections as ones and zeros,
beyond that they are extremely diverse.
Digital libraries may hold mathematical papers in TeX format of Wiles' proof of Fermat's Last Theorem,
3-D models of an architectural reconstruction of the Coliseum in Rome,
a database of genetic information from the Human Genome Project,
videos of a doctor performing an experimental operating technique,
photographs of the Civil War taken by Mathew Brady,
a recording of a Stockhausen string quartet played in helicoptors in flight,
and so on, ad infinitum.
The content of digital libraries can be grouped broadly by media type as
text, image, video, data, sound, and others. Within those
types are many concrete formats such as, for text, HTML, XML,
Microsoft Word, PDF, PostScript, TeX/LaTeX, Lout, Word Perfect,
PowerPoint, and troff, to name but a handful of the most popular.
A digital library's collection or corpus typically spans many media types
and formats, and moreover as hinted at in the examples above the intellectual content itself
usually exhibits nontechnical idiosyncratic characteristics.
If a digital library is to be more than a data repository,
we claim that
it should endeavor to provide a custom user interface
to its collection with the following qualities:
- Beyond the inherent limitations
of a possible physical-to-digital conversion,
the examination and manipulation
of individual elements of the collection
should receive full support of special qualities
at no disadvantage to actual possession of the physical object or
creating software application.
- The interface should exploit advanced features
made possible by the digital representation,
including remote access, full-content search,
and discussions and annotation.
As practical considerations,
- Where the content of libraries are similar,
digital libraries and, more pointedly, their users
should be able to share and take advantage of the
best interface techniques regardless of the source.
For instance, a user should be able to take the
3-D model viewer from one site,
annotation tools from another,
and apply them to data from a third library.
- Since new ideas are continually being invented,
there should be some mechanism to augment or supercede
tools yet maintain the technological framework
of the existing interface.
For instance, given a new "sound visualization" of
the 3-D model above, we would like to integrate it with the viewer,
annotation tools, and data,
while not requiring additional work from those three existing sites.
Survey of Mechanisms
The most popular implementation of a digital library today
is as a World Wide Web site server speaking the HTTP protocol
over the Internet to web browsers.
This has proven extraordinarily successful.
Yet we believe that there is room for improvement.
Let us survey the variety of current practice,
and then in the next section examine one system
that delivers all four of the
desirable qualities presented above.
Custom Application
Before the Web, different media and collections were often
supported by custom applications.
Custom applications were often tailored to exceptionally fine
degree, supporting the most idiosyncratic aspects of the data.
However, custom applications consume a large amount of work
relative to their applicability.
Even when the general functionality is common across domains,
the implementation must be duplicated.
For instance, hyperlinks have been retrofitted into DVI viewers,
Microsoft Word, PowerPoint, UNIX manual page viewers, help systems,
and seemingly every other text viewer
— time after time duplicating work.
HTML in a Web Browser
With a web browser, a digital library can take advantage of
a document engine that is universally available, networked,
cross platform, relatively high quality (HTML vs. ASCII), multimedia
and, to a limited degree, two-way with communication via hyperlinks and forms. With the JavaScript and
the definitions of the
Document Object Model (DOM),
one has programming language control of the entire document.
Moreover, a server can easily generate HTML
and communicate with a potentially very large database.
However, confronted with the diversity of digital library content,
it is apparent that HTML, per se, is appropriate only for a certain fraction
of primarily text- and graphics-based documents.
Moreover, even within that subset,
HTML is not appropriate for those that require fine control of
appearance or, even with conversions,
for many documents in different concrete formats.
JavaScript controlling a DOM does not add fundamental capability to HTML;
it merely enables interactive speeds to manipulations that would otherwise
require page regeneration by a server.
Java applet, Flash, other plug-ins
Of course, within a browser one is not been limited to HTML. Java
applets, Macromedia Flash, and other plug-ins as for PDF or VRML
have full control over a
portion of the browser window and can be programmed with general
purpose programming languages.
After perhaps a one-time installation of the plug-in,
content written for one of these systems is easily distributed.
However, this class of system shares similar limitations.
Effectively, they are restricted to their rectangle on the screen,
aside from interaction with other such rectangles and limited
interaction with the browser itself.
They do not interact well with one another,
especially if not prearranged.
For example, the Adobe
Acrobat plug-in
has desirable tools for annotation, but the tools operate only
on PDF, not on HTML or applets or other plug-ins.
Likewise, unanticipated composition is virtually non-existent;
for while it is straightforward to write HTML to
juxtapose several applets and plug-ins on the same page,
these will not communicate with one another in progressive ways.
Mozilla, Amaya, and Open Source
As compared to proprietary web browsers, the
Mozilla browser
and W3C's experimental Amaya browser
seem to offer inventors an inviting playground in which to
execute their ideas. As Open Source projects, the browsers'
source code is available for arbitrary changes.
Furthermore, Open Source viewers can be found for many but probably not
all other
document formats, which may be good bases for extending the browsers.
Several researchers have taken this path, for example
extending the early NCSA Mosaic
to experiment with annotations, advanced stylesheets, and more.
But the situation is not ideal. In the first place, retrofitting a
feature into the browser for one document format still neglects all
the other document formats one works with regularly;
annotation tools for web pages do nothing for email, PDF,
net news, and so on.
Moveover, in the case of Mozilla, the system is enormous: a recent version
(it is ever larger now)
comprises 3817 C++ files and over 1.6 million lines of code.
While it is modularized, there is
nevertheless a significant amount to master before one can
think about extending it.
When it comes to the essential point of distribution, Open Source
systems are not necessarily an improvement over closed source.
Open Source proponents claim that one can simply redistribute
the source with changes, and indeed this can be done. In practice,
however, any useful system is continually being improved
and one would want to track
new versions, but revising one's changes to match new mainline
source changes is tedious at best. One can contribute the changes
back to the main project developers, but they may not be
accepted especially if they are of limited applicability or if they
are large and thus hard for the main developers to maintain.
The Multivalent Browser
The Multivalent Browser, developed in UC Berkeley's Digital Library
Project, is our candidate for the system that provides most of
the strengths of the mechanisms described above but that suffers
the fewest limitations.
The Multivalent Browser is a free, Open Source, pure Java system that
displays and annotates many document formats. It currently
supports PDF, HTML, scanned paper, TeX DVI, UNIX manual pages,
the Plucker e-book format, and more.
All these formats can be annotated in situ by the user, and
annotations are robustly anchored to survive edits to the base
document. It has many advanced features, such as lenses, which range
from the common magnification lens to one that operates on scanned
paper images to show the OCR translation.
In the screen dump of the Multivalent Browser below,
we see an HTML document in
a window with many of the familiar web browser controls,
including menubar, forward/backward buttons, and URI type-in line.
If this were a PDF or TeX DVI format document, it would have controls
to move from page to page. If it were a UNIX manual page,
the content would be a collapsible outline.

Screen shot from the Multivalent Browswer.
The HTML page is variously annotated.
At the bottom is a yellow highlight,
constructed by selecting a range of text and choosing a menu option.
Just above it is a "short message" annotation, which
has reformatted the text to open up space between lines.
Floating on top of the page is a note annotation
with styled text, which has itself been annotated.
The red line in the middle left from "Oklahoma" to "Microsoft"
is a "move text" annotation;
clicking on the source by someone with write permission
executes the annotation, deleting the text from the source
and inserting it at the destination.
The same annotation code operates across document formats;
if we were viewing the page image of scanned paper,
the "short message" annotation would reformat the image.
Finally, two lenses are in operation.
The magnification lens, at its default zoom factor of 200%,
is enlarging text, image, and even the floating note annotation.
The "Rot-13" lens implements a simple cypher,
sometimes used to obscure jokes,
that replaces each letter by the one halfway through the alphabet.
Where the lenses overlap, their effects combine,
in this case giving us magnified Rot-13 text.
Multivalent Architecture
The above features happen to be implemented and packaged together, but
they could just have well been third party extensions. The architecture
is extremely open to extension — much more so than a web browser —
whether with new document formats or arbitrary features.
A brief look at the Multivalent architecture
helps assess its applicability for Digitial Library interfaces.
- Hubs. Conceptually, every document gets a custom browser. The
"code components" active for a given document are given in a hub, which
is an XML document that lists the components and whose attributes and
content subtrees are passed to components when they are instantiated
for the document. The annotations seen above are stored in a hub in
writable storage and keyed to the URL. Because it is convenient and
more efficient, all documents of a given format also share a common
hub particular to the format, and all documents share a hub of
universal system functionality.
- Behaviors. The "code components" listed in hubs, behaviors are
Java classes that adhere to the certain protocols described below.
Behaviors range from "media adaptors" for PDF and HTML, to
functionality shared across document formats such as searching and
annotation, to features that are hardcoded in other systems such as
the menubar and other GUI widget and hyperlinks. While some behaviors
happen to be packaged with the basic system, all could have been
written as third party extensions and integrated simply by downloading
a Java JAR file of classes and organizing hubs
into the same directory as the main browser.
- Document Tree. The reason behaviors can operate across document
formats is that they do not separately target some large number of
concrete formats, but rather a single well-defined high-level model of
a document. Document structure is captured by internal nodes, and
format-specific content at the leaves. The document tree is similar
to HTML's DOM, but whereas that applies only to HTML and XML, in the
Multivalent model behaviors can introduce new node types and whatever
else is necessary — try faithfully representing PDF and implementing
lenses in JavaScript and DOM.
By targeting a document tree with guaranteed properties across
document formats, a single implementation of a behavior,
such as annotations, works on all document formats, including all
those currently supported by the system and ones yet to be invented.
- Protocols.
The reason behaviors can be combined arbitrarily in hubs
and not conflict with each other is that communication among them is
highly stylized in protocols.
Every basic operation of a digital
document system — restore from disk to active memory, build
document tree from concrete format, format document tree, paint
tree, interact via low-level events (mouse and keyboard) and high-level
semantic events ("open document") — all are reified into
protocols.
Any behavior can hook into any protocol and modify its results.
For instance, in painting a document
on the screen, the basic document is painted, then the behavior
controlling the move text annotation draws a line on top. These
sweeping protocol passes have proven remarkably effective at
preventing conflicts between behavior, but other more specific
communication mechanisms are possible; for example, in the case of lens
composition, a shared "manager" behavior computes lens overlaps and
sequences composition effects.
Conclusion
With arbitrary functionality applied to arbitrary documents, the
Multivalent Browser supports interfaces to idiosyncratic content to
the same degree as custom applications. Although its protocols appear quite
basic, they have, without tweaking, proven capable of supporting such
advanced features as lenses and annotation. As a client-centric
system, the interface to content is not confined to a specific set
features of the server where the content happens to reside; through
edits to the applicable hubs, behaviors can be combined from arbitrary
sources and applied as appropriate to arbitrary content. For these
reasons, we call the Multivalent Browser "a platform for new ideas".
Availability
The Multivalent Browser is available from
<http://www.cs.berkeley.edu/~phelps/Multivalent/>.
It is available as executable and source code.
It is covered by the BSD license and therefore free for all uses.
Acknowledgement
This research was supported by the Digital Libraries Initiative,
under grant NSF CA98-17353.
References
[Adobe] Adobe Acrobat plugin,
http://www.adobe.com/products/acrobat/readstep.html
[Amaya] World Wide Web Consortium, Amaya,
http://www.w3.org/Amaya/
[Document Object Model] World Wide Web Consortium, Document Object Model,
http://www.w3.org/DOM/
[Mozilla] Mozilla home page, http://www.mozilla.org/.
[NCSA Mosaic] National Center for Supercomputing Applications (NCSA), Mosaic,
http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/help-about.html.
[Phelps and Wilensky] Thomas A. Phelps and Robert Wilensky,
"The Multivalent Browser: A Platform for New Ideas",
Proceedings of Document Engineering 2001,
November 2001, Atlanta, Georgia.
[Plucker] Plucker e-book format,
http://www.plkr.org/.
Copyright ©
Thomas A. Phelps and Robert Wilensky
|