Volume 6 Issue 1
Spring 2010
ISSN 1937-7266

Supporting Effective User Navigation in Digital Documents

Jennifer Pearson

Future Interaction Technology Laboratory,
University of Wales, Swansea
csjen@swan.ac.uk

Doctoral Consortium Extended Abstract

1. Introduction

Electronic documents such as PDFs are a large part of the growing digital age and will no doubt form a substantial portion of any paperless office [10]. The harsh truth, however, is that electronic documents differ greatly from traditional paper based material. It has been widely documented that the popularity of reading on paper far exceeds the popularity of reading on a computer, with many users choosing to print documents to read them as opposed to reading on a computer screen.

Incorporating tools into digital document readers to aid users in day to day tasks will enhance their performance and hopefully increase user uptake. My research on this topic centres on several areas of document navigation, focusing specifically on current physical (paper) practices in order to enhance their digital equivalents. Each innovative digital tool I have built has then been critically analysed by means of field studies on users from an appropriate target audience. To date, I have investigated three areas of document navigation: Placeholders [2], Annotation [9] and Indexing [8], and have several more ideas in mind to take the topic further.

I feel that this is a promising digital library topic as it has many practical implications to the community. My work in this area has resulted in three publications [2,8,9] and now directional feedback would be most welcome to finish the final stages of my PhD.

2. Background

The cause of the reluctance to read from digital screens has been speculated as the eye strain caused by backlit displays. This has prompted the explosion of e-paper technology used in many popular ebook readers such as the Sony reader and Amazon's Kindle. Despite these advances in technology however, many users are still stubbornly avoiding digital readers while continuing to buy large physical books.

Another popular theory for the low popularity of digital readers is connected with the affordances that paper offers over digital. The physical properties of paper (e.g. that it is light, thin and flexible) afford many actions that are not possible on its digital counterparts. Activities such as folding, ripping and flicking all contribute to the ease in which physical documents are manipulated and are difficult to replicate on the digital spectrum.

A useful way of describing the affordances that paper documents offer is by "lightweight navigation," which was described by Catherine C. Marshall in her 2005 paper "Turning the Page on Navigation" [5]. Marshall defines the term "lightweight" as navigation that occurs either when people reach a particular page or when they move within an article in a way that is so unselfconscious that they aren't apt to remember it later. She then goes on to describe four separate lightweight navigation types:

  • Narrowing or broadening focus by manipulating the physical magazine
  • Letting one's eyes stray to a page element out of the textual flow
  • Looking ahead in the text to preview or anticipate
  • Looking back to re-read for context

These properties which are common activities on paper are rarely seen in digital document navigation. She speculates however, that this concept of "lightweight" interaction can be also be applied to digital technology but does not give any concrete evidence to support it. There are many aspects of computerised technology that far exceed the capabilities of paper (i.e. searching, zooming etc) and by paying closer attention to the possibility of "lightweight" navigation (and also its corollary: "heavyweight"), digital document software can not only incorporate the physical affordances of paper but also improve upon it by surpassing its limitations. With this in mind, the goal is to prove by example, that lightweight navigation is indeed possible on a digital level. From these examples it will then be possible to re-define the digital equivalent of the term and determine how different the physical and digital definitions are.

There are many areas of reading and manipulating digital media that suffer from unintuitive interactions and low rates of use. Focusing on these areas by prototyping digital tools with lightweight features will enable the digital definition to be determined. Thus far, the areas of research that have been evaluated are: digital placeholders, annotation and indexes.

3. Completed Work

3.1. Placeholders

Placeholders in physical documents are a long-established method of locating information and provide crucial support for readers in remembering their place in the text. Placeholders can take many different forms such as scrap paper, dog eared corners and post it notes, but all share one common characteristic: they are all "lightweight." Marshall's definition of lightweight on physical documents extends to all types of physical placeholders, namely that they can "move within an article in such a way that they are not apt to remember it later."

Placeholders, typically reified in the physical world as "bookmarks," allow users the freedom to relocate information as well as offering support in placing notes in close proximity to text without damaging the document itself. Unfortunately, the equivalent tools on digital documents are far less intuitive and are consequently a more time consuming and cumbersome affair. Two such examples of this poorly used and "heavyweight" placeholder interaction is on web browsers and paginated document readers.

Bookmarking or "favourites" as they are often referred to, have been a feature in web browsers since very early on in their development. Unfortunately however, despite extensive research into their observed usage [1], there have been very few enhancements to this feature since then. It has been documented that web bookmarking is more commonly used as a long-term archival aid as opposed to a method of relocating well-known material. Abrams et al for example, discovered that frequency of use was not a factor in determining whether or not a page was bookmarked, with users commonly choosing to memorise URLs or use search engines to locate commonly visited sites. His research concluded then, that the majority of web bookmarks were used for archival purposes, i.e. when users wanted a permanent record of a particular page.

The format of web bookmarks seems to take the general form of an un-ordered list, i.e. the bookmarks appear in the list in the order in which they were added. Although this format is acceptable for a bookmarking a collection of un-related web pages, it would be extremely undesirable in a document ordered by page number. Paginated document readers then, being sequential order their bookmarks in a list according to where they appear in the document.

A web page can be considered as a document or a set of linked documents; in either case however, 'bookmarking' will result in a placeholder to the document as a whole as opposed to a specific portion of the page(s). In contrast, paginated document readers allow bookmarks on separate pages of a document, enabling in effect sub-document bookmarking.

Despite the obvious superiority of paginated document placeholders over web browser favourites, users still compare them unfavourably against physical bookmarks [5] and consequently make less use of them. The aim of the implemented system is to bridge the gap between the physical and digital domains to make bookmarking on paginated document readers more intuitive and hopefully increase user opinions of digital placeholding.

The System
To improve the use of placeholders in digital document readers, a visual approach which uses colour and position has been implemented. To mimic the way standard placeholders are used in books, the visual bookmarking system has used coloured 'Tabs' that stick out of the sides of a document. These 'Tabs' are positioned in the virtual space located on either the right or left hand side of the PDF viewing area. The design of this representation has been modelled in the style of a telephone directory, i.e. bookmarks that occur before the current page appear on the left of the PDF area, whereas bookmarks on the current or later pages appear on the right (see Figure 1). To make the relocation process easier, the tabs are also ordered by page number; so the later the page, the further down the display the tab appears. Thus, a bookmark on page 1 will often appear at the top left corner of the display, whereas a bookmark on the last page will be seen on the bottom right.



Figure 1: The Visual Bookmarking System (Click here for a larger view of the image)

The physical size of each of these bookmark 'tabs' is dependent upon how many of them there are; the height of each tab is calculated by dividing the height of the PDF viewing area by the number of bookmarks in the document. Creating a new bookmark and similarly deleting an existing bookmark will resize and reorder every tab on the screen.

The aim of the visual bookmarking system is to allow users to quickly and easily relocate information in a document. This "lightweight" interaction has been achieved by a visual representation that mimics methods used in the physical domain (specifically utilising the 'virtual' space surrounding the PDF itself). Each bookmark in the system can be distinguished by its colour as well as its position in the display and clicking on a tab will take the user to the bookmarked page.

Results
To investigate the usability and overall value of the implemented system, a small user study was performed to gain subjective reviews from a range of users. To help users evaluate the system, a set of tasks to compare different methods of placeholding were carried out. To achieve this, two additional systems were implemented based on common methods of placeholding available to date: un-ordered menu (based on web favourites) and ordered list (based on document reader bookmarks).

The results of the study concluded that the visual approach was the most popular with the average ease of use score being 7.7 out of 10. It was anticipated that the un-ordered menu would perform the worst and this was confirmed with an ease of use score of only 4.3 out of 10, compared with the ordered list which resulted in a score of 6.4 out of 10.

3.2. Annotations

Annotations in various different forms have been utilised by many for centuries. Whether it be making notes on a document, marking assignments or even professionally annotating books, annotation allows users the freedom to make their own mark on pre-existing literature. There has been much research into various aspects of digital annotation, including its lack of popularity compared to the alternative on paper. Marshall herself has commented on the "heavyweight" properties of digital annotations [4,6,7], suggesting that "support for a smooth integration of annotation with reading - is the most difficult to interpret from a design point of view and until we as designers get it right it is likely that people will continue to annotate paper materials even as they read materials in a digital library" [4].

It is clear that digital annotation is not as practical or as popular as annotation on the physical plane. The reason for this as suggested by Marshall [6] is that paper provides readers with certain affordances for this sort of active document engagement; affordances that are not easily replicated on the digital level. One crucial aspect of the annotation process that has been given relatively little attention is the concept of space; specifically where the annotations are positioned. Three such examples of physical annotation would be:

  1. Over the document itself;
  2. Around the margins of a document;
  3. On a separate medium i.e. spare paper.

Although Marshall did identify two types of marking characteristics in her 1997 paper [4], her findings were not categorised according to removable media (i.e. spare paper or extra workbooks), instead focusing on the other characteristics identified within the paper. What has not yet been investigated is which of these methods is the most popular as well as how useful it is having more space to annotate.

To investigate these issues, a paper based comparison study was undertaken. This study was explicitly designed to test each of the approaches listed above in order to determine the most common method and focuses strongly on the differences between annotating over the document itself, and in the margins. The objectives of this study then, are not only to determine the most popular type of annotation but more importantly to discover the differences between within-text and marginal annotations. The results from this study later aided in the design of a new digital system which incorporates the "lightweight" properties of physical annotation in an attempt to improve the digital annotation process.

The System
The paper study concluded that overall the margins were the most popular location for annotation, with 70% of the users selecting it as their first choice. With this in mind, the digital solution has been designed with an expandable margin area surrounding the PDF document. It also contains two sets of annotation tools; one for annotating over the PDF itself and another for in the margin area surrounding it. The tools provided have been carefully selected to mimic the most popular methods observed in the paper study as well as improving them by incorporating digital techniques.

The pen and eraser tools for example, have been implemented to imitate the way free form writing implements are used on paper, whereas the image and text box tools are additional digital features. The highlighter tool has been designed to overcome a major problem identified in the paper study, specifically relating to the separate medium. Many of the participants recruited for the paper study admitted that a separate medium was a practical location for notes as there are no space constraints, but were reluctant to use them due to several problems:

  1. Their ability to get lost or detached from the original text;
  2. It is hard to reference a specific section in the original text on a separate medium;
  3. That separate notes mean little without the original text to reference it to;

The highlighter tool has been designed to give users unlimited space to write notes as well as providing the facility to link them to any section of the document. This has been achieved by the 'add note' feature which only shows the annotations when the mouse is hovered over the highlighted section of the document.



Figure 2: The Margin Annotation System (Click here for a larger view of the image)

Results
In order to determine the success of the system, we conducted a user study to answer several research questions and gain insight into users' opinions in order to refine and improve the system. As well as making use of the main annotation system, two additional systems have been implemented in order to conduct a comparison study which will be used to determine whether people prefer to use tools on the PDF or the margin (research question 1). In total then, the systems used in the study will consist of:

  • The Original System (Set of tools for the PDF, and a set of tools for the margin)
  • The Margin System (All tools on the margin only)
  • The PDF System (All tools on the PDF Only)

The results of these investigations confirm that there are some tools which are best suited to the margin, while others are best suited to the PDF. This information strongly suggests that the margins are a useful addition to marking up the PDF as opposed to a straightforward alternative. In addition to this, the study also confirms that all the tools in the system make up the minimal complete set by proving that each one is the best at a particular type of task (highlighting specific points, highlighting specific areas, making connections between points, illustrating something, making notes). Finally, the question that asks how aesthetically pleasing the annotations that can be produced from the system are was answered with an average rating of 8/10 from the participants.

3.3. Indexing

Searching for relevant information within a document is a tiresome but necessary procedure in the information retrieval process. In physical books the process of locating relevant information is supported by the index; a classic structure that references key terms within a document for easy navigation.

Indexes in paper books, despite being reasonably "lightweight" in terms of their structure, still require the unavoidable process of 'looking up' terms, which requires effort from the user. Furthermore, searching for specific sections of physical documents that do not have indexes can be seen as particularly "heavyweight" as this will require an enormous amount of conscious effort.

In contrast, specific words/phrases in digital documents can be found easily by means the Ctrl+f function. By facilitating user-defined search terms, Ctrl+f has established itself as being a considerable advantage in electronic documents and is relatively "lightweight" compared to the alternative on its non-indexed physical counterparts. However, in many document readers and web browsers this function does not return a list. Instead, it simply steps linearly through the document, highlighting each occurrence one at a time. This sequential interaction is slow and cumbersome if the main aim of the search is to get an overview of the location of text matches in a document; the main characteristic of the traditional index system.

This concept of keyword overview can be a useful feature when performing document triage as it gives the user a solid indication of the areas of the document to observe first. Indexes in books provide this overview but still suffer from several problems:

  1. That they are author created, therefore restricting them to static terms that exist when the book is created and
  2. That they take time to physically navigate to.

The problems associated with the index structure on physical books however, need not carry over to the digital world. By combining the lightweight characteristics of traditional indexes with the speed of digital search we are able to create a user-defined index builder that overcomes the problems mentioned above. In addition to this, to enhance the visualisation of the index data, colour and size will be used as visual cues to indicate the relevance of each entry.

The System
The main aim of the visual index system is to allow users to create their own index, which in turn gives a visual overview of the most relevant parts of the document based on the keywords entered. It groups the results that appear on the same page (and also occurrences that appear in clustered pages) and visualises their occurrences by means of colour and size. Creating a graphical representation of the data increases cognitive activity, allowing users to better understand the underlying information [11] and giving a clear overview of relevant sections by illustrating where the highest occurrences of each keyword appear.

The visualisations for the index will take three forms (see Figure 3), which users will be able to toggle between by means of a radio button set on the task bar:

Colour Tag: The Colour Tag system is the same as a traditional index list layout in terms of size, but we have also coloured each link depending upon the number of occurrences of the word on that page/cluster.

Tag Cloud: The Tag Clouds system is an alphabetically sorted, size-weighted list of links that allow users to easily see each page/clusters relevance by means of their size and/or colour.

Graph: Harper et al produced the SmartSkim interface [4], which produced a vertical interactive bar graph representing the document and each section's relative retrieval status values. Working with this idea in mind, we decided to incorporate a simple graph type in the visual indexing solution. The simple horizontal bar chart implemented into the system represents the page/clusters versus the number of occurrences of the keyword/phrase.

All three of these visualisations use colour to indicate which results contain most occurrences of the keyword; i.e. a gradient from red to blue where red is a 'hot' result (lots of occurrences) and blue is a 'cold' results (few occurrences). The Tag Cloud and Graph interpretations also use size-related cues to visualise keyword occurrences.



Figure 3: The Three Types of Visualisation: Colour Tag, Tag Cloud and Graph
(Click here for a larger view of the image)

Results
The main aim of VIS was to integrate the speed of digital search with the visual overview of keyword locations that comes with document indexing. It was anticipated that this would increase the speed with which users can locate relevant information and improve the effectiveness of reading the text. To investigate these issues, a pilot study was performed which focused on qualitative data in the form of subjective user questionnaires, as well as quantitative data obtained from task timings. In order to test the effectiveness of the implemented visualisations, two additional searching methods were written as a basis to test against. These two techniques were based on long-established methods of document searching: Linear Search (Ctrl+f) and Traditional Indexing:

Linear Search: The linear search method implemented in the program is based on the standard Windows within-document find feature that allows users to sequentially progress through a document one keyword at a time.

Traditional Indexing: The traditional indexing method has been designed to look like the classic index structure, i.e. all entries are the same size and colour. It does however have the same `build' feature as the visual index methods and also includes hyperlinks and tool tips.

The results from this study confirmed the effectiveness of custom index builders over standard sequential search in both the timed tasks and subjective user reviews. In addition to these results, it was also discovered that the custom index builders significantly increase the accuracy in which users can locate relevant material in a document. The results showed that the success rate for selecting the most relevant section was almost double when using the traditional index over the linear search and increased even further when using the visual index systems. Furthermore, it was also discovered that colour and size play a key role in the visualisation of keyword occurrences. Where colour and size cues were used, users were able to locate relevant sections of a document in a shorter period of time and with higher accuracy.

4. Plan

The aim of this research is to improve digital document navigation techniques by incorporating "lightweight" interaction into their design. In doing so it is hoped that a definition for the term "lightweight" on digital documents can be more clearly defined. Thus far, incorporating this technique into usable PDF readers has resulted in user satisfaction and where appropriate, more efficient tools.

The plan from this point on is to first identify several more areas of digital navigation that are less popular than their physical paper counterparts. Detailed investigation into these physical methods will then give insight into their "lightweight" characteristics and give a more solid grounding to base a digital improvement. Implementation of the digital systems will then be followed by a series of user studies to analyse the proposed solutions and evaluate their overall success.

5. Conclusions

The term "lightweight" interaction was defined by Catherine C. Marshall and can be used to describe the characteristics of navigation on physical documents. Specifically, what she describes as "lightweight" manoeuvres are those that are so intuitive and unselfconscious that users are not apt to remember them later. The equivalent interaction on digital documents however has been little studied and vaguely defined. It would stand to reason, however, that if it were possible to achieve this type of intuitive interaction on the digital level it would increase the usability of systems that employ it.

The goal for my PhD, then, is to prove that 'lightweight' navigation is possible on the digital spectrum by incorporating the concept into digital implementations that currently suffer from poor interactions. These prototype systems can then be field tested to confirm if they are indeed an improvement over their "heavyweight" equivalents so a digital definition of the term can be concretely specified. Thus far, three areas of digital document navigation have been implemented using "lightweight" features: placeholders, annotation and indexing.

Digital document readers are far from being a replacement for paper, but my research in this area aims to bridge the gap between the physical and digital domains in order to make the navigation of digital documents significantly less cumbersome.

References

[1] David Abrams, Ron Baecker, and Mark Chignell. Information archiving with bookmarks: personal web space construction and organization. In CHI '98: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 41-48, New York, NY, USA, 1998. ACM Press/Addison-Wesley Publishing Co.
 
[2] George Buchanan and Jennifer Pearson. Improving placeholders in digital documents. In Research and Advanced Technology for Digital Libraries, volume 5173 of LNCS, pages 1-12. Springer Berlin / Heidelberg, 2008.
 
[3] David J. Harper, Ivan Koychev, and Yixing Sun. Query-based document skimming: A user-centred evaluation of relevance profiling. In Advances in Information Retrieval, volume 2633 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2003.
 
[4] Catherine C. Marshall. Annotation: from paper books to the digital library. In DL '97: Proceedings of the second ACM international conference on Digital libraries, pages 131-140, New York, NY, USA, 1997. ACM.
 
[5] Catherine C. Marshall and Sara Bly. Turning the page on navigation. In JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 225-234, New York, NY, USA, 2005. ACM.
 
[6] Catherine C. Marshall and A. J. Bernheim Brush. Exploring the relationship between personal and public annotations. In JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pages 349-357, New York, NY, USA, 2004. ACM.
 
[7] Catherine C. Marshall and A.J. Bernheim Brush. From personal to shared annotations. In CHI '02: CHI '02 extended abstracts on Human factors in computing systems, pages 812-813, New York, NY, USA, 2002. ACM.
 
[8] Jennifer Pearson, George Buchanan, and Harold Thimbleby. Creating visualisations for digital document indexing. In Print, 2009.
 
[9] Jennifer Pearson, George Buchanan, and Harold Thimbleby. Improving annotations in digital documents. In Print, 2009.
 
[10] Abigail J. Sellen and Richard H.R. Harper. The Myth of the Paperless Office. MIT Press, Cambridge, MA, USA, 2003.
 
[11] Colin Ware. Information Visualization: Perception for Design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004.