IEEE TCDL Bulletin
space space

TCDL Bulletin
Current 2006
Volume 2   Issue 2


A Spatial Hypermedia Tool to Support the Collection and Organisation of Audio Knowledge

Kirstin Lyon
Department of Computer Science
Aalborg University Esbjerg
Niels Bohrs Vej 8
DK-6700 Esbjerg



Digital libraries allow users to access large amounts of organised audio information. Users of these libraries are able to collect required information, that is then stored and organised for later use. A useful feature of some digital libraries is the ability to create personal spaces within the library where users may collect and organise their information in an appropriate way for them. Some of these personal spaces allow users to organise their information spatially. A difficulty with spatial organisation is that as the amount of information grows, users may become visually overloaded, and the ability to re-find information becomes slower and more error-prone. To overcome this, this paper proposes a tool that allows users to organise their audio collections spatially using audio cues to reduce the effects of visual overload.


1. Introduction

Audio collections of increasing size and complexity are created and maintained by both professional and casual users. With the growth of the Internet as a public resource, users have potentially large collections of audio information on their computers that need to be organised. Organising is useful because it allows users to find previously stored objects quickly. If objects are organised poorly, users may find it difficult to find relevant objects, or in the worst-case scenario, forget that a relevant object exists.

When searching for objects, users find it easier to use location-based searching, that is, to look first where they expect to find an object [1]. Therefore, finding an appropriate initial place for the object is important. Finding an appropriate place for an object can be difficult as objects may belong in more than one place, or have no appropriate place. Re-arranging previously organised objects is time-consuming and not necessarily helpful. It takes time to organise; in some cases the amount of time taken to organise is greater than the time spent working with the information. This paper suggests using spatial organisation as a way of creating implicitly organised spaces. A difficulty of spatial organisation is as the amount of information grows, users may be overloaded visually. To reduce this, I suggest introducing audio cues into a spatial hypermedia tool.

This paper is organised as follows. Section 2 provides some background to the area of audio knowledge work. Section 3 describes a typical usability study scenario. Section 4 outlines my proposed solution. Section 5 describes my prototype. Section 6 describes some related work. Section 7 raises some future questions.

2. Background

2.1 Audio Information

Audio information is comprised of information that is listened to, for example, oral records, music and sound effects. In order to extract their information, they must be listened to. Audio information can be difficult to organise for several reasons. Firstly, when audio is collected initially, it can be either transferred in its entirety to text, or stored as audio. Transcribing to text is a time-consuming, tedious task that is not easily automated [2]. A different approach is to only transcribe the part of the audio that is of interest. When the amount of audio records is large, as in a digital library, it is impractical to transcribe that amount of data, so analysts must be able work with a copy of the original recording. It is also impossible to transcribe music or sound effects.

Most audio analysis tools support linear organisation. It is difficult to show multiple relationships between files if they are organised linearly. Searching can also be challenging. If users know the name of the file, then information retrieval techniques may be used. However, if the name is not known, it is difficult to find the appropriate file with present speech recognition technology [3].

2.2 Computer-Supported Knowledge Work

Computers may be used to extend users' capabilities. They are particularly useful at completing tasks that are automatic in nature, for example, sorting a list into alphabetical order. In the case of digital libraries, keyword searches are frequent. For example, a few key words are entered and all items within the library are searched and the closest results returned. If it is known what is being looked for, this method can be successful, however, the method can be less successful if the user is browsing for information.

Another approach that is being used successfully in knowledge management work is to employ a user-centred approach; that is, to use a computer for easily automated tasks, and allow users to work with unstructured tasks that are difficult to automate. One research area that looks at how to extend user abilities is hypertext (or hypermedia). Organisational structures discussed within the hypermedia community include associative and spatial structures. Associative structures allow users to organise through linking items together. This takes advantage of our ability to remember objects through association. Early examples of associative hypertext tool include the "Memex" [4] and NLS [5]. They allow users to connect objects together without users needing to explain their choices. Associative "trails" can be built that can be viewed later.

Spatial hypermedia takes advantage of our visual and spatial intelligence. Users organise objects with spatial attributes, such as colour, size, location and proximity. These tools allow users to build workspaces containing a variety of documents and pictures. As with associative hypertext tools, users do not explain their choices. The computer remembers what attributes each object has and displays them. Often in information organisations, the relationships between information can be as important as the information itself. Knowledge of a problem space evolves over time. Spatial hypermedia tools support users in creating structures when it is not clear how information should be organised.

A disadvantage of these tools is that users may become overloaded visually as the amount of information increases. Various techniques are being discussed that may reduce the effects of visual overload. These include: increasing the number of dimensions [6]; introducing visual effects [7]; and, increasing the number of modalities used [8].

3. Scenario

A group of analysts research the transport networks of Denmark. A series of interviews and focus groups were organised within the various regions of Denmark with various groups of people, from those working within transport to everyday users. The results of those meetings were recorded and have been stored in a digital library. Analysts are given a list of points that are of interest to managers, including issues relating to timetables, cost and customer satisfaction.

3.1 Pre-analysis – Transcribing

Transcribing refers to the transfer of oral records to text. This can be completed by several people, including professional transcribers, secretaries or analysts. There are several strategies for this. Either recordings are transcribed word for word by a transcriber, or are imported directly into an analysis tool that supports audio files. The first option take approximately one day of transcribing per hour of oral session; both oral and text records are stored for future analysis. The second option, takes less time, with only the necessary sections of oral records being transferred to text. This tends to happen during the analysis stage.

3.2 Collection of Material

Analysts gather relevant material from a digital library. To find the relevant audio files, analysts search through various categories within the library, such as cities, "Amts" or "Kommunes". If the recordings have been transcribed, then they are able to perform keyword searches. However, if recordings have not been transcribed, then keyword searches are dependent on how audio files are named. At present speech recognition software is not advanced enough to accurately search audio files for keywords.

3.3 Organising and Analysing

Once gathered, texts/recordings are stored in the same place for easy discovery. Analysts first begin by looking at the key points they wish to research, such as price and timetable. Starting at the beginning of the recording/text, the analysts work through it, annotating areas that match with key points to get an idea of what general opinions were. This process continues, as it is not possible to get all the details during the first pass. Notes are compared to other networks notes to see if a pattern develops. Knowledge develops over time. Once the analysis is complete, the findings are collected and presented in a report that is given to the managers.

4. Supporting Organisation of Oral Archives

Being able to find objects after they have been stored is important to users. As the amount of available information grows, it becomes increasingly important to be able to find the stored information again quickly and correctly.

Present text analysis tools encourage users to organise linearly. For analysis purposes, this can be restricting, as one point in a discussion may lead to several others. This is difficult to represent in a list, as a file may belong in more than one place. Moving files takes time, and moving them may not be helpful. In this method, analysts are expected to find ways to remember relationships between themes. Computers could do this automatically. Users are often not able to see all of a transcription at one time. When transcripts are long, analysts are expected to remember, or to make a note of, common themes. Some text analysis tools offer filters so themes and ideas can be grouped together into lists. When texts become heavily annotated, it can be difficult to separate out themes. Text analysis tools that do support audio files simply insert audio clips into lists with transcripts.

4.1 Spatial Organisation

Observations in office environments have shown that users often use spatial placement to remind them where particular items are [9]. This has also been demonstrated in the computer world, where placement of files helps remind users what to do with the files [1]. Users tend to favour location-based searching rather than using the "find" function.

4.2 Using Audio to Reduce Visual Overload

Visual overload is a known problem of spatial hypermedia tools. It occurs when the amount of information becomes so large that users are unable to understand it any more. This can make objects difficult to find. Most suggestions to decrease visual overload include using some form of visual cue, or adding a third dimension. However, information can be organised in many ways. Increasing the number of dimensions helps, but does not solve the problem entirely. Users have other abilities such as hearing that they can use to advantage. Our tool increases and develops the number of audio cues to reduce visual overload. Users attach sounds to different areas of their workspace. When audio files are placed near those areas, they collect that sound. Relationships between objects may be built using audio cues instead of visual cues.

4.3 Navigation

A difficulty of spatial hypermedia is the feeling of becoming lost in space. This occurs when all areas of a workspace look similar. In the physical world, when people are new to an area, they may use tools such as maps, or look for distinguishing features in the landscape. Our prototype provides a small map that may be navigated, showing users where they are in relation to their information. Landmarks are also a feature of this prototype. They look different from audio objects, which have a sound attached.

4.4 Implications for Scenario

Analysts create a personal workspace within the library. The create duplicates of relevant files in their area. When necessary, they create landmarks to give them an idea of where they are and to help with rough categorisation. Each file has the capability of playing its own sound, as well as the landmark sounds in its neighbouring area. A mini-map is available for navigation purposes to allow wider exploration of the area.

Analysts listen to the first audio file, then store it somewhere in their workspace and move to the next audio file, which is easily compared with the first to see if some similarities exist between the two. Depending on the outcome, the second file is stored appropriately. Every time a new file is imported, the analysts listen to it and store it accordingly. This way the analysts can easily see what topics are more thoroughly covered, or see if different themes are emerging. The analysts are able to listen easily to files and can quickly rearrange the placement of files. To get an overview to their workspace, the analysts look at the map, and for finer detail, they consult the viewport.

5. Prototype

5.1 Overview

Our tool is divided into three layers: user interface; model; and, storage mechanism. We have implemented it as a monolithic system, but this structure could be divided into its three layers. Our tool is written in Java 1.5 using JOAL for our audio implementation. We used Java serialisation for our storage mechanism, but other mechanisms may be used. The model is based on other spatial hypermedia tools, such as the Visual Knowledge Builder. Our interface also has elements of most spatial hypermedia tools.

image showing architecture

Figure 1.

5.2 Graphical User Interface

There are two sections to our graphical user interface, the viewport and the map. The viewport is the size of users' monitors and takes only a small area of the available space. Users navigate using their mouse and by dragging their cursor around their screen. To help users to maintain an overview of their workspace area, we have provided a map. The map shows each object in the workspace, as well as where the viewport is at present. Users can navigate using either the map or the viewport. This is similar to some computer games. We have also provided landmarks. These are are also shown on the preview map.

5.3 Audio User Interface

In order to reduce visual overload, we have introduced audio cues into our tool. Users are able to swap between the contents of the sound object and the sound that they may have given to the object. These help users to group their information. Users choose the sound of a landmark. At present this choice is from a given list. Audio objects pick up the sounds of their closest landmarks. We use volume to show what the strongest relationship is. By hovering over a landmark, users hear the contents of the closest sound files at the same time. The loudest file is always in the centre of the screen.

Image of audio tool

Figure 2.

6. Related Work

6.1 Spatial Organisation Tools

Synchrony. Synchrony is an example of a Patron-Augmented Digital Library (PADL) [10]. It allows both librarians and patrons to contribute to the digital library. Synchrony is a 2.5D spatial hypermedia tool that allows users to organise files spatially.

Garnet. Garnet uses both spatial hypermedia and digital library technology to support information seeking and structuring [11].

Visual Knowledge Builder (VKB). VKB is an example of a 2D spatial hypertext tool [12]. Users organise files spatially using colour, size, location and proximity to express relationships between files, or files can be grouped into collections. A spatial parser provides suggestions to users about possible relationships between files by looking at various visual attributes. It also provides a history for users, so they may navigate temporally through their workspace. Linking between documents and workspaces is also supported. At present, VKB does not support the inclusion of audio files.

Data Mountain. The Data Mountain [6] is a 2.5D spatial organisation tool. 2.5D tools allow users full control over two axes, but the third remains discrete. Data Mountain was designed as an alternative to the bookmarking function found in Internet Explorer and similar applications. Users organise thumbnails of their web pages within a 2.5D environment. The further the thumbnail is from a user, the smaller it becomes. The Data Mountain uses some audio cues to reinforce what happens visually. However, no support for audio formats exists within Data Mountain.

Analysis. Spatial tools so far focus mainly on the arrangement of either text-based forms or pictures. Audio formats are usually not supported, and therefore it is not possible to use these tools for organising audio files. Spatial organisation has proven to be a successful way in which to organise documents and has already been tested with users. We believe that it will be possible to extend these tools with audio abilities, allowing users to interact with audio formats.

6.2 Media Analysis Tools

AnnoTape. AnnoTape was developed with anthropological research in mind [13]. It allows people to immediately transfer interviews into the system and only transcribe what is necessary. It is able to record, store and provide support for analysing files.

ATLAS/ti. ATLAS/ti is a powerful tool for analysing large bodies of textual, graphical, audio and video data [14]. It offers a variety of tools for accomplishing the tasks associated with any systematic approach to qualitative data. It provides tools to help manage, extract, compare, explore and reassemble meaningful segments of large amounts of data in systematic ways.

Analysis. These tools organise sections into lists, with one section coming after another. It is difficult to organise information into a list, as one issue may lead to several other issues. Our tool concentrates on audio organisation at present. Instead of list organisation, we use spatial organisation. Themes may be grouped together instead of following one after another. Audio files can be listened to easily and can be moved from one place to another within the space when appropriate.

6.3 Audio within Hypermedia

Audio Preview Cues. Audio Preview Cues allows users to interact with a music library of perhaps unfamiliar styles [15]. Each style can be previewed so that users may decide if they would like to hear more of this style. Music files are categorised into various sections, such as classical, rock, etc., and users navigate until they find the style of their choice.

Analysis. Our tool allows users to preview their audio files quickly. Instead of organising into hierarchical structures, we allow users to organise spatially.

7. Future Discussions

At present we focus on spatial placements and previewing of audio files; this could be extended to include text, video, image and any other form of information. There are still difficulties with regard to representing information that belongs in more than one place. In the future I would like to further investigate what types of relationships users want between files, and how many of these are necessary. Computers are capable of producing complex visualisations of information, but it is less certain that users are able to interpret such visualisations, or would even want them.


[1] Barreau, D., Nardi, B.A., Finding and reminding: file organization from the desktop. SIGCHI Bull. 27 (1995) 39-43.

[2] Gauvain, J.L., Lamel, L., Adda, G., Transcribing broadcast news for audio and video indexing. Commun. ACM 43 (2000) 64-70.

[3] Shneiderman, B., The limits of speech recognition. Commun. ACM 43 (2000) 63-65.

[4] Bush, V., As we may think. Atlantic Monthly (1945).

[5] Douglas C. Engelbart, Augmenting Human Intellect: A Conceptual Framework. Technical report, Direction of Inforation Sciences, Air Force Office of Scientific Research, Washington (1962).

[6] Robertson, G., Czerwinski, M., Larson, K., Robbins, D.C., Thiel, D., van Dantzich, M., Data mountain: using spatial memory for document management. In Proceedings of the 11th annual ACM symposium on User interface software and technology, ACM Press (1998) 153-162.

[7] Shipman, F.M., Marshall, C.C., LeMere, M., Beyond location: hypertext workspaces and non-linear views. In Proceedings of the tenth ACM Conference on Hypertext and hypermedia: returning to our diverse roots, ACM Press (1999) 121-130.

[8] Brown, M.L., Newsome, S.L., Glinert, E.P., An experiment into the use of auditory cues to reduce visual workload. In CHI '89: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Press (1989) 339-346.

[9] Malone, T.W., How do people organize their desks?: Implications for the design of office information systems. ACM Trans. Inf. Syst. 1 (1983) 99-112.

[10] Goh, D., Leggett, J., Patron-augmented digital libraries. In DL '00: Proceedings of the fifth ACM conference on Digital libraries, New York, NY, USA, ACM Press (2000) 153-163.

[11] Buchanan, G., Blandford, A., Thimbleby, H., Jones, M., Integrating information seeking and structuring: exploring the role of spatial hypertext in a digital library. In HYPERTEXT '04: Proceedings of the fifteenth ACM conference on Hypertext & hypermedia, New York, NY, USA, ACM Press (2004) 225-234.

[12] Shipman, F.M., Hsieh, H., Maloor, P., Moore, J.M., The visual knowledge builder: a second generation spatial hypertext. In Proceedings of the twelfth ACMconference on Hypertext and Hypermedia, ACM Press (2001) 113-122.

[13] AnnoTape: (

[14] Atlasti: (

[15] Schraefel, M.C., Karam, M., Zhao, S., Listen to the music: Audio preview cues for exploration of online music. In Interact 2003: Bringing the Bit Together. Ninth IFIP TC13 International Conference on Human-Computer Interaction. (2003).


© Copyright 2006 Kirstin Lyon

Top | Contents
Previous Article
Next Article
Home | E-mail the Editor