Volume 4 Issue 1
Spring 2008
ISSN 1937-7266

Tagging Video: Conventions and Strategies of the YouTube Community

Gary Geisler and Sam Burns

School of Information
The University of Texas at Austin
1 University Station, D7000
{geisler, sburns}@ischool.utexas.edu

The rise in popularity of web-based social tagging systems, which enable people to assign free-form terms or "tags" to resources within an information system, has recently led to academic studies aimed at better understanding the nature and potential value of such systems. We argue that the potential value of social tagging is particularly strong for digital video, or moving image, resources because while the amount of online video content available to people is rapidly increasing, the visual and temporal nature of video raises well-known problems of classification and description (e.g., the "semantic gap"), making the capability to effectively catalog the growing stores of video resources a critical open problem.

One approach for dealing with the challenge of video description is to leverage the collective effort of a community that has an interest in the resource collection and use their aggregated descriptions (tags) of the content to facilitate discovery by others within the community. Because most social tagging systems enable all members of the community to see the tags that have been previously used to describe content, tagging conventions and strategies take shape and help to further define sharing mechanisms employed by the community.

Our poster describes a quantitative analysis of the tags used by 537,246 contributors to tag more than one million videos by participants in the largest social network site focused on video resources, YouTube, to better understand how video is being described by and on behalf of members intent to share within a large community, how the tags used by this community might reflect description strategies and characteristics unique to video content, and the implications these findings have for the design of more effective systems for collections of digital video.

In our sample of more than one million You Tube videos, a total of 517,008 distinct tags (without stemming or punctuation normalization) were used. The median number of tags applied per video was 6.0. A significant majority (66%) of tag terms applied were ones that did not appear in the tagged video's title, description, and author fields, suggesting that many of the tag terms applied to videos provide additional descriptors for submitted video. However, our sample also contained many examples of tagging behaviors that indicate tags are being used in ways that do not enhance the description of the video but are instead a result of system constraints and non-descriptive strategies for sharing video. Better understanding of these system constraints and user strategies can suggest design changes that might mitigate tagging errors and enhance the potential of tags to aid in the discovery and utilization of video resources in a shared environment.

More information about our project can be found at http://gremlin.ischool.utexas.edu/youtube/

Thumbnail image of poster

For a larger view of Figure 1, click here.