Volume 6 Issue 2
Fall 2010
ISSN 1937-7266

A Topic Modeling Approach to Social Tag Prediction

Meiqun Hu

Singapore Management University
80 Stamford Road, Singapore 178902


Social tagging has gained immense popularity on the Web. Tags, which are assigned by users, are lightweight abstractions of the target Web resources. Such lightweight knowledge representation brings about numerous benefits of social information processing on resource navigation and Web search. However, as much as we would like to have all Web resources adequately tagged, few tags are given to enormous amount of resources online. This work studies the use of Latent Dirichlet Allocation (LDA) for predicting tags for web documents. We propose LDAtgg model that extends LDA to model the mixtures of tags using latent topics. We relate the topics of tags assigned to a web document with the topics of words in the content, based on the intuition that, topics that are discussed more often in the document are likely to appear more often in tags. We demonstrate the effectiveness of LDAtgg model in the tag prediction task, using news articles published online and the tag assignments to these documents collected from delicious.com. Our LDAtgg model improves the prediction accuracy by more than 20% over the baseline methods.

1 Introduction

Social bookmarking, more commonly known as social tagging, gains increasing popularity on the Web today. Many Web 2.0 sites allow users to bookmark resources with keywords, known as tags. Tags convey meaning and interpretation from users on the tagged content. Not restricted to a controlled vocabulary or a predefined hierarchy, tags are created as a result of sense-making by individual users.

Tags serve as metadata to identify, share and organize resources. Many interesting Web services can also be built upon these metadata, such as resource organization [11], [19], and retrieval [20]. Discovering (or re-discovering) information through tags is to leverage the collective wisdom to determine the relevance, popularity and timeliness of the tagged content, a process known as social information processing.

To effectively search and explore resources in a social tagging system, the resources should have been adequately tagged. However, tags are scarce for large amount of resources online today. While a small amount of resources attract extensive tagging, the vast majority are left untagged [6], [8], [3], [15]. Due to scarcity in tags, the benefits brought about by services built upon these metadata cannot be fully reaped.

We aim to enrich tags for the untagged web document. In this work, we focus on predicting tags for web pages. Given that tags are keywords to describe or summarize a page, one may propose to select important terms from the page content as predicted tags. tf (term frequency) and tf-idf (term frequency × inverse document frequency) are commonly used criteria for selecting important terms. However, such selection methods assume that tag terms can only come from words that appear in the page content. In reality, such assumption may not always hold. To overcome the limitation of the vocabulary of a single page, one may suggest predicting words or tags from other relevant pages, e.g., those whose contents are similar to the target page. However, due to the long-tail distribution of tags in web pages in social tagging systems [2], finding relevant pages that are adequately tagged may also be a challenge.

We observe that tags can be viewed as an abstraction of the content they are assigned to. Often, they are general terms for representing a certain topic. As there can be multiple topics covered by the text content of a page, one would expect each of these topics to be represented correspondingly in the tags collectively assigned to this page by multiple users. Based on these observations, we tackle the tag prediction task by leveraging the multi-topic nature of web pages and the generality of tags. We propose a probabilistic topic model for representing the correspondence between topics of words and topics of tags. Predicted tags are selected from topics that are found in the content of the page. We model the tag vocabulary explicitly, so that the model is capable of capturing tags assigned by web users but not found in the page content.

From this research, we seek to understand and answer the following questions:

  • Do tags relate to the content words of text web documents?
  • Is topic correspondence a viable relation between the tags collectively assigned to the same document by multiple users and the content words of the document?
  • How model the topic correspondence, learn the topics and their corresponding words and tags and use the learnt model to predict the tags for a new document
  • Do predictions made using model(s) based on the relation of topic correspondence give better accuracy than those that do not?

Our contribution in this work includes the use of a probabilistic topic model for performing the tag prediction task. We propose LDAtgg model, which extends latent Dirichlet allocation (LDA) [1], [10] to model the topics for tags. We formulate a Gibbs sampling procedure for conducting inference on the model parameters. We demonstrate the effectiveness of LDAtgg model on a real and novel collection of news articles and tag data from delicious.com. Our LDAtgg model shows superior performance. In particular, when using 100 topics, our LDAtgg model surpasses the strongest baseline method by more than 20% in tag prediction accuracy.

2 Related Work

Most research works on social tagging so far have been focusing on users [10] and their roles in relating tags to resources [16], tags to tags [8], as well as resources to resources [19]. There has also been a considerable volume of work that studies the vocabulary that emerges and evolves in a social tagging system [13]. Interested readers may refer to [17] for a more comprehensive review on the subjects. In the following, we briefly review the studies on the tag prediction task and the probabilistic topic models that have inspired us on this work.

2.1. Tag Prediction Research

There are mainly two branches of research in the area of tag prediction, namely user-independent tag prediction and user-dependent tag recommendation. The former aims to enrich tags for web resources but does not assume a target user, while the latter requires personalized tag recommendation that does assume a target user who are likely to consume the recommended tags. Our work belongs to the former. Our research problem is so defined to aim at general applications that consume tagging data, such as indexing for Web search.

For user-independent tag prediction, [8] proposed two approaches, namely feature-based binary classification and association rule mining. They investigated features such as page text, anchor text and the structure of surrounding hosts, and found that tags can be predicted with high precision but low recall using these three types of features. They also exploited frequent co-occurrences of tags to mine association rules for tag inference. However, tag inference using association rules assumes some observed tag(s) for the target document. This assumption may not be applicable to all web documents. [3] observed citation links among scholarly publications in CiteULike and CiteSeer. Tag predictions for the inadequately tagged publications are propagated along edges of the citation graph. Another study that makes use of the links between web documents for propagating tags is given by Subramanya and Liu [15], in which blog posts are the subjects of study. In contrast, our proposed model does not rely on web data beyond the content of the web document itself, and does not assume the availability of existing tags for the target document.

For user-dependent tag recommendation, tags are predicted for a given pair of target document and target user [5], [14], [16]. Personalized tag recommendation aims at accommodating individual users’ tag vocabularies. However, personalized ranking on tags can only be achieved provided that one has adequate amount of tags for the target document and sufficient knowledge about the target user.

2.2. Latent Dirichlet Allocation Research

Latent Dirichlet Allocation, which is a probabilistic model for text, was first introduced by [2]. It assumes K topics that describe a collection of documents. Each topic has a mixture (multinomial distribution) of words, and each document has a mixture of topics. The mixtures of topics for documents are governed by a Dirichlet distribution.

Our LDAtgg model proposed in this work is closely related to the Correspondence Latent Dirichlet Allocation (Corr-LDA) [1], which was originally developed for generating captions for images. An image contains multiple regions, and each word in the image caption corresponds to one of the regions. The correspondence from words to regions is assumed to follow uniform distributions. Similar idea has been proposed for modeling the correspondence between named entities and their context words, which occur closely to the occurrences of the named entities [12]. For our tag prediction task, we also adopt the uniform distribution assumption for modeling the correspondence from the topic assignments for words and the topic assignments for tags.


3.1. Problem Definition

Before we give formal definition to the tag prediction task, we first introduce the notations used to represent entities and their relationships in social tagging.

In social tagging, there are three types of entities, namely documents, users, and tags. The ternary relationship exists, i.e., (u, d, v), when a user (u) bookmarks a document (d) using the tag (v).

We represent each web document by a bag of words, . In representing the document in word space, we do not assume the availability of other sources of data from the web, e.g., hyperlinks from page to page or domain information about page publishers. We do, however, assume some amount of text content for each document. Content words are extracted from the HTML source to form the bag of words representation for each web page.

We also represent each web document as a bag of tags, , if they exist. We aggregate the tag assignments given by multiple users, since tag distribution tends to converge as more and more tags are assigned to the same document [5]. When aggregating tag assignments, the same tag may be used by multiple users for the same document.

In both word space and tag space, we use the bag notation, in which term occurrences are not assumed to be unique. We use word token and tag token to refer to an occurrence in the bag of words and bag of tags. Although multiple tokens may share the same term, each token is unique in the bag. The set of terms observed in the word (or tag) space makes up the word (or tag) vocabulary. Hence, the notion of word or tag refers to an entry in the vocabulary, and the notion of word token or tag token refers to an occurrence in the bag of words or the bag of tags representation.

We define our tag prediction task as follows: given a collection of web documents, in which each document has not only a bag of words but also a bag of tags, our task is to learn a model using this collection of documents; And, when given an unseen document in which only the content words can be observed, we should predict a ranked list of tags based on the model we have learnt and the observed words in the document.

3.2. LDAtgg Model

We propose LDAtgg model to extend LDA for performing the tag prediction task. Similar to LDA, we assume K topics are used to describe a collection of documents. Each document has a mixture (multinomial distribution) of topics. In our LDAtgg, each topic has not only a mixture of words but also a mixture of tags. Each word token in a document belongs to one of the K topics, so is each tag token for the document.

In LDAtgg, we make two main assumptions when incorporating tags. First, tags and words do not share a common vocabulary, even though overlap in terms of usage is possible. Second, tag tokens are not directly related to word tokens. Instead, they are related via the topics they belong to. Figure 1 depicts our LDAtgg model in plate notation.

In LDAtgg, we hypothesize that, “the topic assignments for tag tokens sample uniformly from the topic assignments for word tokens in the same document”. If a topic is discussed more often in the document, then it is likely that more tags from this topic would be assigned to the document. We use to denote the total number of word tokens found in document d, to denote the topic assignment for the n-th word token in d, and to denote the topic assignment for the m-th tag token in d.

Figure 1 LDAtgg Model in Plate Notation
Legend of plate notation: circles denote random variables; squares denote hyper parameters; rounded rectangles (‘plate’) denote repeated samples, with the number of repetitions written at the bottom corner; arrows indicate conditional dependencies between variables. Shaded shapes indicate observed variables, while unshaded shapes indicate latent variables.


Figure 2 Variables and Distributions in LDAtgg Model

Our hypothesis on the uniform sampling correspondence can be represented as . Based on this hypothesis, the generation of words and tags in a document, ( , ), can be described as a generative process consisting of the following two phases:

  1. First, for each word token in d:
    1. sample a topic from the mixture of topics for the document, ;
    2. sample a word from the mixture of words, , for topic .
  2. Second, for each tag token in d:
    1. sample a topic uniformly from the topic assignments for word tokens in the document, ;
    2. sample a tag from the mixture of tags, , for topic .

This two-phase process is repeated for every document in the collection. It is worth noting that the generation of tag tokens should be done only after all word tokens in the corresponding document have been generated.

The mixtures of topics for documents are governed by a Dirichlet distribution. If no prior knowledge is given, this Dirichlet distribution is assumed symmetric, denoted by the shared prior α. Similar assumptions are made to the mixtures of words for topics and the mixtures of tags for topics. They are governed by two separate Dirichlet distributions with priors β and γ respectively.

3.3. Parameter Estimation

Three (sets of) model parameters have to be learnt to fully describe a collection of documents using the model. They are , the multinomial on topics for each document; , the multinomial on words for each topic; and , the multinomial on tags for each topic.

We adopt Gibbs sampling, which is a Markov Chain Monte Carlo (MCMC) method, to learn the model parameters. Gibbs sampling is widely adopted to conduct approximate inference on the parameters of LDA-based models. Its popularity is due to its efficiency in estimating the joint a posteriori probabilities. It is first adopted by [4] to learn LDA.

Our Gibbs sampler follows the two-phase generation process, and iteratively samples and updates the topic assignments based on estimated probabilities and . Equations 1 and 2 formulate the estimation.


where, we follow the symbols used in [7]. d, k, n and m denote the indices for document, topic, word token within a document and tag token within a document respectively. We use c and q to denote the count of word tokens and the count of tag tokens respectively. The notation ¬{d, n} denotes the exclusion of the word token indexed at n in document d. Whenever the context is clearly within the current document d, we omit the document index and use only the word token index. Similar symbols are used when counting tag tokens.

Once we have learnt the three sets of parameters, a learnt model M is obtained.

3.4. Tag Prediction using LDAtgg

Predicting tags for a test document d’ starts with observing the bag of words . Given the learnt model M, we compute the posterior probability:


To estimate , we re-sample topic assignments for . In this case, re-sampling does not involve the phase for sampling topic assignments for tags. The re-sampling procedure degenerates into that for LDA model, but with topic probabilities for each word token estimated according to Equation 4.


where, denotes the count of word tokens in the test document, which distinguishes from that in the learnt model.


We conduct experiments to evaluate our LDAtgg model in the tag prediction task. We are interested in the accuracy in the ranked tag predictions. Although learning time required for LDA-based models can be prohibitive, once the model is learnt offline, it can be used for predicting tags for test document(s) with lower computational cost than training, since the required re-sampling only takes a few iterations [7]Heinrich, 2008). Therefore, time complexity is not our major concern.

4.1. Data Collection

We examine tag prediction for news articles published online. We chose this type of web documents because of the availability of text content. We collected news articles from three online publishers, namely BBC, CNN and USAToday. We notice that all these publishers support tagging via links to delicious.com and other social sharing tools. Our objective in data crawling is to collect as many as possible news pages that contain both text content and tags.

Starting from the home page of each news publisher, we performed breadth first search by following hyperlinks that are confined to the respective domain. We searched up to a maximum depth of 4 to obtain the initial set of URLs. For each URL in the initial set, we crawled the HTML source to extract news content . Meanwhile, we acquired tags from delicious.com for URLs in the initial set. Since not all news pages contain text content, and not all have been assigned tags in delicious.com, our final set of URLs for each publisher is the intersection of the set of URLs that contain text and the set of URLs that have been assigned tags in delicious.com. In other words, every URL in the final set for each publisher has text content as well as tags. Our crawls of the news content and tags were both conducted in April 2009. A summary of the dataset is given in Table 1.

We notice that not all URLs contain an equal amount of text and not all URLs attract adequate amount (if any) of tags on delicious.com. In Table 1, news pages that have attracted bookmarks among those in the initial set only constitute a small portion, which varies from 7.68% to 34.15%. Again, this demonstrates that scarcity in tags is prevalent. URLs that either do not give actual news content or have not been tagged are not included in our final set.

The overall dataset used in our experiment takes the union of the final sets of URLs from these three news publishers. The union consists of 4,493 documents in total. Table 2 summarizes some statistics for documents in this dataset.

Our crawl has collected 33,222 bookmarks made by 16,272 users. For preprocessing words in documents, we tokenized, normalized words to lowercase, and removed stopwords. We further removed words that appear in less than 3 documents. This results in a word vocabulary of size 24,322. For preprocessing tags, since delicious.com tokenizes tags using whitespace by default, we only removed the prefixing and suffixing punctuation of tags and normalized them to lowercase. This resulted in a tag vocabulary of size 12,468. We did not stem words or tags since topic models are capable of distinguishing terms of the same root into separate topics. As expected, vocabulary for tags and vocabulary for words are not identical.

4.2. Experiment Settings

We adopt 5-fold cross-validation. For each fold, we learnt model parameters φ and ψ from 3 independent samples. Each sample was seeded randomly. Sample parameters were collected by running the Gibbs sampler for 1,000 iterations, where the first 500 iterations were for burn-in. Random seeds were re-drawn at every 100 iterations. The parameters φ and ψ used for prediction were taken as the average of the three samples.

We set the Dirichlet hyperparameters α = , β =0.01 and γ =0.01 according to Stayvers et al [4]. These values were fixed for all folds and samples. We trained our LDAtgg model with predefined K = 50 and K = 100 for the prediction task.

4.3. Evaluation Metrics

For evaluating prediction accuracy, we examine the top 5predicted tags for each document. The choice of top 5 follows the convention in [9]. We look at four evaluation metrics for comparing the prediction performance at each top 5prediction, namely precision, recall, F1 and NDCG. We report the performance at k∈{1, 2, 3, 4, 5} averaged over all test documents.

NDCG (normalized discounted cumulative gain) evaluates ranking performance when there are multiple levels of relevance. DCG@k is defined as

where s(p) denotes the relevance level/score of the retrieved item at position p. NDCG@k is DCG@k normalized by the optimal DCG@k for the particular document, i.e., more relevant tags should always precede less relevant tags. For the choice of s (p), there has not been a standard set of ground truth scores for multiple levels of tag relevance. However, tag frequency observed for each document provides a reasonable reference. We attempted six variations for deriving s(p) from the observed tag frequency . However, since the six options do not differ in the conclusion we draw, we report only one option, for which the derivation is shown in the table below. denotes the x-percentile of the observed tag frequency in d.

4.4. Baseline Methods

For comparison, two groups of baseline methods are used.

  • Keyword-based methods. We adopt tf and tf-idf to select words from documents.
  • LDA methods. We also adopt LDA to predict words as tags. In this case, we learn LDA models on words, which gives ’s, and then compute the probability of assigning a word term by observing all word tokens in a test document.

One disadvantage shared by all baseline methods is that, they may suggest words that never appeared in the tag vocabulary. To offset this disadvantage, we apply post-hoc filtering. Words that never appeared in the tag vocabulary are filtered out and not ranked.

4.5. Performance Analysis

Figure 3 Evaluating Performance on Tag Prediction
(For a larger view of the image, click here.)

Figure 3 shows that, our LDAtgg models give superior performance in the tag prediction task. LDAtgg¬-100 outperforms the baseline methods in precision (Fig. (a)), recall (Fig. (b)), f1(Fig. (c)) as well as NDCG (Fig. (d)) in all top 5 predictions.

1) LDAtgg vs. baseline: The strongest baseline method is tf-idf. At k =1, it gives precision of 0.406, recall of 0.074, and NDCG of 0.345. Whereas, our LDAtgg-100 method gives precision of 0.437, recall of 0.089, and NDCG of 0.385, which outperforms tf-idf by 7.6%, 20.3%, and 11.6% respectively. At k=5, the improvements increase to 19.6%, 23.7% and 21.1% respectively. Although not as strong as LDAtgg-100, LDAtgg-50 method also outperforms tf-idf by 8.0%, 11.3% and 9.0% in the respective metrics at k=5. Note that tf-idf performance will be worse off if not because of the post-hoc filtering, which filters out predicted words that are not found in tag vocabulary.

While neither of the baseline methods models the tag vocabulary explicitly, keyword-based methods predict word terms from within the particular document, whereas LDA baseline methods, i.e., LDA-50 and LDA-100, predict word terms using topic mixtures learnt from training data. As shown in Figure 3, keyword-based methods always outperform LDA baseline methods. This holds for precision, recall, as well as NDCG performances. It suggests that word terms from within the page are more accurate than those introduced by topic mixtures.

Among the two settings on the number of topics, we find LDAtgg-100 always predicts better than LDAtgg-50. Intuitively, the more topics, the finer each topic covers. Hence, we expect more specific tag terms being predicted using larger number of topics. However, the more topics, the more complex the model becomes, and the longer it takes to learn. Although perplexity is an indicator commonly used as a measure of goodness, finding the best fit number of topics remains an open research problem [18], which can be an interesting topic for our future work. For this corpus, we find using both LDAtgg-50 and LDAtgg-100 give reasonably good predictions.

4.6. Tag Prediction Cases

In Table 1, we show two example pages, the top frequent tags assigned to them and the top ranked tag predictions given by tf-idf and LDAtgg-100 methods. As expected, topics discussed in the page are represented accordingly in the top predictions given by LDAtgg-100. On the contrary, top predictions given by tf-idf are too specific to be rarely used as tags for the corresponding pages.


Tag prediction tasks are challenging on its own merits. The novelty of this study includes the use of topic models to describe the correlation between tags assigned to web pages and words appearing in the content of the page. By hypothesizing that topics that are discussed more in the document are likely to have more tags correspond to them, we propose LDAtgg model for performing the tag prediction task. Our experiment conducted on a novel collection of news articles show promising results. Our LDAtgg model with 100 topics surpasses the baseline prediction methods by over 20% in prediction accuracy. This work is our first step into tackling the tag prediction task using LDA. There is ample room for improvement. In particular, we plan to conduct more in-depth analysis of the proposed methods, to help us understand the tag prediction task and answer the following questions:

  • For what kinds of tags does LDAtgg give predictions with high confidence? And for what kinds of tags it does not? From the current experimental results, LDAtgg is likely to predict tags that are mostly general (top ranked, more frequently used) within the topics. It would provide us more insights to explore how does LDAtgg perform for infrequent tags.
  • For what kinds of topics does LDAtgg predict tags better? And for what kinds of tags it does not? Does LDAtgg work better for more specific topics? How does topic specificity related to the total number of topics set for training the model? How to set the best-fit number of topics?
  • Is there any characteristics of the corpus that make one method work better than another? How does LDAtgg performance when generalized to other corpora?

I thank my advisors Dr. Ee-Peng Lim (Singapore Management University) and Dr. Jing Jiang (Singapore Management University) for their guidance and support. I am also grateful to Dr. Christopher Soo Guan Khoo (Nanyang Technological University) and Dr. David Lo (Singapore Management University) for their comments and helpful suggestions on improving this work. This work is partially funded by the National Research Foundation (NRF) (NRF2008IDM-IDM004-036).


[1] Blei, D. M., & Michael, J. I. (2003). Modeling Annotated Data. In Proceedings of SIGIR (pp. 127 - 134). Toronto, Canada: ACM.
[2] Blei, D. M., Ng, A. Y., & Michael, J. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993 - 1022.
[3] Budura, A., Michel, S., Cudre-Mauroux, P., & Aberer, K. (2008). To Tag or Not to Tag: Harvesting Adjacent Metadata in Large-scale Tagging Systems. In Proceedings of SIGIR (pp. 733 - 734). Singapore: ACM.
[4] Griffiths, T. L., & Steyvers, M. (2004). Finding Scientific Topics. In Proceedings of the National Academy of Sciences of the United States of America , 101, 5228 - 5235.
[5] Guan, Z., Bu, J., Mei, Q., Chen, C., & Wang, C. (2009). Personalized Tag Recommendation using Graph-based Ranking on Multi-type Interrelated Objects. In Proceedings of SIGIR (pp. 540 - 547). Boston, Massachusetts, USA: ACM.
[6] Halpin, H., Robu, V., & Shepherd, H. (2007). The Complex Dynamics of Collaborative Tagging. In Proceedings of WWW (pp. 211 - 220). Banff, Alberta, Canada: ACM.
[7] Heinrich, G. (2008). Parameter Estimation for Text Analysis. University of Leipzig, Germany. Leipzig, Germany: University of Leipzig, Germany.
[8] Heymann, P., Ramage, D., & Garcia-Molina, H. (2008). Social Tag Prediction. In Proceedings of SIGIR (pp. 531 - 538). Singapore: ACM.
[9] Jaschke, R., Eisterlehner, F., Hotho, A., & Stumme, G. (2009). Testing and Evaluating Tag Recommenders in a Live System. In Proceedings of the third ACM Conference on Recommender Systems (pp. 369 - 372). New York City, New York, USA: ACM.
[10] Li, X., Guo, L., & Zhao, Y. E. (2008). Tag-based Social Interest Discovery. In Proceedings of WWW (pp. 675 - 684). Beijing, China: ACM.
[11] Mika, P. (2005). Ontologies are Us: A Unified Model of Social Networks and Semantics. In Proceedings of ISWC (pp. 522 - 536). Galway, Ireland: Springer.
[12] Newman, D., Chemudugunta, C., & Smyth, P. (2006). Statistical Entity-Topic Models. In Proceedings of KDD (pp. 680 - 686). Philadelphia, Pennsylvania, USA: ACM.
[13] Sen, S., Lam, S. K., Rashid, A. M., Cosley, D., Frankowski, D., Osterhouse, J., et al. (2006). Tagging, Communities, Vocabulary, Evolution. In Proceedings of CSCW (pp. 181 - 190). Banff, Alberta, Canada: ACM.
[14] Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W.-C., et al. (2008). Real-time Automatic Tag Recommendation. In Proceedings of SIGIR (pp. 515 - 522). Singapore: ACM.
[15] Subramanya, S. B., & Liu, H. (2008). SocialTagger - Collaborative Tagging for Blogs in the Long Tail. In Proceedings of the 2008 ACM workshop on Search in Social Media (pp. 19 - 26). Napa Valley, California, USA: ACM.
[16] Suchanek, F. M., Vojnovic, M., & Gunawardena, D. (2008). Social Tags: Meaning and Suggestions. In Proceedings of CIKM (pp. 223 - 232). Napa Valley, California, USA: ACM.
[17] Trant, J. (2009). Studying Social Tagging and Folksonomy: A Review and Framework. Journal of Digital Information, 10, 1 - 44.
[18] Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation Methods for Topic Models. In Proceedings of ICML (pp. 1105 - 1112). Montreal, Quebec, Canada: ACM.
[19] Wu, X., Zhang, L., & Yu, Y. (2006). Exploring Social Annotations for the Semantic Web. In Proceedings of WWW (pp. 417 - 426). Edinburgh, Scotland: ACM.
[20] Zhou, D., Bian, J., Zheng, S., Zha, H., & Giles, C. L. (2008). Exploring Social Annotations for Information Retrieval. In Proceedings of WWW (pp. 715 - 724). Beijing, China: ACM.