Volume 6 Issue 1
Spring 2010
ISSN 1937-7266

Exploratory Web Searching with Dynamic Taxonomies,
Results Clustering and Visualization

Panagiotis Papadakos

Institute of Computer Science
Computer Science Department
University of Crete, GREECE


The general objective of this dissertation is to elaborate on methods and techniques for exploring and progressively refining large volumes of heterogeneous information, through the provision of appropriate models of interaction and visualization techniques. The goal is to support advanced in accuracy and completeness answers, appropriate for supporting decision making, while the user is not obliged to formulate complex queries or to use specific user profiles. Instead, it will be attempted to provide the user with "all" beneficial options to adjust or restrict the information received, in a summarized, concise and intuitive manner. The navigation should provide appropriate data visualization techniques, in order to exploit the ability of the human brain for rapid understanding and perception of visual information, and the ability to discover standards and relationships through it.

1. Introduction

Web Search Engines (WSEs) typically return a ranked list of documents that are relevant to the query submitted by the user. For each document, its title, URL and snippet (fragment of the text that contains keywords of the query) are usually presented. It is observed that most users are impatient and look only at the first results [1]. Consequently, when either the documents with the intended (by the user) meaning of the query words are not in the first pages, or there are a few dotted in various ranks (and probably different result pages), it is difficult for the user to find the information he really wants. The problem becomes harder if the user cannot guess additional words for restricting his query, or the additional words the user chooses are not the right ones for restricting the result set.

One solution to these problems is results clustering [39] which provides a quick overview of the search results. It aims at grouping the results into topics, called clusters, with predictive names (labels), aiding the user to locate quickly documents that otherwise he wouldn't find, especially if these documents are low ranked (and thus not in first result pages). Another solution is to exploit the various metadata that are available to WSEs (like domain, dates, language, file type, etc). Such metadata are usually exploited through the advanced search facilities that some WSEs offer, but users very rarely use. A more flexible and promising approach is to exploit such metadata in the context of the interaction paradigm of faceted and dynamic taxonomies (FDT) [27, 33, 28], a paradigm that is used more and more now. Its main benefit is that it shows only those terms of the taxonomy that lead to non-empty answer sets, and the user can gradually restrict his focus using several criteria by clicking, providing an easy switch between searching and browsing.

The rest of this paper is organized as follows. Section 2 discusses requirements, related work and background information. Section 3 describes our vision for exploratory search systems. Section 4 describes implementation and reports preliminary experimental results, while Section 5 offers an overview of the proposed approach.

2 Requirements & Background

2.1 Results Clustering

Results clustering algorithms should satisfy several requirements. First of all, the generated clusters should be characterized from high intra-cluster similarity. Moreover, results clustering algorithms should be efficient and scalable since clustering is an online task and the size of the retrieved document set can vary. Usually only the top-C documents are clustered in order to increase performance. In addition, the presentation of each cluster should be concise and accurate to allow users to detect what they need quickly. Cluster labeling is the task of deriving readable and meaningful (single-word or multiple-word) names for clusters, in order to help the user to recognize the clusters/topics he is interested in. Such labels must be predictive, descriptive, concise and syntactically correct. Finally, it should be possible to provide high quality clusters based on small document snippets rather than the whole documents.

In general, clustering can be applied either to the original documents (like in [5, 10]), or to their (query-dependent) snippets (as in [39, 30, 7, 40, 9, 34). Clustering meta-search engines (e.g. clusty.com) use the results of one or more WSEs, in order to increase coverage/relevance. Therefore, meta-search engines have direct access only to the snippets returned by the queried WSEs.

Clustering the snippets rather than the whole documents makes clustering algorithms faster.

Some clustering algorithms [7, 6, 37] use internal or external sources of knowledge like Web directories (e.g. DMoz), Web dictionaries (e.g. WordNet) and thesauri, online encyclopedias and other online knowledge bases. These external sources are exploited to identify significant words/phrases that represent the contents of the retrieved documents or can be enriched, in order to optimize the clustering and improve the quality of cluster labels.

One very efficient and effective approach is the Suffix Tree Clustering (STC) [39], where search results (mainly snippets) can be clustered fast (in linear time) and incrementally, and each cluster is labeled with a phrase. Overall and for the problem at hand, we consider important the requirements of relevance, browsable summaries, overlap, snippet-tolerance, speed and incrementality as described in [39]. Several variations of STC have emerged recently (e.g. [4, 13, 34]).

2.2 Exploratory Search and Information Thinning

Most WSEs are appropriate for focalized search, i.e. they make the assumption that users can accurately describe their information need using a small sequence of terms. However, as several user studies have shown, this is not the case. A high percentage of search tasks are exploratory [1], the user does not know accurately his information need, the user provides 2-5 words, and focalized search very commonly leads to inadequate interactions and poor results. Unfortunately, available UIs do not aid the user in query formulation, and do not provide any exploration services. The returned answers are simple ranked lists of results, with no organization.

We believe that modern WSEs should guide users in exploring the information space. Dynamic taxonomies [27] (faceted or not) is a general knowledge management model based on a multidimensional classification of heterogeneous data objects and is used to explore and browse complex information bases in a guided, yet unconstrained way through a visual interface. Features of faceted metadata search include (a) display of current results in multiple categorization schemes (facets) (e.g. based on metadata terms, such as size or date), (b) display categories leading to non-empty results, and (c) display of the count of the indexed objects of each category (i.e. the number of results the user will get by selecting this category). An example of the idea, assuming only one facet, is shown in Figure 1. Figure 1(a) shows a taxonomy and 8 indexed objects (1-8). Figure 1(b) shows the dynamic taxonomy if we restrict our focus to the objects {4,5,6}. Figure 1(c) shows the browsing structure that could be provided at the GUI layer and Figure 1(d) sketches user interaction.

Figure 1: Dynamic Taxonomies (click here for a larger view of the image)

The user explores or navigates the information space by setting and changing his focus. The notion of focus can be intensional or extensional. Specifically, any set of terms, i.e. any conjunction of terms (or any boolean expression of terms) is a possible focus. For example, the initial focus can be the empty, or the top term of a facet. However, the user can also start from an arbitrary set of objects, and this is the common case in the context of a WSE and our primary scenario. In that case we can say that the focus is defined extensionally. Specifically, if A is the result of a free text query q, then the interaction is based on the restriction of the FDT on A (Figure 1(b) shows the restriction of a taxonomy on the objects {4,5,6}). At any point during the interaction, we compute and provide to the user the immediate zoom-in/out/side points along with count information (as shown in Figure 1(d)). When the user selects one of these points then the selected term is added to the focus, and so on. Note that the user can exploit the faceted structure and at each step may decide to select a zoom point from another facet.

Examples of applications of faceted metadata-search include: e-commerce (e.g. ebay), library and bibliographic portals (e.g. DBLP), museum portals (e.g. [12] and Europeana), mobile phone browsers (e.g. [15]), specialized search engines and portals (e.g. [22]), Semantic Web (e.g. [11, 21]), general purpose WSEs (e.g. Google Base), and other frameworks (e.g. mSpace [29]).

2.3 Related Work

Systems like [38, 11, 19, 21, 2] support multiple facets, each associated with a taxonomy which can be predefined. Moreover, the systems described in [38, 11, 21] support ways to configure the taxonomies based on the contents of the results. Specifically, [38] enriches the values of the object descriptions with more broad terms by exploiting WordNet, [11] supports rules for deciding which facets should be used based on RDF, and [21] supports reclassification of the objects to predefined types. In addition, there are works [1, 20] in the literature that compare automatic results clustering with guided exploration (through FDT). However, none of these systems apply content-based results clustering, re-constructing the cluster tree taxonomy while the user explores the answer set. Instead they construct it once per each query.

3 Vision: Advanced Exploratory Search

3.1 Information Space (Advancements)

3.1.1 Integration of Mined and Explicit Metadata

To the best of our knowledge, there are no other WSEs that offer the same kind of information/interaction. A somehow related interaction paradigm that involves clustering is Scatter/Gather [5, 10], which allows the users to select clusters, subsequently the documents of the selected clusters are clustered again, the new clusters are presented, and so on. However, the application of results clustering on thousands of snippets would have the following shortcomings: (a) Inefficiency, since real-time results clustering is feasible only for hundreds of snippets, and (b) Low cluster label quality, since the resulting labels would be too general. On the other hand, dynamic taxonomies can load and handle thousands of objects very fast [33]. To this end we propose a dynamic (on-demand) integration approach. The idea is to apply the result clustering algorithm only on the top-C (usually C=100) snippets of the current focus. This approach not only can be performed fast, but it is expected to return more informative cluster labels. The approach is described at [23], where the contribution of our work lies in: (a) proposing and motivating the need for exploiting both explicit and mined metadata during Web searching, (b) showing how automatic results clustering can be combined with the interaction paradigm of dynamic taxonomies, by clustering on-demand the top elements of the user focus, (c) providing incremental evaluation algorithms, and (d) reporting experimental results that prove its feasibility and effectiveness.

The above approach can be applied also on other kinds of dynamically-mined metadata. With the term dynamically-mined metadata we refer to metadata which should be minable (a) from small quantities or portions of data, e.g. from the snippets of the top-K part of a query answer, and (b) in real-time. The motivation for focusing on small quantities is that (i) we may not have at our disposal large quantities (e.g. we may have access only to snippets), (ii) it may be computationally expensive to apply these mining tasks on large quantities of data, and (iii) we may want to focus on small qualities for enhancing the quality (specificity) of the mined information. In the context of a WSE, such metadata can be mined from the snippets of the top elements of the current answer. Examples of such mining tasks, apart from results clustering, include: a) Facet and Taxonomy Mining ([6] generates facet hierarchies dynamically from text or text-annotated objects) and b) Entity Mining, where named entity recognition (also known as entity identification and entity extraction) [25] is a subtask of information extraction that seeks to locate and classify atomic elements in text, into predefined categories such as the names of persons, organizations, etc.

3.2 Metrics for Exploratory Search

Evaluation of exploratory systems is crucial to their success and refers to measuring the extent to which people use them to achieve goals in terms of effectiveness, efficiency and satisfaction. [14] discusses various methods and measures of controlling the experimental studies of web search interfaces and [36] proposes an evaluation method for exploratory search features. The evaluation of such systems is difficult, because of the complexity of data relationships, diversity of displayed data, interactive nature of exploratory search, along with the perceptual and cognitive abilities offered. They rely heavily on users’ ability to identify and act on exploration opportunities [35]. Important parts of retrieval results, trends, patterns, clusters, and other aggregate information, are difficult to be measured and no specific metrics are available. Finally, it is difficult to come up with an universal evaluation system. So, the selection and definition of appropriate metrics for the evaluation of exploratory search and IR visualization systems is an important factor for the design and evaluation of exploratory services. These metrics can also facilitate appropriate automatic adaptation functionality, including the requirements of mobile devices and phones with small displays.

3.3 User Interaction (Advancements)

3.3.1 Ranking and Reduction of Facets and Zoom Points

Since our intention is to provide a rich variety of facets and zoom points, their ranking is important for a positive user experience. Only the top-K facets and zoom points will be presented to the user, while options to visit any facet and zoom points will be provided. As a result, we have to provide appropriate ranking methods. Such methods can be based on the count information of indexed objects, sibling and children information of zoom points, lexicographic ranking of terms, TF*IDF weighting (i.e. we can assume the terminology of a facet as a document), or a ranking method for zoom points, based on the relevance of the indexed objects (i.e. zoom points indexing the most relevant documents are ranked high). Furthermore, facets and zoom points reduction are important for mobile devices, with limited displays. In addition, the user will be able to define a specific threshold to restrict the number of results. This threshold can be defined over the objects, or for specific attributes of a facet, like number of instant children, count number, etc. Using this interaction model the user could get a better understanding of the returned objects, their relationships, and their distribution over the facets.

3.3.2 Interactive Preference Management over FDT

IR systems should provide efficient and effective access to exploratory information needs. User actions should be extended in order to further ease the interaction. Currently, preference management and adaptation requires the user to formulate complex expressions or interact with complex UIs. Regarding FDT and personalization, there is little research done. Most FDT systems output facets in lexicographical order, or by taking into consideration the number of indexed documents. A collaborative approach, with explicit user ratings to design a personalized FDT system is proposed in [18], where several algorithms are proposed and evaluated, based on an evaluation methodology for personalized FDTs. This work though, does not allow the user to express any attribute-based preference. Another collaborative approach by content filtering, based on (manually or automatically) created ontologies is proposed in [32, 31]. Relevance to users is measured by calculating the distance between values in the hierarchical ontology, and the personalization supported is simplistic and concerns only the visualization layer. Finally, an approach for data warehouses is described in [26], where at each step of navigation, the system asks the user one or more questions in order to fetch the most promising set of facets, based on a simple approximation algorithm. Again, that work does not allow users to express preferences, and the framework does not support hierarchically organized values.

The above observations justify the need for flexible and universal access methods that offer on-line preference elicitation and support. Requirements of such explorative environments include: a) simplicity (the users should be able to use and understand the interaction immediately), b) expressiveness (it should be possible for the user to interactively specify complex preference structures), and c) acceptance by users (the resulting interaction should be effective and desired by the users). We will investigate how we can extend the user actions in order to further ease the interaction and to speed up the restriction of the focus to those parts of the information space that the user is interested in. These actions will affect the appearance and presentation order of facets, terms and objects of the focus. The semantics will be described using the framework described in [16], extending it to also support hierarchically organized values. This could possible be done by supporting a form of preference inheritance.

3.3.3 Exploratory Search and Visualization

The diversity of IR visualization models poses the question if they can be synthesized into one visualization environment. Two basic strategies in this direction are identified: a) display multiple visual configurations simultaneously in a larger visualization environment b) synthesize various visualization approaches into one visualization approach (data structures should be compatible and their displayed attributes should be complementary) [41]. Recently, there is an effort to design a declarative language for the specification of visualization and interaction methods, which will allow the formal expression of structure, appearance, behavior and communication between the various structures of information visualization [3, 8].

We will focus on investigating ways to integrate the different visualization models into the interaction paradigm of FDT, providing a synthesized visualization environment, which will take advantage of each visualization model’s strengths, and at the same time overcome it’s weaknesses.

For the above we will take into consideration psychology theories, like pre-attentive processing and Gestalt theory, for more intuitive user interaction.

4 Implementation and Experimental Evaluation

The implementation will be done in the context of Mitos [24], which is a prototype WSE (under development by the Department of Computer Science of the University of Crete and FORTH-ICS). FleXplorer is used by Mitos for offering general purpose browsing and exploration services. Currently, and on the basis of the top-K answer of each submitted query, the following five facets are created and offered to users:

  • the hierarchy of clusters derived by NM-STC (one of the two STC variants we have developed [17])
  • web domain, a hierarchy is defined (e.g. csd.uoc.gr csd.uoc.gr<uoc.gr<gr)
  • format type (e.g. pdf, html, doc, etc), no hierarchy is created in this case
  • language of a document based on the encoding of a web page and
  • (modification) date hierarchy.

4.1 Preliminary Experimental Results

Loading times of FleXplorer have been thoroughly measured in [33]. In brief, the computation of zoom-in points with count information is more expensive than without. In 1 second we can compute the zoom-in points of 240.000 results with count information, while without count information we can compute the zoom-in points of 540.000 results. Execution times that correspond to the integration of FleXplorer and results clustering using the non-incremental and an incremental approach of NM-STC for the top-C elements are reported in [23]. It is evident that for top-{100,200} values, the results are presented almost instantly (around 1 second), making the proposed on demand clustering method suitable as an online task. Moreover, we can see that there is a linear correlation between time cost and the top-C value. A significant speedup is observed when the incremental algorithm is used and the overlap of snippets is more than 50%.

5 Epilogue

Apart from the methodological issues for achieving the above, this dissertation will focus on (a) performance, so that the resulting methods and techniques are applicable on large volumes of information, (b) flexibility of applicability, so that they are also applicable to different types of information (from simple text and unstructured data, to semi-structured and structured data), and (c) adaptability (or on-site configuration) of these services. With regard to adaptability, the operation should be simple and systematic so that it can support the adaptation of services based on the environment (context) of the user.


[1] Special issue on Supporting Exploratory Search. Communications of the ACM, 49(4), April 2006.
[2] O. Ben-Yitzhak, N. Golbandi, N. Har’El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E. Shekita, B. Sznajder, and S. Yogev, "Beyond basic faceted search." In Procs of the Intern. Conf. on Web Search and Web Data Mining, (WSDM’08), Palo Alto, California, USA, February 2008, pages 33–44.
[3] J. A. Cottam and A. Lumsdaine, "ThisStar: Declarative Visualization Prototype." In IEEE Symposium on Information Visualization, 2007.
[4] D. Crabtree, X. Gao, and P. Andreae, "Improving web clustering by cluster selection." In Procs of the IEEE/WIC/ACM Intern. Conf. on Web Intelligence (WI’05), Compiegne, France, September 2005, pages 172–178.
[5] D.R. Cutting, D. Karger, J.O. Pedersen, and J.W. Tukey. "Scatter/Gather: A cluster-based approach to browsing large document collections." In Procs of the 15th Annual Intern. ACM Conf. on Research and Development in Information Retrieval, (SIGIR’92), Copenhagen, Denmark, June 1992, pages 318–329.
[6] W. Dakka and P.G. Ipeirotis. "Automatic extraction of useful facet hierarchies from text databases." In Procs of the 24th Intern. Conf. on Data Engineering, (ICDE’08), Cancún, México, April 2008, pages 466–475.
[7] P. Ferragina and A. Gulli, "A personalized search engine based on web-snippet hierarchical clustering." In Procs of the 14th Intern. Conf. on World Wide Web, (WWW’05), volume 5, Chiba, Japan, May 2005, pages 801–810.
[8] G. Jaeschke, M. Leissler, and M. Hemmje, “Modeling Interactive, 3-Dimensional Information Visualizations Supporting Information Seeking Behaviors,” Knowledge and Information Visualization, 2005, pp. 119-135.
[9] F. Gelgi, H. Davulcu, and S. Vadrevu. Term ranking for clustering web search results. In 10th Intern. Workshop on the Web and Databases, (WebDB’07), Beijing, China, June 2007.
[10] M.A. Hearst and J.O. Pedersen. "Reexamining the cluster hypothesis: Scatter/Gather on retrieval results." In Procs of the 19th Annual Intern. ACM Conf. on Research and Development in Information Retrieval, (SIGIR’96), Zurich, Switzerland, August 1996, pages 76–84.
[11] M. Hildebrand, J. van Ossenbruggen, and L. Hardman, “/facet: A browser for heterogeneous semantic web repositories". In Procs of Intern. Semantic Web Conf., (ISWC’06), Athens, GA, USA, November 2006, pages 272–285.
[12] E. Hyvönen, E. Mäkelä, M. Salminen, A. Valo, K. Viljanen, S. Saarela, M. Junnila, and S. Kettula, “MuseumFinland – Finnish museums on the semantic web." Journal of Web Semantics, 3(2), 2005, p, 25.
[13] J. Janruang and W. Kreesuradej. "A new web search result clustering based on true common phrase label discovery." In Procs of the Intern. Conf. on Computational Intelligence for Modelling Control and Automation and Intern. Conf. on Intelligent Agents Web Technologies and International Commerce, (CIMCA/IAWTIC’06), Washington, DC, USA, November 2006, page 242.
[14] M. Käki and A. Aula, "Controlling the complexity in comparing search user interfaces via user studies." Inf. Process. Manage., 44(1):82–91, 2008.
[15] A. K. Karlson, G. G. Robertson, D. C. Robbins, M. P. Czerwinski, and G. R. Smith. “FaThumb: A facet-based interface for mobile search." In Procs of the Conf. on Human Factors in Computing Systems, (CHI’06), Montréal, Québec, Canada, April 2006, pages 711–720.
[16] W. Kießling. Foundations of preferences in database systems. In VLDB ’02: Proceedings of the 28th international conference on Very Large Data Bases, VLDB Endowment, 2002, pages 311–322.
[17] S. Kopidaki, P. Papadakos, and Y. Tzitzikas. STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching. In Procs of the 10th International Conf. on Web Information Systems Engineering, (WISE’09), October 2009.
[18] J. Koren, Y. Zhang, and X. Liu. Personalized interactive faceted search. In WWW’08: Procs of the 17th intern. conf. on World Wide Web, New York, NY, USA, 2008. ACM, pages 477–486,
[19] B. Kules, J. Kustanowitz, and B. Shneiderman. Categorizing web search results into meaningful and stable categories using fast-feature techniques. In Procs of the 6th ACM/IEEE-CS Joint Conf. on Digital Libraries, (JCDL’06), Chapel Hill, NC, USA, June 2006, pages 210–219.
[20] B. Kules, M. Wilson, M. Schraefel, and B. Shneiderman. From keyword search to exploration: How result visualization aids discovery on the web. Human-Computer Interaction Lab Technical Report HCIL-2008-06, University of Maryland, 2008.
[21] E. Mäkelä, E. Hyvönen, and S. Saarela. Ontogator - a semantic view-based search engine service for web applications. In Procs of Intern. Semantic Web Conf., (ISWC’06), Athens, GA, USA, November 2006, pages 847–860.
[22] E. Mäkelä, K. Viljanen, P. Lindgren, M. Laukkanen, and E. Hyvönen. Semantic yellow page service discovery: The veturi portal. Poster paper at Intern. Semantic Web Conf., (ISWC’05), Galway, Ireland, November 2005.
[23] P. Papadakos, S. Kopidaki, N. Armenatzoglou, and Y. Tzitzikas. "Exploratory Web Searching with Dynamic Taxonomies and Results Clustering". In Procs of the 13th European Conf. on Digital Libraries, (ECDL’09), Corfu, Greece, September 2009.
[24] P. Papadakos, Y. Theoharis, Y. Marketakis, N. Armenatzoglou, and Y. Tzitzikas. "Mitos: Design and evaluation of a dbms-based web search engine". In Procs of the 12th Pan-Hellenic Conf. on Informatics, (PCI’08), Samos, Greece, August 2008.
[25] M. T. Pazienza, editor. Information Extraction: Towards Scalable, Adaptable Systems, volume 1714 of Lecture Notes in Computer Science. Springer, 1999.
[26] S. B. Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania. Minimum-effort driven dynamic faceted search in structured databases. In CIKM’08: Procs of the 17th ACM conf. on Information and knowledge management,New York, NY, USA, 2008, pages 13–22.
[27] G. M. Sacco. “Dynamic taxonomies: A model for large information bases". IEEE Transactions on Knowledge and Data Engineering, 12(3):468–479, May 2000.
[28] G. M. Sacco and Y. Tzitzikas. "Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience". Springer-Verlag, 2009.
[29] M.C. Schraefel, M. Karam, and S. Zhao. “mSpace: Interaction design for user-determined, adaptable domain exploration in hypermedia". In Procs of Workshop on Adaptive Hypermedia and Adaptive Web Based Systems, Nottingham, UK, August 2003, pages 217–235.
[30] J. Stefanowski and D. Weiss. Carrot2 and language properties in web search results clustering. In Procs of the Intern. Atlantic Web Intelligence Conf., (AWIC’03), Madrid, Spain, May 2003. Springer.
[31] M. Tvarozek, M. Barla, G. Frivolt, M. Tomsa, and M. Bieliková. Improving semantic search via integrated personalized faceted and visual graph navigation. In SOFSEM, volume 4910 of Lecture Notes in Computer Science, Springer, 2008, pages 778–789.
[32] M. Tvarozek and M. Bieliková. Personalized faceted browsing for digital libraries. In ECDL, 2007, pages 485–488.
[33] Y. Tzitzikas, N. Armenatzoglou, and P. Papadakos. FleXplorer: A framework for providing faceted and dynamic taxonomy-based information exploration. In 19th Intern. Workshop on Database and Expert Systems Applications, (FIND’08 at DEXA’08), Torino, Italy, 2008, pages 392–396.
[34] J. Wang, Y. Mo, B. Huang, J. Wen, and L. He. Web search results clustering based on a novel suffix tree structure. In Procs of 5th Intern. Conf. on Autonomic and Trusted Computing, (ATC’08), volume 5060, Oslo, Norway, June 2008, pages 540–554.
[35] R. W. White, G. Marchionini, and G. Muresan. Evaluating exploratory search systems: Introduction. Information Processing and Management, 44(2): March 2008.
[36] [1] M.L. Wilson and M.C. schraefel, “Bridging the Gap: Using IR Models for Evaluating Exploratory Search Interfaces.” In SIGCHI 2007 Workshop on Exploratory Search and HCI. ACM, April 2007.
[37] D. Xing, G.R. Xue, Q. Yang, and Y. Yu. Deep classifier: Automatically categorizing search results into large-scale hierarchies. In Procs of the Intern. Conf. on Web Search and Web Data Mining, (WSDM’08), Palo Alto, California, USA, February 2008, pages 139–148.
[38] K. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In Procs of the Conf. on Human Factors in Computing Systems, (CHI’03), Ft. Lauderdale, Florida, USA, April 2003, pages 401–408.
[39] O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In Procs of the 21th Annual Intern. ACM Conf. on Research and Development in Information Retrieval, (SIGIR’98), Melbourne, Australia, August 1998, pages 46–54.
[40] H.J. Zeng, Q.C. He, Z. Chen, W.Y. Ma, and J. Ma. Learning to cluster web search results. In Procs of the 27th Annual Intern. Conf. on Research and Development in Information Retrieval, (SIGIR’04), Sheffield, UK, July 2004, pages 210–217.
[41] J. Zhang. Visualization for Information Retrieval. Springer-Verlag, 2008.