Volume 5 Issue 3
Winter 2009
ISSN 1937-7266

Thesis Support Information System (TSIS)

Seungwon Yang

Department of Computer Science
Virginia Tech
2030 Torgersen Hall
Blacksburg, VA 24061 U.S.A.
+1-540-231-3615
seungwon@vt.edu

ABSTRACT

Finding an appropriate dissertation topic, which is interesting, novel, provable, has intellectual merit, and potentially has commercial value, is not always easy. Multiple variables are involved in the process, either explicitly or implicitly. This study is an attempt to assist graduate, especially Ph.D., students so that they could reduce time and effort in the process of identifying, developing, screening and solidifying (and possibly starting over this process iteratively) their novel dissertation topic. To achieve this goal, efficient methodologies to retrieve sub-documents from ETDs will be developed. A strategy to find existing topics, which are relevant to the users’ research interests and areas, will be developed. Also a strategy to refine those to be novel and interesting will be developed and incorporated into the proposed Thesis Support Information System (TSIS). Methodologies of generating innovative ideas by association will be incorporated from the design of an existing system, CyberQuest [6]. TSIS also will include services to suggest relevant reference materials and a summary of appropriate methodologies, given a dissertation topic.

Categories and Subject Descriptors

H.1.2 Information Systems: User/Machine Systems – Human factors; H.4.2 Information Systems Applications: Types of Systems – Decision support (e.g., MIS); H.3.3 Information Storage and Retrieval: Information search and retrieval – Clustering, Query formulation.

General Terms:

Algorithms, Measurement, Documentation, Design, Experimentation, Human Factors, Theory.

Keywords:

Text information retrieval, electronic theses and dissertation (ETD), sub-document, creativity, idea generation, information system

1. INTRODUCTION

Launching Ph.D. research involves finding an appropriate dissertation topic. Some people might find their topics rather naturally while they participate in a funded research project. Some pursue their personal research interests. These interests might be combined with the funded project, which may be an ideal situation. In this process of making a decision on doctoral research direction, people around the student will be involved. Advisors and committee members will be the major sources for advice. They provide guidance to students in adjusting the scope of the topic and direction so that the dissertation work will be achievable, novel, and interesting, as well as a significant contribution to the field.

Once the topic, relevant problem statements, research questions and ideas for methodology have been decided, students will devote themselves to finding solutions by applying various methodologies or developing a novel one if none are applicable. Data will be collected and evaluated as part of the research to assess the effectiveness of the proposed solution. The limitations of the study and additional questions that need to be addressed will be presented. A list of references is prepared, which is highly relevant to all aspects of a dissertation. The references provide a common ground between the dissertation’s author and the readers who are interested in it.

Not always does the dissertation process go as well as described above. One of difficulties in finding an appropriate topic is that there exist multiple variables that affect the decision making process. The number of values that variables can take tends to increase as students’ knowledge and experiences increase. Therefore, the more time spent in a graduate program, potentially the harder it becomes for students to ‘just pick one of the topics.’ Possible variables involved in the dissertation topic selection both explicitly and implicitly might be:

  • Intellectual merit of the topic in the field
  • Level of innovation/creativity in the proposed solution
  • Estimated feasibility (time and effort) of the implementation of the idea/topic
  • Level of contribution to the field
  • Students’ research interests, background area, skills, personality
  • Advisors’ research interests, background area, skills
  • Future goal (e.g., going to academia or business)
  • Impact from external forces (e.g., research funding situations, job market changes, technical and societal changes, etc.)

In the following sections, introduced are the motivation for this study, the existing problems, research questions, related other studies, conceptual design ideas and their descriptions with concept maps, methodologies to conduct testing of the proposed system, both implicit and explicit contributions to the field, and an overall summary.

2. MOTIVATION

What made me take a long time to have a nice dissertation topic? It has been a while since I had my first dissertation topic about a novel information system. I received valuable and resourceful comments whenever I had a discussion with my peers and instructors regarding it. Their advice helped me to consider other aspects involved and to expand my initial idea to a broader context. My interest in creativity led me to a second topic, in which I wanted to connect online information resources and systems with an idea generation system to provide effective idea screening. Now I have a third topic, ‘Thesis Support Information System (TSIS),’ which came out of a discussion with my advisor. Again, I had discussions and received valuable comments from my peers and instructors both from my and other fields of study.

I think that I finally have settled down on this topic after months of a quest for my dissertation topic. This quest itself was an interesting and important step to get to the final decision. At times, the journey made me excited or frustrated. I met with people in different levels in the academia, who helped me identify the area I might pursue in the future. With the support of the proposed TSIS, Ph.D. students will be able to come to their dissertation topics with less time and effort. Other researchers might find this system valuable considering that they could identify interesting problems not much addressed in their disciplines. Those problems also could be a starting point to conduct research projects and write grant proposals. One professor answered me, ‘In agony,’ when I asked her how she found her dissertation topic when she was a Ph.D. student. It is my hope that the agony that many of my fellow Ph.D. students are sharing will be alleviated with the help of TSIS.

3. THE PROBLEMS

  • Finding an appropriate dissertation topic and relevant resources is not always easy. The possible reasons would be that it requires human time and effort to generate innovative ideas and to identify open problems. Also, it is hard to examine whether ideas or problems have been explored already or to find the next question to explore.
  • There is not much assistance for identifying, or suggesting appropriate sub-sections of ETDs, which could have richer information compared to the key words and overall abstract of ETDs. Electronic Theses and Dissertations (ETDs) contain various problem statements and solutions proposed as well as future work plans and references. They could be effectively used in searching for dissertation topics, related resources and methodologies. Therefore, more technical support is needed.

4. RESEARCH QUESTIONS

  • What is an effective way to identify open problems and promote ‘associative creativity’ to generate novel ideas to solve those problems?
  • What is an effective strategy to screen ideas to assess if they are novel, feasible to implement and potentially contributing to the field of study?
  • What is a good way to design an information system so that it could access and utilize a sub-document level of ETDs? What are the technical obstacles involved in this and how can they be resolved? Upon access, how can the sub-document components (e.g., problem statements, literature reviews, methodologies, future works and references) be related to each other and displayed in an easily understandable way to the users?

5. RELATED STUDIES

My quest for creative idea generation was one of the motivational factors that led me to pursue this topic. It is a very broad topic, and so it is not appropriate to talk about the general concept of creativity here; however, understanding the common creative processes would be important if I am to design the proposed TSIS. Knörig presents a comparison of multiple Creative Process Models in his dissertation [1]. In total, 6 of those models, by Wallas, Osborn, Amabile, Poincare, Couger and Shneiderman, were compared. Overall, they mostly share the process of Preparation -> Idea Generation -> Evaluation -> Elaboration -> Donation. Idea generation could be subdivided into Incubation and Illumination, where Incubation is the period that the creative person is not explicitly contemplating about an idea. It is an idle and waiting time for insights to appear in mind. Csikszentmihalyi [2] reports that most highly creative people emphasized the importance of this period. Idea Generation in the model above might not include Incubation when the model is implemented on a computer system, considering that the true Incubation may be achieved by shutting down the computer and going out for a walk. However, Idea Generation could be enhanced with more powerful associations of concepts between disparate areas supported by TSIS.

Shneiderman applies a creative process in designing user interfaces for innovation support. His four phase creativity framework, called ‘genex,’ includes: “(1) Collect: learn from previous works stored in libraries, the Web, etc.; (2) Relate: consult with peers and mentors at early, middle, and late stages; (3) Create: explore, compose, evaluate possible solutions; and (4) Donate: disseminate the results and contribute to the libraries [3].” These four steps are non-linear and cyclical phases; the creative work may return to any of the previous steps. More ‘concrete’ eight activities during the genex framework were suggested based on all three perspectives of creativity – inspirationalism, structuralism, and situationalism:

  1. Searching and browsing digital libraries
  2. Consulting with peers and mentors
  3. Visualizing data and processes
  4. Thinking by free association
  5. Exploring solutions—what-if tools
  6. Composing artifacts and performances
  7. Reviewing and replaying session histories
  8. Disseminating results

Those eight activities resemble the process of doing Ph.D. research. The ‘Thinking by free association’ activity is one of the techniques frequently used to generate novel ideas (e.g., Searching amazon.com with ‘associative thinking’ returns 1,062 books). The Idea Generation engine, illustrated in Figure 2, will employ this in its design.

Creativity supporting tools have been developed. Warr and O'Neill present creativity support design tools that support generation and interaction with external representations to promote common ground and shared understanding among the stakeholders [4]. Kerne et al. present an interesting software system, called combinFormation, which supports a mixed-initiative (software agent and human user help each other in activities) creativity support tool. Information is represented as text and image ‘surrogates.’ By associating various surrogates, users have a chance to develop novel ideas [5]. CyberQuest (CQ) [6] is a software system developed by Dr. John W. Dickey at Virginia Tech. The author conducted more than 600 sessions with participant groups, where he acted as a facilitator between the system and human participants. CQ has 6 steps: Problem/Aim/Dream description -> Word selection -> Idea generation -> Idea screening -> Packaging -> Reporting. At the end of a session, participants will receive a detailed report about the innovative ideas generated and a potential implementation plan. Dr. Dickey said that the Idea generation step was the most interesting and fun part. But the Idea screening step was difficult since there were many variables involved in successfully implementing the ideas generated and there were no clear assessment for the variables. The proposed TSIS will attempt to provide some evidence that the generated dissertation topic is actually feasible (or not feasible) to implement.

Hassel presents language independent and less human resource intensive automatic text summarization software in his dissertation [7]. He also claims that the summarization system should be able to be easily assembled using only a small set of basic language processing tools, which are not specifically aimed at summarization. Benefits of this type of summarization tool include its easy portability. Mani mentions that demand for text summarization is growing due to technology advance and the digitization of text information [8]. With the increasing volume of online information, it is getting harder to generate meaningful and timely summaries. In the case of sub-document summaries in ETDs, timely summaries may not be that important considering that those will be prepared in advance. Hahn mentions that data summarization methods fall into two categories. Knowledge-poor approaches rely on not having to add new rules for each new application domain or language. On the other hand, knowledge-rich approaches attempt to grasp the meaning of the text so that they can more effectively reduce the text, yielding a better summary. In the case of ETD sub-document summaries, balance will be needed.

6. CONCEPTUAL DESIGN

Figure 1 shows a high-level conceptual design of TSIS. It is initially intended for users in academia, for example Ph.D. students, who are preparing their dissertations; however, it could be used by people in various professions, who require access to current research problems, methodologies, and literature, and would like to enhance ideas based on them. Upon having a tentative topic or problem in which they are interested, users can develop those further by scanning the ETD collection to verify if their topic has been addressed already or what the current status of the problem is. The literature relevant to their topic will be displayed so that the users can read and explore deeper into a specific aspect of the topic. Support for potential methodologies to address the research questions could be uncovered. Upon receiving the user’s query, for example a problem statement and description of the area of study, TSIS will provide a list of similar ETDs and their sub-documents, such as problem statements, literature reviews, methodologies, or ideas for future work. In addition, the users will be able to select a method, see a list of related references, or be directed to the actual ETDs that incorporate the method.


Figure 1. High-level conceptual design of the Thesis Support Information System (TSIS).

TSIS utilizes an ETD collection and the open Web as its knowledge base. Online search engines will be used in accordance with the ETD collection to support information. A high number of ETDs have been and are in the process of being crawled in the Digital Library Research Laboratory at Virginia Tech. More will be prepared, so we have a substantial amount of information. Technology to access at the sub-document level and to provide groupings of the document parts will be employed.

6.1 OPEN PROBLEM ASSISTANT (OPA)


Figure 2. Open Problem Assistant component of TSIS.

Once we know the direction to our goal, the only thing left is to diligently go that way, overcoming hurdles and learning on the way. Considering the importance of finding the right direction, identifying an open problem – providing a potential direction to go – would be a very important initial step in dissertation research. Thus, the OPA component is explained first. Its conceptual design is expressed in Figure 2.

A short scenario will help illustrate how the OPA component will work. Kara has been developing her dissertation topic for some months, reading related literature, and discussing with peers and her advisor. But, she wants to develop her topic to be more innovative. Kara finds TSIS and decides to give it a try. She enters a short description of her tentative topic, ‘Discussion Augment System,’ as well as her area of study ‘computer science’ and other research interests such as cognitive science and human factors. She selects four keywords from her description in case she wants to do manual matching. Then, she clicks the ‘Automatching’ button for OPA to do the job for her. Her description is parsed into key words. They are matched against problem statements and research questions databases in computer science, cognitive science, and human factors, which have been prepared in advance.

The links to matching problem statements and research question sections are displayed as succinct sentences for users to better understand the content pointed to by those links. They are grouped by similarity of the topics. In each group, another criterion such as the publication year of the dissertation or the number of entries in the reference section could be used to rank the problem statements. Kara selects a group, ‘Group Discussion Support,’ and finds an interesting link about supporting generation of innovative ideas in a group environment. She adds an idea from this and browses through the ‘Limitations and Future Works’ databases. Kara was interested in the Idea generation process and enters an updated description of her topic and four keywords after clicking ‘Proceed to Idea Generation.’

The Idea generation process displays summaries of problem statements in ETDs from other disciplines such as biology, fine arts, and history, which have concepts in common with Kara’s topic description keywords, and include ‘analog distance.’ Analog distance is a ‘conceptual distance’ between the two concepts that are similar but not the same. One way to generate interesting ideas is by association of concepts that have some analog distance. Kara replaces ‘group’ with ‘innovation’ in her topic after experiencing the Idea Generation engine. To see if her updated topic, ‘Innovation Support Information System,’ is novel, she uses the topic to search again. The topic turns out to be a novel one that has not been explored enough. She could successfully update her topic and then work to see what kinds of references are related to it.

6.2 LITERATURE REVIEW ASSISTANT (LRA)

The LRA component matches users’ queries, which consist of about four keywords from a short description of the dissertation topic, against the Literature Review database (Figure 3). Once matched, not those reviews, but references appearing inside them are collected and merged together by the merging engine. The result is displayed ranked based on frequency. Each reference will still have a connection (or connections) to its original literature reviews to provide quick browsing of the resource. In reality, there are technical barriers to identify and segment literature reviews and its related reference papers, from PDF files. Therefore, the implementation of LRA will remain as a future work; its conceptual design is presented in Figure 3.


Figure 3. Literature Review Assistant component of TSIS.

6.3 METHODOLOGY ASSISTANT (MA)

The MA component should be able to present relevant known methodologies for the provided novel dissertation topics. MA reacts to the input, which consists of a description of a novel dissertation topic, by matching its key words and concepts to multiple short summaries of methodologies. All the matching methodology summaries and references that appear in those summaries are displayed. They can be grouped as ‘quantitative’ or ‘qualitative’ methodologies or by both categories. If an open problem is a truly novel one, it requires new methodologies to solve it. Nevertheless, MA could help researchers to come up with new ideas and enhanced approaches, if applicable methodologies are identified.


Figure 4. Methodology Assistant component of TSIS.

7. SKETCH OF METHODOLOGIES

The Thesis Support Information System consists of three components – OPA, LRA and MA. Considering the current technical barriers, only the OPA component is planned to be prototyped and tested for effectiveness and user satisfaction. Yet, the work of others in the Digital Library Research Laboratory may lead to partial or complete implementations of other components, and then to collaborative studies integrating those components into TSIS. Hence, to be complete, possible experimental tasks are presented for all three component systems.

  • OPA component: Tasks will involve finding existing problems/topics, which are relevant to the provided user query. Other similar systems (e.g., Web search engines, DB search engines) can be used in comparisons regarding effectiveness and user satisfaction.
  • LRA component: Tasks will involve searching for a list of references which are relevant to the provided user query. Other similar systems (e.g., CiteSeer, EndNote search engine) can be used in comparisons regarding effectiveness and user satisfaction.
  • MA: Tasks will involve relevant methodologies, which relate to the provided user query. Other similar systems (e.g., DB search engines, etc.) can be used in comparisons regarding effectiveness and user satisfaction.
  • (Experiment Setting) During a semester, a total of 40 graduate students will be recruited for the tests. First, OPA will be tested by 20 students who are randomly assigned to the system. The other 20 will use other similar systems such as Web search engines or DB search engines to complete the tasks. Demographic information, and other data, using pre- and post-questionnaires, will be collected. Experimental sessions will be video-recorded and used later for detailed analysis of user interaction with the system. The screens will be captured using Camtasia software.
  • Depending on the availability of particular ETDs in the set collected, certain constraints might be applied to the ETDs to be used in the experiments. For example, ETDs, which are from only certain disciplines/schools, published after year 2000 and written in English, may be made the focus of experimentation.

8. CONTRIBUTION

  • Strategies to find a novel dissertation topic will be developed. As sub-strategies, techniques to identify existing dissertation topics that match the users’ research interests and area of expertise will be developed. Then, mechanisms to generate innovative topics based on the existing topics, by association with other areas, will be studied and implemented using theories from the creativity research field.
  • Screening mechanisms to verify the novelty and feasibility of newly generated ideas will be created and incorporated into the system. These mechanisms and their implementation will benefit other fields of studies (e.g., Public Administration), too, which require verification of ideas and new strategies to solve a problem.
  • Strategies to effectively extract and store sub-document level components from a large data set such as Electronic Theses and Dissertation (ETD) collections will be developed.
  • This study will help Ph.D. students find novel, interesting, and valuable dissertation topics in their field. The system component to be developed (OPA) will help save time and effort in examining whether the proposed topic has been already addressed. If so, students will receive guidance on where to go from there.

9. SUMMARY

In this extended abstract, ideas and high-level design plans of a system called Thesis Support Information System (TSIS) are presented. Each component of TSIS, such as the Open Problem Assistant, Literature Review Assistant, and Methodology Assistant are explained and their concept map diagrams are presented to help readers’ understanding. The emphasis of this paper is on the development of Ph.D. students’ dissertation topics, which are novel, interesting to them and others, provable, and (in some ideal cases) have commercial value. Even after an open problem is identified, generating innovative ideas to solve the problem may not be that easy. But, that ‘not easy’ process of generating innovative ideas is what distinguishes humans from machines. That’s our work, but the time and effort needed in the process might be reduced with the help of a software system. It is my aim and hope that I will develop a deeper understanding of humans as well as technologies and software systems through my research on this topic.

10. ACKNOWLEDGMENTS

I would like to thank the DL curriculum project team, LIKES project team, all my lab colleagues and friends, who supported me with their valuable insights and comments. Special thanks go to my advisor, Dr. Edward A. Fox, and Dr. John W. Dickey, who have guided me into the fields of IR and creativity.

11. REFERENCES

[1] Knörig, André. Free the body and the mind will follow: An investigation into the role of the human body in creativity, and its application to HCI. Diploma Thesis, Univ. of Applied Sciences Wedel (2006). http://andreknoerig.de/projects/free-the-body
 
[2] Csikszentmihalyi, Mihaly (1996). Creativity: flow and the psychology of discovery and inven¬tion. New York: Harper Perennial.
 
[3] Shneiderman, Ben (2000). Supporting Creativity with Powerful Composition Tools for Ar¬tifacts and Performances. In Proceedings of the 33rd Hawaii International Conference on System Sciences, Vol. 7. HICSS ‘00. Washington, DC, USA: IEEE Computer Society.
 
[4] Warr, A. and O'Neill, E. 2007. Tool support for creativity using externalizations. In Proceedings of the 6th ACM SIGCHI Conference on Creativity & Cognition (Washington, DC, USA, June 13 - 15, 2007). C&C '07. ACM, New York, NY, 127-136. DOI= http://doi.acm.org/10.1145/1254960.1254979
 
[5] Kerne, A., Koh, E., Smith, S. M., Webb, A., and Dworaczyk, B. 2008. combinFormation: Mixed-initiative composition of image and text surrogates promotes information discovery. ACM Trans. Inf. Syst. 27, 1 (Dec. 2008), 1-45. DOI= http://doi.acm.org/10.1145/1416950.1416955
 
[6] Dickey, J. W. 1995. CyberQuest: Problem Solving and Innovation Support System Conceptual Background and Experiences. Ablex Publishing. ISBN-13: 978-1567501179.
 
[7] Hassel, M. 2007. Resource Lean and Portable Automatic Text Summarization, PhD-Thesis, School of Computer Science and Communication, KTH, ISBN-978-917178-704-0
 
[8] Mani, I. 2001. Recent developments in text summarization. In Proceedings of the Tenth international Conference on information and Knowledge Management (Atlanta, Georgia, USA, October 05 - 10, 2001). H. Paques, L. Liu, and D. Grossman, Eds. CIKM '01. ACM, New York, NY, 529-531. DOI= http://doi.acm.org/10.1145/502585.502677
 
[9] Hahn, U. and Mani, I. 2000. The Challenges of Automatic Summarization. Computer 33, 11 (Nov. 2000), 29-36. DOI http://dx.doi.org/10.1109/2.881692