Go to  Advanced Search

Blog comments classification using tree structured conditional random fields

Show simple item record

dc.contributor.author Jin, Wei
dc.date.accessioned 2012-11-07T18:14:49Z
dc.date.available 2012-11-07T18:14:49Z
dc.date.copyright 2012 en
dc.date.issued 2012-11-07
dc.identifier.uri http://hdl.handle.net/2429/43571
dc.description.abstract The Internet provides a variety of ways for people to easily share, socialize, and interact with each other. One of the most popular platforms is the online blog. This causes a vast amount of new text data in the form of blog comments and opinions about news, events and products being generated everyday. However, not all comments have equal quality. Informative or high quality comments have greater impact on the readers’ opinions about the original post content, such as the benefits of the product discussed in the post, or the interpretation of a political event. Therefore, developing an efficient and effective mechanism to detect the most informative comments is highly desirable. For this purpose, sites like Slashdot, where users volunteer to rate comments based on their informativeness, can be a great resource to build such automated system using supervised machine learning techniques. Our research concerns building an automatic comment classification system leveraging these freely available valuable resources. Specifically, we discuss how comments in blogs can be detected using Conditional Random Fields (CRFs). Blog conversations typically have a tree-like structure in which an initial post is followed by comments, and each comment can be followed by other comments. In this work, we present our approach using Tree-structured Conditional Random Fields (TCRFs) to capture the dependencies in a tree-like conversational structure. This is in contrast with previous work [5] in which results produced by linear-chain CRF models had to be aggregated heuristically. As an additional contribution, we present a new blog corpus consisting of conversations of different genres from 6 different blog websites. We use this corpus to train and test our classifiers based on TCRFs. en
dc.language.iso eng en
dc.publisher University of British Columbia en
dc.relation.ispartof Electronic Theses and Dissertations (ETDs) 2008+ en
dc.title Blog comments classification using tree structured conditional random fields en
dc.type Text en
dc.degree.name Master of Science - MSc en
dc.degree.discipline Computer Science en
dc.degree.grantor University of British Columbia en
dc.date.graduation 2013-05 en
dc.type.text Thesis/Dissertation
dc.description.affiliation Science, Faculty of
dc.degree.campus UBCV en
dc.description.scholarlevel Graduate en

Files in this item

Files Size Format Description   View
ubc_2013_spring_jin_wei.pdf 966.0Kb Adobe Portable Document Format   View/Open

This item appears in the following Collection(s)

Show simple item record

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893