Go to  Advanced Search

Blog comments classification using tree structured conditional random fields

Show full item record

Files in this item

Files Size Format Description   View
ubc_2013_spring_jin_wei.pdf 966.0Kb Adobe Portable Document Format   View/Open
 
Title: Blog comments classification using tree structured conditional random fields
Author: Jin, Wei
Degree Master of Science - MSc
Program Computer Science
Copyright Date: 2012
Publicly Available in cIRcle 2012-11-07
Abstract: The Internet provides a variety of ways for people to easily share, socialize, and interact with each other. One of the most popular platforms is the online blog. This causes a vast amount of new text data in the form of blog comments and opinions about news, events and products being generated everyday. However, not all comments have equal quality. Informative or high quality comments have greater impact on the readers’ opinions about the original post content, such as the benefits of the product discussed in the post, or the interpretation of a political event. Therefore, developing an efficient and effective mechanism to detect the most informative comments is highly desirable. For this purpose, sites like Slashdot, where users volunteer to rate comments based on their informativeness, can be a great resource to build such automated system using supervised machine learning techniques. Our research concerns building an automatic comment classification system leveraging these freely available valuable resources. Specifically, we discuss how comments in blogs can be detected using Conditional Random Fields (CRFs). Blog conversations typically have a tree-like structure in which an initial post is followed by comments, and each comment can be followed by other comments. In this work, we present our approach using Tree-structured Conditional Random Fields (TCRFs) to capture the dependencies in a tree-like conversational structure. This is in contrast with previous work [5] in which results produced by linear-chain CRF models had to be aggregated heuristically. As an additional contribution, we present a new blog corpus consisting of conversations of different genres from 6 different blog websites. We use this corpus to train and test our classifiers based on TCRFs.
URI: http://hdl.handle.net/2429/43571
Scholarly Level: Graduate

This item appears in the following Collection(s)

Show full item record

All items in cIRcle are protected by copyright, with all rights reserved.

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893