

One research direction has been to try to match the syntactic structure of the question to that of the candidate answer. This is a well-researched problem in the context of general question answering. Question-answer similarity has been a subtask (subtask A) of our task in its two previous editions nakov-EtAl:2015:SemEval nakov-EtAl:2016:SemEval. Some work has paid attention to modeling the question topic, which can be done explicitly, e.g., using question topic and focus duan2008searching or using a graph of topic terms Cao:2008:RQU:1367497.1367509, or implicitly, e.g., using a language model with a smoothing method based on the category structure of Yahoo! Answers cao2009use or using LDA topic language model that matches the questions not only at the term level but also at the topic level zhang2014question. Typically, it has been addressed using a variety of textual similarity measures. Question-question similarity is an important problem with application to question recommendation, question duplicate detection, community question answering, and question answering in general. Question-question similarity has been featured as a subtask (subtask B) of SemEval-2016 Task 3 on Community Question Answering nakov-EtAl:2016:SemEval there was also a similar subtask as part of SemEval-2016 Task 1 on Semantic Textual Similarity agirre-EtAl:2016:SemEval1. Figure 1: The similarity triangle for CQA, showing the three pairwise interactions between the original question q, the related question q ′, and a comment c in the related question’s thread.

In this particular example, q and q ′ are indeed related, and c is a good answer for both q ′ and q. This relation captures the appropriateness of c for q ′.

This relation captures the relatedness of q and q ′.įinally, the edge ¯ ¯¯¯¯ ¯ q ′ c represents the decision of whether c is a good answer for the question from its thread, q ′ (subtask A). The edge ¯ ¯¯¯¯¯ ¯ q q ′ represents the similarity between the original and the related questions (subtask B). This relation captures the relevance of c for q. The edge ¯ ¯¯¯ ¯ q c relates to the main CQA task (subtask C), i.e., deciding whether a comment for a potentially related question is a good answer to the original question. In the figure, q stands for the new question, q ′ is an existing related question, and c is a comment within the thread of question q ′. The relationship between subtasks A, B, and C is illustrated in Figure 1. The threads are independent of each other, the lists of comments are chronologically sorted, and there is meta information, e.g., date of posting, who is the user who asked/answered the question, category the question was asked in, etc.

The best answersĬan come from different question–comment threads. particularly in Subtask C, which was defined as follows: “given ( i) a new question and ( ii) a large collection of question-comment threads created by a user community, rank the comments that are most useful for answering the new question”.Ī test question is new with respect to the forum, but can be related to one or more questions that have been previously asked in the forum. In contrast, in SemEval-2016 Task 3 nakov-EtAl:2016:SemEval, we targeted a fuller spectrum of CQA-specific tasks, moving closer to the real application needs, 4 4 4A system based on SemEval-2016 Task 3 was integrated in Qatar Living’s betasearch hoque-EtAl:2016:COLINGDEMO: In greater detail, in SemEval-2015 Task 3 “Answer Selection in Community Question Answering” nakov-EtAl:2015:SemEval, 3 3 3 we mainly targeted conventional Question Answering (QA) tasks, i.e., answer selection. These scores are better than the baselines, especially for subtasks A–C. The best systems achieved an official score (MAP) of 88.43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. A variety of approaches and features were used by the participating systems to address the different subtasks. Unfortunately, no teams participated in subtask E. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums.Ī total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks A–D. (A) Question–Comment Similarity, (B) Question–Question Similarity, (C) Question–External Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 20 for training, and fresh data for testing. This year, we reran the four subtasks from SemEval-2016: We describe SemEval–2017 Task 3 on Community Question Answering.
