I am graduating with a PhD in Computer Science in May 2024 from Emory University. During my Ph.D., I was a member of the Natural Language Processing Lab, working with Dr. Jinho Choi. My research interests are in Conversational AI. I have explored many facets of conversational systems, including the development and deployment of a personal-experience-oriented dialogue system for large-scale usage and improving the evaluation of dialogue models through fine-grained behavior analysis. My most recent and ongoing works focus on improving social understanding and commonsense reasoning in dialogue models in the era of Large Language Models.
Before coming to Emory University, my undergraduate research background also was in automated dialogue systems. I worked as an undergraduate research assistant in the Language and Interaction Lab at Michigan State University, where we focused on developing a robotic system that is able to learn how to perform a task from a human teacher, using both language instruction and visual demonstration. I also spent a summer in the Natural Language Dialogue Group at the Institute for Creative Technologies of the University of Southern California investigating the sharing of personal information between humans in chat-oriented dialogues and applying our findings to the development of an automated system for extracting personal information of a human interlocutor.
Sarah E. Finch and Jinho Choi. 2024. ConvoSense: Overcoming Monotonous Commonsense Inferences for Conversational AI. Transactions of the Association for Computational Linguistics (TACL), under copyediting.
Sarah E. Finch, James D. Finch, and Jinho Choi. 2024. Exploring the Impact of Human Evaluator Group on Chat-Oriented Dialogue Evaluation. To be presented at The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).
Sarah E. Finch, Ellie S. Paek, and Jinho Choi. 2023. Leveraging Large Language Models for Automated Dialogue Analysis. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL).
Sarah E. Finch*, James D. Finch*, and Jinho Choi. 2023. Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL).
James D. Finch, Sarah E. Finch and Jinho Choi. 2021. What Went Wrong? Explaining Overall Dialogue Quality through Utterance-Level Impacts. In 3rd Workshop on NLP for ConvAI.
Sarah E. Finch*, James D. Finch*, Daniil Huryn, William Hutsell, Xiaoyuan Huang, Han He, and Jinho Choi. 2021. An Approach to Inference-Driven Dialogue Management within a Social Chatbot. In 4th Proceedings of the Alexa Prize.
Sarah E. Finch and Jinho Choi. 2020. Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols. In Proceedings of the 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL).
Sarah E. Finch*, James D. Finch*, Ali Ahmadvand, Ingyu (Jason) Choi, Xiangjue Dong, Ruixiang Qi, Harshita Sahijwani, Sergey Volokhin, Zihan Wang, Zihao Wang, and Jinho D. Choi. 2020. Emora: An Inquisitive Social Chatbot Who Cares For You. In 3rd Proceedings of the Alexa Prize.
Sarah Fillwock and David Traum. 2018. Identification of Personal Information Shared in Chat-Oriented Dialogue. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan.
Sarah Fillwock, Changsong Liu, and Joyce Chai. July 2016. Dialogue Management for Task-Learning Human-Robot Dialogue. Poster presented at the Mid-Michigan Symposium for Undergraduate Research Experiences.
* denotes equal contributions as first authors
I am acting as a senior research mentor to a group of undergraduate students who are developing a virtual assistant for Emory students that provides both chit-chat functionality and educational/coursework support
My internship focused on the development of an article-grounded conversational question-answering dialogue system that minimized the fabrication of information. To this end, I worked on the evaluation and integration of numerous approaches to dialogue-relevant tasks into the dialogue system, including information-retrieval, hallucination-detection, and response generation, focusing on prompt-based large language model approaches.
(1) We developed a fine-grained dimensional evaluation of human-computer chat that can reliably measure critical dialogue characteristics by quantifying the rate of several quality-related chatbot behaviors. Our results demonstrate our method to be more suitable for dimensional chat evaluation than alternative likert-style or comparative methods, and we apply our methodology to gain a more thorough understanding of the current strengths and weaknesses of SOTA chatbots.
(2) It can be challenging to discriminate dialogue behaviors that lead to poor user satisfaction, even though such an understanding would illuminate fruitful areas for improvement. We developed a regression analysis approach to decomposing conversation-level user ratings into utterance-level quality ratings as a step towards identifying low-quality dialogue behaviors.
(3) There currently exist a wide variety of evaluation strategies for dialogue systems, which presents challenges for comparing approaches across works. In an effort to explore these diversities, we analyzed the evaluation protocols of twenty recent dialogue system works, focusing on human evaluations, and synthesized human evaluation dimensions for dialogue.
I was the co-team-lead for Emory University's 6-person student team participating in the 4th Amazon Alexa Prize. The goal of this challenge was to develop the most engaging and capable dialogue agent. We explored a dialogue management approach using semantic graphs and inferential reasoning, and advanced to the Finals round of the competition.
I was the co-team-lead for Emory University's 14-person student team participating in the 3rd Amazon Alexa Prize. The goal of this challenge was to develop the most engaging and capable dialogue agent. We were one of 10 teams invited to participate based on our proposal, out of approximately 400 applicants. By July 2020, we won this year's competition, based on advancing through two elimination rounds based on user ratings and then receiving the highest overall rating from the panel of final judges.
Accurate emotional understanding of a conversational partner allows for more sophisticated dialogue strategies. We investigated a language-only approach to predicting current user engagement in an online fashion during a conversation, with the ultimate goal of using such engagement as a feature for initiative transfer.
I worked in the Natural Language Dialogue Group with Dr. David Traum during this 10-week NSF REU program. Inspired by previous work showing that human-human conversations tend to contain a significant portion of personal experience and personal information sharing, we explored in detail the types of personal information that people tend to share in conversations and developed a preliminary word-embedding-based approach to extracting such information from user utterances.
As a undergraduate research assistant, I was involved in the development of an end-to-end multi-modal task-learning robot that is capable of understanding a human teacher's verbal instructions.