Harsh Trivedi

Harsh Trivedi

I am a PhD student at Stony Brook University, advised by Niranjan Balasubramanian. I have interned at AI2 multiple times working with Ashish Sabharwal and Tushar Khot, and was a visiting researcher at NYU in 2021 working with Sam Bowman's group.

I research in NLP and AI. I am broadly interested in the development of reliable, generalizable, explainable AI systems and their rigorous evaluation. Specifically, my research is in the domains of multi-step reasoning, AI agents, question answering, AI safety, and efficient NLP.

Much of my research is centered around multi-step reasoning, answering questions: (i) how to evaluate models so that we know that they are employing reliable multi-step reasoning (e.g., taking all the steps of reasoning and not taking shortcuts or not hallucinating), and (ii) how to teach models the same. In that theme, I have built benchmarks (AppWorld, MuSiQue) and evaluation (DiRe), training (TeaBReaC, Multee), prompting (IRCoT, DecomP), and explanation (SuQA) methods.

Previously I was working on question answering as a multi-step reasoning problem. These days, I am focussed on AI agents that can autonomously take actions (interactive coding, function calling, UI interactions) in the world (e.g., across your app accounts) to achieve complex tasks for you (e.g., "I owe money to friends on Splitwise. Pay them on Venmo."). Checkout our work, 🌎 AppWorld, which is an environment and a benchmark for reliable evaluation of such agents.

I am giving a talk on AppWorld and its future work at many places (details). If you are around, consider joining, and let's chat. If your group would like a talk as well or you need any help using/extending AppWorld, please reach out!

Publications

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

ACL 2024 Conference 🏆 Best Resource Paper Award

Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian

Paper Website Tweet Video Blog Poster Code Leaderboard Copy Bib

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

ACL 2023 Conference

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

Paper Tweet Video Code Copy Bib

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

ICLR 2023 Conference

Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

Paper Code Copy Bib

Two-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

ML Safety @ NeurIPS 2022 Workshop 🏆 Best Paper Award

Alicia Parrish*, Harsh Trivedi*, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Amanpreet Singh Saimbhi, Samuel R. Bowman

Paper Copy Bib

Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts

EMNLP 2022 Conference

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

Paper Video Code Copy Bib

MuSiQue: Multihop Questions via Single-hop Question Composition

TACL, presented at NAACL 2022 Journal

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

Paper Video Code Leaderboard Copy Bib

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

Learning with Natural Language Supervision @ ACL 2022 Workshop

Alicia Parrish*, Harsh Trivedi*, Ethan Perez, Angelica Chen, Nikita Nangia, Jason Phang, Samuel R. Bowman

Paper Video Data Copy Bib

Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension

EMNLP 2021 Conference

Naoya Inoue, Harsh Trivedi, Steven Sinha, Niranjan Balasubramanian, Kentaro Inui

Paper Video Code Copy Bib

IrEne-viz: Visualizing Energy Consumption of Transformer Models

EMNLP 2021 Demo

Yash Kumar Lal, Reetu Singh, Harsh Trivedi, Qingqing Cao, Aruna Balasubramanian, Niranjan Balasubramanian

Paper Code Copy Bib

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

ACL 2021 Conference

Nikita Nangia*, Saku Sugawara*, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman

Paper Video Code Copy Bib

IrEne: Interpretable Energy Prediction for Transformers

ACL 2021 Conference

Qingqing Cao, Yash Kumar Lal, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Paper Video Code Copy Bib

Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning

EMNLP 2020 Conference

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

Paper Code Video Slides Copy Bib

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

ACL 2020 Conference

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Paper Video Slides Code Copy Bib

Repurposing Entailment for Multi-Hop Question Answering Tasks

NAACL 2019 Conference

Harsh Trivedi, Heeyoung Kwon, Tushar Khot, Ashish Sabharwal, Niranjan Balasubramanian

Paper Code Slides Copy Bib

Controlling Information Aggregation for Complex Question Answering

EMNLP 2019 Conference

Heeyoung Kwon, Harsh Trivedi, Peter Jansen, Mihai Surdeanu, Niranjan Balasubramanian

Paper Poster Copy Bib

* denotes equal contribution