Human-Agent Interaction

Why do we want agents?

Humans can only produce a limited amount of intelligent output, which doesn’t scale well. Intelligent systems are capable of turning energy, hardware, and software into intelligence that can scale — a technological photosynthesis. But if intelligent systems require humans-in-the-loop, they can only scale as fast as humans can scale. Building agentic behavior into these intelligent systems reduces the need for human effort.

But, we won’t achieve reliable, aligned, intelligent systems without improving:

Model performance, accuracy, and safety
Cost efficiency of models/compute
Model reliability
Human-agent interaction
…and much more

Background

Human-computer interaction (HCI) is a large field of research & development that focuses on improving the communication between humans and technology. This encapsulates everything from UI/UX design to consumer hardware design to brain-computer interfaces (BCIs).

Interaction and communication is a form of compression. We take abstract thoughts and compress them into actions, language, and cues that convey our initial intentions. Though helpful, this articulation can induce significant loss between our intentions and outcomes.

HCI is already a field of ongoing research for facilitating interaction with deterministic technologies. What about probabilistic, non-deterministic technologies like large language models (LLMs)?

Human-agent interaction

Human-agent interaction (HAI) focuses on the interaction between users and non-deterministic systems. It can be summarized in the following research question: How can we facilitate harmonious interaction between humans and intelligent systems to produce aligned outcomes with minimal human effort?

A fundamental challenge with agentic interactions is that as an action space expands, decisioning requires more thought. It’s difficult to interact with new systems that are open-ended by default. To visualize this, let’s look at the three big forms of AI generation: completions, chat interfaces, and agents.

Completions

With GPT-3 and previous instruction models, models produced small completion outputs. Though this was a restricted way of requesting outputs from models, the user interaction was quite simple. Type some content, then have a model complete your intended sequence.

The scope for completion inference was smaller, which made the interaction between the user and the model simple. As a result, products like GitHub Copilot have been used universally because they’re helpful and easy to use.

Chat interfaces

Once GPT-3.5 and ChatGPT were released, the primary form of user-model interaction shifted towards chat interfaces. This enabled a large variety of interactions that a user could have with LLMs. Users could ask questions, get summaries, generate custom content, or even have a normal conversation.

While these interfaces have unlocked more value out of LLMs, the expanded amount of possible user interactions has made UI/UX a bit trickier. Complex actions like multi-step reasoning and in-context learning became possible, but the burden of detailed instruction and context-building remained on the user. As opposed to well-designed graphical user interfaces (GUIs), such open-ended interfaces like chatbots can give users a cold-start problem and decision fatigue.

Agents

Now, as models continue to advance, many are excited for the potential of agents. The idea of LLMs collaborating to make decisions and execute actions makes large strides towards scalable economic freedom and artificial general intelligence (AGI). Of course, agents are still built with a human in-the-loop, but the human role is abstracted to eliminate as much work as possible.

Yet, today’s agents are often not reliable or aligned with user intentions. Because human instruction is minimal compared to the amount of actions an agent would take, user intention can be easily under communicated.

Any system can perform as expected with enough human control. That’s exactly how coding works: a human precisely instructing a computer. The goal of agents is to minimize human effort by offloading intelligence to LLMs, while still producing aligned outcomes. Is it possible to enable sufficient human control without requiring an equivalent amount of human effort?

Towards a solution

We need to understand that agents won’t become usable just because the models “get better”. At the heart of misaligned agents is a misunderstanding of the user’s intentions. This isn’t a technology problem, it’s a design problem.

We need more experimentation in human-agent interaction to answer important design research questions:

What are better ways of context-building and personalization for agents?
How can we give users maximum control over agents without requiring an equivalent amount of responsibility?
How can we introduce agentic systems incrementally to increase user adoption over the upcoming years?

Context-building and personalization

Agents cannot operate on the same wavelength as a user without proper context. These systems need to be close to the user, with a personalized understanding of the user’s intentions. Whether it’s a personal assistant or an enterprise copilot, interacting with an uninformed agent can create a laborious UX.

Control vs. effort

How do you design an experience that minimizes human effort, while still enabling human control? It’s the difference between designing a user-driven UX vs. an AI-driven UX. Agents haven’t reached an adequate level of reliability or personalization to handle decisioning. But when they do, new paradigms of UX may place the agent in the driver’s seat and the user as a supervising authority.

User adoption

Distributing new technologies is sometimes just as hard as building it. Consumer technologies that were introduced decades ago still haven’t been fully adopted and technological literacy has become a legitimate issue in distribution. Building and distributing new forms of interaction with new intelligent systems may have to be done incrementally to promote user adoption.

As we pave the way for agentic systems, we must ensure we invest in AI safety and reliability. As models get more powerful, exploitations can become more dangerous. As agentic systems receive more responsibility, mistakes can result in big consequences. We must ensure these systems are safe and reliable before they are distributed.

Wrapping up

At the end of the day, LLMs will always hallucinate because model error will always stem from human error. As we continue to keep humans in-the-loop, humans will continue to make mistakes when instructing and communicating intention.

Building better human-agent interactions is the best way to effectively reduce human error, in turn reducing model error.