Reliable AI Software

July 14, 2024

Beyond model performance.

A few weeks ago, I was talking to Guillermo Rauch, CEO of Vercel.

"What's the main bottleneck towards truly AI-driven software?" The answer: reliability.

AI models of today are powerful. Chaining together models to create agents can yield powerful outcomes. Providing context and tooling can make these agents even more powerful. But, powerful outcomes don't guarantee a good user experience.

Current AI models aren't reliable enough. They aren't consistent. One prompt can generate infinite variance in responses. Even as models become more interpretable and steerable, responses continue to be quite random.

Observe how the best software platforms that use AI well don't even feel AI-powered. I've never searched on Google and gotten a wildly-irrelevant search result. Nor have I opened Spotify and been greeted with an interface that morphs and "generates" itself dynamically. They feel completely deterministic.

Deterministic software is controllable software. It puts the user in control, which is where the user should be. If I do X, I should get Y, every single time.

Non-deterministic, probabilistic software is uncontrollable software. It means every time I prompt ChatGPT, I'm not sure if I'll get a good or bad answer. This can create moments of user delight, but also moments of frustration. Unpredictability destroys trust.

TLDR; Reliable software doesn't just perform reliably. It feels reliable.

The question then becomes: How do you build reliable software on top of generative models, which are inherently non-deterministic?

1. Defined workflows

Tooling exists to facilitate and execute workflows. Google helps you search the Internet. Salesforce manages customer relationship data. Google and Salesforce serve many different workflows within their products, but those workflows are intentionally designed and communicated to users.

In comparison, what workflows does ChatGPT execute? It's hard to name them all. ChatGPT achieves the goal of exposing frontier models to users very well, but it is by no means a universal tool. It makes ChatGPT a good tool for doing a great many things, but not a great tool for anything specific. Don't get caught using AI to provide an infinite number of workflows for your product.

Reliable software is built around defined workflows. Every workflow inhibits their own unique outcomes and edge cases. Great software yields predictable outcomes and handles edge cases gracefully. An AI product with an arbitrary number of workflows yields an infinite number of outcomes and edge cases. Not handling these outcomes and edge cases properly results in unreliability.

So, we must define and design intentional workflows.

2. Contextual automation

The main sell of AI is automation. Models can't outperform humans (yet), so their value comes from executing a set of tasks faster than a human can. Automation reduces user touchpoint which saves time, but it also reduces user control. User control isn't required to create user trust, but it is an important factor when introducing a new product.

Put simply, AI can help save time through automation, but we must keep the user in control.

This is why context is so important. Context is information that helps AI systems act and automate in alignment with the given context. Context can come through a few sources: instruction, integration, and personalization.

Instructional context is exactly what it sounds like: a user manually instructing an AI system to perform a given task. Instruction is powerful because it enables a user to directly instruct an AI system to perform tasks according to their preferences. This is a typical interaction, but there are some issues.

  1. Instruction is a heavy user interaction.
    1. To instruct, a user is responsible for aggregating all of the relevant information for the AI system. The user may have to type a few instructions, copy-paste some outside information, upload a photo or file, etc. It can be a lot of work.
  2. Instruction can be an inefficient form of communication.
    1. Instruction and writing ability can be challenging as well. It may be hard to convey thoughts and actions into words. It can also be potentially inaccessible for users who struggle with literacy.

Integrated context is the use of service integrations for gathering context. Large amounts of data can be searched and retrieved from outside services to inform context (e.g. Google, Facebook, Amazon), all without user touchpoint. This can be very powerful for gathering context that a user may not even have easy access to.

Personalized context is the use of user-specific data for gathering context. Passively collecting user data from previous interactions can be extremely powerful in understanding user intent and preferences, making for more personal, accurate decisioning for AI systems. The challenging aspect of personalization is actually collecting an adequate reservoir of a user's preferential information.

Gathering context is agnostic: all three methods (and more) can be employed to gather enough context to execute informed automations. Proper contextual automation can lighten a user's responsibility without sacrificing control.

3. Controlled transparency

Think about the last time you filled out an onboarding flow. You probably went through multiple steps of the flow and filled out each input, being intentional with the information you gave whether it was your name, email, or credit card information.

Now, imagine going through another onboarding flow but it was automated by AI. Your personal AI assistant filled out your name, email, address, credit card information automatically, without confirmation or permission.

That sounds great, right? Your AI assistant just saved you a few minutes of time. But, what if it wasn't done properly? What if it gave your home address when it wasn't required, and you would've preferred not to provide that information? Same with your credit card information? You have no idea what your AI assistant did, and you have no visibility into doing so.

Lack of visibility can create anxiety and a lack of trust. It's crucial that users have visibility into every decision that is being made on their behalf. AI making decisions on a user's behalf without visibility doesn't just create room for error, it creates distrust.

Reliable AI software must have controlled transparency, meaning the level of transparency that is exposed to a user is customized to their preferences and trust of the system. For example, if a user is new with no impression of the product, and has no personalized context to detail their preferences, all AI-based decisioning should be extremely visible and transparent. On the other hand, if a user has been an active user for a while, trusts the product, and has personalized context to detail their preferences, AI-based decisioning can be somewhat muted and hidden from view to create a cleaner, simpler UX.

Most importantly, visibility into all AI processes should be customizable and accessible to the user, enabling them to observe, interrupt, and engage in all workflows on the product. Visibility gives control back to the user.

4. Guardrails

At the end of the day, AI is non-deterministic. Humans are non-deterministic as well. It's not a bug, it's a feature. Intelligence must be non-deterministic — if it were deterministic, humans would be nothing more than pre-programmed robots.

Just as parents can set rules for their children and managers can set rules for their employees, users should be able to set rules for non-deterministic systems. These rules can be referred to as guardrails. Guardrails help act as ruled boundaries — it preserves creative, intelligence output while satisfying user preferences. Though AI functions non-deterministically, the rules that guardrails impose are deterministic, and can guarantee that certain rules are not broken.

For example, guardrails on an AI email manager can ensure that important emails are not automatically sent or replied to. Another example, guardrails on an AI stock trader can ensure that certain stocks aren't traded, or certain prices aren't to be bought/sold at.

Guardrails are a deterministic way to give users control over a non-deterministic system.

Wrapping up

A common theme across all of these principles is: to make AI software reliable, follow the principles of regular software UI/UX design.

It's easy to get caught in the temptation of creating new paradigms of software and human-computer interaction with generative AI, that we forget the actualized value that generative AI has. Don't get caught in the potentials of AGI and mistaken it for currently accessible value.

To build good AI software, we must build reliable software.

© Made in California.