The Chatbot and the Freudian Slip

May 23, 2024

A lot of the recent conversation around AI tools revolves around the idea that as the technology advances, we will no longer need specialized interfaces. Instead, we will be able to execute any task with natural language commands. Whether the interface metaphor is a command line, a chatbot, or a search bar, the premise is the same. Input words, output magic.

The pro-chat position assumes that humans are inherently good at communicating their needs. But isn’t communication and understanding THE fundamental human problem? Natural language is by nature ambiguous, context-dependent, and imprecise. There are as many interpretations of a given sentence than there are people — language is burdened with subjective associations, and we are burdened with having to communicate with that language. We are notoriously bad at precisely articulating our needs. Doing so requires us to first effectively identify them, which is a hurdle on its own. Learning to ask the right questions is a quest of a lifetime — it seems suspicious that we’d count on it for the adoption of the most powerful technology yet created.

The concept of the Freudian slip effectively illustrates my point: In everyday discourse, we often believe we’ve misspoken when we say something unintended. However, from a Freudian perspective, these ‘slips’ are not mistakes at all, but reveal unconscious thoughts or intentions that we’re unable to symbolize and therefore interpret as ‘random’. This is to say that our use of language is anything but a universal application for a fixed syntax of meaning. Replicating human communication as the ‘ideal’ non-interface is denying the ambiguity of human thought and the fluidity of language.

Observing friends and family firmly outside of the tech umbrella try to use ChatGPT has led me to think that there’s fundamental faults in this approach. What often happens is some version of the following:

The person assumes the AI has more context into their intent than it does.
Because of the above, the person doesn’t understand the extent to which the structure of their command affects the AI’s output.
The frictionless nature of the chat interface paired with the user’s poor prompting eventually leads to the user becoming bored.

My interpretation is that boredom in this case stems from a perceived lack of agency (“this is as good as it gets?” as in, assuming the AI’s output will be similar in quality regardless of the user’s input, and therefore abandoning the effort).

The conversational nature of the chat leads the user to anthropomorphize the AI, leading to a false sense of comprehension — assuming a level of understanding that doesn’t exist. Paradoxically, relying on natural language as the interface for AI requires us to learn a new language. The linearity of chat presents other issues: it makes it difficult to iterate on results, to branch out from an output, and navigate to or reference previous points of a given process.

The interface imperative

Consider a future where any task commonly executed on a computer can be done by voice command, or a universal search bar / command line. The lack of visual cues, processes, and elements to manipulate leads to confusion and boredom. As a reminder, ‘What time is it’ constitutes a surprising amount of Google search queries.

If we adventure further, why not execute tasks by simply thinking of a prompt? The boundary between intent and action becomes blurred. I can now do ‘anything’, but paradoxically, if so, what should I do? Anxiety stems from a perceived excess of choice. I struggle to distinguish between my own thoughts and intentions and the AI's outputs. The erosion of boundaries leads to a a short circuit of action and consequence, and a diminished sense of self and agency.

The cognitive load paradox

In open-ended interfaces, like a chat, the perceived cognitive load is heavily on the machine. The user is told ‘ask me anything (and I will do all the work)’. In reality, the effective load falls on the user, who needs to conceptualize their needs without guidance, articulate them precisely, and navigate the obscure, seemingly infinite capabilities of the tool. This misalignment leads to the loss of the user’s sense of agency, boredom, disengagement, and generally disappointing outcomes.

The other extreme are interfaces that place the perceived cognitive load on the user — tools with a lot of specialised features that distract the user from the task at hand and lead them to underutilize the tool’s capabilities.

In an optimal scenario, the perceived and effective cognitive load is balanced between the user and the machine by the interface. The user feels in control and understands their role in the interaction. Their perceived agency matches the actual influence they have. The interface provides sufficient structure to frame the interaction, and gives real-time visual feedback to the user.

Transitional spaces and sufficient frictionless

The (grahical) interface serves a purpose far more profound than its mere functionality. Consider this analogy with the Lacanian concept of transitional spaces: A analyst's office serves as a necessary frame, a controlled environment where the patient is free to self-explore and grow. Within this space, the patient can articulate their experiences, thoughts, and emotions in a way that brings them insight to their psyche. While the patient is certainly free to articulate their experiences outside of sessions, these expressions don't carry the same transformative power. The therapist's office, as a transitional space, is the essential context for analysis to happen. In this vein, in the context of human-computer interaction, the ideal interface sufficiently delimits the potential or suggested course of action for the user, mediates a feedback loop that’s meaningful, and preserves the user’s sense of control over the output.

A case for spatial

The human experience is inherently a spatial one. We relate to the world through our body as it moves through and manipulates the surrounding enviroment. Our interpretation of spatial relations between things has a semantics of its own, that we can translate into words, like ‘close to’ ‘under’ ‘along’ or ‘on top of’. However, our understanding of these relations is pre-linguistic — we understand them instinctively.

None of this is new. The concept of direct manipulation was introduced in 1982, following principles that are just as, if not more relevant in the context of AI: Continuous representation of the object of interest, physical actions over complex syntax, continuous feedback, and reversible, incremental actions. Since, we’ve gotten desktops, touch screens, pinch to zoom. If you’ve ever seen a baby use a touch screen before they can speak, you’ll understand my point.

It’s underwhelming that with all the power of AI at our fingertips (pun intended) we’re now rebranding the terminal as the accessible interface of the future. Meh.

In the case of AI, the main advantage of a spatial interface is that it allows for visualising complexity without increasing the cognitive load of the user, because we naturally understand spatial relations. It’s not a language we need to learn. We don’t even need to go as far as 3-dimensional space to benefit from this intuition — a 2-dimensional canvas compared to a linear interface already significantly grows the possibilities for manipulating inputs, and visualizing or comparing outputs.

Work in progress

Materials, not mirrors

To summarize; the goal shouldn’t be making AI more human-like, because humans are inherently confused. Replicating human communication as the ‘ideal’ non-interface is denying the ambiguity of human thought. It’s a lose-lose scenario. A better goal would be to create interfaces that make the capabilities of AI more accessible, comprehensible, and controllable, not more human-like. These interfaces should be flexible, scalable, and modulable, but interfaces nevertheless.