InnovationOpeds and MediaRuminations

(More) Skeptical Remarks about AI

I made the remarks below to a CEO forum on June 21 2024. Generally the AI enthusiasts were over the top vocal. Skeptics were quiet but quietly supportive of my viewpoint

Suppose someone said that smartphones were on the cusp of generating widespread transformations.

You might reasonably ask, “Where have you been these last twenty years, Rip Van Winkle?”

Smartphone apps like Uber and Airbnb have revolutionized transport and travel. Mobile search and social media have crushed mainstream media and advertising.

Given how far we have already come, is it likely that smartphones are at an inflection point? Similarly, with AI. Its applications have already been transformational. Indeed, it is AI tools and techniques that make smartphones smart. Nearly every smartphone app – from texting to sexting, mapping to matchmaking, video editing to streaming, Uber ridesharing to Airbnb rentals – incorporates AI. When we speak to our phones asking for weather forecasts or driving directions, we engage AI’s Natural Language Processing capabilities.

Moreover, AI’s widespread use precedes and goes far beyond smartphones. A 1956 workshop at Dartmouth kicked off academic AI research. In the following decades, practical applications evolved. Starting in the 1970s, George Lucas’s Star Wars epics dazzled audiences with AI special effects and animations. ‘Fuzzy logic’ proposed by UC Berkeley’s AI guru, Lotfi Zadeh, in 1965, was used to control a Japanese subway in 1987. By 1990, Japanese consumer electronics companies were using fuzzy logic in camcorders, vacuum cleaners, room heaters, and air-conditioners.

In 2006 – a year before Apple’s iPhone – Oxford’s Nick Bostrom noted that cutting-edge AI had “filtered into general applications, often without being called AI because once something becomes useful enough and common enough it’s not labelled AI anymore.”

Sixteen years later, the claim that AI has just reached a take-off stage. is perplexing. Merely maintaining historical growth rates from a high base should be a challenge.

Looking more closely at how AI became mainstream is instructive.

Traditional pre-AI software applications performed deterministic calculations. Payroll processing and optimizing complex operations were archetypal applications.

More often than not, however, uncertainties frustrate demonstrably correct solutions. Ambiguous information or incomplete knowledge makes calculating what’s truly best impossible. We must make do with guesses and approximations. Likewise, we often don’t use numbers or algebraic symbols to specify problems or discuss solutions. From everyday speech to Supreme Court deliberations, our discourse relies on ambiguous language – including analogies and metaphors.

Lofti’s 1965 “fuzzy logic” and natural language programming thus epitomize the more realistic aspirations of AI.

But how to combine the digital computer’s capacity to flawlessly manipulate 1s and 0s with the incompleteness and imprecision of human knowledge and discourse?

One early approach incorporated specialized expertise. Medical rules of thumb were a popular basis for the early expert systems. But, this approach was limited to problems where experts had codifiable knowledge.

 Other AI applications that used statistical approximations. Humans merely specified the data – text, images, not just numbers — from which computers inferred statistical patterns. No understanding of the underlying process or consideration of contextual meaning was necessary. The dictum, repeated endlessly in elementary statistics classes, that “correlation is not cause” was brushed aside. AI programs did not even have to be told which variables mattered or to what degree. They used data mining to calculate variable weights that best fit the observations.

AI programs used statistical correlations to mimic natural language. Actual natural language often requires reading minds — contextual interpretation of intent. The meaning of a simple ‘what!’ depends on context and tone. Going back to MIT’s Eliza, a 1960s-era psychotherapeutic chatterbot, AI programs used correlations as substitutes for any mindreading.

Statistical AI could also improve through trial and error. But again, this ‘machine learning’ did not require domain expertise, judgments about “lessons learned,” or understanding or consideration of context.

Nonetheless, the cost-effectiveness of statistical AI that did not require specialized expertise vastly broadened the scope of AI applications. Google’s search algorithm, which handily outperformed Yahoo’s human catalogers of the internet, was a striking example.

At the same time, AI hasn’t sailed smoothly in every sea. Belying dire predictions, AI did not dominate or displace human’ knowledge work.’ Knowledge-intensive jobs grew, and wages stayed high.

AI even failed to automate many tasks that don’t require much thinking or training. Going back to Apple’s much-ridiculed 1993 Newton, handwriting was supposed to replace typing. In 2001, Bill Gates predicted that pen-based tablets would become “the most popular form of PC sold in America” in five years. They didn’t come close. Now, finally, convertible PCs with pens and touch screens have found a market, but keyboards remain the dominant input device. AI-enabled handwriting and voice recognition remain frustratingly hit or miss. Similarly, we usually still prefer the precision and accuracy of clicking or tapping on a button to giving voice instructions to personal assistants (like Siri or Alexa).

Where has the accuracy of statistical AI been acceptable, and where has it not?

Accuracy often depends on the ambiguity of inputs and outputs. Printed words that use standard fonts are less ambiguous than idiosyncratically handwritten words. Unsurprisingly, Optical Character Recognition software scans printed books and documents far more accurately than handwriting recognition programs.

Ambiguous outputs similarly undermine machine learning. Unquestionably correct or wrong results have helped make face recognition highly accurate. In contrast, correctly deciphering spoken words (“there” or “their”?) requires knowing the speaker’s intent. But, statistical correlations cannot reliably discover intent just as they cannot establish cause.

Accuracy also depends on the stability and uniformity of the process that generates the data used by AI applications. Physical or physiological processes, governed by invariant laws of nature, are usually stable. In contrast, human behavior and choices are subject to the whimsical vagaries of social attitudes and the zeitgeist. Statistical predictions about creditworthiness or purchasing behavior can, therefore, be highly inaccurate.

Data produced by a uniform process provides a more reliable basis for statistical inference. For example, OCR algorithms scan text more accurately if trained with materials in the same language and script. Conversely, data shaped in diverse ways by different contextual factors – if the observations are likeunhappy families unhappy in their own way – can make statistical inferences practically useless.

Acceptable accuracy depends on the cost of mistakes — the stakes — and the price-performance of the alternatives. Nearly every ad that Google and Meta Platforms throw at me is utterly remote from my interests. But the stakes are low and even the wildly inaccurate targeting of algorithmic advertising beats the alternative of blind advertising.

In some creative applications of AI accuracy can be both unknowable and irrelevant. There are no correct special effects in Star Wars movies or animations in video games and cartoons. There is no objective benchmark for restoring old movie prints — who knows what the original looked like? But, automated AI restoration wins because it is much cheaper and faster than human restoration.

Turning to the current AI mania.

Ignorance of AI’s seven-decade history may explain some over-the-top predictions about its future. But even some savvy techies who are aware of what came before assert that Large Language Models – often now conflated with all of AI – are game changers. A veteran software entrepreneur believes AI is still in its “early infancy.” He argues that “earlier incarnations, such as protein folding and chess playing, were esoteric and of little relevance to the general public. The chat interface to LLMs has suddenly made AI accessible to the wider public. New ideas and applications are exploding. The real creativity is coming from people using it and suggesting new uses, rather than from the engineers creating it.”

 I believe it is fair to say that before LLMs, most people were passive consumers, often unaware of the AI in their mobile phones, search engines, and social media. Certainly, LLMs have an arresting capacity for seemingly intelligent, natural language conversations with non-technical users, and they offer to automate several analytical and creative tasks. Could these abilities make LLMs a “killer app” for AI to an even greater degree than the AI that has long been embedded in smartphones?

The analogy with spreadsheets is seductive. Spreadsheets had simple user interfaces that allowed people with limited technical expertise to build useful programs. Running on cheap personal computers, they offered compelling value in many applications that did not require the power of mainframes. Symbiotically, they helped expand the personal computer market, prompting investments in better computers.

LLMs have even simpler and more natural user interfaces than spreadsheets. Yet underneath their hoods, LLMs run statistical engines with the same statistical issues that delineated the practical scope of earlier AI applications. As with earlier AI, LLMs can shine in creative applications, such as image generation, where accuracy is irrelevant. Conversely, as with other statistical AI models, ambiguous inputs and outcomes derail their reliability and limit self-corrective learning. They can trip over data that is not generated by a stable process or is highly dependent on context.

Relying on statistical correlations rather than deductive logic or math, LLMs have offered bizarre solutions to reasoning problems, highlighting, for example, the risks of being attacked by a cabbage while rowing across a river. The Khan Academy’s AI tutor for kids, struggles with elementary math. (It miscalculated subtraction problems such as 343 minus 17, couldn’t consistently round answers or calculate square roots, and typically didn’t correct mistakes when asked to double-check its solutions.)

Throwing every possible kind of data into LLMs’ training pots does not improve accuracy and reliability. Medical data does not make responses to legal or engineering questions any better. Training on Swahili literature does not sharpen statistical summaries of Shakespeare’s plays. Bulking up LLMs with disparate data so that LLMs can answer every question under the sun may increase their propensity to fantasize or hallucinate.

Spreadsheets, in contrast, didn’t overpromise and underdeliver. They didn’t tell jokes or write essays, but for their more targeted functions, they followed the user’s instructions precisely and correctly.

The chatty user-friendliness of LLMs isn’t a free lunch. It may well be a significant limitation. Yes, users need less knowledge of input rules and conventions than in their interactions with a spreadsheet, traditional search engine, or photo editor. But free-form inputs are also more ambiguous. Natural language prompts are more likely to evoke inaccurate or useless responses than traditional keyword searches.

In low-risk uses people will tolerate LLM mistakes for convenience as they do with autocomplete howlers in their text messages. The multi-trillion-dollar question is whether the benefits from low-stakes uses can  cover the costs.

One important reason for the nearly immediate popularity of spreadsheets (besides their ease of use) was that they ran on personal computers and not expensive mainframes. Similarly, Uber and Airbnb apps provided cheap, reliable alternatives to taxis and hotels through smartphones that users already owned. In contrast, LLMs require users to purchase more expensive hardware. Moreover, user hardware accounts for a fraction of the costs of building, training, and operating LLMs. For now, and as in the 1999 internet bubble, manic investors are willing to subsidize uneconomic uses. What happens when the music stops?

At best, LLMs are akin to a new high-powered automobile engine that can win car races but makes too much noise and guzzles too much gas for street use. The hype notwithstanding, LLMs aren’t like Nikola Tesla’s alternating current inventions that drastically changed the economics of electrification. Why then gamble on the transformative acceleration of AI and ignore so many other possibilities for innovation and operational improvements the world offers?