Generative AI 2024 Retrospective
Generative AI 2024 Retrospective
2024 witnessed a parade of increasingly more powerful AI model releases, culminating in OpenAI’s groundbreaking “o3”. While I’m certain these advancements really are significant, I don’t think they have translated to noticeable improvements for the average user.
What people are paying attention to, though, are the really amazing product features that are coming out of Anthropic and Google. I’ve heard a lot about people building software Claude’s Artifacts. And Google’s Deep Research product, released a few days ago, has just completely changed search.
The Rise of AI Agents
This pattern offers a clue to where AI is going. As models saturate “intelligence”, I think the next wave of innovation will come from the product makers instead of research labs. We’ll increasingly see product companies break out of the cliched “chatbot” UX and explore agents as the new paradigm for interacting with large language models. It is an exciting time to be a maker.
Here is my rundown of 2024’s most significant developments. This list is only a fraction of all the exciting things that happened in 2024. If you think I missed anything important, let me know through my email.
Gemini 1.5 Deep Research
Release Date: Dec 2024
If you have not used Deep Research, this is how it works. You tell it what to research. The model comes up with a research plan, which you can modify. It then researches results, summarizes contents from multiple webpages, and compile a detailed document for you.
Deep Research has completely replaced search for me. It represents what ‘agentic AI’ should truly be - It takes instructions, run its plans by you, then work in the background tirelessly to get you results.
Deep Research is only available via the paid Gemini Advanced subscription at $20/month.
ChatGPT Search
Release Date: Oct 2024
This was a stinker. This feature allows ChatGPT to include search results in its responses, but it was universally panned. Either use Perplexity, Kagi Search, or Gemini Advanced instead.
OpenAI o1
Release Date: Sept 2024
The o1 model was a revolution in how well LLMs can think. Previously, LLMs mostly behaved like intuition machines. They give good answers quickly but struggled with systematic thinking. The o1 model, on the other hand, will take a long time to respond, but is capable of thinking logically.
While this type of slow, resource-intensive, yet analytical reasoning isn’t universally beneficial, it offers significant progress for mathematicians, scientists, programmers, and traders who require rigorous logic.
OpenAI o3
Release Date: Announced on Dec 20th, 2024, public release in Jan 2025
OpenAI’s o3 model, the culmination of “12 Days of OpenAI”, surprised everyone by scoring close to 90% on the ARC-AGI semi-private evaluation—The gold standard of AI benchmarks. The previous best, OpenAI’s o1 model, only came in at 32%. It won’t be long before LLMs “solve” the existing ARC-AGI evaluation.
The designer of ARC-AGI is impressed.
Claude 3 and 3.5
Release Dates: Claude 3 in March, Claude 3.5 in June, and Claude 3.5 Haiku in Nov
Many people I know use Claude as their LLM of choice and I understand why. In terms of capabilities, Claude is as good as ChatGPT. Talking to Claude feels like talking with a friend. On the other hand, ChatGPT’s default personality tend to be patronizing. Anthropic kept updating their models and making them more powerful. Though OpenAI and Google are still on the cutting edge.
DeepSeek-V3
Release Date: December 2024
DeepSeek is a model released by the Chinese AI company with same name: DeepSeek. DeepSeek-V3 was not the best model in any metric. However, it was a model that was significantly cheaper to train than the leading models from the US. DeepSeek-V3 is significant in showing that there is still room to reduce the cost and size of models through software optimization. We should expect to see similar improvements from US AI companies by applying the same optimizations to their training process.
Sora
Release Dates:Dec 2024
OpenAI released Sora at sora.com in December. Previously, Sora was only available to researchers. Now, anyone with a ChatGPT Pro account can try it.
Veo
Release Date: Dec 2024 (kind of)
Google announced an image-to-video model called Veo in December. However, it is only available to enterprise customers (Vertex AI). Initial impressions seems good. However, it is not publically available.
Copilot, Cursor, and Aider
In 2024, we saw a lot of IDE adopting generative AI. The most popular one is of course Microsoft’s VSCode with Github Copilot. It is worth trying. Cursor is another AI-powered IDE and has received a lot of hype in 2024, however, not everyone is positive on it. I also want to call out Aider as a worthy contender. Instead of an IDE, Aider is a kind of command-line agent that will write code for you. It uses a different workflow, but helps you write code all the same.
I don’t know which technology will win. What is obvious to me is that the average developer’s productivity will dramatically increase with these tools. They are still not quite right for me, but it is very early and I think we’ll see significant improvements as models gets longer context windows and engineers get better at building local “agents”.
Claude Artifacts
Release Date: Aug 2024
Claude’s Artifact feature has been genuinely amazing. In certain conversations, Claude will open a document or a React app in a separate window and collaboratively edit it with you. It is an innovative and refreshing way of working on a document together. Claude can even publish your React app when you’re finished! This is the kind of product release that really impresses me. I am excited to see what Anthropic comes up with next.
NotebookLM
Release Date: Not applicable - this product keeps evolving
This is what I wrote about Notebook LM on a discord thread a few weeks ago:
- NotebookLM is the most startup thing Google has done lately. It came under the radar. It added a podcast feature for NO reason. And they are now going hog wild with the whole podcast thing. I love it.
To which someone replied
- It also has a discord linked on it’s main UI where the devs hang out which I find deeply unusual for a FAANG product
NotebookLM was originally created to help you read a book or organize notes on a particular subject. Then someone added the ability for it to generate podcasts. Now it’s the most podcastic AI product ever. I love that this is happening in Google, a company often criticized for its bureaucracy and lack of innovation.
Wrap Up
That’s my recap of AI in 2024. Shoot me an email or ping me on social media if I missed anything.