I spent the last year thinking that if we just fed ChatGPT more data it would eventually become a genius. I assumed that scale equals intelligence.
But recently I tried to use an LLM for a complex multi-step tax reconciliation. It failed. It didn’t just fail; it confidently lied about the numbers. It reminded me that these models are great at sounding like they know the answer but terrible at actually reasoning through a problem.
That experience sent me down a rabbit hole. I wanted to know if we are hitting a ceiling.
Here is what I found and why the industry is pivoting hard.
The “Scaling Hypothesis” is Hitting a Wall
For the last decade, the strategy was simple: make the model bigger.
The technical term is the Scaling Hypothesis. The idea is that if you dump enough data and computing power into a neural network, intelligence just “emerges”. It worked for a long time. It got us from GPT-2 to GPT-4.
But looking at the data from the course transcript, we are seeing diminishing returns.
- Cost vs. Gain: We are seeing a 100% cost increase for marginal performance gains.
- Data Exhaustion: We have effectively indexed the entire public internet. There isn’t much high-quality human data left to feed the beast, leading to a “data wall”.
The research report highlights a landmark study by Apple researchers titled “The Illusion of Thinking”. It showed that when you increase the complexity of a reasoning task, pure LLMs suffer a “complete accuracy collapse.” They don’t think harder; they just give up or guess.
Enter Neurosymbolic AI
The industry solution is a shift toward Neurosymbolic AI.
This is a fancy way of saying we need to combine two different types of thinking. The course instructor Brian explains this using Daniel Kahneman’s framework:
- System 1 (LLMs): Fast, intuitive, and pattern-matching. This is what ChatGPT does. It predicts the next word based on vibes and probability.
- System 2 (Symbolic): Slow, deliberate, and logical. This is math. This is code. This is logic that can be verified.
Pure LLMs are all intuition and no logic. That is why they hallucinate. They don’t know that $2+2=4$; they just know that “4” usually follows “2+2=”.
The new wave of AI—like Google’s AlphaProof or OpenAI’s o1—is hybrid. They use the neural network to understand your question (System 1) and then hand it off to a logic engine or a reinforcement learning search process to actually solve the problem (System 2).
Why This Matters for Accounting
I write about this because the implications for finance are massive.
If you are using AI for creative writing, hallucinations are fine. If you are using it for an audit trail, they are fatal.
The move to Neurosymbolic AI means we might finally get tools that can handle causality. As the research notes, current LLMs struggle to understand cause and effect, often conflating correlation with causation. A hybrid system essentially “checks its work” using a logic layer before giving you an answer.
For us, that means reliability.
Key Takeaways
- Scaling is Stalled: Simply making models bigger is no longer yielding massive intelligence jumps.
- Reasoning Gap: LLMs are great at language but terrible at logic and planning.
- The Hybrid Future: The industry is moving to Neurosymbolic AI which combines neural intuition with hard logic rules.
- Reliability Over Creativity: For accountants, this shift is crucial because it prioritizes factual accuracy over plausible text generation.
- Data Limits: We are running out of human internet data, which is forcing a change in how these models are built.
Want to earn CPE for this topic?
- Compare Options: See how we stack up against others in our 2025 Flexible CPE Guide
- Understand the Format: Read how Nano-Learning works for CPAs.
- Check Your State: Ensure you are compliant with our State Requirements Guide.
- What is EverydayCPE?
Related Courses:

