I tested Fable 5 on a real engineering project. Here is what happened.

I tested Fable 5 last night, and I must say, the excitement was real. After all, this is the model that has been talked about as the frontier of frontier models. The take-your-software-engineering-job-and-not-blink model. The model that apparently made Wall Street go into a tizzy. Sorry, I digress.

I did not want to test it on a toy app. No todo list. No “build me a weather app.” No tiny proof of concept where everything fits nicely into one prompt and the demo looks magical because the problem is artificially simple. I wanted to test it on something closer to real engineering work.

Luckily, I had one. A project I am actively building. The repo is not massive by enterprise standards, but it is also not small. It has around 22 MB of tracked repository content, 513 tracked files, 32.7k lines of code, 14k lines of documentation, and 17 fairly detailed design and mockup images.

And yes, the documentation is almost comparable to the codebase at this stage. Which is not an accident. At the beginning of any serious AI-assisted build, specs matter as much as code. Maybe more.

The project itself is still in the early stage, but a fair amount of thinking and implementation has already gone into it. This was not a “start from scratch” project. It already had architecture decisions, existing implementation, design direction, gaps, loose ends, and technical debt starting to form. Basically, a normal real-world software project.

The stack

The stack is also not exactly lightweight. It is a Python, AWS serverless, React/Vite, and Terraform platform. It includes Bedrock and Claude-based agents, a custom LLM gateway, Step Functions, DynamoDB, Slack approval flows, GitHub and Jira connectors, evidence packs, and a few enterprise workflow patterns that always sound simple in a diagram and then become interesting in implementation.

So the goal was simple. Give Fable 5 something real. Not impossible. Not trivial. Real.

The setup

Before starting, I was not sure how quickly the tokens would burn. So I loaded up a reasonable amount of on-demand tokens. I made sure the laptop would not sleep while plugged in. I enabled bypass permissions mode.

And I picked Fable 5 Extra because, based on the cost-to-improvement graphs Anthropic shared, it looked like the best bang for the buck. At least on paper.

The plan

I did not want the work to vanish inside a long chat transcript. That is one of the biggest problems with using AI coding tools for serious work.

A lot happens. Some of it is useful. Some of it is questionable. Some of it is brilliant. Some of it is confidently wrong. And unless you have a system to capture decisions, progress, evidence, blockers, and task state, the whole thing becomes a very expensive stream of consciousness.

For this project, I already had a system in place. It creates task packs, tracks progress, documents evidence, captures blockers, and keeps work from becoming transient. So even if the run went sideways, the output would not be completely lost.

The specs were solid. The task packs were mostly independent. The documentation was reasonably clear. So I went in fairly confident.

I also used Sonnet before the run to help create a curated prompt. The idea was to prioritize high-value gap analysis and task-pack creation first, and then move into more open-ended implementation.

In other words, I did not want Fable to just start coding randomly. I wanted it to understand the project first. Then find gaps. Then structure the work. Then implement. Very boring. Very necessary.

The run

The first 15 minutes were not very encouraging. Around 900 tokens in, the request failed. It retried. Then it started moving.

At around 20 minutes, it had used about 2k tokens. At 30 minutes, around 13k. At 35 minutes, it had finished the main analysis part at around 18k tokens. At 40 minutes, it was around 25k tokens and the first UI task pack was complete. At 50 minutes, around 35k tokens. At the one-hour mark, around 45k tokens.

At about 1 hour 10 minutes, I stopped the flow briefly to add a couple of instructions and one additional mockup.

By this point, it had clearly understood the repo structure. It had started with gap analysis. It was creating task packs. It was documenting its progress reasonably well. And interestingly, even after around 1.5 hours, it had not used half the max context window.

That part was impressive. It felt like it had a lot of room to keep working.

The first disappointment

After around a couple of hours, the task finished. But I was not getting a good feeling. I could see the UI as it was progressing. And the UI did not look demoable.

Unfortunately, I was right. None of the 10 tabs in the app were showing anything I would confidently demo. So much for the hype.

It was already late, and I did not have the patience to sit through another careful planning loop. So I gave it a very open-ended instruction.

Make all 10 tabs demoable. Do whatever it takes. Document blockers if any.

Then I went to sleep.

The surprise

It ran for around three more hours. And this time, it did a genuinely impressive job.

Was it perfect? No. Was it where I hoped the UI would be? Also no. But did it move the project forward meaningfully? Yes. Absolutely.

It took the app from “this is not demoable” to “there is something real here.” It found some nasty bugs. It improved parts of the implementation. It made the tabs more coherent. It helped expose missing pieces in the spec. And it gave me enough working surface area to pivot, add new detail, and tighten the next set of tasks.

That last part matters.

Sometimes the value of these models is not that they finish the work perfectly. It is that they drag the project forward far enough that the next layer of truth becomes visible.

You see what is missing. You see where the spec was weak. You see where the architecture was hand-wavy. You see where your own assumptions were doing unpaid labour.

That is useful.

My takeaways

First, Fable 5 can definitely handle long-running complex tasks better than previous models I have used. It does not feel like it gets lost as quickly. It can hold the shape of a larger project for longer. It can move through analysis, planning, implementation, and documentation without immediately falling apart.

That is a big deal.

Second, it is slow. But it is slow in a way that feels deliberate. Not always exciting to watch. Sometimes painfully boring. But mostly useful.

Third, it burns tokens, but not in the ridiculous way I feared. I have burned tokens faster in verbose chats with smaller models where half the conversation was just me correcting misunderstandings. This felt more controlled.

Fourth, guardrails help. But too many guardrails can hurt. This was interesting. The more tightly I constrained it, the more careful it became, but also the less bold it became. The more open-ended prompt I gave at the end actually produced the most visible progress.

That does not mean “just ask it to do everything.” That is still a good way to manufacture chaos. But it does mean there is a balance. Good specs. Clear task packs. Strong evidence capture. Then give it enough room to solve the problem.

The cost

I burned through my 5-hour quota. I finished around 20% of my weekly quota. I also used a bit of on-demand token.

But honestly, the damage was worth the output. It was not as bad as I had imagined. And the project moved forward enough that I would happily run this kind of workflow again.

Not casually. Not for every small task. But for serious implementation passes where the repo has enough structure and the work is worth the cost.

The verdict

If you think you can one-shot a serious engineering project and get the moon from one prompt, you are probably in for a rude shock.

Fable 5 will not magically turn vague intent into production-ready software. It will not replace engineering judgment. It will not save you from weak specs. It will not make bad architecture good just because the model is expensive.

But it also will not just give you a pizza when you asked for the moon.

It might actually help build the rocket.

You still need to point it in the right direction. You still need to inspect the work. You still need to know what good looks like. But once you do, it can move a serious amount of work.

I will definitely use it more over the coming days.