Gemini 3.5 Flash Reads a Paper, Codes the Demo, Ships the Site
All articles
AI Capabilities·

Gemini 3.5 Flash Reads a Paper, Codes the Demo, Ships the Site

Jeff Dean posted a video this month showing Google's Gemini 3.5 Flash ingesting a dense academic paper and producing a working interactive website that explains it — autonomously, end-to-end, in one prompt. The implication for consulting work isn't that the AI is impressive. It's that the moat for the rest of us just moved.

Jeff Dean is the head of Google's AI work. He's been at Google since the original PageRank papers, was one of the lead architects of MapReduce and Bigtable, and has personally trained more frontier models than any other public AI researcher. When he posts a 30-second video demonstrating something new from Gemini, the AI community pays close attention.

In mid-May, he posted a short video on X. The setup: Gemini 3.5 Flash, the smaller and faster of Google's two current frontier models, was given a single PDF of an academic paper. The prompt was something like "read this paper and produce an interactive website that explains it."

The output, about ninety seconds later: a working website with embedded explainer animations, equations that resolved into interactive sliders, a clean three-section layout, and a one-paragraph summary on the front page that captured the paper's actual contribution. The site loaded in a browser. It worked. It would have taken a graduate student three days.

The video's most-quoted reply, from researcher Meghdad Farahmand, captured the universal reaction:

"This is amazing. Paper → interactive website demo is exactly what I would have loved to have for some of the more complex long NLP papers."

That reaction — I want this — is the thing every business owner, consultant, and content producer should sit with for a moment, because it's the same reaction every customer is going to start having about their own work, soon, about a lot of different kinds of work.

This post isn't about Gemini 3.5 Flash. The model is impressive. There will be a more impressive one next quarter. The post is about what changes for consulting work — the kind of work Blue Octopus does, the kind your competitors do, the kind you might do if you're an engineer-with-a-business-card — when paper-to-demo becomes a single prompt instead of a three-day engagement.

Octo at a desk with a paper PDF on one side and a glowing tablet on the other side, with a thin glowing trail connecting them — the tablet displaying a fully-rendered interactive website with charts, sliders, and a "live demo" button. Octo is reading the PDF while the tablet renders in the background, no human intervention visible

The moat that just moved

For most of the last decade, the moat for engineering-flavored consulting work was: we can build it. Customers had ideas, ambitions, mockups, sketches. The consultant's value was in turning those into actual working software. The price was set by how hard the building was and how few people could do it.

That moat has been eroding since GitHub Copilot launched in 2021. It eroded further with Claude Code and Codex in 2024. The Gemini 3.5 Flash demonstration is the moment it became obvious that the moat was never the building. The moat was knowing what to build, and knowing it was worth building.

The customer's actual problem was never "I can't write the code." Their actual problem was: I have a paper / a workflow / a process / a problem I don't fully understand, and I don't know what the right deliverable is, and I don't know how to specify it, and I don't have time to figure out which AI tool I'd use.

The Gemini demo skipped all of that. The paper itself was the specification. The tool selection was a single product (Gemini 3.5 Flash). The customer's intent was implied by the document. The implementation was free.

The consulting work that survives this moment is not the work of implementing. That tier is now a commodity, and the price will keep falling. The consulting work that survives is the work of intentionality — figuring out what to build, whether to build it, whether to use AI at all for this customer's actual problem, and which AI if so.

We wrote about this exact pattern in the context of AI memory a few months ago, before this demonstration made it obvious. The pattern: when the building gets cheap, the architecting becomes the scarce resource. When the architecting gets cheap, the deciding becomes the scarce resource. The work upstream of the cheap thing is what stays valuable.

What changes for the working consultant

If you're an engineer-consultant in 2026, three concrete things change because of this demonstration.

Discovery sessions need to deliver value on day one. The "tell us about your business so we can write a proposal" conversation is now a luxury you can't sell. Customers will reasonably expect that an hour-long conversation produces some deliverable — a rough mock, a workflow diagram, a list of concrete next steps. Gemini-flavored tools can produce all three during the call. If your discovery process isn't keeping up, the customer is going to do their own discovery on a Saturday with Gemini and call you with a fully-spec'd project the following Tuesday.

The "we built it" deliverable needs to come with the "we figured out it was worth building" justification. A customer who watched Gemini build a site in 90 seconds will ask, reasonably, why your engagement quote is what it is. The honest answer isn't "we wrote the code that Gemini could have written." The honest answer is "we figured out which of seven possible directions you should actually be pursuing, and we built the one we thought was right, and here's why the other six were wrong." That justification has to be visible in the deliverable, or the customer correctly perceives they overpaid.

Domain expertise is now the priced layer. The AI can build a generic site. It cannot build the site that's right for the specific weirdness of your customer's industry. The HVAC business owner who is your customer doesn't need a generic site; they need a site that handles the seasonal pricing logic, the dispatch-fee passthrough, the warranty registration flow, the manufacturer rebate documents that have to attach to invoices. The customer's industry is the place where the AI's general knowledge tapers off and your specific knowledge becomes valuable. If you don't have a specific industry, you don't have a defensible position.

This shift is what we wrote about in our piece on Forward Deployed Engineering and at greater length in the post on what AI actually takes from a business and what it can't. The TL;DR: AI takes execution and leaves intentionality. The consulting work that survives is the intentionality work.

Octo at a wooden desk with three monitors. Left monitor shows a generic AI-generated mockup of a website. Right monitor shows the same site but with industry-specific details — seasonal logic, dispatch-fee passthrough, real customer fields. Center monitor shows Octo's notes annotating the differences between the two. The "industry-specific" monitor glows brighter

How we're using Gemini 3.5 Flash in practice

A working consultant should be using Gemini, Claude, and (where appropriate) one of the open-weight Llama or Qwen variants — not "the AI tool," but the right AI tool for the job. Here's where we use Gemini 3.5 Flash specifically.

Discovery summaries. After a discovery call, we feed Gemini the transcript and ask for a one-paragraph summary, three numbered concerns, and a list of unanswered questions. The output is on the proposal in under an hour. This is exactly what Gemini was built to do: ingest a long document, produce a useful structured summary. The model is fast at this — three to ten seconds for a 5,000-word transcript.

Document-to-mock conversion. When a customer shares a PDF — a brochure, a spec, a contract, a research paper, anything dense — Gemini 3.5 Flash will produce a workable visualization in a single prompt. We use this as a starting sketch for our follow-up conversation, not as a final deliverable. The customer sees what their thing could look like, and the conversation moves from abstract to concrete.

Workflow mapping. Given a transcript of "tell me how your business runs," Gemini will produce a flowchart of the workflow in Mermaid diagram syntax. We feed it the conversation; out comes a diagram. We refine the diagram with the customer in the next session.

Quick-and-dirty internal tools. When we need a one-off tool — a script to clean a customer's data file, a one-page form, a simple internal calculator — Gemini 3.5 Flash is faster than writing it ourselves. The cost (Google's tier-pricing on the API) is on the order of pennies per task. We treat it like a very fast intern who never gets tired.

For comparison: we use Claude Sonnet or Opus for engineering work (better at long-context reasoning, better at tool use, less likely to fabricate dependencies). We use Gemini 3.5 Flash for throughput — ingesting documents, producing summaries, generating drafts that we then refine. Different jobs, different tools.

We unpack the broader comparison in a separate post on which AI for which job.

The honest part

A few caveats worth naming.

The 90-second demo was the best take. Anyone who's run AI demos knows the published video is the one that worked. The first five tries probably didn't. Most of the AI demonstrations in 2026 are real, but rehearsed. The capability is there; the reliability isn't always.

The output is a starting point, not a finished product. The interactive paper-explainer site Gemini produced in 90 seconds is good. It is not, however, the site a researcher would ship to a peer-reviewed conference proceeding. It needs polish, fact-checking, copy editing, and design refinement. The "free" part of the workflow stops about 70% of the way to a deliverable.

The capability does not generalize as well as the demo suggests. Paper-to-interactive-site works well because the input is structured and the output is bounded. Paper-to-business-workflow does not work that well because the input is unstructured (a conversation with a business owner) and the output is unbounded (a working production system). The capability gradient is real and the easy demos are easier than the hard real-world tasks.

None of this changes the underlying point. The moat moved. The work upstream of the implementation is the work that gets paid for now. If you're an engineer-consultant whose pitch is "we can build it," the next 18 months are going to be uncomfortable. If your pitch is "we figure out what to build, and we build it," the next 18 months are going to be the best years of the practice.

What to do this week if you're a consultant

Three concrete moves.

Set up a paid Gemini account and a paid Claude account. Use both. Learn which jobs each one is better at. If you're consulting on AI and you've only used one of them, you don't have the right comparative basis to advise customers.

Build a discovery-call workflow that produces a deliverable by the end of the call. A summary, a workflow diagram, a rough mock, a list of next steps — something the customer takes away. The AI-tools-of-2026 make this realistic. The customer's expectation has shifted; meet it.

Stop describing your work as "we build software." The framing that survives the next year is "we figure out what software to build, and then we build the right thing." The "what to build" part is now the priced layer. Lead with it.

If you're an operator who wants to talk through this — what AI is taking, what it isn't, and where your business sits on that gradient — get in touch. We're a small consultancy with a forward-deployment model. We are not short-of-work; we are deliberate about which engagements we take. The right early conversations are the ones where the customer already understands that the AI is the cheap part.


Related reading:

Share:

Stay Connected

Get practical insights on using AI and automation to grow your business. No fluff.