Everyone doing SEO knows the recipe. It hasn't changed in twenty years.
In 2005, competitive analysis meant keyword density. Pull up the top 10 Google results. Run them through a tool like GRKda. See that the top pages used your keyword at 2.3% density. Optimize your page to match. Watch it climb.
By 2015, the variables had multiplied. Backlink profiles, domain authority, content length, heading structure. By 2026, a single content brief analyzes 50+ variables from the top-ranking competitors: semantic keywords, word count ranges, heading hierarchies, "People Also Ask" data, content gaps, search intent classification, topic saturation scores.
The tools got more sophisticated. But the recipe never changed. Look at what's winning. Figure out what those pages have in common. Build something that matches or exceeds those patterns.
The channel was Google search then. The channel is AI systems now. Same recipe.
Why does this matter now?
Google's AI Overviews now appear on 25-50% of searches, depending on the query type and who's measuring. Google's AI Mode passed 1 billion monthly users within a year of launch. ChatGPT has 900 million weekly active users. Perplexity has 45 million monthly active users and growing. Claude and Gemini are becoming default research tools for millions more.
These systems don't show ten blue links. They synthesize an answer from multiple sources and cite where they got it. The question isn't "Do I rank?" anymore. It's "Am I cited?"
The question isn't "Do I rank?" anymore. It's "Am I cited?"
This practice is called Generative Engine Optimization, or GEO. The competitive analysis methodology is familiar: analyze the sources getting cited, reverse-engineer what they have in common, replicate the pattern.
Most of what makes content rank well also makes it citable. But understanding how AI systems find and select sources changes which optimizations matter most.
How do AI systems actually find what to cite?
Most GEO articles skip this question entirely. They describe what to optimize without explaining the retrieval mechanism that determines what gets cited and what gets ignored. The mechanism has a name: Query Fan Out.
AI systems are not search engines. They don't maintain their own web indexes or run their own crawlers. When ChatGPT, Claude, Perplexity, or Google AI Mode needs to answer a question, the system decomposes the prompt into multiple focused sub-queries. For real-time or research queries, those sub-queries run against external search engines (Bing in ChatGPT's case, Google for AI Mode) and the retrieved results are synthesized into the response. Even systems with large training corpora reach for live search when recency or specificity matters.
What does Query Fan Out look like in practice?
Google's own AI Mode documentation describes the process directly: the system "divides your question into subtopics and searches for each one simultaneously across multiple data sources." Google patents detail the mechanism further: generating "synthetic search queries" executed "in parallel or in series" as part of a generative search workflow.
Consider what happens when someone asks an AI system "best marketing platform for small teams." It doesn't search that phrase and summarize the results. It generates sub-queries:
- "best marketing platforms small business 2026"
- "marketing tools for small teams comparison"
- "all-in-one marketing platform reviews"
- "marketing platform pricing comparison"
- "alternatives to HubSpot for small teams"
Each sub-query hits Google or Bing independently. The AI retrieves results from all of them, then synthesizes the response.
Why does this change everything?
Three data points reshape GEO strategy entirely.
First: 95% of ChatGPT fan-out sub-queries have zero traditional search volume. No human types them. They're machine-generated decompositions that don't appear in any keyword research tool.
Second: 32.9% of pages cited by ChatGPT appeared only in fan-out sub-query results, not in the original prompt's search results. Nearly a third of all citations come from content that wouldn't be found by optimizing for the user's prompt alone.
Third: 87% of SearchGPT citations match Bing's top organic results, while only 56% match Google's. For ChatGPT specifically, Bing rankings matter more than Google rankings. Most GEO strategies are optimizing for the wrong search engine.
Query Fan Out is the mechanism behind everything that follows.
They don't search the way a person does. They split one question into many, and cite whatever wins each one.
What is the CITED Framework?
The CITED Framework is a structured approach to Generative Engine Optimization, organized around the Query Fan Out retrieval mechanism. It covers five pillars:
|
Pillar |
Question it answers |
How it connects to QFO |
|---|---|---|
|
Crawl |
Can AI systems find and retrieve your content? |
Pages must be indexable by Google and Bing, where sub-queries run |
|
Inform |
Is your content structured for easy extraction? |
Structure determines what survives the synthesis step after retrieval |
|
Trust |
Do AI systems consider you authoritative enough to cite? |
Authority determines ranking position in sub-query results |
|
Evaluate |
Are you tracking your AI citations? |
Monitoring should cover sub-query visibility, not just head terms |
|
Distribute |
Are you present where AI systems actually look? |
Multiple surfaces capture different sub-queries from the same fan-out |
Most GEO articles stop at the first three pillars (Crawl, Inform, Trust). They're necessary. They're not sufficient. The first three pillars determine whether your content deserves to be cited once found. Evaluate tells you whether it's working. Distribute determines whether AI systems encounter you across enough sub-queries in the first place.
That last pillar is the one most articles on the topic skip entirely. And the data says it's the most important one.
Most frameworks stop at the first three pillars. The data says the one they skip matters most.
How do GEO and SEO overlap?
Compare the factors that help content rank on Google with the factors that help content get cited by AI, and the overlap is significant. The core fundamentals carry over. But each discipline has its own priorities that the other doesn't share.
Shared foundations (these matter equally for both)
|
Factor |
SEO |
GEO |
|---|---|---|
|
Title optimization |
High |
High |
|
Content depth |
High |
High |
|
Heading structure (H2/H3) |
High |
High |
|
Internal linking |
High |
High |
|
Domain authority |
Critical |
High |
|
Canonical tags |
Baseline |
Baseline |
SEO-specific (these don't affect AI citation)
|
Factor |
Importance |
|---|---|
|
Mobile optimization |
Critical |
|
Backlink acquisition |
Critical |
|
URL structure |
High |
|
Meta descriptions (CTR) |
High |
|
Image alt text |
High |
|
Open Graph / Twitter Cards |
High |
GEO-specific (these don't affect Google ranking)
|
Factor |
Importance |
|---|---|
|
Third-party distribution |
Critical |
|
AI crawler access (robots.txt) |
Critical |
|
Schema.org structured data |
Critical |
|
Server-side rendering |
Critical |
|
Answer-first paragraphs |
Critical |
|
Section length (75-150 words) |
Critical |
|
Named entity density |
High |
|
Question-format headings |
High |
|
FAQ schema |
High |
|
Comparison tables |
High |
The shared foundations are real. But SEO has its own priorities (mobile, backlinks, click-through optimization) and GEO has its own (structure, entities, distribution). Neither is a subset of the other.
If your SEO is solid, you're not starting over. You're adding a second checklist.
GEO isn't a separate discipline. It's the same fundamentals, with a new set of priorities layered on top.
What signals separate ranking from citation?
Five signals consistently appear in cited content that don't show up in traditional SEO analysis. These aren't arbitrary preferences. They determine what survives the synthesis step after QFO retrieval has already found the content. Crawl and Distribute get content into the retrieval pool. These signals determine whether it's extractable once there.
1. Section word count
AI retrieval systems chunk content at heading boundaries, and research shows sections in the 75-150 word range (100-200 tokens) are optimal for extraction as citations. Too long and the system can't isolate the relevant passage. Too short and there isn't enough context for a standalone citation. SEO doesn't care about section length. AI citation does.
2. Question-format headings
Cited pages use H2 headings phrased as questions more often than top-ranking pages do. "What is keyword research?" instead of "Keyword Research." This maps directly to how QFO generates sub-queries: when an AI system decomposes a prompt, those sub-queries often take question form. Headings that match the sub-query get retrieved.
3. FAQ schema
Pages with FAQPage structured data get cited at higher rates than equivalent content without it. The FAQ format gives AI systems pre-structured question-answer pairs. Each FAQ entry functions as an independently retrievable unit that can match against a specific sub-query.
4. Comparison tables
Pages with HTML tables comparing options, features, or approaches are preferred by AI systems that need to synthesize structured comparisons. Tables are extractable in ways that prose paragraphs are not. When a sub-query asks for a comparison, tabular content wins.
5. Named entity density
Cited pages contain more specific, named references: statistics with sources, brand names, named frameworks, concrete claims. Content with 15 or more connected named entities shows 4.8x higher selection probability by AI engines. Entities help the system match content to specific sub-queries. Vague claims don't get retrieved. Specific ones do.
Why do comprehensive pages outperform narrow ones?
Query Fan Out explains something that otherwise seems counterintuitive. A single comprehensive page covering multiple related intents consistently outperforms multiple narrow pages targeting one intent each. The reason: a comprehensive page gets retrieved for more sub-queries from the same fan-out. Each well-structured section acts as an independently citable unit matched against a different sub-query. One page, multiple retrieval paths.
Semrush's QFO experiment demonstrated this directly: optimizing content to address fan-out sub-queries produced a 150% increase in AI citations across test articles.
These signals aren't replacing the SEO checklist. They're additions. New variables in the same competitive analysis that's been running for years.
Each well-structured section is an independently citable unit. One page, many retrieval paths.
Why is distribution the highest-impact pillar?
82% of AI citations come from earned media, not the brand's own website. That number, from Muck Rack's analysis of over one million AI citations, changes everything about how to think about GEO.
You can optimize your own site perfectly. Clean structure, answer-first paragraphs, every Schema.org tag in place. And still barely get cited, because AI systems preferentially cite sources that aren't you.
Query Fan Out explains why distribution works. Each fan-out sub-query is an independent search. Your canonical website might rank for two of those sub-queries. A G2 review page might rank for another. A Reddit thread mentioning the product might rank for a fourth. A guest post on Search Engine Land might rank for a fifth. More surfaces mean more sub-queries covered. Distribution isn't just "be in more places." It's "appear in more sub-query results."
What does the research show?
A Stacker study of 87 stories across 30 clients, analyzing 2,600+ prompts across 8 AI platforms, found that earned media distribution produces a median 239% lift in AI citations compared to brand-owned content alone, with 64% of those citations coming from third-party publisher sources. According to AirOps' 2026 State of AI Search report, approximately 85% of brand mentions in AI-generated answers come from external, third-party domains rather than brand-owned pages.
Where does each AI platform look?
Each AI platform cites radically different sources. According to Profound, only 11% of domains get cited by both ChatGPT and Perplexity. A single-platform strategy misses most of the opportunity.
|
Platform |
Top Sources |
What This Means |
|---|---|---|
|
ChatGPT |
Wikipedia (7.8%), Reddit (1.8%), Forbes (1.1%), G2 (1.1%) |
Get listed on G2 and industry directories |
|
Perplexity |
Reddit (6.6%), YouTube (2%), Gartner (1%), Yelp (0.8%) |
Engage authentically on Reddit, create YouTube content |
|
Google AI Mode |
Reddit (2.2%), YouTube (1.9%), Quora (1.5%), LinkedIn (1.3%) |
Publish LinkedIn articles, create video with transcripts |
What does the distribution playbook look like?
The playbook isn't complicated. It's just different from SEO link-building:
- Get listed on directories. G2, Capterra, Product Hunt. Each listing is an additional surface for appearing in fan-out sub-queries.
- Be present where AI looks. Reddit, YouTube, LinkedIn articles. Not as marketing. As genuine expert participation.
- Earn third-party mentions. Guest posts, expert commentary, industry roundups. These carry more citation weight than anything on your own domain.
- Stay consistent. AI systems scan for agreement across independent sources. If your positioning on G2 contradicts your website, AI systems flag the inconsistency and may exclude you entirely.
SEO had link-building. GEO has earned distribution. The mechanism is different, but the principle is the same: external validation matters more than self-promotion.
What about infrastructure-level distribution?
Beyond earned media, there's a smaller class of distribution tactic that works at the infrastructure level rather than the content level: depositing content on scholarly repositories like Zenodo, attaching DOIs and BibTeX citation files, and using ScholarlyArticle schema. These attach academic-publishing infrastructure to commercial content. That puts it inside retrieval pipelines (DataCite, OpenAIRE, Google Scholar) that AI training corpora and live retrievers already query.
It's a smaller-volume play than the earned-media playbook above, and the evidence base is still observational. We unify these tactics into a formal framework called Academic Citation Infrastructure (ACI) in a companion methods paper that documents the pattern, analyzes the retrieval mechanism, and proposes a controlled experiment for validation.
The highest-impact lever in GEO isn't on your website at all.
How far behind is AI citation measurement?
Crawl, Inform, Trust, and Distribute all map cleanly to traditional SEO concepts. Evaluate is the pillar where most businesses are furthest behind.
What tools exist today?
SEO has had Google Search Console for nearly two decades. For AI citations, the tooling exists but adoption is still in its infancy. Most businesses don't know if ChatGPT recommends their product, if Perplexity cites their content, or if Google's AI Overview is hallucinating outdated information about their brand.
Google Search Console is starting to show AI Overview impressions. Platforms like Profound, Otterly, and ILLIXIS track AI citations across multiple platforms. But the ecosystem is still where SEO analytics were in the mid-2000s: fragmented and unfamiliar to most teams. Even manual spot-checks (asking ChatGPT and Perplexity about your brand, recording what they say) puts you ahead of most competitors.
How volatile are AI citations?
Research analyzing roughly 80,000 prompts per AI platform found that 40-60% of cited domains change within a single month for identical queries. Losing a citation doesn't mean permanent failure. It means the system is rotating sources, and staying in the rotation requires fresh content and consistent presence.
How much does freshness matter?
AI systems show strong preference for recent content, citing URLs that are roughly 400 days newer than organic Google results. Content updated every 90-120 days maintains significantly higher visibility in AI-powered search compared to static content. The Evaluate pillar isn't a one-time check. It's an ongoing practice of tracking, spotting gaps, and iterating.
AI citation measurement is where SEO analytics were in the mid-2000s. Starting now compounds into a lead.
How do you get started?
If you're already doing SEO, here's how to layer GEO on top.
Week 1: Secure the foundation (things you probably already have)
- Verify canonical tags, title optimization, content depth, internal linking. If your SEO audit is clean, skip this.
- Check your
robots.txtfor AI crawlers. Unblock GPTBot, ClaudeBot, PerplexityBot if blocked. - Submit your sitemap to Bing Webmaster Tools. ChatGPT's browsing uses Bing's index, and 87% of SearchGPT citations match Bing's top results. Bing optimization is now a GEO priority, not an afterthought.
Week 2-3: Add the five new signals
- Add answer-first paragraphs to your top 5 pages. Put the conclusion under the heading, then explain.
- Restructure sections to 75-150 words per heading. Add comparison tables where relevant.
- Add Schema.org markup (Organization, Product/Service, FAQ) if missing. Add FAQ sections with question-format H2s.
- Audit named entity density. Add specific statistics, named frameworks, and concrete data points.
Week 3-4: Distribute where AI looks
- Create a G2 profile with a rich description. This is ChatGPT's top product source.
- Publish a LinkedIn article linking back to your canonical content.
- Start engaging on Reddit in your industry's subreddits. Expert answers, not marketing.
- Audit your brand messaging across all platforms for consistency.
Week 4+: Start evaluating
- Check what ChatGPT and Perplexity say about your brand. Record it.
- Identify 3-5 queries where competitors get cited and you don't.
- Create or retrofit content targeting those gaps.
- Set a 90-day reminder to refresh all GEO-targeted content.
Same game. New surface.
SEO competitive analysis has been the same discipline for twenty years. Analyze what's winning. Reverse-engineer the pattern. Build something that matches it.
GEO adds a retrieval mechanism most people haven't studied. When someone asks an AI system a question, it doesn't search the web the way a human would. It decomposes the prompt into sub-queries and searches for each one independently. Understanding that mechanism is the difference between optimizing blindly and optimizing for the system that actually selects what gets cited.
Five new content signals separate citation from ranking. Distribution turns out to be the highest-impact lever, because more surfaces mean more sub-queries covered. And the measurement approach is still in its infancy, which means the people who start now have a compounding advantage.
The CITED Framework is the playbook for this new surface. Not because AI citation is some mysterious new discipline. Because it's the same competitive analysis game it's always been, with a retrieval mechanism most people haven't studied, new variables most teams haven't measured, and a distribution layer most businesses haven't built.
Check your robots.txt. Submit the sitemap to Bing. Write an answer-first paragraph. See if ChatGPT knows you exist.
The recipe hasn't changed. The surface has. The teams who figure that out first will own the citations while everyone else is still optimizing for the wrong channel.
Ready to operationalize? The CITED Framework reference defines all five pillars operationally, with the concrete signals and actions for each. From there, the ACI Playbook is the step-by-step template for the highest-leverage slice of CITED: the infrastructure-level distribution plays (methods-lite papers, ScholarlyArticle schema, Zenodo DOIs, citation files).
Written by Nuno Andrade, founder of ILLIXIS.


