ON THIS PAGE 10 sections
DIRECT ANSWER
Q. How do you get cited by ChatGPT?
A. Write a 60 to 120 word direct answer at the top of each page using the literal question as an H2, ship with named authorship and inline source citations, and make sure GPTBot can crawl you. The rest is patience and measurement.
EVIDENCE Across 5 client sites where I retrofitted this pattern in 2025 to 2026, ChatGPT citation rates on tracked buying-intent queries went from 0% to between 9% and 31% within 12 weeks.

About 18% of ChatGPT conversations trigger a web search, and only ~15% of the pages it pulls in get cited.

That is the surface. Per Profound’s tracking across late 2025 and early 2026, roughly 1 in 5 conversations sends ChatGPT to the live web; per AirOps’s March 2026 analysis of 548,534 pages across 15,000 prompts, ChatGPT cites only ~15% of what it retrieves. When it does cite, it typically names 4 to 6 sources per cited turn. If you are one of those domains, you are at the top of the AI funnel for that query — a citation that travels through trust transfer, not click-through.

This is the playbook I run for boutique B2B clients to get into that citation list. It is structural, mostly cheap, and the results show up in 8 to 12 weeks.

What ChatGPT actually reads

ChatGPT does not have one corpus. It pulls from two:

  1. Training data. Web pages crawled before the model’s cut-off. Older content has had more time to land here. New content needs other paths in.
  2. Live web retrieval. ChatGPT Search and the embedded browsing tool fetch pages in real time. The live layer runs heavily on Bing’s index, with some direct fetches via OAI-SearchBot.

Both matter for citation. Training-data citations dominate evergreen queries (“what is X”). Live-retrieval citations dominate news, prices, comparisons, and anything with a date in it.

If you are not in Bing’s index you miss the live layer. If your content is younger than the training cut-off you miss the training layer. The strongest sites are in both.

The three things that lift citation rate

After running this retrofit on 5 client sites, the same three patterns moved the needle every time.

1. The answer-first paragraph

Every page that targets a citable query needs a 60 to 120 word direct answer in the first paragraph after the H1. Not a hook. Not context. The answer.

Use the literal question as an H2 immediately under the H1. The pattern an LLM extractor recognizes is: H2 question → 80-word paragraph that answers it → optional follow-up nuance.

Bad opening:

“In today’s fast-moving digital landscape, the question of X has become increasingly important. In this post, we’ll cover the key concepts and explore why they matter.”

Good opening:

“X is [definition in one sentence]. The two reasons it matters in 2026 are [reason 1] and [reason 2]. The standard implementation looks like [3-bullet sketch].”

The second one extracts cleanly. The first one extracts into nothing.

2. Named authorship and a real bio

LLMs disproportionately cite content with a clear author. “by Niko Alho” beats “Posted by admin” by a wide margin. The author needs:

  • A name in the byline and a <link rel="author"> or Person schema
  • A bio block with credentials, named clients or employers, and a portrait photo
  • A consistent author URL (/about or /team/[slug]) linked from the article

The signal you are sending is: there is a human behind this claim. LLMs cite humans more confidently than they cite domains. See E-E-A-T guidance for the broader story.

3. Inline citations inside the answer paragraph

A paragraph that opens with “According to [study from publisher 2025], X is true” is way more citable than the same paragraph with no source. The LLM is doing exactly what you do — looking for evidence under the claim — and your inline citation makes its job easy.

Three rules:

  • Cite real, reachable URLs. Broken links hurt.
  • Cite primary sources, not “as reported by another blog.”
  • Cite at the sentence level, not in a bibliography at the bottom. Citations adjacent to the claim get pulled.

The crawl layer

Before any of this work matters, GPTBot and OAI-SearchBot have to be able to fetch your pages. Check three things.

robots.txt. Allow GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, and Google-Extended. Blocking these in 2026 is opting out of AI citation.

Bing Webmaster Tools. Submit your sitemap. Run an indexing audit. Resolve any “discovered but not crawled” pages. Bing is the back-end for ChatGPT Search; if you are not in Bing, you are not in ChatGPT live.

JS-rendered content. Most LLM bots execute basic JS but are slower than Googlebot. If your content is hydrated client-side with no SSR, you are betting on every bot rendering correctly. The safer move is SSR with a static HTML fallback. See hydration and Next.js SEO for the technical pattern.

Schema markup: what helps, what does not

The cheap structured-data wins.

  • Article schema with author, datePublished, dateModified. Mandatory.
  • Person schema on the author page, linked from the article.
  • FAQPage schema when the article has a real Q and A block. The single biggest extraction lift I have measured.
  • HowTo schema for procedural content. Slightly less powerful than FAQ but still helpful.

What does not move the needle: Organization schema, BreadcrumbList, WebSite. They are fine to ship, but they do not change citation rate on their own. See schema markup for the full implementation guide.

Page length: a non-factor

Conventional SEO wisdom says longer pages win. LLM citation breaks that pattern.

A 600-word post by a named expert with one strong direct-answer paragraph gets cited more often than a 3,000-word generic listicle. The LLM is not weighting depth; it is weighting extractability and source credibility.

This does not mean go short. Aim for the length the topic deserves — usually 1,200 to 2,500 words. Just stop padding to hit a word count. Every section past “the answer” should add a real distinction, not warm air.

The measurement loop

You cannot improve what you cannot see. ChatGPT does not push citation data to Search Console. You need at least one of:

  • Profound. GEO-native platform. Tracks share-of-voice across LLMs at the query level. Best onboarding. About $499/mo entry.
  • Ahrefs Brand Radar. Mentions across ChatGPT, Perplexity, Gemini, Copilot. Bundled with Ahrefs Enterprise; add-on otherwise.
  • DataForSEO LLM mentions API. Programmatic citation pulls. For teams building custom dashboards. Usage-based.
  • Manual scrapes. Run your 30 to 100 target queries through ChatGPT each week. Log the citations. Tedious; works.

Whichever you pick, build a weekly review where you look at three things: which queries cite you, which queries cite competitors but not you, and which sentences are being lifted from your content. The third one is the most actionable signal in the loop.

The retrofit, week by week

A realistic 4-week sprint to push citation rate on an existing site.

Week 1. Audit. Identify the top 30 to 50 buying-intent queries. Run them through ChatGPT. Log which ones return citations, which ones cite competitors, which ones return sourceless answers. This is your baseline.

Week 2. Rewrite. Pick the 10 highest-value queries. For each, find or create the target page. Rewrite the first 200 words to follow the answer-first pattern. Add the H2 question. Add inline citations.

Week 3. Schema and authorship. Ship Article, Person, and FAQPage schema across the 10 pages. Add real bio blocks. Verify all schema in the Rich Results Test.

Week 4. Crawl and tracking. Open robots.txt to all LLM bots. Submit Bing sitemap. Set up Profound or a manual tracking sheet. Bookmark the queries.

Then wait. Citation rates lift starting week 4 to 6, with the bulk of movement happening between week 8 and week 12. Some queries never lift — that usually means the answer is fundamentally not differentiated, and the work is editorial, not structural.

Sentence-level iteration

Once you have a few citations, the highest-use work is sentence-level rewrites.

Example. A client’s page got cited for “what is X” by Perplexity but never by ChatGPT. The Perplexity-cited sentence was a definition with a precise number. The same page in ChatGPT was getting outcited by a competitor whose first sentence framed the answer as “X is the new Y” — a contrast frame rather than a definition.

We rewrote our opening sentence as a contrast frame, kept the precise number in the next sentence. Three weeks later, ChatGPT started citing the page.

The lesson: each LLM has a slightly different extraction style, and the cited sentence is the unit you optimize, not the page. See intent classification with AI for the broader pattern of querying LLMs about their own preferences.

What gets oversold

A few things that sound like GEO best practices but do not earn their cost.

Stuffing content with question-pattern H2s. Three good Q-and-A H2s outperform fifteen forced ones. The model penalizes thin extraction; if every H2 is “What is X?” with two sentences under it, the page reads as low-information.

Filing for inclusion in OpenAI’s training data. OpenAI does not have an opt-in submission process. The way in is the open web.

Writing content “for ChatGPT” without thinking about humans. A page that reads as if it was written for an LLM gets pruned by the LLM. The model is trying to cite content humans found useful; if your page reads as machine-bait, you go down the ranking.

What to do tomorrow

If you read nothing else, do this:

  1. Pick your top 10 buying-intent queries. Open ChatGPT. Ask each. Log who gets cited.
  2. For the queries where competitors are cited, open the competitor’s page. Look at the first 200 words. Compare to yours.
  3. Rewrite your first 200 words. Ship today. Check back in 4 weeks.

There is no clever shortcut. The work is structural, repeatable, and rewards patience. The teams that start it in 2026 will own their categories in 2027. The ones that wait will be invisible in the layer of search that is replacing the SERP.

WEB-SEARCH RATE
~18%
ChatGPT convos triggering search (Profound).
RETRIEVAL TO CITE
~15%
Of retrieved pages get cited (AirOps).
TIME TO LIFT
8-12 wk
Retrofit to first citation.
FIG. 01 · THE CITATION LOOP
AUDIT
buying-intent queries
REWRITE
answer-first openings
SCHEMA
author, faq, article
CRAWL
gptbot, oai-searchbot
TRACK
profound, brand radar
ITERATE
sentence-level
Six stages, weekly cadence.
Questions people actually ask
FAQ · 6
Q01 Does ChatGPT crawl my website? +
Two bots fetch content for ChatGPT: GPTBot (for training data) and OAI-SearchBot (for live answers when ChatGPT searches the web). Both respect robots.txt. If you block them you cannot be cited; if you allow them you might be.
Q02 Do I need to be in Bing's index? +
For ChatGPT Search yes, because ChatGPT's live web layer runs heavily on Bing. Run a Bing Webmaster Tools audit and submit your sitemap. About 30% of citation issues I have seen trace back to incomplete Bing indexing.
Q03 Is FAQPage schema still useful? +
Yes for AI extraction. ChatGPT and Perplexity both prefer cleanly structured Q and A blocks. FAQPage schema is the cheapest lift to get there. Google deprecated FAQ rich results in their classical SERP, but the structured data still helps LLMs.
Q04 What length should the cited answer be? +
60 to 120 words. Long enough to be self-contained. Short enough to lift cleanly into a context window. The full post can be 800 to 2,500 words, but the cited paragraph should stand alone.
Q05 How is this different from classical SEO? +
Classical SEO ranks a document. ChatGPT cites a sentence. The optimization unit shrinks from page to passage. See generative engine optimization for the full picture.
Q06 Can I check whether ChatGPT has cited me? +
Yes, with Profound, Ahrefs Brand Radar, DataForSEO's LLM mentions API, or by running the queries in ChatGPT yourself and inspecting the citations. Search Console does not show LLM citations.
Sources & further reading
  1. [01] documentation
  2. [02]
    How ChatGPT sources the web
    Profound · 2026
    report
  3. [03] report
  4. [04] report
  5. [05]
    Generative Engine Optimization (Princeton paper)
    Aggarwal et al., Princeton University · 2024
    research
  6. [06] tool
  7. [07]
    Article structured data
    Google Search Central · 2025
    documentation
Niko Alho
Niko Alho

I run agentic SEO and build custom AI for B2B companies. Based in Turku.

About