by blakelapides
Share
Share
Your technical SEO audit passed. Screaming Frog is clean, Lighthouse scores are solid, and your Googlebot rendering report looks fine. But there’s a category of crawler most audit frameworks weren’t built to account for – and for JS-heavy e-commerce stacks, it’s a silent GEO liability that compounds with every piece of content you publish.
AI crawlers are not Googlebot. The rendering assumptions baked into your technical SEO tooling do not apply to them.
How AI Crawlers Differ from Googlebot
Googlebot uses a Chrome-based rendering engine with a queued JavaScript execution pipeline. It processes dynamic content, waits on async calls, and eventually sees most of what a user in a browser would see. This rendering infrastructure took Google years to build and is actively maintained.
Most AI crawlers – including GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and the majority of LLM training and real-time retrieval bots – do not execute JavaScript. They fetch the raw HTML response. If your product descriptions, buying guides, gemstone education content, or structured editorial copy is rendered client-side via React, Vue, or similar frameworks, those AI crawlers receive an empty application shell.
For a DTC fine jewelry retailer whose key content – the 4 Cs explainer, the ring configurator’s educational copy, the diamond certification guide – is loaded via client-side fetch calls or rendered in a SPA, this means AI platforms may be training on or retrieving a content-empty version of your most authoritative pages. The gap between what Googlebot sees and what GPTBot sees can be substantial, and it’s invisible in most audit workflows.
Running an AI Crawler Audit
The audit requires checking what an HTTP-only client – no JavaScript execution – receives from your key content pages. Several methods work:
curl-based raw HTML inspection. Run the following against your highest-priority content pages:
curl -A "GPTBot/1.0" https://yourdomain.com/your-content-page/ | grep -i "h1\|h2\|h3\|description\|schema"
Examine the raw HTML response. Is the core content present in the initial payload? Are your H2 headings, product description copy, and schema markup visible without JavaScript execution? If you see an app shell with a `
` and minimal content, you have a rendering problem with real GEO consequences.
Google’s URL Inspection Tool as a proxy. The “View Tested Page” output in Google Search Console shows what Googlebot received – the closest widely-available proxy for a JS-rendering-capable crawler. Compare that against your raw curl output. Content present in Googlebot’s view but absent in the curl response represents your AI crawler visibility gap.
Screaming Frog with JavaScript rendering disabled. Run a secondary crawl of your domain with the “Spider JavaScript” option turned off. Compare the output – word count per page, heading structure, schema detection – against your standard rendered crawl. Pages showing significant content loss in the non-rendered crawl are your priority remediation targets.
Prioritize auditing: cornerstone buying guides, diamond and gemstone education pages, category landing pages with substantial editorial content, and any pages you’re actively targeting for AI citations.
CDN and Cloudflare Bot-Blocking Edge Cases
Rendering is the second problem. The first – and more immediate – is whether AI crawlers are reaching your origin server at all.
Cloudflare’s bot management tools and similar WAF configurations frequently use aggressive Bot Score thresholds that flag legitimate AI crawlers alongside malicious bots. If your Cloudflare configuration applies a JS challenge or outright blocks requests scoring above a threshold, GPTBot and ClaudeBot may never receive a response from your server. The rendering question becomes moot.
To check: navigate to Cloudflare’s Security > Events log and filter for firewall events. Look for blocked or challenged requests from known AI crawler user agents. If you’re seeing blocked events from the agents below, you have a zero-content problem that precedes any rendering discussion.
AI crawler user agents to review and allow-list:
- `GPTBot` – OpenAI’s web crawler
- `ClaudeBot` – Anthropic’s crawler
- `PerplexityBot` – Perplexity AI
- `GoogleOther` – Google’s AI-related and non-Search crawlers
- `FacebookBot` – Meta’s AI-training crawler
- `Bytespider` – ByteDance / TikTok AI systems
Allowing these user agents through your WAF does not materially increase security risk. These are legitimate crawlers from major platforms with published IP ranges and acceptable use policies. Blocking them is an invisible self-inflicted GEO penalty.
Remediation Paths
If your audit surfaces an AI visibility problem, remediation options range in complexity and organizational lift:
Static site generation or incremental static regeneration. The cleanest long-term solution for content-heavy pages. Pages pre-rendered at build time serve complete HTML to all crawlers regardless of JS capability. Next.js ISR and similar patterns work well for product category pages with relatively stable content structures.
Server-side rendering for high-value content pages. If a full SSG migration isn’t feasible, prioritize SSR for your highest-value content: buying guides, educational resources, cornerstone category pages. Product pages with real-time inventory and pricing can remain client-side rendered without meaningful GEO loss – AI systems are less interested in dynamic transactional data than in evergreen editorial content.
Pre-rendering services. Prerender.io, Rendertron, and similar tools serve cached, fully-rendered HTML to identified crawlers while maintaining your SPA architecture for users. This is a lower-lift solution with known tradeoffs: cache freshness management and user-agent detection configuration require ongoing maintenance.
Inline critical content in the initial HTML payload. For teams that can’t immediately implement SSR or pre-rendering, ensuring that key heading structure, above-the-fold editorial content, and schema markup are present in the raw HTML response – even if supplementary content loads dynamically – reduces the AI visibility impact without a full architecture change.
Prioritizing the Fix
Not every page requires SSR remediation. Prioritize based on two criteria: content value (how much authoritative, citable content lives on this page, and is it content AI platforms would extract when answering relevant queries?) and crawl exposure (how discoverable is this page via external links, sitemaps, and known crawl paths?). Pages scoring high on both are your immediate remediation targets.
Run your AI crawler audit this week. Start with curl against your five most important content pages. The gap between your rendered content and your raw HTML response is your current AI visibility ceiling – and for most JS-heavy stacks, that gap is significantly larger than the team expects.
llms.txt is a proposed standard for giving AI systems curated context about your site. Here's what it does, how to build one, and when it actually matters.
GEO has no position #1. Visibility is citation frequency across prompts, not a fixed rank. Here's the emerging measurement framework and how to implement it.
Most AI-generated brand descriptions are inaccurate. Here's how to audit what LLMs say about your brand and run a systematic correction workflow that sticks.
Nearly 70% of Google searches end with no click. Here's how to redefine organic success when traffic is no longer the signal that matters.

