// TWO_JOBS

The Two Jobs of a Bot-Friendly Website

Most advice tells you to throw open the doors to AI crawlers. The sites that win in AI search are legible to the bots that cite them and deliberate about the ones that only consume them.

By Tina Chu | July 2026 | 7 min read

Non-human traffic crossed 50% of the internet for the first time this year, according to Cloudflare's network data. A little over half of every request now comes from a bot, and 52% of crawler requests exist to train AI models, up from 22% a year earlier. For most companies, the single most important visitor to their website is no longer a person. It is a crawler deciding whether to read you, remember you, and recommend you.

What is a bot-friendly website? A bot-friendly website is one that AI systems can read, understand, and cite, plus a deliberate choice about which AI companies may use your content to train their models.

TL;DR: Being bot-friendly is two jobs. Make your site readable to the AI systems that cite and recommend you, and decide deliberately whether AI companies may train on your content. For most brands, letting training crawlers in builds the AI's knowledge of you; publishers should gate or charge. Measure AI citations, not crawler visits.

Hour bar chart: of every hour spent researching online, 15 minutes reach the open web and 45 minutes stay inside search results and AI answers

That reframes what "bot-friendly" means. The phrase usually gets treated as a hospitality problem: open the doors and let the crawlers in. The reality has two halves. You want to be maximally readable to the AI systems that can cite you and send you qualified buyers. You also want to make a deliberate business decision about the ones that consume your content and return nothing. A website tuned for AI does both jobs at once.

Two-panel diagram: bots that cite you get maximum legibility, bots that consume you get a deliberate allow, block, or charge decision
// JOB_ONE

Job one: be readable to the AI that can recommend you

Start with an uncomfortable question: can AI systems actually read your website? For a surprising number of companies the answer is no, and nobody in the building knows it.

The most common cause is invisible to a human visitor. Many modern websites only assemble their content after running code inside the visitor's browser, and most AI crawlers never run that code. Your prospect sees a polished page. The AI sees an empty one. Free scanners now exist that show you exactly what an AI receives when it fetches your page, so this is a question your web team can answer in an afternoon.

The second cause is self-inflicted. Plenty of companies block the very AI systems they want to be recommended by, through permission settings someone configured years ago and nobody has revisited. If your brand never shows up in AI answers, check whether you told the AI to go away.

Once AI can get in and read you, the work is making yourself easy to quote. AI systems lift passages, not pages, so content wins when each section answers one question on its own: clear headings, short self-contained explanations, plain tables where they fit. One practitioner reported a 10% organic traffic lift from a handful of changes, and the biggest single gain came from serving AI crawlers a simplified, cleanly structured version of each page.

Your facts need the same treatment as your prose. What you do, what it costs, who it serves: state these in the structured formats machines can verify, not just in marketing copy. The reason is a new step in the buying journey that most teams have not noticed. Before an AI assistant recommends a vendor, it checks whether that vendor actually meets the buyer's requirement. It can only check claims it can read. A site that states its facts plainly qualifies for the shortlist. A site that buries them in a brand video does not.

One rule ties it together: write your knowledge pages to answer questions, not to sell. AI systems are not looking for sales copy, they are looking for clear explanations. Keep the persuasion on your marketing pages, where humans decide, and make your explainers genuinely useful.

If you take one action from this section, put these five questions to your web team this week. Can AI crawlers read our pages without running browser code? Are we blocking any AI systems by accident? Is our key business information stated in structured, machine-readable form? Does each explainer section stand on its own? And which AI systems actually cite us today, versus merely visiting us?

// JOB_TWO

Job two: be deliberate about who consumes you

Most guides stop at job one. Job two is the more interesting business decision. Being readable to AI is not the same as donating your content to it for free. The same Cloudflare report that counted the bots also documented the economics: AI crawlers increasingly take content to train models without sending any visitors back, and some heavily-crawled content categories lost up to 40% of their human traffic in under a year. Google still accounts for roughly 88% of referral traffic, which is why publishers now plan for a "Google Zero" scenario where that stream thins out.

Proportion bar chart: Google is roughly 88% of the referral traffic websites receive from other sites, everyone else combined is 12%
Bar chart of US organic search traffic declines at major tech publications, peak month versus January 2026: Digital Trends down 97%, ZDNET down 90%, The Verge down 85%, WIRED down 62%, CNET down 47%, Mashable down 30%

Slamming the doors is the wrong response. Separate the crawlers by what they give you instead. The systems that cite you, surface you in answers, and send qualified visitors deserve every ounce of readability you can offer. The ones that only ingest your content for training are a business decision: allow them, block them, or charge them. A market for that last option now exists. More than 50 licensing agreements have been signed between publishers and AI companies since 2023, and infrastructure providers now offer pay-per-crawl, metering access the way a utility meters power. The old system was a polite notice on your website that crawlers could ignore. The new controls are enforcement.

Stacked bar chart of crawler requests by purpose in June 2026: 52% AI training, 36% mixed use, 12% search that can cite you; AI training was 22% a year earlier

Which choice is right depends on what your content is for. If content is your product, meaning you are a publisher, a data or research firm, or a course business, training crawlers are extracting your inventory, and gating or charging them is a real revenue decision. If content is your marketing, which describes most B2B companies, the calculus flips: being in a model's training data is how the model comes to know your brand at all, and what a model already knows before it retrieves anything is a major driver of whether it recommends you. For those companies, blocking training bots is self-harm dressed as protection. The genuine risks are narrower: if how-to blog traffic is your main acquisition channel, AI trained on your explainers will answer those questions directly and thin that funnel, and openly published proprietary frameworks can be absorbed and repeated to anyone. The response to both is the same: move the moat to what AI cannot replicate, not to the firewall.

Either way, make the choice consciously, at the infrastructure level, rather than discover months later that you made a decision by default.

// MEASURE_CITATIONS

Measure citations, not crawls

A trap sits between the two jobs. It is easy to see AI crawlers all over your traffic logs and assume you are winning. Crawls are not citations. An AI can visit your site thousands of times without ever mentioning you to a single buyer. Visits measure interest in your content as raw material. Citations measure whether the AI actually recommends you. Track both, and treat the gap between them as your real to-do list.

// STAY_HONEST

Two things that keep this honest

Your website is necessary but not sufficient. Roughly 85% of what AI says about a brand comes from off-site sources: Reddit, YouTube, review platforms, third-party mentions. A perfectly bot-friendly site is about 15% of the picture. Fix the site because it is the part you fully control, then go do the off-site work.

None of this is a separate discipline with its own bag of tricks, either. Google's own guidance on AI search is that it is still search, done for people. Build for humans, make the result machine-readable, and skip the gimmicks that get punished later. The sites that win are readable by a person, parseable by a model, and clear about which models get in.

// HOW_WE_APPLY_THIS

How Novastacks Applies This

// FAQ

FAQ

No. Most non-Google AI crawlers fetch raw HTML without running scripts. If your content only appears after JavaScript executes, those crawlers see an empty page. Server-side rendering or static HTML fixes this.

It depends on what your content is for. If content is your product, such as publishing, data, or courses, gating or charging training crawlers is a real revenue decision. If content is your marketing, being in training data helps AI models know and recommend your brand, so blocking usually hurts you.

llms.txt is a file that points AI agents to your most important pages, like a table of contents. Adoption is real but narrow, and Google advises skipping it. It is low effort and low risk, but not a growth lever.

No. Crawls are not citations. A bot can fetch your site thousands of times without the model ever mentioning you. Measure actual citations in AI answers, not crawler activity in your logs.

Roughly 15%. About 85% of what AI says about a brand comes from off-site sources such as Reddit, YouTube, reviews, and third-party mentions. A bot-friendly site is necessary, not sufficient.

// GET_STARTED

Get a Bot-Friendly Site Audit

Find out whether AI crawlers can read and cite your content. We'll assess your technical foundation and identify quick wins.