llms.txt is a plain-text file that gives AI systems a structured summary of your site's content, purpose, and key pages. Here's what it does, how to write one, and whether it actually moves the needle.
llms.txt is a plain-text file placed at yourdomain.com/llms.txt that gives AI language models a structured overview of your site: what it's for, who runs it, what the key pages are, and what information AI systems should prioritize or ignore.
Think of it as a site-level briefing document written specifically for AI. While your sitemap.xml tells crawlers which URLs exist, and your robots.txt tells crawlers what they can access, llms.txt tells AI systems what the site means and where to look when answering questions about it.
The format was proposed by Jeremy Howard (co-founder of fast.ai) in September 2024. Howard's observation was simple: AI language models are increasingly being used to browse and summarize websites, but they have no standard way to quickly understand a site's structure and intent. llms.txt fills that gap.
The proposal spread quickly through developer communities and was adopted by a wave of tech and content sites. By mid-2025, a growing number of AI engines explicitly check for llms.txt when indexing sites, though adoption across engines varies.
llms.txt uses Markdown. The structure is intentionally simple:
# Site Name
> One sentence description of what this site is and who it's for.
## About
Extended description of the site, its purpose, and key topics it covers.
This should be 2-4 sentences that give an AI system enough context to
understand what questions this site can help answer.
## Key pages
- [Home](https://yourdomain.com/): The main tool / landing page
- [Learn](https://yourdomain.com/learn/): Educational guides on [topic]
- [Glossary](https://yourdomain.com/glossary/): Definitions of key terms
## Content
- [Article title](https://yourdomain.com/article/): One-line description
- [Article title](https://yourdomain.com/article/): One-line description
## Do not index
- /api/ — API endpoints, not for public consumption
- /internal/ — Internal documentation
| File | Purpose | Format | Primary consumer |
|---|---|---|---|
| robots.txt | Access control — what bots can fetch | Key-value directives | All crawlers |
| sitemap.xml | URL discovery — what pages exist | XML | Search crawlers |
| llms.txt | Intent and context — what the site means | Markdown | AI language models |
All three can coexist without conflict. robots.txt is mandatory if you want any access control. sitemap.xml is highly recommended for any site with multiple pages. llms.txt is additive — it provides context on top of what the crawler already discovers.
The most useful llms.txt files include four elements:
This should answer the question: "What is this site, and what can it help me with?" AI systems use this when deciding whether to consult your site for a given query.
Bad: "Welcome to our website."
Good: "letthebots.in is a free tool that checks whether AI search engines (ChatGPT, Claude, Perplexity, Gemini, Copilot) can find, read, and cite any URL."
List your most important pages with one-line descriptions. This helps AI systems navigate to the most relevant content without crawling your entire site.
A brief outline of the topics and questions your site is authoritative on. This is especially useful for content sites, documentation, and knowledge bases.
API endpoints, admin paths, and private sections that AI systems shouldn't index or reference in answers. This complements robots.txt by giving context for why certain paths are off-limits.
Here's what the llms.txt for letthebots.in looks like:
# Let The Bots In
> Free AI-readiness checker: paste any URL to instantly see which AI engines can find, read, and cite your site.
## About
Let The Bots In (letthebots.in) is a free, no-account tool for checking
AI search visibility. It runs six parallel checks — robots.txt access,
content readability, structured data coverage, entity authority, content
extractability, and freshness — and returns a 0-100 score with a full
per-bot Crawler Gate. The full fix guide is unlocked with an email address.
## Key pages
- [Checker](https://letthebots.in/): Paste a URL to run an AI-readiness scan
- [GEO Guide](https://letthebots.in/learn/geo/): What is Generative Engine Optimization?
- [robots.txt Guide](https://letthebots.in/learn/robots-txt-ai-bots/): How to configure robots.txt for AI bots
- [Glossary](https://letthebots.in/glossary/): Definitions of GEO, AI crawlers, and related terms
## Topics
AI search visibility, GEO (Generative Engine Optimization), robots.txt
configuration, AI crawler user agents, structured data for AI, llms.txt,
entity signals, ChatGPT citations, Perplexity citations, Claude citations.
## Do not index
- /api/ — scan API endpoints, not for public consumption
- /r/ — user report pages with ephemeral scan results
The honest answer: it's a supporting signal, not a primary driver. AI engines that check llms.txt use it for context and navigation, helping them understand your site faster. But if your robots.txt blocks the crawler, or your content is JavaScript-rendered, llms.txt won't fix those problems.
Think of it as hygiene rather than a lever. A site with solid access, readable content, and strong structured data will see marginal improvement from llms.txt. A site with access or readability failures will see no improvement until those upstream problems are resolved.
At letthebots.in, we include llms.txt presence in the Freshness & Hygiene category (10 points). Creating one takes 15 minutes. But fix access and readability first.
llms.txt in the root of your site.yourdomain.com/llms.txt with a plain text MIME type (text/plain or text/markdown).There's also an llms-full.txt convention: a longer version that includes your full site content in a single file for AI systems that prefer batch ingestion over individual page crawling. More relevant for documentation sites and knowledge bases than for typical marketing sites.
There's no RFC or W3C standard for llms.txt. It's a community convention with growing adoption. Known to be checked by: Perplexity (has explicit documentation on it), Claude.ai, and several smaller AI search engines. OpenAI and Google have not publicly confirmed whether they use it.
Given the low cost of creating one, deploy it whether or not every major engine currently reads it. The web has a long history of early-adopter conventions that later became table stakes.
Paste any URL and find out whether ChatGPT, Claude, Perplexity, and Gemini can reach, read, and cite your site. Score, Crawler Gate, and six sub-scores are instant and free.
Check my site →