Back to blog
6 min read
SEO

Sitemap and robots.txt: the SEO basics most people get wrong

A robots.txt file can block Google without anyone knowing. A missing sitemap prevents page discovery. Test #38 checks both in seconds.

Key Takeaways
  • Test #38 checks the presence and accessibility of sitemap.xml and robots.txt. Both present = score 100, one only = 70, neither = 20
  • If robots.txt contains "Disallow: /" for all user-agents, the score drops to 20 — your site is invisible to every search engine
  • These are basic configuration files. Fixing them takes 5 minutes, but the SEO impact is immediate and measurable

There are spectacular SEO mistakes — massive duplicate content, manual Google penalties. And then there are the silent ones. The kind that go unnoticed for months because nobody thinks to check two plain text files at the root of the site.

The sitemap.xml file tells search engines "here are the pages that exist on my site." The robots.txt file tells them "here are the areas you can access." When the first is missing, Google has to guess which pages exist. When the second is misconfigured, Google can be blocked without anyone knowing.

Orilyt's test #38 checks both files in a single pass. It verifies their presence, accessibility, consistency — and detects the critical case where robots.txt blocks all crawling. These are the SEO foundations. If they're shaky, nothing you build on top will hold.

SEO test for sitemap.xml and robots.txt: accessibility check, format validation, and crawl directive analysis

Sitemap.xml: your site's map for Google

A sitemap.xml is an XML file that lists all the URLs you want indexed. It sits at the site root (e.g., yoursite.com/sitemap.xml) and helps search engines discover your pages without having to follow every internal link.

Test #38 checks several things about the sitemap:

  1. Accessibility — is /sitemap.xml reachable (HTTP 200)? If it returns a 404 or 500, search engines cannot read it
  2. Detection via robots.txt — if robots.txt contains a "Sitemap:" directive, the test uses that URL first. This is the recommended way to declare the sitemap location
  3. Valid XML format — does the file contain a <urlset> or <sitemapindex> tag? A file that returns HTML or plain text is not a valid sitemap

Without a sitemap, Google can still index your site by following links. But it will do so more slowly, potentially missing orphan pages — those with no internal links pointing to them.

A sitemap doesn't guarantee indexation. But its absence guarantees that Google will have to guess your site's structure — and it will often guess wrong.

Robots.txt: your site's access controller

The robots.txt file is a plain text file at the site root (yoursite.com/robots.txt). It tells crawlers which parts of the site they can explore and which are off-limits.

Test #38 checks the critical aspects of robots.txt:

  1. Accessibility — is robots.txt present and reachable (HTTP 200)? Its absence isn't blocking, but having one is best practice
  2. Sitemap reference — does robots.txt contain a "Sitemap:" line pointing to sitemap.xml? This is the standard way to declare the sitemap location
  3. Total block — the critical case: if robots.txt contains "User-agent: *" followed by "Disallow: /", the entire site is blocked for all search engines. Immediate score: 20/100

The most dangerous case is also the most common: a site pushed to production with a pre-production robots.txt that blocks all crawling. The developer added "Disallow: /" to prevent staging indexation, then forgot to remove it. The site is live, works perfectly — but Google can't see it.

Common mistakes (and how to fix them)

Most sitemap and robots.txt problems come from the same source: files created once and never rechecked. Here are the most frequent mistakes:

  1. Missing sitemap — the site never had one, or the plugin that generated it was deactivated. Fix: enable the native WordPress sitemap feature (available since WP 5.5) or use an SEO plugin like Yoast or Rank Math
  2. Robots.txt blocks everything — inherited from development or staging. Fix: replace "Disallow: /" with targeted rules (block /wp-admin/ but not the rest). Verifiable in 10 seconds
  3. Outdated sitemap — the file exists but contains deleted URLs or pages returning 404. Fix: regenerate the sitemap via your SEO plugin. Most do it automatically when properly configured
  4. No sitemap reference in robots.txt — the sitemap exists, but robots.txt doesn't mention it. Fix: add a "Sitemap: https://yoursite.com/sitemap.xml" line at the end of robots.txt
  5. Wrong sitemap format — the file returns HTML instead of XML (custom error page returning a 200 status). Fix: verify that the sitemap URL returns proper XML with the correct Content-Type

All of these fixes take less than 5 minutes. The effort-to-impact ratio is exceptional: a few lines of configuration can unblock the indexation of hundreds of pages.

The business value: a quick win for every audit

For freelancers and agencies, sitemap and robots.txt problems are gold findings in a client audit. They're easy to explain, fast to fix, and visually striking in the report.

In the Orilyt report, test #38 generates concrete FIA recommendations:

  1. Fact: "No accessible sitemap.xml file" or "robots.txt blocks all search engines (Disallow: /)"
  2. Impact: "Google doesn't know your site's structure" or "No page on your site can appear in search results"
  3. Action: "Generate a sitemap via your SEO plugin and add it to robots.txt" or "Remove the Disallow: / directive from robots.txt"

The case of a robots.txt blocking everything is particularly powerful in client meetings. When you show a client that their site has been literally invisible to Google for months, the urgency is immediate. The fix takes 2 minutes. The ROI of the audit is proven on the spot.

A site with a robots.txt blocking Google is like a store with the shutters down. The building is there, the products are on the shelves — but nobody can get in.

Two files, zero excuses

Sitemap.xml and robots.txt are the two most basic files in technical SEO. They require no budget, no advanced skills, no code changes. Just a 30-second check. And yet, thousands of sites live with a missing sitemap or a robots.txt that sabotages their visibility.

Orilyt's test #38 automates this check. It detects missing files, inconsistencies between robots.txt and the sitemap, and most importantly the critical case of total crawl blocking. It's a minimal SEO hygiene check — but an essential one.

If you run audits for clients, start here. A problem found here is fixed in 5 minutes and immediately demonstrates the value of your work. It's the perfect quick win.

Check any site's sitemap and robots.txt
Run a free audit and see if the SEO foundations are in place — sitemap, robots.txt, and 56 other automated tests.
Launch a free audit
Previous Canonical, Open Graph, Hreflang: avoid duplicate content Next Checklist SEO Technique 2026 : 25 points à vérifier sur WordPress