NEW!

Sitemap Comparator

Diff two XML sitemaps in your browser. Find URLs only in A, only in B, common to both, and likely renames via similarity matching. 100% local.

100% Private & Secure

All processing happens locally in your browser. Your files never leave your device.

Client-Side Processing No Server Uploads No Registration Required
Examples
Comparison options
Drop a sitemap.xml file or click to browse
Drop a sitemap.xml file or click to browse

All processing happens locally in your browser. Sitemap URLs never leave your device.

Keywords

sitemap diffcompare sitemapssitemap comparison toolfind renamed urlsurl similaritysite migration auditxml sitemap diffseo migration checklist

Need something else?

How to use

1

Paste the first sitemap XML into the Sitemap A panel, or drop a saved sitemap.xml file onto the drop zone. Sitemap A is usually the 'before' state — your current production sitemap, the source you are migrating from, or a competitor snapshot.

2

Paste or upload the second sitemap into the Sitemap B panel. Sitemap B is usually the 'after' state — the new sitemap, staging build, or target you are migrating to.

3

Open Comparison options if you want to adjust how URLs are normalized. The defaults (ignore case, strip trailing slash, drop www., ignore http vs https, sort query parameters) catch most cosmetic differences. The similarity threshold defaults to 0.85 — raise it to be stricter, lower it to surface more potential renames.

4

Click Compare sitemaps. The tool parses both XML inputs, normalizes every URL, and classifies them into four buckets: Common, Only in A, Only in B, and Similar.

5

Open each tab to review the results. The Similar tab pairs likely renames between the Only-in-A and Only-in-B sets — these are your redirect candidates. Export any tab to CSV to feed it into your redirect map, spreadsheet, or CMS migration tool.

Features

Four-Bucket Diff With Live Stats

Every URL pair is classified instantly into Common, Only in A, Only in B, or Similar. A live stats bar shows counts so you can size the migration at a glance — useful when triaging which redirect groups to handle first.

Similarity Detection for Renames

After normalization, the remaining diffs are run through a Levenshtein-based similarity matcher. Slug renames (/pricing-2024/ → /pricing/), article URL changes, and CMS path migrations surface as paired suggestions instead of disappearing into the Only-in-A / Only-in-B columns.

Configurable URL Normalization

Five normalization toggles let you tune how aggressive the matching is: ignore case, strip trailing slash, drop www., ignore http vs https, and sort query parameters. Each can be turned off when you specifically need to compare those dimensions.

Sitemap Index Aware

If either input is a <sitemapindex> rather than a <urlset>, the tool detects this and surfaces a clear warning, parsing the index entries so you know which child sitemaps to feed in next.

CSV Export for Every Bucket

Each result tab has a CSV export button. Drop the resulting file into Sheets, Excel, or any redirect-map builder to plan the migration without retyping URLs.

Why Choose This Tool?

Your URLs Never Leave Your Browser

The sitemap files you compare often include staging URLs, pre-launch content, gated client work, or competitor crawls you do not want sitting in a third-party SaaS log. Every byte of XML stays in your browser memory — there is no upload step, no network call, no server-side persistence.

Built For Real Migration Workflows

Most online sitemap diff tools give you a flat A vs B comparison and leave the redirect mapping as homework. The similarity tab does that homework for you — pairing likely renames so the export becomes the first draft of your 301 map instead of a starting point.

Handles Cosmetic Noise Out Of The Box

Trailing slashes, www vs apex, http vs https, query parameter order, case differences — these are usually not real differences between sitemaps, just inconsistencies in how URLs are emitted. Smart normalization treats them as equal by default, so the diff focuses on URLs that genuinely changed.

Transparent, Open-Source Logic

The comparison runs from the open-source @anthropic-tools/tools-core library shared with the REST API. You can audit exactly how URLs are parsed, normalized, and matched — no black-box scoring, no hidden weighting.

Sitemap Diffing for Site Migrations: A Practical Survival Guide

Why Sitemap Diffs Are A Migration Cornerstone

Site migrations — domain changes, CMS swaps, redesigns, locale rollouts, IA overhauls — share a single failure mode: URLs that existed in the old site but are not redirected in the new one. Every unmapped URL is a 404 the moment the migration lands, which means lost crawl equity, lost ranking, lost referral traffic, and, if you depend on organic search, lost revenue. The job of the migration is not just to ship the new pages; it is to make sure every URL that mattered before still resolves to something useful after.

Sitemap diffing is the cheapest, fastest way to spot the gap. An XML sitemap is the site's own declaration of what it considers crawlable and indexable. Compared across a migration boundary — old sitemap vs new sitemap — the diff tells you exactly which URLs were left behind, which are new, and which look like renames that need explicit 301s. Crawler tools like Screaming Frog or Sitebulb give richer signals (status codes, response headers, internal link counts) but they are slower to run, more expensive, and overkill for a fast sanity check the day before a launch.

What URL Normalization Buys You

Two sitemaps generated by different systems will almost never agree on cosmetic details: one emits trailing slashes, the other does not; one uses www., the other does not; one orders query parameters alphabetically, the other in declaration order; one writes hostnames in lowercase, the other uppercases the protocol. None of these are real differences in how the page is served — almost every server canonicalizes them — but a naive string comparison treats them as completely different URLs.

Normalization collapses those cosmetic variants into a single canonical form before the comparison runs. By default this tool lowercases the host, strips trailing slashes on non-root paths, drops the www. prefix, ignores http vs https, and sorts query parameters alphabetically. The diff then operates on the normalized form, so URLs that only differ in cosmetic ways land in the Common bucket where they belong. Each normalization is a separate toggle: if you specifically care about, say, http vs https as a real difference, turn the protocol normalization off and see those URLs split into the Only-in-A and Only-in-B buckets.

Similarity Matching: Catching Renames Automatically

The hardest part of a migration audit is not the URLs that vanished or the URLs that were added — it is the URLs that were renamed. A slug change from /pricing-2024/ to /pricing/, a category restructure from /blog/category/foo/ to /articles/foo/, or a CMS migration that swaps /products/widget/ for /p/widget/ all look like one URL disappearing and another appearing. Without explicit help, they bury themselves in the Only-in-A and Only-in-B columns, easy to overlook.

After the normalized diff is computed, this tool runs Levenshtein-distance similarity across the still-different URLs from both sides. For each unmatched URL in A, it searches for the closest match in B and pairs them if the similarity is above the threshold (0.85 by default — meaning the two URLs must agree on roughly 85% of their characters after normalization). The result is the Similar tab: a side-by-side list of probable rename pairs with the similarity percentage, ready to be exported and turned into 301 rules.

Greedy uniqueness keeps the pairing clean: once a URL in B is paired with a URL in A, it is removed from the candidate pool. This means each A-URL gets at most one suggested B-target and vice versa — the suggestions are useful as a draft redirect map rather than a noisy any-to-any matrix. Tune the threshold up if you see false positives, down if you suspect real renames are being missed.

Reading The Diff: A Practical Workflow

The fastest way to use the output is a four-step pass:

  • Only in A → check for redirect coverage. Every URL here exists in the old sitemap but not the new one. If there is a matching entry in the Similar tab, plan a 301 to its B-side counterpart. If not, decide: was the page intentionally retired (in which case a 410, or a 301 to the closest topical replacement), or was it accidentally orphaned (in which case fix the migration before launch)?
  • Similar → review and approve. Each suggested pair is a draft redirect. Eyeball the similarity score: anything above 0.90 is usually a confident rename; 0.85–0.90 deserves a quick human check; below 0.85 (if you lowered the threshold) often pairs unrelated URLs that happen to share a substring.
  • Only in B → check for orphans and pollution. New URLs are expected, but watch for staging-only URLs that should not have made it into the production sitemap, faceted-navigation pages that should be noindex, or duplicates from a sloppy URL builder. The Only-in-B bucket is also where you spot pages that exist on the new site but have no internal link from the old site's content — those will need editorial backfills.
  • Common → spot-check for trailing-slash and protocol drift. Even with normalization, glance at the Common tab to confirm the URLs you expect on both sides actually landed there. A surprisingly small Common bucket is usually a sign that one sitemap has a systematic cosmetic difference your normalization toggles do not currently cover.

Sitemap Indexes vs URL Sets

The sitemaps.org protocol allows two root types: <urlset>, which lists actual page URLs, and <sitemapindex>, which lists the URLs of other sitemap files. Large sites that exceed the 50,000-URL or 50 MB per-sitemap limit split their sitemaps into multiple chunks and ship an index. If you feed a sitemap index into the comparator, the diff will not be on real page URLs — it will be on the URLs of the child sitemaps themselves, which is rarely what you want.

The tool detects this case and surfaces a warning at the top of the affected panel. The fix is to download each child sitemap separately and feed them in pairs, or to merge them locally before comparing. Future versions may handle index expansion automatically; for now the explicit warning prevents the silent footgun of comparing index entries instead of page URLs.

Where Sitemap Diffs Fit In A Bigger Migration Plan

A sitemap diff is a necessary but not sufficient step. The strongest migration audits combine: (a) a sitemap diff like this tool to surface the URL-level deltas; (b) a server-log analysis to spot URLs Google crawls that are not in the sitemap at all (often the long tail with the most accumulated equity); (c) a Google Search Console export of top-performing pages to confirm none of the top traffic URLs are in the Only-in-A bucket without a redirect; (d) a post-launch crawl with status-code checks to verify every planned redirect is actually wired up correctly in production. Treat the sitemap diff as the first 80% of the audit you can run in 30 seconds, then layer the more expensive techniques on top for the high-stakes pages.

Privacy: Why Local Processing Matters Here

Sitemap files routinely include pre-launch URLs, gated content, staging environments, client work under NDA, and competitor crawls you would rather not have logged by a third-party service. Browser-side processing — the entire pipeline runs in the same tab that loaded the page — means the XML you paste never crosses a network boundary. You can verify this in your browser's DevTools network panel: after the page loads, the Compare button triggers zero HTTP requests. The same comparison logic is available as an open-source library if you need to run it in a CI pipeline or a Node.js script, but the web tool itself is fully self-contained.

Frequently Asked Questions

Is my sitemap data sent to a server?

No. Parsing, normalization, and matching all run inside your browser tab using JavaScript loaded from a static site. You can confirm this in your browser's network panel — clicking Compare triggers no outbound requests.

What is the maximum sitemap size I can compare?

There is no hard cap, but very large sitemaps (tens of thousands of URLs on each side) will run the similarity step against an internal pair cap to keep the comparison fast. When that cap is hit, the tool surfaces a warning so you know some similarity suggestions may be missing.

Does it support <sitemapindex> files?

Partially. The tool detects when the input is a sitemap index and parses its child-sitemap URLs, but it does not fetch those children automatically (browsers block cross-origin XML fetches). Download each child sitemap separately and feed them in pairs, or merge them locally first.

What does the similarity threshold actually mean?

It is the minimum normalized Levenshtein similarity (1 minus edit-distance divided by max length) required to pair two URLs as a likely rename. 0.85 means the two URLs must share roughly 85% of their characters after normalization. Raise it for stricter matching, lower it to surface more candidates.

Why are URLs normalized before comparison?

Sitemaps from different systems often disagree on cosmetic details — trailing slashes, www prefixes, http vs https, query parameter ordering, case. None of those are usually real differences in how the page is served. Normalization collapses them so the diff focuses on URLs that genuinely changed.

Which XML formats are supported?

Standard sitemaps.org 0.9 schema with <urlset> and <url><loc> entries, optionally including <lastmod>. Sitemap indexes (<sitemapindex> with <sitemap><loc>) are recognized but flagged with a warning. Image and video sitemap extensions are parsed for their <loc> values but the extra metadata is currently ignored.

What happens to <lastmod>, <changefreq>, and <priority>?

<lastmod> is preserved and shown in the result table alongside each URL. <changefreq> and <priority> are not part of the comparison — Google has publicly stated it ignores them, and the diff is on URLs not metadata.

What is the CSV format on export?

Common, Only-in-A, and Only-in-B exports have two columns: url and lastmod. The Similar export has four: url_a, url_b, similarity (0–1, four decimals), and edit_distance. Values containing commas, quotes, or newlines are properly quoted per RFC 4180.

Learn more