The most damaging machine-readable web mistakes are using div-soup instead of semantic HTML, publishing invalid or missing structured data, locking content inside images or JavaScript, using inconsistent entity names, ignoring heading hierarchy, skipping validation tools, and marking up content that never appears on the page. Each mistake directly reduces how well search engines and AI systems can parse, trust, and surface your content.
Search engines have always been software parsing text. Now AI agents, large language models, and generative search surfaces are doing the same — except they go further, extracting entities, relationships, and factual claims from your pages to feed into answers and citations. If your pages are not structured for machines to read, you are invisible to a growing share of the web’s discovery layer.
These seven mistakes are not theoretical. Terry Samuels and the practitioners at Salterra University see them on client sites weekly — sites that wonder why their perfectly written content never ranks or gets cited by AI. Let’s go through each one, explain why it hurts, and show you the fix.
The mistake: Wrapping every element in a generic <div> or <span> with no semantic meaning. Navigation built from divs. Articles wrapped in divs. Sidebars, footers, headers — all divs.
Why it hurts: When a crawler or language model processes your page, semantic elements carry meaning that divs do not. An <article> tag tells a parser “this is a self-contained piece of content.” A <nav> tag signals navigation to be skipped when extracting body content. A <main> tag identifies the primary content region. Without these signals, parsers have to guess — and they often guess wrong, pulling in boilerplate navigation text instead of your actual article, or failing to identify where the real content begins.
The fix: Audit your HTML output and replace meaningless wrapper divs with the correct semantic elements: <header>, <main>, <article>, <section>, <aside>, <nav>, <footer>. Use <figure> and <figcaption> for images with captions. Use <time datetime=””> for publication dates. The HTML5 element specification exists precisely so machines know what they are looking at.
The mistake: Publishing pages with no schema.org markup at all, or copy-pasting JSON-LD snippets that contain errors, outdated properties, or mismatched types.
Why it hurts: Structured data is the fastest and most explicit way to communicate factual claims to machines. A valid Article schema with a named author and datePublished helps Google confirm your E-E-A-T signals. A FAQPage schema makes your questions eligible for featured placements. Invalid markup — unclosed brackets, wrong property names, required fields left blank — gets silently ignored by parsers. You put in the work and get none of the benefit.
The fix: Implement structured data for every major content type on your site: Article (or BlogPosting) for editorial content, BreadcrumbList for navigation context, Organization and Person on your About page, FAQPage where appropriate. After implementing, run every page through Google’s Rich Results Test and Schema Markup Validator. Fix every error before publishing. Treat structured data like code — it either works or it does not.
The mistake: Placing important text — key statistics, product specifications, contact information, pricing — inside images, or rendering body content exclusively through client-side JavaScript that requires execution to appear.
Why it hurts: Most AI systems and many crawlers do not execute JavaScript or read image pixels as text. Even Googlebot, which does render JavaScript, crawls rendered pages on a delayed secondary pass. Content that lives only inside an image is invisible to every machine that cannot perform OCR — which is most of them. Content that requires JavaScript execution may never be indexed by crawlers operating at scale, and is almost certainly invisible to AI agents pulling raw HTML for analysis.
The fix: Keep all substantive content in raw HTML. Use alt text on every image, but do not treat alt text as a substitute for putting important information in the document itself. If you use a JavaScript framework, implement server-side rendering (SSR) or static site generation so the full HTML is delivered on the first byte. Check your pages by viewing source — if the content is not in the source HTML, machines cannot reliably read it.
The mistake: Referring to the same person, brand, place, or concept under different names across your site and across the web — “Terry Samuels” on one page, “T. Samuels” on another, “Terry S.” in a byline, and a completely different handle in social profiles.
Why it hurts: Search engines and AI systems build knowledge graphs by linking mentions of the same entity across sources. Inconsistent naming breaks those links. Google cannot confidently connect your author bylines to a Knowledge Panel. AI systems sourcing factual claims about your organization may fail to recognize that two mentions refer to the same entity, diluting your authority signals and making co-citation analysis unreliable.
The fix: Choose a canonical form for every important entity — your name, your brand, your products, your location — and use it consistently everywhere: your website, your structured data, your social profiles, your guest posts, your press mentions. In schema markup, use the sameAs property to explicitly link your entity to its authoritative references: Wikipedia, Wikidata, LinkedIn, Google Business Profile. Consistency is how machines build trust.
The mistake: Using heading tags in whatever size looks best visually — skipping levels, using multiple H1s, using H4s where H2s belong, or styling non-heading text with heading tags to make it bold and large.
Why it hurts: Heading tags are a machine-readable outline of your document. Crawlers and AI systems use them to understand the topic structure of a page — what the page covers, what subtopics it contains, how ideas relate to each other. A broken heading hierarchy produces a broken outline. The parser cannot identify your main topic, your subtopics, or the logical flow of your argument. This directly affects how your content is chunked and indexed for retrieval.
The fix: One H1 per page, matching the primary topic. H2s for major sections. H3s for subsections within those sections. Never skip a level. Never use heading tags for decorative text. Your headings should read like a table of contents — if you extracted only the heading text, a reader (or machine) should understand exactly what the page covers and how it is organized. Use your browser’s accessibility inspector to audit heading structure before publishing.
The mistake: Writing schema markup and semantic HTML by hand — or generating it with a tool — and then never testing it. Assuming it works because it was published.
Why it hurts: Structured data errors are silent. A malformed schema does not throw a visible error on the page. It just gets ignored. Invalid HTML passes visually in every browser. Meanwhile, your carefully written Article schema has a misspelled property name and has been returning zero signals to Google for six months. You have no idea, because you never checked.
The fix: Build testing into your publishing workflow, not as an afterthought. Use Google’s Rich Results Test for schema validation. Use the W3C HTML Validator for markup quality. Use Google Search Console’s Enhancement reports to monitor structured data errors at scale. Set a calendar reminder to audit structured data health monthly. Many practitioners also use browser extensions like Structured Data Testing Tool (third-party) to check markup during development before pages go live.
The mistake: Adding structured data properties — review scores, prices, product availability, author credentials — that do not correspond to content a human can actually see on the page.
Why it hurts: Google’s spam policies explicitly prohibit structured data that describes content not present on the page. This is called “misleading structured data” and it can result in manual actions that strip your rich result eligibility entirely. Beyond penalties, it is a trust problem: if your schema claims a 4.9-star rating but no reviews appear on the page, you are asking machines to surface claims you cannot substantiate. AI systems and crawlers increasingly cross-reference structured data against visible page content to detect this pattern.
The fix: Apply a simple rule — every claim in your structured data must be visible to a human reading the page. If you mark up author, the author’s name must appear in the byline. If you mark up aggregateRating, the rating and review count must appear in the visible content. If you mark up a dateModified, the update date should be visible on the page. Structured data should describe the page. It should never be used to inject claims the page does not make.
It means structuring your content so that automated systems — search engine crawlers, AI language models, and data agents — can parse it accurately without human interpretation. This involves using semantic HTML elements with clear meaning, implementing schema.org structured data, maintaining consistent entity references, and organizing content with a logical heading hierarchy. The goal is to remove ambiguity from your pages so machines extract exactly what you intend.
Structured data is not a direct ranking factor in the traditional sense, but it influences rankings indirectly in several ways. It enables rich results that increase click-through rates. It clarifies entity relationships that contribute to E-E-A-T signals. It helps AI-powered search surfaces cite and surface your content accurately. Sites that implement structured data correctly consistently outperform those that do not in competitive niches — particularly in AI-generated answer results and featured snippet placements.
Yes, and the two work together rather than substituting for each other. Semantic HTML gives parsers structural context — where the article starts, where navigation ends, what the main content region is. Structured data gives parsers factual claims — who wrote it, when, what it is about. Removing either one leaves machines with incomplete information. Strong machine-readable pages use both: clean semantic markup as the foundation and accurate structured data as the explicit annotation layer on top.
The practitioners at Salterra Digital Services have built a full curriculum covering semantic HTML, schema.org implementation, entity optimization, and AI search readiness. Salterra University teaches these skills through practitioner-led training grounded in real client work — not generic SEO theory. If you want to build machine-readable pages that perform in both traditional and AI-powered search, that is exactly where the training lives.
Terry has 30+ years in software and SEO. He’s the founder of Salterra Digital Services and SEO Spring Training, host of the Roundtable SEO Mastermind, and lead instructor at SEO University — teaching the exact tactics his team uses on client work.
This guide is one lesson from the The Machine-Readable Web course. Get every lesson, framework and checklist — plus the full 38-course catalog — inside SEO University.
Practitioner-focused training across the full digital marketing stack — from technical SEO to conversion optimization and the AI search era. By Salterra Digital Services, since 2011.