7 Machine-Readable Web Mistakes That Kill Your Results

The 7 Biggest Machine-Readable Web Mistakes

In this guide

The 7 Biggest Machine-Readable Web Mistakes
Mistake 1: Div-Soup Instead of Semantic HTML
Mistake 2: Missing or Invalid Structured Data
Mistake 3: Content Locked Inside Images or JavaScript
Mistake 4: Inconsistent Entity Names and References
Mistake 5: No Clear Heading Hierarchy
Mistake 6: Skipping Validation and Testing Tools
Mistake 7: Marking Up Content That Is Not Visible on the Page

The most damaging machine-readable web mistakes are using div-soup instead of semantic HTML, publishing invalid or missing structured data, locking content inside images or JavaScript, using inconsistent entity names, ignoring heading hierarchy, skipping validation tools, and marking up content that never appears on the page. Each mistake directly reduces how well search engines and AI systems can parse, trust, and surface your content.

Search engines have always been software parsing text. Now AI agents, large language models, and generative search surfaces are doing the same — except they go further, extracting entities, relationships, and factual claims from your pages to feed into answers and citations. If your pages are not structured for machines to read, you are invisible to a growing share of the web’s discovery layer.

These seven mistakes are not theoretical. Terry Samuels and the practitioners at Salterra University see them on client sites weekly — sites that wonder why their perfectly written content never ranks or gets cited by AI. Let’s go through each one, explain why it hurts, and show you the fix.

Mistake 1: Div-Soup Instead of Semantic HTML

The mistake: Wrapping every element in a generic <div> or <span> with no semantic meaning. Navigation built from divs. Articles wrapped in divs. Sidebars, footers, headers — all divs.

Why it hurts: When a crawler or language model processes your page, semantic elements carry meaning that divs do not. An <article> tag tells a parser “this is a self-contained piece of content.” A <nav> tag signals navigation to be skipped when extracting body content. A <main> tag identifies the primary content region. Without these signals, parsers have to guess — and they often guess wrong, pulling in boilerplate navigation text instead of your actual article, or failing to identify where the real content begins.

The fix: Audit your HTML output and replace meaningless wrapper divs with the correct semantic elements: <header>, <main>, <article>, <section>, <aside>, <nav>, <footer>. Use <figure> and <figcaption> for images with captions. Use <time datetime=””> for publication dates. The HTML5 element specification exists precisely so machines know what they are looking at.

Mistake 2: Missing or Invalid Structured Data

The mistake: Publishing pages with no schema.org markup at all, or copy-pasting JSON-LD snippets that contain errors, outdated properties, or mismatched types.

Why it hurts: Structured data is the fastest and most explicit way to communicate factual claims to machines. A valid Article schema with a named author and datePublished helps Google confirm your E-E-A-T signals. A FAQPage schema makes your questions eligible for featured placements. Invalid markup — unclosed brackets, wrong property names, required fields left blank — gets silently ignored by parsers. You put in the work and get none of the benefit.

The fix: Implement structured data for every major content type on your site: Article (or BlogPosting) for editorial content, BreadcrumbList for navigation context, Organization and Person on your About page, FAQPage where appropriate. After implementing, run every page through Google’s Rich Results Test and Schema Markup Validator. Fix every error before publishing. Treat structured data like code — it either works or it does not.

Mistake 3: Content Locked Inside Images or JavaScript

The mistake: Placing important text — key statistics, product specifications, contact information, pricing — inside images, or rendering body content exclusively through client-side JavaScript that requires execution to appear.

Why it hurts: Most AI systems and many crawlers do not execute JavaScript or read image pixels as text. Even Googlebot, which does render JavaScript, crawls rendered pages on a delayed secondary pass. Content that lives only inside an image is invisible to every machine that cannot perform OCR — which is most of them. Content that requires JavaScript execution may never be indexed by crawlers operating at scale, and is almost certainly invisible to AI agents pulling raw HTML for analysis.

Prefer the guided path? This is one lesson from the The Machine-Readable Web course — get the complete step-by-step system with every lesson and template.

Explore the course →

The fix: Keep all substantive content in raw HTML. Use alt text on every image, but do not treat alt text as a substitute for putting important information in the document itself. If you use a JavaScript framework, implement server-side rendering (SSR) or static site generation so the full HTML is delivered on the first byte. Check your pages by viewing source — if the content is not in the source HTML, machines cannot reliably read it.

Mistake 4: Inconsistent Entity Names and References

The mistake: Referring to the same person, brand, place, or concept under different names across your site and across the web — “Terry Samuels” on one page, “T. Samuels” on another, “Terry S.” in a byline, and a completely different handle in social profiles.

Why it hurts: Search engines and AI systems build knowledge graphs by linking mentions of the same entity across sources. Inconsistent naming breaks those links. Google cannot confidently connect your author bylines to a Knowledge Panel. AI systems sourcing factual claims about your organization may fail to recognize that two mentions refer to the same entity, diluting your authority signals and making co-citation analysis unreliable.

The fix: Choose a canonical form for every important entity — your name, your brand, your products, your location — and use it consistently everywhere: your website, your structured data, your social profiles, your guest posts, your press mentions. In schema markup, use the sameAs property to explicitly link your entity to its authoritative references: Wikipedia, Wikidata, LinkedIn, Google Business Profile. Consistency is how machines build trust.

Mistake 5: No Clear Heading Hierarchy

The mistake: Using heading tags in whatever size looks best visually — skipping levels, using multiple H1s, using H4s where H2s belong, or styling non-heading text with heading tags to make it bold and large.

Why it hurts: Heading tags are a machine-readable outline of your document. Crawlers and AI systems use them to understand the topic structure of a page — what the page covers, what subtopics it contains, how ideas relate to each other. A broken heading hierarchy produces a broken outline. The parser cannot identify your main topic, your subtopics, or the logical flow of your argument. This directly affects how your content is chunked and indexed for retrieval.

The fix: One H1 per page, matching the primary topic. H2s for major sections. H3s for subsections within those sections. Never skip a level. Never use heading tags for decorative text. Your headings should read like a table of contents — if you extracted only the heading text, a reader (or machine) should understand exactly what the page covers and how it is organized. Use your browser’s accessibility inspector to audit heading structure before publishing.

Mistake 6: Skipping Validation and Testing Tools

The mistake: Writing schema markup and semantic HTML by hand — or generating it with a tool — and then never testing it. Assuming it works because it was published.

Why it hurts: Structured data errors are silent. A malformed schema does not throw a visible error on the page. It just gets ignored. Invalid HTML passes visually in every browser. Meanwhile, your carefully written Article schema has a misspelled property name and has been returning zero signals to Google for six months. You have no idea, because you never checked.

The fix: Build testing into your publishing workflow, not as an afterthought. Use Google’s Rich Results Test for schema validation. Use the W3C HTML Validator for markup quality. Use Google Search Console’s Enhancement reports to monitor structured data errors at scale. Set a calendar reminder to audit structured data health monthly. Many practitioners also use browser extensions like Structured Data Testing Tool (third-party) to check markup during development before pages go live.

Mistake 7: Marking Up Content That Is Not Visible on the Page

The mistake: Adding structured data properties — review scores, prices, product availability, author credentials — that do not correspond to content a human can actually see on the page.

Why it hurts: Google’s spam policies explicitly prohibit structured data that describes content not present on the page. This is called “misleading structured data” and it can result in manual actions that strip your rich result eligibility entirely. Beyond penalties, it is a trust problem: if your schema claims a 4.9-star rating but no reviews appear on the page, you are asking machines to surface claims you cannot substantiate. AI systems and crawlers increasingly cross-reference structured data against visible page content to detect this pattern.

The fix: Apply a simple rule — every claim in your structured data must be visible to a human reading the page. If you mark up author, the author’s name must appear in the byline. If you mark up aggregateRating, the rating and review count must appear in the visible content. If you mark up a dateModified, the update date should be visible on the page. Structured data should describe the page. It should never be used to inject claims the page does not make.

7 Machine-Readable Web Mistakes That Kill Your Results

The 7 Biggest Machine-Readable Web Mistakes

Mistake 1: Div-Soup Instead of Semantic HTML

Mistake 2: Missing or Invalid Structured Data

Mistake 3: Content Locked Inside Images or JavaScript

Mistake 4: Inconsistent Entity Names and References

Mistake 5: No Clear Heading Hierarchy

Mistake 6: Skipping Validation and Testing Tools

Mistake 7: Marking Up Content That Is Not Visible on the Page

Frequently Asked Questions

Ready to master this?

Explore

Company

Learn

Legal

7 Machine-Readable Web Mistakes That Kill Your Results

The 7 Biggest Machine-Readable Web Mistakes

Mistake 1: Div-Soup Instead of Semantic HTML

Mistake 2: Missing or Invalid Structured Data

Mistake 3: Content Locked Inside Images or JavaScript

Mistake 4: Inconsistent Entity Names and References

Mistake 5: No Clear Heading Hierarchy

Mistake 6: Skipping Validation and Testing Tools

Mistake 7: Marking Up Content That Is Not Visible on the Page

Frequently Asked Questions

Continue learning

Ready to master this?

Explore

Company

Learn

Legal