Please don't implement WebMCP on your site. Support a11y / accessibility feature...

TechSquidTV · 2026-03-02T04:32:50 1772425970

While you absolutely should, I would argue that MCP access would be the OPTIMAL level of accessibility.

_heimdall · 2026-03-02T04:41:30 1772426490

Why? What does it add that accessibility features don't cover? And of there's a delta there, why have everyone build WebMCP into their sites rather than improve accessibility specs?

DrScientist · 2026-03-02T10:34:02 1772447642

Because, thinking bigger picture, having an AI assistant acting on your behalf might be more effective than slow navigation via accessibility features?

I get the wider point that if accessibility features were good enough at describing the functionality and intent then you wouldn't need a separate WebMCP.

So what does WebMCP do that accessibility doesn't?

Seems to me, at cursory reading, it's around providing a direct js interface to the web site ( as oppose to DOM forms ).

Kind of mixing an API and a human UI into one single page.

_heimdall · 2026-03-02T11:48:11 1772452091

Navigation shouldn't be slow when using accessibility features though. The browser already prices the accessibility tree with full context and semantics of what is on the page and what can be interacted with.

I take the same issue when MCP servers are created for CLI tools. LLMs are very good at running Unix commands - make sure your tool has good `--help` docs and let the LLM figure it out just like a human would.

DrScientist · 2026-03-02T16:55:20 1772470520

I guess I was asking - assuming that WebMCP isn't totally misguided - which of course is an assumption - is there anything that current accessibility standards can learn from WebMCP - ie why did they feel the need to create it?

_heimdall · 2026-03-02T18:09:26 1772474966

I'm not aware of anything WebMCP could add that wouldn't be more useful as an improvement to accessibility tooling instead.

MCP is ultimately another solution to trying to make RPC(ish) situations more RESTful. I.e. they need self-documenting, discoverable APIs.

That's exactly what you can get from both HTML and the accessibility tree, though. We don't need another implementation for it. My guess (conjecture here) is that all the skills, MCP, WebMCP, etc talk is a manifestation of all the model providers and VCs backing them trying desperately to have others find ways to make LLMs worth the cost.

DrScientist · 2026-03-03T11:36:01 1772537761

Isn't Aria there to describe the structure of the page so that say visually impaired users gain the same information as any other user? ie the interpretation of what that page then does, and so the appropriate action to take is largely left to the human user post description - just as web you load a page and look at it - the human brain works out what to do based on those visual and textual clues.

This leaves agents trying to work out page intent, allowed values for text fields - parsing returned pages for working out success or failure etc.

I'm assuming that's why they want what is effectively an in page API - that massively improves machine accessibility and can piggy back on browser authentication systems so the agent can operate on the users behalf.

_heimdall · 2026-03-03T15:00:08 1772550008

The website is the API though. HTML is one of the few RESTful systems people still use today, build semantics into the page and humans and LLMs can understand how to use it.

A11y specs and APIs are just a way of presenting those semantics differently, often for those who can't see the page, whether visually impaired or in this case an LLM.

At least in my view, we should expect anything claimed to be artificial intelligence to be able to interact with things much like a human would. I'm not going to build an MCP for a CLI tool, for example, I'll just make sure it has a useful man page or `--help` command.

DrScientist · 2026-03-04T10:12:34 1772619154

I think you are confusing two things.

- the semantics of a form and a button and the resulting http POST/GET - and what the page actually does!

So I can have two pages - both with html forms - what they actually do on submission might be completely different - one buys a potted plant the other submits a tax return.

ie the meaning of the action is in the non-semantic elements - the free text, the images, the context.

This is the stuff that's hard for the agent to easily determine - is this a form for submitting a tax return or not?

If what you said is true then there would be already agents out there that use ARIA info to seemlessly operate the web. As far as I can see people have tried to use that information to improve agents use of the web - but have met limited success - and that's for well annotated sites - not because sites aren't ARIA enabled.

_heimdall · 2026-03-04T11:51:08 1772625068

A human needs to be able to distinguish the buttons though, both visually and via accessibility tools.

I would hope those two buttons and forms include labels, description text, indicators for required fields, etc. All of that should live in the HTML and includes attributes as needed for a11y. LLMs can use that, they don't need yet another API to describe it.

DrScientist · 2026-03-05T16:42:46 1772728966

> they don't need yet another API to describe it.

WebMCP isn't accessibility support for humans, it's accessibility support for agents, which despite all the hype, are less capable than humans in working out what's going on, and find functions and data schema's easier to understand than a web page designed for human ( whether that's a partially sighted human or not ).

Natfan · 2026-03-02T09:27:16 1772443636

not from a legal perspective

me551ah · 2026-03-02T11:52:26 1772452346

Or have an a11y standard for MCPs, where they can't show UI elements and have to only respond with text so that Voice Readers could work out of the box.

This would be a game changer, currenly Voice Readers do not work very well with websites and a11y is a clunky set of tags that you provide to elements and users need to move around elements with back/tab and try to make a mental model of what the website looks like. With MCP and Voice chat, it is like talking to a person.

_heimdall · 2026-03-02T18:11:36 1772475096

Agreed that better browser support for accessibility tools is always welcome. I just don't think MCP is required at all there, the APIs are already well documented and built right into the browser.

charcircuit · 2026-03-02T04:45:12 1772426712

Don't use accessibility features either. Just build for humans and let AI understanding take care of understanding all of the details.

AlecSchueler · 2026-03-02T09:52:34 1772445154

Following accessibility best practice is what designing for humans looks like.

charcircuit · 2026-03-02T10:27:28 1772447248

The best practices are changing. Many accessibility features were built due to the computer not being understand correctly. For example how something that looks like a checkbox despite being just a div is would not get recognized properly. Now with AI, the AI understands what a checkbox is and can understand from the styling that there is a checkbox there.

_heimdall · 2026-03-02T11:50:37 1772452237

That's a huge resource cost though, and simply unnecessary. We should be building semantically valid HTML from the beginning rather than leaning on a GPU cluster to parse the function based on the entire HTML, CSS, and JS on the page (or a screenshot requiring image parsing by a word predictor).

charcircuit · 2026-03-02T19:47:08 1772480828

That's the point of solving problems with LLMs. We pay a large resource cost, but in return we get general intelligence to understand things.

_heimdall · 2026-03-03T15:02:09 1772550129

We should try to avoid hitting that resource cost on every use where possible though. A CLI tool should have good `--help` docs for example, rather than expecting every inference run to scrub the CLI tool's source code to figure out how to use it.

johneth · 2026-03-02T11:05:06 1772449506

Or just use <input type="checkbox"> in the first place and save humans and machines a whole bunch of time.

charcircuit · 2026-03-02T19:46:26 1772480786

That's already possible today yet there are still people who don't which is why a more general solution for the screen reader is needed rather than requiring every site developer to do something special.

_heimdall · 2026-03-03T15:04:45 1772550285

We shouldn't create general solutions for people building software poorly. We should help people build software better, in this by helping to promote the use of a11y specs.

This is actually exactly where model providers could be doing some good. If they said a11y is the way for LLMs to interact with the web and helped push developers to docs, tutorials, etc the web would be better off. Google did effectively just that with HTTPS, they told everyone use it or lose SEO value rather than slapping some solution on Google's end to paper over poor security practices.