AMD got rid of much of the older baggage when they designed 64-bit mode. There's a 32-bit segmented mode with call gates and rings nobody uses. That was dropped for 64-bit mode, but it's still there in 32 bit mode in shipping processors.
Intel has been burned in the past, though. The Pentium Pro (1995), the first good superscalar microprocessor, was a 32-bit machine that could also run 16-bit code.
Intel thought 16-bit was on the way out, so the 16-bit hardware was not fast. Unfortunately for Intel, Microsoft Windows users were still running too much 16-bit code. It ran Windows NT fine; Windows 95, not so much. As a result, the Pentium Pro did not sell well. Intel had to come out with the Pentium II (1997), which was similar to the Pentium Pro but had better 16-bit performance.
> Unfortunately for Intel, Microsoft Windows users were still running too much 16-bit code.
The proposal [0] still keeps support for 32-bit user space – or at least as much of it as is commonly used by mainstream operating systems (Windows and Linux). In this case, they are removing obscure features which aren't used by current versions of mainstream operating systems, so the only real impact is going to be (1) people running non-mainstream operating systems, (2) people virtualising legacy versions of mainstream operating systems, (3) apps that try to directly access IO ports from user space – which is a supported feature (albeit very rarely used) on Linux, not officially supported by Microsoft on Windows (although third party kernel drivers have been used to implement it)
Virtualisation of legacy/non-mainstream OSes which use these features will be possible through software emulation, although obviously that has a performance cost.
I wonder what the impact will be on OpenVMS for x86-64?
Is Intel going to drop these legacy features across their entire product line? Or maybe their low-end SKUs will drop them, but they'll keep some high-end SKU which retains them? Although one wonders how long that will last.
>AMD got rid of much of the older baggage when they designed 64-bit mode
Depends on whether you view MMX, and SSE as baggage. They were OK at the time, but I guess we could do a lot better today.
Having said that I still wish they open up X86s. Or may be an even new 86 breaks most of the backward compatibility ( pure 64 bit mode ) but freely open to implement. Meaning if you want backward compatibility you will need to pay AMD or Intel for it.
PPro ran Windows 95 fine, just not at a favorable price/performance point. PPro sold fine for what it was: the very high end, highest price of the Intel line.
It's performance advantage was only on 32bit code, 16bit code was not further optimized. The high cost (and high manufacturing cost) certainly didn't help either.
That "older baggage" takes up less than 5% of the chip. While I'm all for simplifying the platform, lets not act like it's a major step. It's not going to make much difference either size or speed wise, but it's just plain time to put 32 bit to rest.
> Those modes and microcode need to be validated, tested and add bloat to the boot software stack.
Just allowing booting directly in 64-bit mode and making 64-bit SMM mode would be enough to avoid boot software stack problems. Riddance of "legacy mode" doesn't require all it.
Fabulous. Might actually make me like Intel again. The legacy stuff is a nightmare, their segmented memory nonsense was also a waste of development time and I'm glad to see them finally acknowledge it.
I wonder how this will affect quirks such as A20, IRQ remapping, etc. All oversights and mistakes made by Intel over the years - none of which really hurt development or performance these days per se, but are definitely not fun to work with either.
Something like this is the only thing that would save the x86 architecture before ARM inevitably took over.
> All oversights and mistakes made by Intel over the years
Engineering is not about achieving perfection. It's about organizing compromises to create a product that provides good utility at a fair price. These were not oversights or mistakes they were intentional design decisions.
Likewise.. you can see segmented memory as "nonsense" and a "waste of development time" but Intel clearly didn't fail at anything. They've been one of the largest and most successful chip manufacturers for decades, because their products provided real utility at affordable consumer prices and software developers gladly put up with the technological choices to be a part of that giant market.
> They've been one of the largest and most successful chip manufacturers for decades
Probably the 2 biggest reasons for this are that only 2 companies are allowed to "touch" x86, and that the second company was easily undermined by nefarious business practices. This resulted in a lot of things that were not "oversights or mistakes" but rather for the longest time lax approaches under a distinct lack of pressure.
It's hard to say how the x86 world would have looked like today if someone like Nvidia had a license.
> It's hard to say how the x86 world would have looked like today if someone like Nvidia had a license.
There's some alternate universe out there where x86 died because Nvidia was charging $1200 for a mid tier CPU so arm was able to creep up even faster :/.
>They've been one of the largest and most successful chip manufacturers for decades, because their products provided real utility at affordable consumer prices...
Do not forget all of the anti-competitive AMD nonsense over the years. Sabotaging AMD performance on benchmarking is my stand-out, but there were also the payouts to Dell and others to purchase exclusively Intel chips.
Market capture via corner cutting excuses lazy engineering by evil bloated monopolist. Got it.
(I'm still shocked how they've begun losing to AMD. Having all of the money, dirty benchmarking tricks, and "let's just hire all of the people" haven't worked out, incredibly.)
anyone in the business side knows that Intel has succeeded splendidly, at insider Defense deals, proprietary components and market manipulation to crush rivals. Engineering is always on display, but does not tell the full financial story of Intel "greed is good" Corp.
> Something like this is the only thing that would save the x86 architecture before ARM inevitably took over.
Eh, this feels like a rounding error in ISA desirability compared to pointer authentication, or something like CHERI. Sure, it'd be _nice_ to have the legacy stuff cleaned up, but it's not costing much human effort to deal with compared to the effort spent securing native code.
It'd make the chips easier to design, I guess. Maybe motherboards and BIOSes too?
I could imagine a more user-noticeable effect in VM cold-start times, perhaps, though I'd think physical hardware boot times would still be dominated by other things (e.g. actually loading the kernel into memory); the minimal implementation of transitioning from real mode to long mode is a few hundred instructions.
It might not make too much of a difference to software developers targeting x86, but dumping legacy crap might make Intel's job easier (and/or more profitable, because they can reclaim some die space) when it comes to implementing their future silicon.
A cynic might wonder if it would also help to renew the Intel/AMD x86 patent moat now that the core x86-64 specification is over 20 years old. If the plan is largely just to strip things out it might be hard to shoehorn in patent claims, but I'm sure a way could be found.
Segmented memory was not a "waste of development time" at the time. It allowed the 8086 to be compatible as far as source and machine-translated binaries with the 8080, while expanding the address space considerably. It allowed considerable advances while retaining some measure of backward compatibility.., like x86-64 did.
The way Intel did it was absolutely horrible. Had the left shift of segment selectors on the 8086 been 8 bits rather than 4, the extra 15MB of address space would have given DOS apps substantially more headroom. Having programmed C and assembly on both the x86 and the 68000 in the 1980s, x86 was an absolute nightmare to program. Even in C you had to always be intimately aware of your memory model at all times on x86 (remember near, far and huge pointers??!!?) annotating pointers to work around limitations of the architecture, whereas on 68k everything was just a 32 bit pointer. Oh the loss of productivity that one decision driven by the desire to save money avoiding putting an extra 4 address pins on the CPU ended up costing the industry. The horror.
I always thought the point with memory segments was that you could load a program verbatim without adjusting any pointers and do tricks like Terminate and Stay Resident and executing one program from another. E.g. Command.com and edlin.com don't need to know where they're loaded when they're both in memory, they just start at CS:0100.
> their segmented memory nonsense was also a waste of development time and I'm glad to see them finally acknowledge it.
People tend to remember mostly how difficult it was to program with segmentation in 286's 16-bit mode and earlier.
But in 32-bit mode, user-space code didn't need to bother if the OS didn't use it.
I've read several papers lately about schemes with compiler/OS co-design for securing the return pointer on the stack, function pointers and other sensitive variables from hacking attacks.
This problem was already mostly solved on x86-32 where the stack can be in its own segment.
But to get achieve this in 64-bit mode (and on ARM and RISC-V), researchers have resorted to various tricks such as randomising addresses regularly [SafeHidden], switching access to the (safe) stack on and off with Intel MPK, using the Intel shadow-stack (in CET) for storing variables [CETIS], and even running user code in privileged mode [Seimi], [InversOS](ARM) ...
Some of these these schemes are tricks using the CPU in ways it wasn't intended, and therefore perhaps broken on future CPUs. Several are only available on newer Intel processors with certain extensions, and most have considerable run-time overhead. Therefore, you will never see any like it in a mainstream OS. But the x86-32's safe stack would have been, as "Safe Stack" schemes relying on randomisation already got widespread adoption.
> I wonder how this will affect quirks such as A20, ... All oversights and mistakes made by Intel over the years
Well, the A20 gate was actually IBM's creation, in order to make the PC-AT system compatible with running real-mode DOS programs.
Intel simply, later, incorporated it into the CPU because it was already part of the PC architecture (and had been for years) at that time.
Now, one could argue Intel had no business incorporating it into the x86 ISA, but it wasn't ever their oversight, they were just reacting to the reality of the systems in which the vast majority of their CPU's were used.
I thought it was a 286 errata-- where the address didn't wrap around correctly in real mode. The "fix" was IBM repurposing the keyboard controller to fix up the behaviour on A20.
It does break the illusion. At the same time, it allowed another ~64k of memory, which was important. DOS eventually sat up there in the non-wrapped memory.
"their segmented memory nonsense was also a waste of development time and I'm glad to see them finally acknowledge it."
Burroughs B5000 had segments. Roger Schell said it was added to Intel by a Burroughs guy when they requested hardware-enforced security to protect memory. They wanted to use it in upcoming OS's like GEMSOS. Rings came from SCOMP which also had an IOMMU in the 1980's. Descripters evolved into capability-based computer systems. Intel tried with Intel i432 APX and i960MX (good one). They lost billions. CHERI with RISC-V is the best of that lineage right now.
More recently, schemes like Native Client and Code Pointer Integrity used segments. The reviewer who broke the software versions of Code Pointer Integrity couldn't bypass the segment-enforced version. There was probably a lot of unexplored territory in combining modern, memory protection with segments. I'd like the alternatives to have as much assurance as designs on segments gave us. They're getting hacked more often, though.
The IA-32 ISA has been so successful _because_ of the no-compromise downward compatibility, whether you like that or not.
See all the revolutionary ISAs from the 80s, 90s and 00s that did away with all that legacy crap. Where are they now? I think the only one still working is POWER.
ARM also is very backwards compatible, there have only been two major breaks in compatibility. The ARM6 (which dropped legacy memory addressing modes) and aarch64 (which made certain extensions required + dropped support for old ARM7 and ARM9 modes).
Yeah, sorry; thumb was part of the ARM7/9 features I was referring to. It also dropped the hardware Java support.
To be fair, both were always "optional" features; but yes, there are thousands of devices that shipped with them (one of the biggest being the GBA, with the majority of games heavily relying on THUMB).
The grandparent is referring to the fact the ARMv6-M and ARMv7-M architectures are Thumb only (and are different subsets of Thumb 1 and Thumb 2, to keep it spicy); so a Cortex M has zero instruction set overlap with a pre-Thumb ARM core, even though both are "32 bit ARM."
Or perhaps they have been successful primarily because of luck. They were lucky to be selected by IBM for the PC. Once they were riding the PC wave, their position as market leader (by volume, not quality) kept them there.
I highly recommended not assuming that the engineers working for Intel are idiots. If something seems like an “obvious” mistakes by you, then it is highly likely that you have zero clue on the constraints that forced the decisions.
While the _engineers_ of Intel are definitely not idiots - otherwise they couldn't have been getting enough success - the quasi-engineer layer over it, which makes the real technical decisions, showed a truly summit of folly, multiple times.
- Foolhardy snatch 1: iAPX 432.
- Degradation in first half of 1990s, concluded with FDIV bug, and marvellous savior "ASCI Red" which paid all OoO development (PPro and following).
- Foolhardy snatch 2: RAMBUS + Itanium. Nearly collapse with P4 and miracle #2 with Israel team, Core, and stealing AMD64.
I can't imagine that a person adopted EPIC-based Itanium had had a real engineering competence. (Not sure for iAPX 432, this is a complicated issue.)
On a lower layes, as ISA details, every Intel move looks like using prediction less than for 1 move forward! Maybe 1/2 move sounds more veracious. Loads of examples. And, this also seems a guy that just remembered he was an "engineer" 30-40 years ago is now a decision maker.
Something like this is the only thing that would save the x86 architecture before ARM inevitably took over.
On the contrary, it will make it worse. People choose x86 and the PC for the backwards compatibility, and Intel throwing that away is not going to make them any better than ARM.
Do you really need to use segmented memory? You can just go flat mode and put code segment and data segment in the same entry of the GDT. And run everything in ring0.
Even as a retro-grouch (with a bunch of 386 and 486 boxes, some still in use), this makes sense. Not mentioned, but wondering what sort of TPM/DRM bolt-ons will become mandatory as part of this?
I see no real benefit to this. The x86 compatibility doesn't get in the way in any modern architecture - all the 16/32 bit mode stuff is all 'emulated' with microcode, and there are no dedicated 16 bit registers anymore...
Nor does it get in the way of software - as soon as you've switched to 64 bit mode, you can ignore all legacy stuff.
Thats the right way to maintain legacy compatibility - make sure it is all contained and doesn't get in anyone's way, and you can leave it there forever.
Can't help but agree. I especially don't like removing ring3 I/O port access. It was/is the fastest way to trap out of a VM without SYSCALL or some other mechanism that I could measure.
Segmentation also has its uses, and it's a shame that they are steadily removing it. Having a way to set up a 1:1 memory range without even touching the page tables has always been faster than messing with pages. Indeed, it's always been more performance to either have a fully static page table or just disable it completely and run in 32-bit mode. For certain tasks, it's still the fastest.
But these are all niche uses. These days, who thinks about anything other than Windows and Linux use cases?
To me though, AMD64 made most problems go away, and the architecture is just fine. Perhaps they want to do away with many of the now rarely used instructions?
> as soon as you've switched to 64 bit mode, you can ignore all legacy stuff.
That, I assume, is the point. If it isn't used by the vast majority why spend the silicon keeping it present?
> all the 16/32 bit mode stuff is all 'emulated' with microcode
That microcode still needs to be maintained, tested, & verified, with each chip iteration, so it is more than just a tiny bit of silicon that could be repurposed.
> and you can leave it there forever
If it is there it will be expected to work (otherwise why keep it?) and they need to keep verifying that containment. I'm not much of a hardware guy myself so maybe the risk profile is different - but I've tripped over, or been mugged by, so-called dead code that ends up having “interesting” side effects, enough times for me to not want a bunch of it in my CPU's design if it is avoidable.
Removing it has costs somewhere of course, both in Intel's design work and potential compatibility issues in minority use cases. But deciding to leave it in has costs elsewhere, so the options & risks need to be looked into in either case.
3. The "Moore's law is dead, long live Wirth's law" answer: They want to increase the clocks, but timings are so tight that they need to remove circuitry and delete bits of microcode to increase it further.
Is that true though? 32bit Windows 2003 Enterprise or Datacenter editions were perfectly capable of accessing 64GB or RAM [0]. Even so, that difference from the usual 4GB and 3.xGB was any kind of memory usage for integrated GPU or virtual address space mapping for different hardware.
P.S. I think the "3.2" figure irked me more than the usual "32bit OSes can't use more than 4GB". That's not a real limit and never was. Even people with no IT knowledge would have noticed it varies quite a bit anyway. I guess I expected a little more from a tech reporter.
They have their place. They're not perfect though.
Also, I'm not perfect, I have an attachment to the physical realm, and I desire not simply to run software, but to run it on hardware, whatever that means. Oftentimes, I desire simply to run hardware for its own sake, because it brings me pleasure.
1. Reading the current spec I'm most wondering with mandatory paging enable. This requires prefilled paging catalogs in RAM. Does it mean CPU can't start until ME (or another external agent) generates these catalogs?
It also looks really weird compared to other ISAs as System/Z or SPARC which really benefit of not using page translation in kernel mode. I'd expect alleviation of switching paging on and off...
2. Looks it's a good moment to relax memory ordering from TSO, with easy control using Flags. (Using it will gradually soar in years, yep.)
As long as their scalar and vector units still support packed 8/16/32-bit data, this seems fine? Too bad there’s no OS telemetry on the number of legacy apps still in operation.
What does linux have to do with anything? They can use 64-bit only mode any time they like, and plenty of Linux machines for the last 4-5 years have been pure 64. It's mostly people using commercial software (games, stuff like Skype) that have to have multiarch, but less and less of those are 32-bit as time passes.
Additionally, of the three main x86 OSes (Windows, macOS and Linux); Linux' solution to dual-mode is easily the worst, specifically so it is easy to run it in pure 64 mode.
Yes. It'll still be there. The only difference will be that you won't be able to do stuff that CR0 can control like disabling the FPU for whatever reason.
It looks like they’re keeping x87 support, which feels like a missed opportunity. Does any modern compiler generate these instructions?
Stripping out x87 and the associated 80 bit registers, weird modes, etc would seem right in line with this effort. If someone really needs x87 support, trap those instructions and emulate in kernel. Nothing with truly high performance requirements will be using them. Intel could even provide the code.
I suspect the main thing keeping x87 support is that 32-bit ABIs for x86 still use it, just for returning floats and doubles from functions, which makes trap-and-emulate not an effective solution.
There's still no replacement for high precision long doubles on x86; 128 bit float support seems unlikely, and double-double has not caught on. Sometimes 64 bits just ain't enough.
Absolutely. Piling on tech debt forever is like trying to eat endlessly and never to go the toilet, pardon the analogy. Legacy needs to be dealt with constantly and relentlessly. With a suitable overlap window for transition (but we've had enough x86-32 compatibility window already).
If I'm reading the pdf correctly it proposes dropping 32-bit ring 0 (OS) but not 32-bit ring 3 (userspace), so most end-users are unlikely to notice much change even if they do have 32-bit legacy apps kicking around. I think 32-bit VMs would no longer be supported, though.
The VM runs in the host's ring 3, but the VM has virtual rings of its own. Removal of 32-bit ring 0 support including in VM guest mode could make it infeasible to run 32-bit OSes in VMs with reasonable performance on the new chips. The VM hypervisor software would have to switch to emulation QEMU-style whenever the guest was running it's kernel, or revive the old scanning and patching tricks from the early days of VMware before x86 had any hardware VM support.
You're kind of forced to. I have tried to do userspace emulation using a vmm in amd64, but it's a bit hard. It is possible to handle exceptions in userspace, but you can't do page table modifications because many important instructions are ring0 only. That said, if you simplify a bit and handle paging in the host-side VMM it still works fine. You still have to do system call emulation by trapping in the SYSCALL handler. SYSCALL forces you into kernel mode, unfortunately.
So bottom line is that no matter which way you try to do the userspace emulation, some parts are going to be ring0, and thus the answer to your question is indeed yes.
I don't really kmow x86 VM architecture well enough to say -- I was guessing by analogy to Arm. But a lot of the complexity they want to get rid of is system-level stuff so if you leave it in for VMs you lose a lot of the point. It might be emulatable by the hypervisor, I guess.
There are similar efforts to revamp C++ or replace it with another language, too.
Rust is a famous example; there's also Carbon[1] and cppfront[2].
Even so, this may be a controversial opinion, but with C++20, it has become fairly straightforward to program relatively safely and avoid some old footguns.
I use "modern C++" (a subset of it that suits my goals) and do not really have any real problems during last 3 years. My backend C++ application servers are running along just fine.
I have the same experience. All the tools at my disposal including mutrace for measuring contention, I just feel like C++ is a solid language. Sure, I wish there was more compile-time stuff, like the ability to force a function argument to be compile-time, but in my experience, modern compilers often have surprisingly deep reach, and you can expect much of your code to just be compiled away if written a certain way.
I recently wrote a server-client framework from scratch and used protobuf for messaging. The server is heavily multi-threaded with a lot of complex asynchronous rules, and there are no issues. If anything, I would hope for more concurrency primitives, as the C++ standard library is very high quality, and I would like concurrent hash maps and such.
Same point can be made about anything. Never breaking BC is easier, and more comfortable, but in the end it's slow and inevitable death.
If C++ would have cleaned up legacy, Rust wouldn't need to exist for ex.
Java has long had a policy of never breaking BC, but few years ago they started marking packages deprecated for removal. They're very slow and systematic about it, as they should be, but the point is, eventually those packages will be removed. And that's good. It means Java has a future.
Rust is still in the noise vs the amount of market share that C++ has. That said I love it as a language but it's hardly "taking over", I have tried to push it at my company a bit for command line utilities.
Just don't use "legacy parts" and be happy. Backwards compatibility is a great asset. Alternatively do something like Google's Carbon. Seems like a great product. Just do not call it C++. If it gets accepted by wider community you would know you are on the right path.
>"When you need to integrate with other code, you can't necessarily understand the code semantics without knowing a lot more of the spec."
I mostly design and create new products but on some occasions I've had situations like this. There was not a single time that quick Google search and now ChatGPT did not shed the light.
>"Also, I challenge any normal programmer to remember all of the "good" C++ rules regarding overload resolution in an arbitrary context."
Knowing / remembering every nook and cranny does not make for a good programmer (except language experts who write standard libraries and design / implement language itself of course). There are way more important things to consider.
Luckily I'm only working on those code bases so I don't care if it causes other programmers' heartache. If you don't like your code base, then I swear it's never been easier to find a job on newer code bases.
I wonder how much they can gain from it. Suppose they push all 16 bit and 32 bit kernel mode code to microcode,basically building an interpreter inside the CPU. This would sacrifice performance but not compatibility. Probably today's design is already close.
The design of current x86s is that multiple regions under 4GB boundary are occupied by I/O regions, some of them tied to processor (as APIC, HPET, etc.) and some to extension cards which reject working in 64-bit mode.
3.2GB is an example figure - it could be 3.5GB, 3.6GB... - but reflects that RAM can't be mapped at all of lower 4GB. Memory controllers typically remap part of it to a range upper than 4GB boundary, so 32-bit PAE-capable or 64-bit OS can still use it.
OTOH using 64-bit OS is recommended with RAM >=1.5GB because if OS can't map all RAM _twice_ and all device memory to virtual address space, tricks with memory banking (frequent page remapping) are getting needed to deal with data move.
Invoking Cunningham's, I'd guess it's memory mapped I/O occupying 0xE0000000 - 0xFFFFFFFF. That's the top ⅛ of the address space, which is every binary address starting with 111.
In researching this I learned that some 32bit hardware could utilize 128GB of ram using Physical Address Extensions. Each process is still limited to 4GB though.
QEMU [0] emulates many systems, including the 32-bit Intel architecture. For retro gaming specifically I can recommend PCem [1], which also emulates a wide range of sound and graphics cards, from IBM MDA to 3dfx Voodoo 2.
For REALLY old games PCEM works but for later games (think generation half life 2+) it becomes very tedious to work with VMs, especially when you have to do GPU Passthrough
Unless by that you also mean running old OSs (e.g. Windows 95... but you probably can't run that already, in a world with UEFI and no A20) then you're fine: they are removing ring-0 32bit, but not ring-3 32bit.
Almost certainly. The underlying form of the IP cross licensed is in patents on implementation of x86_64 cores. Those will still be on the table for as long as the patent terms last.
Intel has been burned in the past, though. The Pentium Pro (1995), the first good superscalar microprocessor, was a 32-bit machine that could also run 16-bit code. Intel thought 16-bit was on the way out, so the 16-bit hardware was not fast. Unfortunately for Intel, Microsoft Windows users were still running too much 16-bit code. It ran Windows NT fine; Windows 95, not so much. As a result, the Pentium Pro did not sell well. Intel had to come out with the Pentium II (1997), which was similar to the Pentium Pro but had better 16-bit performance.