The use of highly parallel GPUs as general purpose compute elements is a major trend. Most graphics boards have more compute power than the main CPU they serve. The Titan and Summit supercomputers have most of their compute power in the GPUs. The limits of GPU parallelism haven't been reached yet. Machine learning can be done on GPUs, and that's the biggest CPU hog problem right now that is getting mainstream use. That technology isn't near a ceiling.
As machine learning/AI/neuron emulation becomes more useful, we'll see hardware specialized for that. It's not yet clear what shape that hardware will take. Look for "works fine, but is slow on GPUs" results to lead to such hardware, rather than "build it and they will come" projects like the Human Brain Project.
There's still more improvement possible in storage devices. With the 20TB SSD drive expected in 4 years, things are looking good in that area. For compute elements, cooling limits transistor density. Storage tends not to be heat dissipation limited.
It's a pretty good piece. The main thing that it does skip over is what the economic implications might be to no longer declining transistor costs. Sure, Google can add a more or less arbitrarily large number of servers but what are the implications of the likely reality that those servers aren't improving in price/performance to the degree that they once were.
At Hot Chips a couple of years back Robert Colwell, who was director of the microsystems technology office at DARPA at the time, had a very interesting presentation on where things were going. One of the things that stuck with me at the time was his contention that there are lots of ways to improve performance etc. over time but CMOS was really pretty special.
"Colwell also points out that from 1980 to 2010, clocks improved 3500X and micro architectural and other improvements contributed about another 50X performance boost. The process shrink marvel expressed by Moore’s Law (Observation) has overshadowed just about everything else."
Even if we don't produce smaller circuits, we still might produce cheaper ones. There is still plenty of room to go to reduce fab costs. Especially since we basically replace them every few years. Consider if intel solely spent time iterating on cost. We also have made headway on power-reduction. I could see further improvements there. In aggregate I could imagine even if we don't see greater chip density we could see, say, AWS compute power/cost continuing Moore's trend for some time.
That's true. Seeing mask sets and EDA tooling get cheap at 90nm or 65nm could open up serious possibilities. The laptop still serving me well runs a CPU done on 65nm with other chips from even older nodes.
I was recently thinking about Moore's Law and if it is truly coming to an end. My initial reaction is that it doesn't matter in itself, if the number of transistors doubles, what might matter is that our compute power is doubling.
This led me to think that maybe Moore's Law is looking at the wrong metric, and is there a more fundamental law regarding increased capacity. Some proof of this is in the drop in prices of cloud computing and increases in network capacity. If chips stopped getting more powerful, in theory, we may not notice as the capability of the entire network continues to increase at the familiar rate.
However, this begs the question, did this increased capacity actually begin with Moore's Law and computer processing in general? Or is there an overarching law of progress which has always existed. To prove this, I'd need to go back and look at the growth of industrialization. I suspect as the capabilities of the factories slowed, our network speed (rail and sea, then road and then air) increased. As the network speed reached a plateau, we began to find places where we could manufacture more cheaply, thereby increasing the output per cost. Does the same growth exist in agriculture? What major industry does not fit?
There is some evidence that a law similar to Moore's law exists not only for technology. What effect does that have in our understanding of why things grow?
[edit: this is was an except from my YC application to the question regarding what have you discovered]
I think technological progress is more like a logistic curve, with exponential-like growth at the beginning and then a leveling off. Look at technologies that have already had time to mature. The speed of airplanes grew tremendously while jet engines and wing shapes were undergoing heavy refinement, but then it leveled off and hasn't really budged in decades.
Technologies can take leaps and bounds, too, though. If you view the modern computer as an evolution of the abacus the flat part of the curve was the first few thousand years. It's possible quantum computing, say, or neural nets will be leaps of that order.
Or maybe not. It's impossible to predict true breakthroughs.
>> If you view the modern computer as an evolution of the abacus
I'm not sure what value that has for predicting the future. You could view the airplane as the evolution of the chariot. Or not. Does it make a difference?
It does, because otherwise you think technology never changes quickly. My point was while it's true we're probably reaching the point of diminishing returns for silicon etched circuits, that's not the end of the line. I'd be surprised if we didn't have ubiquitous quantum computers in thirty years or so unless something even better came along.
Here's one of his graphs from 1900 to 1995 starting with census counting technology in 1900. If you allow GPUs as info tech then we are about in line with his most optimistic projection.
While we're reaching the physical limits of classical chip design, do we have any ideas what the limits are on the algorithm side of things? As much speed up has come from software as hardware according to a few reports.
> do we have any ideas what the limits are on the algorithm side of things?
I've always believed that humans do not have the ability to program things smarter than themself, because we do not understand our own intelligence, so we have no way to reproduce it.
At the time, I said the only alternative I can think of is make random permutations and pick the best one, and go from there. But I said this as a ridiculous suggestion that no one would actually do.
But then Google actually did exactly that with their Go machine!
So, that's what I think: The future of computing will be based on randomness and the job of the programmer will be to guide it, but not program it directly. (Can you imagine programming a webpage this way? Or writing a book this way?)
(Totally self-promoting here) I did a lot of work along this line in my PhD -- stochastic architectures for probabilstic computation. http://ericjonas.com/pages/circuits.html There's an increasing amount of interest in this space.
You're work is interesting. The concepts remind me of work in analog implementations of neural circuitry. See, the brain is likely a general-purpose, analog computer with digital-like parts. So, trying analog implementations was thought to have improvements. It did with one wafer-scale method I didn't see coming.
It would be interesting to see someone combine the principles of your work with analog implementations on a decent process node. Yours is kind of like a hybrid between properties of analog and digital cells. The real thing might be even more effective albeit harder to automate. There's some analog EDA but it's almost always custom work.
Sounds like really neat work. It reminds me of an idea I've toyed with just a little: you know how chemical reaction networks can compute? E.g. https://johncarlosbaez.wordpress.com/2014/03/23/programming-... Last I heard, they'd used DNA strand displacement in vitro for simple neural network stuff. So I wondered how practical it'd be to encode a Bayes net as a reaction network, with the Gibbs sampling done directly by thermodynamics. This wouldn't compete with the sort of thing we do in silicon; maybe it'd work out for tiny diagnostic systems in synthetic bio? With less overhead than the neural nets.
(Well, that took more space than I wanted. I downloaded one of your circuits papers for a gander.)
The point of machines is that they do a single task better than us. A dishwasher is superhuman at washing plates but nothing else. If we model machines to be more like us they will end up with the same disadvantages of a normal human. At that point why not employ real humans in the first place?
> At that point why not employ real humans in the first place?
Cost mostly, If you can replicate a human-level intelligence for 10,000 (or 100,000) and have it work 24/7 with no time off and scale out to thousands of them then you'd have something absolutely terrifying in it's capability.
Image an AWS of a 1000 von-neumann level intelligences working co-operatively 24/7 on a problem.
The stuff of sci-fi right now but maybe one day, we know it's physically possible to build human level intelligences since we prove that the rest is 'just' engineering.
While we may not have the ability to program things 'smarter' than themselves, we do have the ability to create tools that extend our intelligence. Machine learning, randomness, and even new javascript frameworks all fall under this umbrella.
>But then Google actually did exactly that with their Go machine!
I would argue that this is not what they did. What they did was take something a human player does (study games to learn how to play). And then parallelized this to a level that a human can't match. Basically the equivalent of having one go player playing against an entire team of go players who are all experts.
Humans every day "program" things smart than themselves. Every parent who raises a child smarter than themselves does exactly that. Smarter than human level computers will be created the same way we create smart humans.
I do not think this is equal. Babies may actually be "brighter" than parents, without the parents having to actually "do" anything to produce the higher IQ baby. But building an artificial system means some humans actually have to design it.
Yes parents have to do something to end up with brighter children than themselves - if you put a child in a dark room from birth with no human interaction I can guarentee that they will not come out very bright 20 years later (there have been some very sad examples of this in history).
The point I was making is that intelligent agents are built all the time by less intelligent agents. Humans will build more intellgent AI using exactly the same approach and these AI's can do the same.
Of couse building an AI via evolutionary training opens up a whole lot of control issues. Do we really want to be creating entities vastly more intelligent than ourselves when we have no real idea of their interests and when we occupy something that is valuble (e.g. matter and energy) to this entity?
It is worth remembering that just because we are reaching some physical limits of "classical" chip design. For example, both x86 and arm date back to the 80s. While I have no doubt that the implementation of these architectures has improved to reflect modern manufacturing capabilities, this still suggests that there is room for architectural improvements in performance.
Beyond simple architectural improvements, we could still move beyond basic transistor based computing. The most common example is quantum computing (which offers an asymptotic improvement in some cases), however I can imagine there beyond other classical devices that can compute certain functions more efficiently than a pure transistor based solution can.
> For example, both x86 and arm date back to the 80s.
Modern x86 and high performance ARM cores are almost unrecognizable compared to processors in the late 90s, much less the 80s. (Also, ARM was founded in 1990, not the 80s).
> While I have no doubt that the implementation of these architectures has improved to reflect modern manufacturing capabilities, this still suggests that there is room for architectural improvements in performance.
There are still performance improvements to be had, but it's not going to be anywhere near the performance scaling of Moore's law. The rate of architectural improvements is also slowing down as well (and increasingly only applicable for a smaller and smaller fraction of workloads).
> I can imagine there beyond other classical devices that can compute certain functions more efficiently than a pure transistor based solution can.
Like...? Quantum is 10+ years away right now and unlikely to get fast any time soon. CMOS has had decades and billions of dollars invested in scaling; it's going to be a long time before any of the current "CMOS killers" (virtually all of which are still transistors) reach parity.
> Beyond simple architectural improvements, we could still move beyond basic transistor based computing. The most common example is quantum computing (which offers an asymptotic improvement in some cases)
The number of problems for which quantum computing offers a speedup is very limited. It is absolutely not a general, all-purpose computational architecture.
Having X86 and ARM chips means you can run legacy code on them from a long time ago. Which is why they are so popular. But once we go to optical chips for quantum computing there will be no legacy apps from X86 or ARM on them. It would have to write apps from scratch.
You can tell that Microsoft knows that Windows is getting long in the tooth, and has to support legacy code, and still has old code in it for compatibility reasons. They look at other operating systems like Linux to port their enterprise apps to in order to sell tech support and take a stab at Oracle, MySQL, and PostgreSQL. They know that Linux gets ported to different platforms and so can SQL Server to give them a larger marketshare.
The old X86 and ARM designs are limited due to legacy support of older programs. But they are marketed as backward compatible with older chips.
So you got a lot of backward compatible CPUs out there, that are limited because they have to support legacy code. The new processors that don't have backward compatibility should run faster with fewer quirks and use new technology not from the 1980s, but programs have to be written from scratch or ported from other platforms.
Windows 10 is the last version of Windows because Microsoft knows that it will eventually have to drop compatibility in order to compete with the new systems and new processors out there that don't have legacy support. The X86-64 chips are a dead-end, and Microsoft has to look to other technologies and a different operating system. Linux is a good choice to support even if Microsoft does not officially have a Linux distro yet. If they open source their enterprise tools like Dotnet or CLR or Roslyn or Visual Studio Code SQL Server to Linux and OSX and other platforms they can sell tech support for it via their paid hotlines. Even making iOS and Android apps for Office and other things.
Microsoft is going to move away from X86-64 and Windows eventually, and focus on The Cloud instead and Azure in hosting VPS operating systems. Then when the new design of processors come out that put X86 and ARM to shame they can port their programs to that new platform.
If you remember the original Micro-Soft business model was to make programs for computers that other companies made and make them for different operating systems. They only got into DOS because IBM made them an offer they couldn't refuse. Windows was basically their attempt at making a Mac GUI for DOS, and working with IBM to bring OS/2 was yet another GUI attempt, but they quit OS/2 and focused on Windows instead. OS/2 was going to be ported to PowerPC, MIPS, Alpha, SH4 and other RISC processors because IBM and Microsoft saw the limitations of the 80X86 processors and wanted something new. But it fell apart. Then Windows NT 4.0 was ported to MIPS, Alpha, etc but abandoned. Windows RT was ported to ARM but flopped. Every attempt to move away from 80X86 processors met with disaster because people wanted to run legacy code.
But soon it will be a new day with new processors and new computers not based on 1980s designs and using Linux or some other FOSS OS and connecting with Cloud computers.
A ton of it did. There's many subtopics for algorithms, microarchitecture tricks, hardware accelerators, I/O schemes, improvements for RTL/transistor optimization, and so on. Each have enough papers it can be hard to fibd stuff. Most of best stuff gets patented and controlled by dominant companies.
I like PG's idea [1] of trying to write a compiler that can utilize code to run on multiple cores, as if the cores were running in series, not parallel (think batteries).
That's the holy grail of The Cloud: just write a description of what you want and what you want happens. DWIM programmatic casting.
I think pg originated his "sufficiently smart compiler" startup idea in his pycon talk. You can find it online somewhere. The other take away from his talk was: just lie to customers about it being automated, manually farm out the parallelize-all-the-code tasks to works/interns/turks while saying it's "automatic," then eventually figure out how to automate it yourself later so you don't need pesky humans in the loop.
What's with "the cloud" here? That's the holy grail of any type of programming on any type of computing device at any scale. If that were possible it would be just as useful on a phone or a PC or a personal server.
While PGs idea is likely too hard to be doable there is an interesting practical approach with things like Elixir which make multicore fairly easy using functional programming:
"""
Other languages skirt these issues by running on a single CPU with multiple processes to achieve concurrency; however, as Moore's Law pushes us towards increasing multicore CPUs, this approach becomes unmanageable. Elixir's immutable state, Actors, and processes produce a concurrency model that is easy to reason about and allows code to be written distributively without extra fanfare.
While "cloud" may be a big part of the future of computing, I think the relationship painted between that and the end of Moore's law is tenuous at best.
I believe a more plausible link exists between the end of Moore's law and the rise of open hardware as explained in this article:
The article argues a very interesting point and there might definitely be an opportunity for more competitive open hardware. At the same time, it feels kind of sad that it would take a technical constraint for this to happen; that is, rather than a change in culture.
The logistic function (or "S-curve") describes systems that expand first at exponential rates, then logarithmic ones. With respect to new technologies, it's been observed that adoption rates and most measurable improvements follow a logistic function.
For example, people did not go from buying 1 car to buying 10 cars and then 100 cars - most of us hit saturation somewhere between 1 and 2, and stayed there. Similarly, the average speed of our cars did not increase at an increasing rate except in the earliest years of automotive technology; we have a "highway maximum" between 50 and 70 MPH, and a "technical maximum" in the high 200's range for production cars which do not rely on rockets or other not street-legal tricks (a quick search brings up the Hennessey Venom GT at 270.49 MPH). Likewise applied to fast moving vehicles as a whole, including supersonic aircraft and rockets, we've already brushed up against vehicle weight and power density limits that slow the rate of improvement in acceleration.
Applied to the Moore's Law measures, that indicates we have a "endless sunset" period ahead of us where we'll still get more doublings of semiconductors, but they'll come increasingly slowly as more fundamental innovations become necessary to realize them.
What we don't know is whether there is a logistic curve on technology as a whole. Belief in the Singularity is premised on this not being the case.
I am of the strong opinion that most technological advancement that is mistaken for endless exponential improvement (eg Moore's law) actually follows a physical sigmoidal curve. I also agree that "the singularity" is a belief based proposition and I personally think it amounts to techno-woo.
However, one interesting argument for continuation of technological advancement beyond what we might call "singularity" levels today. If technology maintains an exponential tragectory for another century or so through a few more breakthroughs then we would have some amazing tech.
So, it is not required that tech advancement is exponential -- as long as we are still early enough in the sigmoidal curve that more exponential (and linear) advancement is still to come. With biotech, quantum and the algorithmic side of AI I think we still have quite a bit of advancing to do. That said, the singularity stuff is still ridiculous woo woo.
>What we don't know is whether there is a logistic curve on technology as a whole. Belief in the Singularity is premised on this not being the case.
That's a version of the Singularity to extreme even for Kurzweil. It's premised on technological growth not hitting the log portion of the curve before machine intelligence passes humans.
Some of those are demand limited. People can't use 10 or 100 cars; street-legal limits aren't technology limits.
When a technology limit is reached, but not a demand limit, interest and capital flow to other technologies for meeting that need, starting a new s-curve. eg peak oil prices lead to fracking.
Whether it will be at Moore rates we don't know. But the possibility of far superior information technology has a proof by example: biological neurons.
People in silicon valley who believe in AI/ML/Battery Technology/Solar/Wind Turbines/Space Travel/Fusion Energy throw around "exponentially accelerating progress" as a platitude so obviously true that it doesn't warrant questioning. (PC quote with my edit)
"And despite Silicon Valley's ostensible belief in rationality and empirical evidence, we continue to assert this despite the data strongly suggesting that it just ain't so." - Patrick Collison
The race to 7nm is expensive.
I think the next big leap will be in Chip Design software which lives under a rock inside a ditch in an anachronism that is EDA - electronic design automation. There's got to be a better way that allows a chip designer to produce a high fidelity mixed signal analog chip at speed and scale without having to pay $25K+ for the software licenses and tooling.
VLIW has been around for a while (Itanium is probably the most famous "general purpose" example) and has failed to gain traction outside of GPUs and DSP (ie not "ordinary code").
Say, simple random sample of all the other
code being run.
> VLIW has been around for a while
Yup. I never claimed otherwise.
> Itanium is probably the most famous "general purpose" example
Yup.
> and has failed to gain traction outside of GPUs and DSP (ie not "ordinary code").
Yup.
Still, yet again, over again, one more time, once
again, VLIW can get 9:1 speedup.
Why mention this? Because the OP
was talking about the challenges of
getting faster computing. Well,
if want faster computing, one approach,
that, indeed, works on general purpose
code, and gives ballpark 9:1 speedup is,
and may I have the envelope please [drum
roll, please], VLIW. Really new? Nope.
Tried with Itanium? Yup. Works? Yup.
Bottom line -- we still have 9:1 speedup
available to us.
Maybe an objection is that pay a factor
of 24 in transistor count and electrical
power but get only a factor of 9
in performance.
Citing a paper from the mid 90s isn't a credible answer in 2016.
All VLIW does is move a lot of on-chip logic off-chip into the compiler. This only works for a small set of computing tasks - which is why the closest thing we have to VLIW today lives in GPUs. And why Itanium was nicknamed Itanic.
It's a non-starter for general computing because as soon as you start dealing with real-time conditions the compiler can't optimise in advance, the speed advantage turns into a speed penalty.
> Citing a paper from the mid 90s isn't a credible answer in 2016.
I think it's unfair to generalize from VLIW to everything published 20+ years ago. Plenty of the things discovered back then, or earlier, are still applicable and in production today. Your compiler picked most of its low hanging fruit ages ago. Lots of PhD theses are rehashing old ideas, often unknowingly. Results are results, what matters more is whether there's a relevant context, as you've highlighted.
> It's a non-starter for general computing because as soon as you start dealing with real-time conditions the compiler can't optimise in advance, the speed advantage turns into a speed penalty.
VLIW is just a way of achieving instruction level parallelism. That 9:1 speed-up is vs a single issue, in-order core, which PCs haven't used since the mid-90s. Modern superscalar processors can not only handle multiple instructions in parallel, but extensions like AVX allow very wide instructions for embarrassingly parallel things like matrix operations. We've probably achieved 95% of the theoretical speedup from amazing VLIW.
So, just when did we get that
8:1 or 9:1 speedup without
having the processor clocks
run faster. IBM was doing
speculative execution, branch
prediction, out of order execution,
and vector instructions also
in the 1990s. There were careful
instruction level traces of the
advantages and speedups. Of course,
vector instructions are for special
purpose code, e.g., the ubiquitous
inner products in linear algebra,
probability, and digital filtering,
but 9:1 was on general purpose code.
IIRC, the instruction traces didn't
show 9:1 or anywhere near that.
Both mechagodzilla and TheOtherHobbes have pretty much covered my response. The only thing I wanted to add was that you're acting incredibly defensive to essentially a request for context. The 9:1 speedup you quoted doesn't exist in a vacuum (and repeating the phrase "general purpose code" doesn't fix that.)
We all know quite well what "general purpose code" is.
I'm sorry about Itanium, but VLIW has to remain a possible path to faster cores. That my information is old does not mean it is wrong: The guy got 9:1 speedup on 24-way VLIW. Saying that the tricks of branch prediction, out of order execution, speculative execution, register renaming, etc. make VLIW forever obsolete is shaky without some solid references.
Moreover, if we want faster cores, then obvious, right in front of us, are two possibilities: (1) Design instruction sets that make VLIW easier to do and more productive. (2) Integrate and coordinate up and down the stack, that is, from application, e.g., collection classes, string operations, function calling, memory management, exceptional condition handling, to compilers to instructions to VLIW to the gate level logic and look for speedups. E.g., the now famous instruction sets look like they were designed for assembly language programming, and likely no compiler makes good use of all the instructions. E.g., C code, supposedly fast, forces the programmer to do the multidimensional array indexing arithmetic themselves, and to the machine language it all looks like normal work. In fact, that addressing is necessarily ubiquitous across computing, so maybe have an instruction for it. Same for string compare -- the usual C approach is just comparing one character at a time in a loop -- bummer.
In tool making, a key is to design the right tools. Else end up with a huge toolbox where most of the tools are used little or not at all. IMHO, we are still looking for the right tools in the stack from logic gates to microcode, register sets, instructions, caches, compilers, and applications. Then, VLIW in some form needs to be kept in mind.
Moore’s has been slowing down for a long time. In 1965 he wrote a paper predicting a doubling every year. In 1975 it was revised to double ~every 2 years. With 18 months as suggested by someone else being the accepted target for quite a while.
Intel failed to keep up with every 2 years back in 2012.
CEO of Intel, announced that "our cadence today is closer to two and a half years than two.” This is scheduled to hold through the 10 nm width in late 2017.
As machine learning/AI/neuron emulation becomes more useful, we'll see hardware specialized for that. It's not yet clear what shape that hardware will take. Look for "works fine, but is slow on GPUs" results to lead to such hardware, rather than "build it and they will come" projects like the Human Brain Project.
There's still more improvement possible in storage devices. With the 20TB SSD drive expected in 4 years, things are looking good in that area. For compute elements, cooling limits transistor density. Storage tends not to be heat dissipation limited.