Why do GPU's dwarf the computational power of CPU's?

In summary: GPU is optimized for parallel processing, allowing it to perform tasks like rendering and graphics at much higher speeds. This is due to the fact that a GPU has a significantly higher number of cores compared to a CPU, allowing it to process multiple instructions simultaneously. Furthermore, GPUs have a much higher memory bandwidth, making them more efficient at handling large amounts of data. While CPUs are constantly improving and increasing their number of cores, it is likely that GPUs will continue to outperform them in tasks that require massive parallelization.
  • #1
FishmanGeertz
190
0
The ATI Radeon HD 6990 is capable of about 5 teraflops (five trillion operations per second) while the top-tier CPU's from intel and amd can only churn out about 250 gigaflops (billions of operations per second) even after they have been overclocked quite a bit. why do GPU's dwarf the computational horsepower of central processing units?

When do you think we'll see TFLOP processors in PC's? Nvidia's "CUDA" technology allows the GPU to perform some CPU tasks, and the difference in speed and performance is like night and day. I remember seeing a video on youtube about how Intel had designed an 80-core CPU which had about 2.5 teraflops of computing power. Only it was for supercomputers and research purposes, and not for home computers.
 
Computer science news on Phys.org
  • #2
A "Dual Core" CPU has just that, two cores, the ones in my laptop operate @ 1.6 GHz. The GPU you are talking about has 3072 cores, operating at 0.83 GHz (830 MHz). The key is that the GPU can MASSIVELY parallelize.

Additionally, a Intel Core i7 965 Extreme 3.20 GHz, DDR3-1333 CPU, has a memory bandwidth of ~ 24 GB/sec while the stated GPU has a bandwidth of 320 GB/sec. Essentially this means a GPU can massively parallelize massive matrices.

In short, you can generalize this as: A CPU is about quality over quantity, a GPU is about quantity over quality.
 
  • #4
The processing power and parallelization of modern GPUs is extremely exciting.
 
  • #5
I'm sure I read something not so long ago regarding hackers now targeting GPU's as well.

Anyone else see this?
 
  • #6
Well, the enormous power of GPUs makes them ideal for brute force attacks on passwords hashes, encryption keys, etc.
 
  • #8
Also, The Portland Group has freaking amazing CUDA-Fortran/C/VB compilers that I can't wait for my department to get their hands on :biggrin:
 
  • #9
jarednjames said:
That's the one.

Something along these lines: http://www.tomshardware.com/news/nvidia-gpu-wifi-hack,6483.html

Yeah, TKIP is bad. CCMP-AES is far better, but in the end wireless security is pretty hard to achieve.

AIR&SPACE said:
Also, The Portland Group has freaking amazing CUDA-Fortran/C/VB compilers that I can't wait for my department to get their hands on :biggrin:

VB?
 
  • #10
AIR&SPACE said:
In short, you can generalize this as: A CPU is about quality over quantity, a GPU is about quantity over quality.

No both are about highest quality. The reason for the difference is optimizations. I have done graphics programming in OpenGL. And much of the instructions you tell the graphics card to do can be paralized. The same is not true for the CPU. Example

Draw triangle v1, v2, v3 with color mappings of c1, c2, c3 (about 5 -10 lines of code)
From a GPU (which is rigorously optimized for rendering) Will draw each pixel at simutaneosly. Unfortunatly reading pixels from GPU is generally very, very SLOW,

If a CPU will do this, the CPU will draw each pixel one at a time, mostly because the development of the CPU was more generic. And can't make the assumption that the previous state is independent of the next state, when optimizing how the instructions are processed. You will not find how a GPU is designed cause that will give unfair advantage to competitors, what you will find is specs of how to use the GPU, and how to write programs that optimize for different GPU's.

However the CPU is optimized for executing instructions one after another. Now you can get CPU's with 16 cores and there are even CPU's with 100 cores on FPGA's. Right now what I see multithreading is going up cause a lot of applications these days process mass amounts of data that do not need the result of the previous data point or the future data point to calculate the current data point. E.g. doing physics simulation, you can allocate memory for the next state and simulate dt for each of the current state objects and save it in the next state memory location. Now work back and forth between the memory location. In this design all objects can be parallelized.

Also on a GPU you cannot easily make your own color blending function, it is optimized for specific ones (This is changing with new shading algorithms where you can specify the function) e.g. you can only do linear forms of blending. But on a CPU you can do non-linear blending and it will generally be faster cause instructions like power, squrt, sine, cos and etc... are hardware optimized, but a GPU generally does not need all of those functions just sine, cos and a few others.

Note: No where will you find how a GPU, or a CPU works in specific details, since it gives advantage to competitors. But you can find docs of how to make use of the functionalities, and speculate how it is designed internally.

But what it comes down too is drawing can easily be parallelized (just apply the same equation for each pixel) cause pixels are not dependent with each other, but for CPU general applications (e.g compression, decompression) algorithms require the current state to be calculated before calculating the next state.

But this is changing though # of cores are increasing, and because of this multi-threading algorithms are being developed. the biggest example are games do physics calculations on separate threads. Compiling programs are done on separate threads now. So are compression algorithms.

... and one day we might have a 1K core CPU.
 
  • #11
AIR&SPACE said:
A "Dual Core" CPU has just that, two cores, the ones in my laptop operate @ 1.6 GHz. The GPU you are talking about has 3072 cores, operating at 0.83 GHz (830 MHz). The key is that the GPU can MASSIVELY parallelize.

Additionally, a Intel Core i7 965 Extreme 3.20 GHz, DDR3-1333 CPU, has a memory bandwidth of ~ 24 GB/sec while the stated GPU has a bandwidth of 320 GB/sec. Essentially this means a GPU can massively parallelize massive matrices.

In short, you can generalize this as: A CPU is about quality over quantity, a GPU is about quantity over quality.

So a single GPU core has hundreds/thousands of tiny cores, or "stream processors" which each operate at about 850 +/- MHz? While a CPU has only 2-6 cores operating about around a few GHz?

Hypothetically, if a CPU had 800 highly parallelized cores running at a few GHz each, what kind of computational horsepower would we be looking at? And how could we use it?

Ten years ago, what we now have inside our home computers would have been considered multi-million dollar supercomputers during 1997-2000. I wonder if we'll see EXAFLOP PC hardware 10-12 years from now.

Off-topic, but you're saying hackers are learning how to utilize the monstrous computing power of GPU's to crack password caches, and data encryptions?
 
  • #12
Note that cpu in PC's also have parallel mathematical sub-units, dating back to the Pentium III, known as SSE. Not as many parallel units as a GPU, but there are math librarys to use SSE.

Pentium III version:
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

AMD / Intel SSE3 version from 2004:
http://en.wikipedia.org/wiki/SSE3

AMD / ATI also has it's version of a math library for it's GPU's, ACML-GPU. I think that there's math library with a common interface to use either ATI or Nvidia GPUs.
 
  • #13
Just to throw it out there, one of the problems faced at the moment is heat dissipation.

They are (or were?) looking at multi layer CPU's to allow more cores to be packed in (3D processors), but the heat generated just couldn't be removed quick enough.
 
  • #14
jarednjames said:
Just to throw it out there, one of the problems faced at the moment is heat dissipation.

They are (or were?) looking at multi layer CPU's to allow more cores to be packed in (3D processors), but the heat generated just couldn't be removed quick enough.

You mean stacking transistors on top of each other instead of trying to fit more cores onto dies horizontally?
 
  • #15
FishmanGeertz said:
You mean stacking transistors on top of each other instead of trying to fit more cores onto dies horizontally?

Yes.
 
  • #16
jarednjames said:
Yes.

Is that scientifically feasible? What about the problem with heat generation and cooling?
 
  • #17
FishmanGeertz said:
Is that scientifically feasible? What about the problem with heat generation and cooling?

You quoted post #13 (below) where I talked about multi-layer (3D) processors having heat dissipation problems to ask me if I meant stacking the processors.

When I confirm this you then tell me this would have problems due to heat dissipation.

Gone full circle.
jarednjames said:
Just to throw it out there, one of the problems faced at the moment is heat dissipation.

They are (or were?) looking at multi layer CPU's to allow more cores to be packed in (3D processors), but the heat generated just couldn't be removed quick enough.
 
  • #18
FishmanGeertz said:
Hypothetically, if a CPU had 800 highly parallelized cores running at a few GHz each, what kind of computational horsepower would we be looking at? And how could we use it?

Oh, we're well beyond that. They just don't stuff everything into one box, but instead have many computers working together on the same data.

According to Wikipedia the most powerful supercomputer at the moment is Tianhe-I with 14,336 hex core CPUs and 7,168 GPUs. That comes to 86,016 CPU cores and 3,211,264 GPU shaders. It's rated at 2.566 petaFLOPS.
 
  • #19
Speedo said:
Oh, we're well beyond that. They just don't stuff everything into one box, but instead have many computers working together on the same data.

According to Wikipedia the most powerful supercomputer at the moment is Tianhe-I with 14,336 hex core CPUs and 7,168 GPUs. That comes to 86,016 CPU cores and 3,211,264 GPU shaders. It's rated at 2.566 petaFLOPS.

I thought the fastest supercomputer was IBM's "roadrunner."
 
  • #20
According to this it was dethroned in 2009 and is now #7.
 
  • #21
I wonder how long it will be until the absolute ceiling of Moore's law is reached.
 
  • #22
FishmanGeertz said:
I wonder how long it will be until the absolute ceiling of Moore's law is reached.

Not long... not long at all.
 
  • #23
KrisOhn said:
Not long... not long at all.

What will they do to further improve the performance and efficiency of microprocessors?
 
  • #24
FishmanGeertz said:
What will they do to further improve the performance and efficiency of microprocessors?

I think one of the biggest problems is the current leak of transistors. When you make transistors that are that small, very small amounts of current leak out of them, which is why they heat up. If we could get rid of that current leak, then we could start stacking layers and layers of transistors like JnJ was saying.

I unfortunately have no idea how feasible this idea is, I am no Electrical Engineer that's for sure.
 
  • #25
KrisOhn said:
I think one of the biggest problems is the current leak of transistors. When you make transistors that are that small, very small amounts of current leak out of them, which is why they heat up. If we could get rid of that current leak, then we could start stacking layers and layers of transistors like JnJ was saying.

I unfortunately have no idea how feasible this idea is, I am no Electrical Engineer that's for sure.

Could they do this same thing with GPU dies?
 
  • #26
FishmanGeertz said:
Could they do this same thing with GPU dies?

Yes they could.
 
  • #27
KrisOhn said:
Yes they could.

Could they fit multiple GPU cores on a single die? Or would this generate too much heat?
 
  • #28
FishmanGeertz said:
Could they fit multiple GPU cores on a single die? Or would this generate too much heat?

I'm not sure, it depends how dense the fitting is. That would have to be answered by someone with more knowledge.
 
  • #29
KrisOhn:

I originally had a design for a computer unit that I call AT1000 It consisted of around 35billion
GPUs interconnected by transition gate matrix so the physical configuration of the processor could be reconfigured as needed. Multiple 14cm silicon wafers with about 750,000 GPUs per side had laser-ed holes for interconnections via micrometalic bridges. Up to 7000 wafers or more, depending on thickness, were then stacked in a special cylinder that supplied IO and power, that was originally to be flooded with R12 or Dichlorodifluromethane. I came up with this design around 1975 before the advent of large scale integration. R12 flooding would keep the wafers around 2 deg. C. I later opted for Monsanto's TP10 or Fluorocarbon Inert, a long chain fluorocarbon that is less reactive than R12 but would not cool as well but could keep core temps below 12 deg. C. So I could recommend immerse the chip assembly in a suitable coolant with a adequate heat sink as part of your chip carrier.

I don't think that multiple layers of devices is a good idea due to point heating that could cause localized disruption of devices in the other layers.
 
  • #30
Although I'm not entirely sure I understand everything that was said there, damn does it sound cool.

I have a question though, how would the AT1000 made out of 35billion circa 1975 GPU's stack up against something like this:
http://en.wikipedia.org/wiki/Tianhe-1A
 
  • #31
KrisOhn said:
Although I'm not entirely sure I understand everything that was said there, damn does it sound cool.

I have a question though, how would the AT1000 made out of 35billion circa 1975 GPU's stack up against something like this:
http://en.wikipedia.org/wiki/Tianhe-1A


It is becoming very difficult for electronic engineers to significantly increase the performance and efficiency of microprocessors. The next step for PC hardware is 28nm chips. Moore's law might become utterly obsolete at around 11nm.

That's when they will have to really put their heads together and think of something revolutionary if they want to keep making faster and faster hardware. There are some ideas on the slate, but most of them are purely theoretical/experimental.
 
  • #32
KrisOhn:

I had a reply that I tried to enter day-before-yesterday but there was a glitch and the post was lost.

However the gist was this:

The AT1000 and its associated GPUs contain adaptive electronics, much more sophisticated than GPUs of the day, each GPU can be configured electronically by arrays similar to PEELs (Programmable Electrically Erasable Logic). I figure that the Tianhe-1A may just be adequate to load the P7E BIOS into the AT1000 Unit. My original estimation on the loading and compilation of the P7E BIOS using CRAY computers circa 1990 was about 3.38 years computertime. My estimation is the Tianhe-1A might take about 6 or more days.

FishmanGeertz:

I believe that the upper limit to computer clocking speed is the EM effects beyond 10GHz, it might be necessary to couple devices with waveguides. Imagine using X-rays to clock a computer circuit, very very difficult, not necessarily imposable.

As for the heating problem, immersion of the chip in a suitable non-polar refrigerant coupled to a suitable heat sink as part of the chip carrier should solve the heating problem. Liquid coolant allows for the maximum junction heat transfer over the entire surface of the device and not just through the substrate of the device.
 
  • #33
Eimacman said:
KrisOhn:

I had a reply that I tried to enter day-before-yesterday but there was a glitch and the post was lost.

However the gist was this:

The AT1000 and its associated GPUs contain adaptive electronics, much more sophisticated than GPUs of the day, each GPU can be configured electronically by arrays similar to PEELs (Programmable Electrically Erasable Logic). I figure that the Tianhe-1A may just be adequate to load the P7E BIOS into the AT1000 Unit. My original estimation on the loading and compilation of the P7E BIOS using CRAY computers circa 1990 was about 3.38 years computertime. My estimation is the Tianhe-1A might take about 6 or more days.

FishmanGeertz:

I believe that the upper limit to computer clocking speed is the EM effects beyond 10GHz, it might be necessary to couple devices with waveguides. Imagine using X-rays to clock a computer circuit, very very difficult, not necessarily imposable.

As for the heating problem, immersion of the chip in a suitable non-polar refrigerant coupled to a suitable heat sink as part of the chip carrier should solve the heating problem. Liquid coolant allows for the maximum junction heat transfer over the entire surface of the device and not just through the substrate of the device.

Do you think we'll ever see EXAFLOP processors in home computers?
 
  • #34
FishmanGeertz said:
Do you think we'll ever see EXAFLOP processors in home computers?

You might, but your computer still won't be able to play the latest games.
 
  • #35
jhae2.718 said:
You might, but your computer still won't be able to play the latest games.

An EXAFLOP GPU would absolutely butcher Crysis. 2560x1600, x32MSAA, everything else set to max, on six monitors, and you would still get over 100 FPS! I'm not sure about how it would perform in 2030-2035 PC games.
 

FAQ: Why do GPU's dwarf the computational power of CPU's?

Why are GPU's better at handling parallel processing compared to CPU's?

GPU's are designed specifically to handle large amounts of parallel processing. They have a larger number of cores compared to CPU's, which allows them to perform multiple tasks simultaneously. This makes them more efficient at handling tasks that require a lot of calculations, such as graphics rendering and machine learning algorithms.

How do GPU's achieve higher computational power compared to CPU's?

GPU's have a highly specialized architecture that is optimized for handling large amounts of data simultaneously. They have a large number of cores that work together to process data in parallel, whereas CPU's have fewer cores that work sequentially. This allows GPU's to perform more calculations in a shorter amount of time, resulting in higher computational power.

Can GPU's be used for general purpose computing?

Yes, GPU's can be used for general purpose computing, but they are best suited for highly parallel tasks. This means that they are not ideal for everyday tasks such as web browsing or word processing, but they excel at tasks that require a lot of calculations, such as scientific simulations or video rendering.

Are there any limitations to GPU's compared to CPU's?

While GPU's have a higher computational power, they are not as versatile as CPU's. They are designed to handle specific tasks and are not as efficient at tasks that require sequential processing. Additionally, GPU's have a limited amount of memory compared to CPU's, which can be a limiting factor for certain applications.

How can GPU's and CPU's work together to improve overall performance?

GPU's and CPU's can work together through a process called heterogeneous computing. This involves offloading specific tasks to the GPU, while the CPU handles other tasks. This allows for better utilization of resources and can significantly improve overall performance. However, it requires specialized programming techniques and is not suitable for all applications.

Back
Top