Hello there again!
In this article I am planning to tell you something about video memory too, also giving some examples of video cards being among the best and most worthy of choosing! It is my first video-related post on my site, but they are as hardware-related as other computer components are. So let’s talk!
Video memory in computers is like eyes in our body
When using computers, we need to see something on their monitor screens, just like we have images at TV. And for rendering images, computers need a special segment of memory that’s dedicated for doing that. And usually this task is accomplished via the so-called video cards, also known as GPUs (Graphics Processing Units) that need to be plugged into the PCI-E motherboard slots (that is for modern motherboards, but on older ones there are old-tech slots like AGP and simply PCI). Not having video memory at all in the computer means that we will see nothing on the screen – we may turn the PC on, but we’ll find ourselves staring at an empty screen that may tell us “No signal”! And this one would make our machine useless, so that having some video memory is definitely a must!
There is not a single type of video memory
We may retrieve some video memory from the CPU via a part of the RAM (which is called shared memory and doesn’t need a separate card), but this depends on the CPU – it’s not possible everywhere. And anyway, that shared memory is regularly at most 1 Gigabyte, while for higher video-related tasks like playing games or creating content (and also doing enhanced computing tasks, we’ll get into that too), it is much better for us to choose a separated video card. Cards have up to 48 GB of memory!
But wait, what memory is this about? When talking about computer memory in general, the first thing that comes into our mind is the RAM. However, that’s not all. And while there are some similarities between the RAM and GPUs in terms of how memory works, they are two separated things (well, except for the “shared memory” case, where similarities are welcome for making the shared memory possible).
Like RAM, the video memory temporarily stores data (graphics data), and as soon as new graphics data is required – because images on the screen may change all the time, as the OS loads or as our tasks make progress – the data from the video memory is replaced and we are seeing other things on the screen. When turning the computer off, all data in the video memory is removed. So it’s volatile memory, just like the RAM. We cannot rely on video memory for permanently saving data.
The video memory part in a GPU, as it is of random access type, is also called “Video RAM” or VRAM, exactly as we see. And besides desktop PCs, we can find this VRAM in laptops too. So when talking about video in a computer, it is either shared memory (coming from the CPU integrated graphics – it there is any – and the regular RAM) or separated video RAM coming from the GPU card, and this card, besides VRAM, contains also processing units (yes, GPUs have processors too, and it’s plenty of cores there, we’ll see that).
Alternatedly, GPUs are called discrete graphics cards. This is exactly because of their video memory and processors being on their own, completely separated from the motherboard’s CPU and RAM.
To say it shortly, in a Desktop PC we may have integrated graphics cards and discrete graphics cards. For the former we don’t need to purchase anything that’s separated, the computer already comes with a video module, however we cannot play high-tech games, do 3D modeling or other heavy video-content-creating tasks with this memory. We can instead do our basic work like navigating through the Internet, listening some music or creating documents. The latter (using a GPU or even more) gives us much larger opportunities to benefit from, and a Gaming user would surely NOT be content with just having integrated graphics!
And maybe we want an image with higher resolution on our screens, so that more icons / items will be visible on it: 640*480 pixels is definately not as larger as 1920*1080, and it’s very likely we cannot have too high resolutions on integrated graphics. Discrete graphics has a lot more chances to help us out with this. Then, if we want a better color palette in our systems, GPUs are scoring once more against integrated graphics.
Video cards are an empowered world of processors
If CPUs were considered the brain of Computers, graphic cards were called their soul. But during the latest years the GPU power has increased enough to make them usable in supercomputing tasks, thanks to their large number of processor cores. A GPU does not only have a memory part (nowadays usually gigabyte-like), but there’s also an internal network of processors, and since GPUs were dedicated to parallel processing, we need to know that there are MANY cores inside of them. Of course GPU models differ in power and performance, there is plenty of offers, but sometimes the core count of a GPU can be greater than 4000 (yes, it’s true).
Due to how emancipated GPUs are when it comes to processor cores, they can do excellent jobs with parallel processing, being able to do much more operations at once when compared to CPUs. This is a reason why GPUs are used in Cray supercomputers, but gaming users are not excepted from the powers GPUs bring in. In the first place, the computations that are made by the video internal cores are related to pixels (it’s video after all!), or let’s just call it pixel processing.
Accelerating applications is an important criteria, and by dividing tasks to so many processors, GPUs offer better ways for this acceleration in the fields of Artificial Intelligence, supercomputing, and regular graphics. For every single unit of energy, GPUs are rated to work much more than CPUs. Other domains where GPUs are excellent are Automotive, Robotics, Life sciences.
Surely, regular CPUs will further keep their indispensable roles, they are not going to disappear because of how powerful graphic units have become. Each type of processing unit has its well-defined role.
NVIDIA cards make use of their lots of processors via the Compute Unified Device Architecture (CUDA), an extension of the C language helping programmers to use that lot of cores for general computation purposes. Programmers have to learn & develop highly parallel-oriented algorithms. Number crunching also fits in very well: CUDA does 32-bit integer and floating-point operations, and double floating-point calculation is also supported.
Since here we have so many cores when compared to regular CPUs, CUDA is much more suitable for huge calculation workloads with these types of numbers. And there are also some libraries allowing even larger floating-point numbers to be tackled (arbitrary-precision software libraries). But this matter concerns number lovers, not gamers!
CUDA is also good for large datasets. Also, the CUDA-oriented GPUs have a special category of cores (besides CUDA ones), that is called Tensor Cores, counting up to several hundreds (e.g. 640 Tensor cores on the NVIDIA Titan V). These Tensor Cores are good at Artificial Intelligence, Deep & Machine Learning and at evaluating pretty large mathematical computations and problems. We will not find them in the commonly gaming-oriented GPUs like the NVIDIA GTX 2k suite.
Some NVIDIA cards also have a third distinct category of cores, the Ray Tracing (RT) cores, where ray tracing is related to the following of the path of light beams between our eyes and the objects that the light ray interacts with (like while watching movies). There are NVIDIA cards that feature all these three kinds of cores (like in NVIDIA Quadro, you will see an example later).
As for AMD, the lots of cores inside their GPUs are called Stream processors. Compared to their CUDA counterparts, these are smaller, not so complex, and running at lower frequencies. There is definitely an architecture difference between CUDA and Stream, so it is not that simple to compare, say, 1600 CUDA cores to 1600 Stream processors.
Physical and hardware considerations
Older GPU cards like my oldest one from home (NVIDIA GeForce 9500 GT) do not have an active cooling, but more modern GPUs (from GDDR5/GDDR6 generations) have at least one air fan, and the larger the video memory, the more extended the cooling zone (2 or even 3 fans mounted on the card), because these pieces accumulate heat too. Especially when overclocking them. And in a previous article I mentioned the water cooling option – but that is for custom users and implies aftermarket work. By this time I don’t know if there are any GPUs coming with waterblocks.
When we need to place a GPU card on the motherboard, it is best to choose the PCI-Express slot that is closest to the CPU. It surely is x16 (which means 16 lanes), that ensures the best bandwidth and thus a good performance for the video card. The rest of the PCI-E slots may be even shorter (and running at a lower count of lanes) like x8/x4/x1, depending also on the total number of PCI-E lanes on the motherboard, but the slot being closest to the CPU is always x16-long and x16-running.
You should also be prepared for needing good power supplies if your video card is going to feature larger memory like 4/8/11 GB or more. Their power consumption is high as well, and you will very probably need a special PCI-E connector (it has 6 pins) from the PSU to be plugged into the video card for dealing with its power. It will NOT be enough connecting the card to the PCI-E slot in the motherboard!
And by personal experience, I have one GDDR3 1GB NVIDIA 9500GT (no extra PCI-E needed), four other NVIDIA 2GB GDDR5 cards (I don’t remember each full name for them, but one is a Zotac, while the others are either GT1030 or G1050), and my biggest memory card is a GIGABYTE GeForce GTX 1050 Ti Windforce OC (long name…), 4 GB, GDDR5.
The 2GB cards have one single cooling fan and don’t require any extra PCI-E connector, while also being single-slot (I will explain this below), and of course the GDDR3 card is a single-slot without extra connection too. On the other hand, the Gigabyte 4GB card (which is still NVIDIA-based) has two large cooling fans, needs an extra PCI-E 6-pin connector, and it’s also dual-slot!
But what do these 1-slot and 2-slot terms mean? You should be careful about this slot detail too, when purchasing a video card. They plug into one single PCI-E slot from the motherboard, but as they feature more memory, active cooling and thousands of processors inside, their physical size gets bigger too, and many high-end video cards are wide enough to NOT cover just one PCI-E slot from the motherboard! The so-called dual-slot GPUs still plug into a single mobo slot, but they are wide enough to get the next PCI-E slot “stuck” under their plastic part, so that you will lose access to that second slot.
Be even more careful: there are triple-slot form-factored GPUs too, like the AORUS NVIDIA GeForce RTX 2070 SUPER 8GB. You buy a strong GPU, but at the same time you may lose extra access to your PCI-E slots, because the space that’s allocated on the motherboard between the slots is not wide enough.
Extra Bonus Malus: there also exists a Colorful iGame GTX 1080 KUDAN video card that is quad-slot! Depending on your motherboard, if you connect a GPU like that you may lose access to all other PCI-E slots, because they are beneath the card.
And actually, there are more GPU form factor types, kind of intermediary ones, like 1.5 slot and 2.5 slot. So we have several criteria to pay attention to, when deciding it’s time for newer video memory in the house. Especially Gaming users needing great video would run into that multi-slot problem, since it is pretty hard to find a high-memory GPU that is not that spaced.
But KFA2 GeForce® GTX 1070 KATANA Single Slot is an 8GB GDDR5 piece that provides reasonable memory, without taking more space than needed. Maybe you want to benefit from each one of the slots from the motherboard, plugging other cards like expansion ones (for extra SATA, M.2, U.2 ports) and even another GPU card.
Yes, we can also have several GPUs on the same motherboard. And in order to be able to use them both as a single device with the sum of the two memories and processor powers they have, we need the SLI technology.
Scalable Link Interface (SLI) was created by NVIDIA in 1998 and allows multiple GPU cards (they can be even 3-4, not only two!) to work together on the motherboard. There are some little devices called SLI bridges for physically interconnect the cards, and three fashions (modes) the SLI works:
- Anti-aliasing mode, leading to a good image quality rather than to high frame rates; like its name says, this method splits the anti-aliasing workload between the multiple GPUs;
- Alternate Frame Rendering (AFR), that achieves high frame rates; if you use two GPUs via this technique, each of them renders a full frame, the first takes care of the even numbered frames and the second handles odd-numbered ones, so it’s also an arithmetical matter;
- Split Frame Rendering (SFR), that takes a frame, and after analyzing it, the frame gets horizontally divided into as many parts as the GPUs count is (two, three, four…), thus distributing the workload equally between them. The frame rates are not as high as in the above-mentioned AFR approach.
Normally, you can choose which one of these 3 SLI operating ways will help you benefit from all cards, by entering the NVIDIA Control Panel software, that should be installed onto the computer along with the regular video drivers.
However, this SLI technology only applies to NVIDIA-based cards. AMD ones have their own multi-GPU-related technology, that is called CrossFire. If you choose CrossFire, or anyway if you are interested to find out more about multi-GPU usage, there are some other interesting details:
- NVIDIA SLI requires us to use identical graphics cards on the motherboard, like RAID technology does with disk drives – a big RTX 2070 Super will only make pair with another RTX 2070 Super (8 GB + 8 GB), and of course not with a 4 GB GT 1050 Ti OC. Brand may differ (like AORUS vs non-AORUS), but the structure has to be the same. CrossFire allows two structurally different models, however they must still belong to the same architecture (like R9 with R9, but not R9 + RX).
- NVIDIA SLI tells us to inter-connect the cards via that bridge (or by cable), whereas CrossFire allows cards to communicate via the PCI-Express interface (3.0, but 4.0 has also come out this year).
- NVIDIA has higher prices for the SLI configuration, because they ask for so-called SLI certifications from motherboard manufacturers like GIGABYTE, MSI, EVGA, ASUS; AMD does not require such certifications, and it is stated that there are a lot more motherboards supporting CrossFire than those with SLI.
As for myself, I have never tried to use either of these technologies since I am not an eager Gaming user. I loved doing researches about numbers, which required CPUs, lots of RAM and many terabytes of disk drives. Although CUDA also is helpful for doing calculations with the much-many-processor-cored GPUS, I studied about the matter and found out that there is no way for working with large integers (like 100 decimal digits and more) via CUDA. But true gaming users and intensive video content creators can surely use plenty of video power via SLI or rather CrossFire if the latter is more generous.
They also come in a RAM-like generation flavor
As RAM memory knows several successive generations on its development roadmap (like DDR, DDR2, DDR3, DDR4, DDR5 to come), discrete graphic cards also have their own line of generations. So there are different GDDR versions (the extra G stands for Graphics, of course). And the numbering of generations is more advanced on the video DDR side: while regular RAM is still stuck at DDR4, we can already see GDDR6 video cards! There has been also a GDDR5X sub-gen.
Now, I will not dive into GDDRs inferior to GDDR3. The GDDR3 standard is still used in some low-end, entry-level video cards (and that are obviously older), I have myself a GDDR3 1GB NVIDIA 9500 GT card at home. And it doesn’t work with CUDA on Arch Linux, I can tell this too.
GDDR4 is also limitedly used, it’s stated that it was used only with a few cards, so it’s difficult to spot it on the market.
GDDR5 is still en vogue, being capable of speeds up to 5 GHz and normally 2.5x to 3x faster than GDDR3. My other five video cards at home are all GDDR5 (2 GB and the largest one 4 GB, I was not that gaming-oriented user). Although GDDR5 and its sister generation GDDR5X are no longer the most recent video generations, we can still find plenty of GDDR5 cards.
Finally, GDDR6 is the real news in the video domain. It almost doubles the transfer speeds over GDDR5 and has lower power consumption (well, that’s a real progress leveraging electric bills).
There exists also another type of video memory called HBM (High Bandwidth memory), with even the second-generation HBM2. Hynix and Samsung are the companies manufacturing this standard.
NVIDIA and AMD provide best rated video cards!
We know that in the CPU field there are two market giants – Intel and AMD are the dominating ones. The situation is similar when we talk about NVIDIA, AMD and their much more core-gifted GPU cards.
NVIDIA has been very good at producing GPUs for many years, and AMD is also responsible for cards like the Radeon suite. There is a GPU-related rivalry between NVIDIA and AMD! Not only Intel and AMD are competing one against another for CPUs, but we may also consider a GPU war between AMD and NVIDIA! And it’s stated that there is no true winner in the GPU side.
NVIDIA is famous for manufacturing cards that use the CUDA parallel-programming platform (based on programming languages like C and Fortran), while AMD makes cards like Navi, Quadro RTX, and also the above-mentioned Radeons. As for a parallel programming platform, on AMD cards we may find OpenCL.
Some extremely powerful cards from NVIDIA, as of today, are the Turing GPUs that belong to the NVIDIA GeForce RTX suite. NVIDIA follows a certain naming policy, so that in an increasing order of quality we meet products like NVIDIA GeForce RTX 2060, 2070, 2080, and there are also three “Super” counterparts of those. The non-Super 2080 card has actually the full name “2080 Ti”.
The NVIDIA GeForce RTX 2080 Ti flagship GPU card uses the GDDR6 video memory generation, and it features 11 GB of memory, besides having 4352 CUDA processor cores. Its base clock speed is 1350 MHz, but the boost clock speed is 1545 MHz. Although this frequency is lower than those the flagship CPUs can reach today, we have to remember that the best Desktop CPUs available today have up to 64 logical processors aka threads, being able to run somewhere above the 4000 MHz threshold.
The memory bandwidth is an overwhelming 616 GB/s. The memory interface width is 352 bytes, while the memory speed is 14 gigabits (meaning one-eighth of gigabytes) per second. The card also supports Ray-Tracing, and at most 4 displays (four monitors that it can be simultaneously connected to). You can use HDMI 2.0 and Display Port 1.4 cables with this video card.
It is Dual-Slot and needs two extra PCI-E 8-pin connectors (they actually come in a 6+2-pin flavor, we must not confuse them with 8-pin CPU power connectors). Its cost is above $1000 USD.
Note: Electronically speaking, the architecture GPU processors are built on is different from the principles that are used when manufacturing CPUs. For now I cannot thoroughly explain what those manufacturing differences are, but if we put the problem arithmetically and multiply, say, 4100 by 64 and then 1545 by 4352, the second result (on the GPU side) clearly overwhelms the first one. So, in supercomputing, video rendering and other kinds of tasks, GPUs have a great advantage towards classic processors!
It’s good for us to know that there are also some NVIDIA cards that contain another type of processor cores besides the CUDA part.
The NVIDIA Quadro RTX 8000 is a real video behemoth having no more, no less than 48 GB of GDDR6 memory, 4608 CUDA cores, 576 Tensor cores, 72 RT cores, 672 GB/s memory bandwidth, and a 1730 MHz boost clock frequency. It supports DirectX 12.0 and it’s capable of supporting two simultaneous displays of 7680*4320 pixels on 60 Hz! Let’s think a bit about this resolution: 7680 pixels long and 4320 tall. Far away from the 640*480 and 800*600 oldies! And have it on TWO monitor screens at once. This is what “simultaneous displays” means.
It is rated as the best ray-tracing graphics card in the world!
Smaller resolutions (also with a higher Hz rate, like 3840 * 2160 at 120 Hz) can be supported on FOUR monitors at once. This is possible also because the card has four connectors (Display Port), so we can connect 4 monitors to it.
We may see other huge parameters like 130.5 Tensor Teraflops for Deep Learning, and its maximum DisplayPort 1.4 resolution is HDR 7680*4320 at 60 Hz. So 7680 pixels in width and 4320 in height. How many of us have ever seen such a broad image on a monitor?
You have to be rich and well-determined to use this card, since its price can be over $5000 USD. After all, Quadros are meant for professionals, so that a CEO will very likely be interested in purchasing some samples for their company. This kind of card would be overkill for you gamers!
While the GeForce division of NVIDIA cards is suitable for Gaming, the Quadro GPUs are dedicated to workstation environments, for tasks like machine learning, scientific calculations, digital content creation or DCC, computer-generated imagery, computer-aided design.
The workstation orientation of Quadros really makes sense, if we remember that workstations usually need larger resources. After all, the RTX 8000 has more than four times the memory capacity of the 2080 Ti! Besides some amounts of extra cores, and higher boost clock frequencies.
Obviously the Gaming-related 2080 Ti flagship is way more affordable than the Quadro part, so those of you who have satisfying earnings and go for gaming should just enjoy the huge CUDA power the 2080 Ti comes with. Moreover, the 2080 Ti works on PCI-E 3.0, so let’s wait to see if in 2020 a new NVIDIA GeForce flagship comes out, maybe a 3080 Ti benefiting from the even higher speeds of PCI-E 4.0 standard.
Remember that as of right now, PCI-E 4.0 is available only on the AMD X570 motherboards, and partially on their B450 and X470 ones as stated by Tom’s Hardware in July 2019. Intel motherboards still stick with the PCI-E 3.0 standard. We still have to wait and see other progresses on the way.
But these two video cards are just the top of the top, and surely not everyone of us is in the position of aiming for that best of the best. There are also more affordable GPUs that at the same time provide lots of performance. And then, a pretty large part of PC users are interested in gaming GPUs (that can of course be used for creating content), we would rather go for NVIDIA or Radeon cards, not for Quadros that may cost housands of bucks per piece. So let’s see some milder video cards.
The NVIDIA GeForce 2070 Super is kind of a younger sister (or let’s say cousin, given the extra Super name part) to the 2080 Ti. It has “only” 8 gigabytes of GDDR6 memory, 2560 CUDA processor cores, a 215W thermal design power (TDP), the core clock is 1605 MHz (Boost: 1770 MHz) and the architecture is Turing, like in the 2080 Ti. There are also Tensor and RT cores, its memory bandwidth is 448 GB/s, the memory interface width is 256 bits, the speed of the memory is 14 gigabits per second. It can connect to up to four monitors via Display Port and HDMI.
It is dual-slot and not as pricey as the flagship, a future user may be prepared to spend around $500 – $700 US for having one piece. Or even more if they want to go SLI (and not CrossFire, since this is NVIDIA). Be careful, you will need two extra energy connectors (PCI-E) for it: 6-pin and 8-pin.
And now, let us leave NVIDIA alone and look for an AMD great video card too.
AMD Radeon RX 5700 is an 8-gigabyte GDDR6 card that’s considered to be best for 2K Gaming (where 2K stands for 2000-like pixel resolutions). It belongs to the Navi 10 architecture, its TDP is 180 or 185W (depending on where you find the specs, on AMD’s site it’s 185, however it’s surely lower and better than the one of the 2070’s above), and its core clock is 1465 MHz, while there are 2304 Stream processors. Its form factor is Dual Slot, so you will end up with a useless PCI-E slot underneath this card!
The Boost clock is 1725 MHz, its memory speed is 14 Gbps, the memory interface is 256 bits, the memory bandwidth is 448 GB/s. Its connectivity supports both DisplayPort – 3 connectors – and one HDMI (no DVI or VGA, these two are too oldies). You will need two extra PCI-E connectors: a 6+2=8-pin one and a 6-pin one. You may pay less than $400 US for one card, or of course “x” times as much if you want a “x”-element CrossFire configuration!
Let’s go downto 1080p resolutions and see another best rated GPU card from NVIDIA.
The NVIDIA GeForce GTX 1660 Super has 6 GB of GDDR6 memory, 1408 CUDA cores of 1530 MHz (Boost: 1785), 125W TDP, and it is dual-slot. It does not support real-time Ray Tracing (and there are no RT cores). It has one Display Port, One HDMI connector and one for DVI, so that you may theorically connect it to up to 3 monitors. Its memory bandwidth is rated at 336 GB/s, the memory speed is 14 Gbps, and the memory interface bandwidth is 192 bytes. People wanting to purchase it should spend less than $300 US, and you will have also to use an 8-pin additional PCI-E connector.
Having said all this, there are many other features / parameters a GPU can have. Full specification details are much larger.
And to conclude…
Video cards are nowadays meaningful also for other tasks than simply seeing images on our monitors. Their industry has made, is making and surely will further be making great paces and provide us with more and more powerful GPUs having plenty of CUDA / Stream Processor cores, and we are not finished.
The astronomical computation power GPUs reach (especially when interconnected…) makes them suitable for supercomputing and avdanced science purposes. And regular PC users also have many video features to benefit from, be it while gaming, be it while watching movies or creating any kind of video content (or why not, also when doing mathematical researches, admitting that those numbers are supported). There is already a number of activity fields where GPUs have already put CPUs in the shade!