CPU and PCI-Express
There’s a lot of excitement over PCIe lanes! However, the reality is that they do not have anything to do with the speed of deep learning. If you have only one GPU PCIe lane is needed to transfer data from between the CPU RAM and GPU RAM GPU RAM within a relatively short time.
However, the ImageNet batch of 32 images (32x225x225x3) and 32 bits requires 1.1 milliseconds when using 16 lanes. 2.3 milliseconds with 8 lanes, and 4.5 milliseconds when there are four lanes.
These are just theoretical numbers but in reality, you’ll often see PCIe to be twice as slow but it’s still quick! PCIe lanes generally have a delay that is in the range of nanoseconds and therefore it’s possible to not notice the delay.
When we put this all together and put it all together, we will obtain the following information regarding an ImageNet mini-batch comprising 32 images.
ResNet-152 which is characterized by the following time-stamps:
Reverse and Forward pass and reverse pass: In the roaming of 216 milliseconds (ms)
16 PCIe lanes with CPU-GPU transfer speeds of approximately two milliseconds (1.1 milliseconds theoretical)
Thus the increase between 4 and 16 PCIe lanes will give an increase in performance of approximately 3.2 percent. If you are using the data loader in PyTorch with locked memory you’ll get the same speed. So, don’t waste your money on PCIe lanes for a single GPU!
When you are deciding on PCIe PCIe lines for your motherboard and CPU PCIe lanes, ensure you select a setup that supports the amount you need of GPUs.
If you buy a motherboard with two GPUs and you want to add two GPUs in the near future, ensure you purchase an appropriate CPU that has two GPUs. Don’t necessarily look for PCIe lesions.
PCIe Lanes and Multi-GPU Parallelism
Are PCIe lanes necessary for training networks on multiple GPUs with data parallelism? I have written an article about this subject at the ICLR2016 conference. I’m here to tell you that if you’ve got over 96 GPUs, PCIe-based lanes are vital.
However, if you’re running 4 or fewer GPUs it’s not a huge problem. If you’re able to make your system parallel across several GPUs I would not be worried about the PCIe lane.
If I had four GPUs, I’d make sure that I have at least 8 PCIe lanes per GPU (32 PCIe channels total). Since almost no one utilizes systems that have more than 4 GPUs, the general standard is to not make the extra investment to purchase a larger PCIe lane for each GPUreally, it doesn’t matter much! Check out LCD screen price online in India.
Needed CPU Cores
In order to make the right decision about the CPU, it’s first important to understand the CPU’s function and the relationship to deep learning.
What is the function of the CPU to support deep learning? The CPU isn’t able to do any computation while you’re using your deep nets with the GPU.
It generally (1) initiates GPU tasks, (2) executes CPU functions.
Most likely, the most useful software for processing data is the one that you use. There are two distinct ways of data processing, each with distinct specifications for CPU.
It is the first thing to prepare for your training
Mini-batch of Load
Mini-batch of pre-processing
Train in mini-batch
Another method is preprocessing ahead of any class.
Mini-batch packed with preprocessed processing
Train in mini-batch
In the first scenario, and efficient CPU that is equipped with more cores could boost performance by an impressive amount. For the second case, it’s not necessary to have a super-fast CPU. In the first instance, I would recommend at minimum 4 threads per GPU that are usually 2 cores per GPU.
I haven’t conducted any thorough tests, however, it is possible to see a gain of 0-5 per cent in performance per additional core or GPU. Check out 10kva ups price online in India
If you’re using the second approach I would suggest at least two threads per GPU. This is typically one core per GPU. There aren’t any significant improvements in performance when you have additional cores when using the second strategy.
Needed CPU Clock Rate (Frequency)
When people think of speedy CPUs, they tend to concentrate on how fast they run. 4GHz is more powerful than 3.5GHz But is that really true really? This is typically the case when comparing processors with similar design, e.g. “Ivy Bridge”, but it’s not a reliable way to assess the performance of processors. It’s far from being the best and most precise measurement of performance.
For deep learning, there’s very little computation that is required from the CPU. It is possible to increase the number of variables and examine the Boolean formula, or create certain function calls using the GPU or within the program. All depend on the core clock speed of your CPU.
While this logic may seem reasonable, however, the truth is that the CPU is actually in per cent utilization when I am using deep learning applications. What’s the issue? I ran some tests of underclocking on the CPU core rate to find the cause.
CPU underclocking in MNIST, as well as ImageNet Performance, is determined by the amount of time more than 100 Epochs MNIST or half an epoch
ImageNet using a variety of the core clock rates of the CPU where the clock rate that is the highest is used as the baseline for determining how well the CPU performs.
To give a benchmark, the change of the processor from the GTX 580 to GTX 580 to the GTX Titan is about +20 per cent performance. Transferring from a GTX Titan up to GTX 980 is an additional 30 per cent increase in performance. Overclocking GPUs can result in five per cent improvement in performance for all GPUs.
Underclocking of CPUs in MNIST as well as ImageNet Performance is determined using the duration of the 200th epoch MNIST and the quarter-epoch in ImageNet using the various rates of clocks for cores on CPUs with the rate with the highest number is taken as the base to measure the CPU’s performance.
For example, upgrading from the GTX 680 to GTX 680 to GTX Titan is approximately +15percent performance, while upgrading from the GTX Titan to the GTX 980 the performance increases by 20%. Overclocking your GPU can lead to approximately 5 percent of performance per GPU.
Note that it is crucial to keep in mind that these tests were conducted on old hardware; however the results will likely be comparable to the latest GPUs and CPUs.
CPU and PCI-Express