System Performance: Multi-Tasking

One of the key drivers of advancements in computing systems is multi-tasking. On mobile devices, this is quite lightweight - cases such as background email checks while the user is playing a mobile game are quite common. Towards optimizing user experience in those types of scenarios, mobile SoC manufacturers started integrating heterogeneous CPU cores - some with high performance for demanding workloads, while others were frugal in terms of both power consumption / die area and performance. This trend is now slowly making its way into the desktop PC space.

Multi-tasking in typical PC usage is much more demanding compared to phones and tablets. Desktop OSes allow users to launch and utilize a large number of demanding programs simultaneously. Responsiveness is dictated largely by the OS scheduler allowing different tasks to move to the background. Intel's Alder Lake processors work closely with the Windows 11 thread scheduler to optimize performance in these cases. Keeping these aspects in mind, the evaluation of multi-tasking performance is an interesting subject to tackle.

We have augmented our systems benchmarking suite to quantitatively analyze the multi-tasking performance of various platforms. The evaluation involves triggering a VLC transcoding task to transform 1716 3840x1714 frames encoded as a 24fps AVC video (Blender Project's 'Tears of Steel' 4K version) into a 1080p HEVC version in a loop. VLC internally uses the x265 encoder, and the settings are configured to allow the CPU usage to be saturated across all cores. The transcoding rate is monitored continuously. One complete transcoding pass is allowed to complete before starting the first multi-tasking workload - the PCMark 10 Extended bench suite. A comparative view of the PCMark 10 scores for various scenarios is presented in the graphs below. Also available for concurrent viewing are scores in the normal case where the benchmark was processed without any concurrent load, and a graph presenting the loss in performance.

UL PCMark 10 Load Testing - Digital Content Creation Scores

 

UL PCMark 10 Load Testing - Productivity Scores

 

UL PCMark 10 Load Testing - Essentials Scores

 

UL PCMark 10 Load Testing - Gaming Scores

 

UL PCMark 10 Load Testing - Overall Scores

The presence of a transcoding workload in the CPU cores makes handling other tasks an uphill task for low-power PCs. The PCMark 10 workloads above bring out that aspect. The ECS LIVA Z2 and LIVA Z3 are able to only handle the 'Productivity' and 'Essentials' workload components, while ending up with a timeout on others. Other than the 'Gaming' component, we see the June Canyon NUC being most effective at handling multi-tasking due to its actively cooled nature - it has the least performance loss across almost all PCMark 10 components.

Following the completion of the PCMark 10 benchmark, a short delay is introduced prior to the processing of Principled Technologies WebXPRT4 on MS Edge. Similar to the PCMark 10 results presentation, the graph below show the scores recorded with the transcoding load active. Available for comparison are the dedicated CPU power scores and a measure of the performance loss.

Principled Technologies WebXPRT4 Load Testing Scores (MS Edge)

The June Canyon's WebXPRT4 scores are well behind that of the Jasper Lake-based units under normal conditions. However, addition of the transcoding workload results in significant loss in performance for the latter set. The June Canyon has limited performance loss, with its active cooling probably allowing it to go the extra mile in the presence of heavy sustained workloads.

The final workload tested as part of the multitasking evaluation routine is CINEBENCH R23.

3D Rendering - CINEBENCH R23 Load Testing - Single Thread Score

 

3D Rendering - CINEBENCH R23 Load Testing - Multiple Thread Score

The June Canyon NUC with its active cooling comes out on top with the transcoding load active.

After the completion of all the workloads, we let the transcoding routine run to completion. The monitored transcoding rate throughout the above evaluation routine (in terms of frames per second) is tabulated below.

VLC Transcoding Rate (Multi-Tasking Test) - Frames per Second
  Enc. Pass #1 PCMark 10 WebXPRT4 Cinebench Enc. Pass #2
ECS LIVA Z3
(Pentium Silver N6000)
0.1541 0.1071 0.1294 0.1476 0.1424
ECS JSLM-MINI
(Pentium Silver N6000)
0.2223 0.1635 0.1635 0.2092 0.2216
ZOTAC ZBOX CI331 nano
(Celeron N5100)
0.3016 0.1943 0.1859 0.2025 0.1864

The transcoding rates drop down with simultaneous loading, as expected. For the JSLM-MINI, the first pass and second pass rates are pretty much equal, pointing to the absence of throttling. However, both the LIVA Z3 and the ZBOX CI331 nano suffer from reduced rates in the second pass - the internal temperatures are high enough for the CPU to  be throttled after extended sustained loading.

GPU Performance HTPC Credentials
Comments Locked

52 Comments

View All Comments

  • xol - Friday, July 8, 2022 - link

    Correction (?)

    Neither of these reviewed products has a Intel UHD Graphics 605 .. (that's a 14nm Gemini part with 18 EU eg here https://ark.intel.com/content/www/us/en/ark/produc...

    .. Intel seems to have not publisher a 'number' for this iGPU and seems to distinguish them by number of EU eg Jasper Lake 24EU eg https://www.intel.co.uk/content/www/uk/en/products...
  • xol - Friday, July 8, 2022 - link

    Somehow messed up the link :

    UHD 605 https://ark.intel.com/content/www/us/en/ark/produc...
  • mode_13h - Friday, July 8, 2022 - link

    Thanks for your coverage of fanless mini-PCs. However, I really wish you'd include something with "big cores", so we can get a sense of the scale of performance difference between them and Tremont.

    Another nice-to-have would be at least a few benchmarks including a Raspberry Pi 4. However, it has serious thermal throttling issues, unless it's actively cooled or you use a substantial passive cooling solution.
  • mode_13h - Friday, July 8, 2022 - link

    I guess the ideal comparison would be a Tiger Lake-based system, since that's the same vintage and similar manufacturing tech as Tremont. Probably much harder to find in a fanless mini-PC, unless we're talking about an industrial PC, but I'd love even to see a comparison between two NUCs: Tiger Lake vs. Tremont.
  • mode_13h - Friday, July 8, 2022 - link

    Or maybe Ice Lake would be even better, but did they make Ice Lake-based NUCs?
  • abufrejoval - Thursday, July 14, 2022 - link

    Yes, Tiger Lake NUCs were made, but also very hard to come by: I have both.

    In a way they are perfect to showcase the benefit of E/P cores …in the case of Intel: AMD is really another story.

    The two NUCs look nearly identical on the outside, but inside they are very different beasts.

    For starters: The Tiger Lake NUC11 (i7-1165G7 with 96EU Xe iGPU) is configured with a 64 Watt PL2, a rather long TAU and even the PL2 is 30 Watts by default, I believe. There is a reason it comes with a 90 Watts power brick! I changed PL2 to 50, TAU to 10 seconds and PL1 to 15 Watts to ensure the fan would never howl they way it does with the defaults.

    I’ve seen HWinfo report a 5GHz maximum clock, but 4.7GHz is the official top speed. It’s at 64 Watts and near 5GHz clocks that I have measured 1707/5808 Geekbench 4 results on Linux (always a bit faster than on Windows). Jasper Lake doesn’t quite play in the same league at 781/2540 using 3.3 GHz and 10 Watts. In Watts/compute power Tiger Lake looks rather worse than Jasper Lake, but when it comes to rendering a complex web page or recalculating a giant Excel sheet, its sprinting power certainly has it appear much faster.

    At 64 Watts the Tiger Lake is a desktop CPU, shoehorned into mobile power envelopes. And when it’s constrained to the levels that passive cooling can manage (see the Supermicro SYS-E100-12T-H review here), it really struggles to deliver that performance. The great thing about the Tiger Lake NUC is that you can change PL1, PL2 and TAU to pretty much anything you want and when you set it to the 10 Watts the Jasper Lake gets to use as an absolute maximum, it starts to do rather badly.

    Some of that is because the iGPU always gets preference, leaving close to nothing to the CPU. But some of that is that the remaining power budget forces very low frequencies, where the big Core CPU loses against the Atom cores running at a full speed with these Watts.

    Jasper Lake, like all the other Atoms since the J1900, never slows down. I’ve never seen it drop below its “Turbo” clock unless idle, even on a mix of Prime95 and Furmark, and I’ve never seen it exceed 10 Watts of combined CPU+GPU power consumption either.

    I also have two Ryzen 5800U based notebooks (1443/7855 on Geekbench4), one of which can be switched between 15 and 28 Watts of TDP. When Tiger Lake and Zen 3 are strictly set to the same power levels, Tiger Lake has to run much slower even with half the cores: Ryzen beats it with a much smaller energy footprint per core. But with Tiger Lake left at the default NUC settings (which a battery powered notebook could not support), its four cores will beat an eight core Zen 3 at 15 Watts in Geekbench, which luckily never seems to exceed TAU.

    Intel needs E/P because P cores need too much power at the clock rates they require to beat a Ryzen core, and only with E cores they can hit the efficiency of Zen cores in fully multi-threaded loads.
  • mode_13h - Thursday, July 14, 2022 - link

    Wow, another awesome post! Thanks for taking the time to relate your findings. Very interesting!

    > the iGPU always gets preference, leaving close to nothing to the CPU.

    Very key point, but also one that Intel could conceivably address, to some extent, in future BIOS updates. Not that they're likely to, if it had been on the market for a while when you tested, but it's conceivable.

    > in Geekbench, which luckily never seems to exceed TAU.

    Another great point! I have never run Geekbench myself, and I haven't noticed reviewers mention this key detail.
  • Foeketijn - Saturday, September 3, 2022 - link

    Don't you want to write for Anand?
  • stanleyipkiss - Friday, July 8, 2022 - link

    Zotac makes a fanless zbox with a 1165G7
  • xol - Friday, July 8, 2022 - link

    Benches I've seen suggest both are very similar in multi to a i3 low power Skylake eg a ie-6100T (2core 4 thread very common thin client chip) - the gfx capability also seems also a close match for the 24EU part [probably a very similar part with improved HEVC support] (the 32EU N6000 should be better)

    For single threaded the old Skylake is ~+50% faster., and from Skylake to Alder Lake it's nearly 2x , so nearly 3x from N5100 to i5-12500 for single thread

    I have an old fanless Atom Z3735F (22nm) and these new SoCs are a impressive step up (~7x both cpu and gpu) -- I think the Pi Model B latest is very roughly 2x better than that nut no where near the 5100T in any metric.

    tldr both benches would have been a wash one way of the other.

Log in

Don't have an account? Sign up now