Chang Hyun Park and David Black-Schaffer (Uppsala University)


Greetings everyone!

This week we have the pleasure to welcome Chang Hyun Park and David Black-Schaffer for a special team seminar with not one but two engaging presentations!

They will tell us about the research they develop in themes related to virtual memory and instruction scheduling.

When

Thursday, May 5 @ 9 am

Where

room Ada Lovelace or online (more info at the end of this message)

Speaker 1

Chang Hyun Park, Assistant Professor @ Uppsala Architecture Research Team, Uppsala University, Sweden

Title

Every Walk’s a Hit: Making Page Walks Single-Access Cache Hits

Abstract

As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically “flattening” the page table: merging two levels of traditional 4 KB page table nodes into a single 2 MB node, thereby reducing the table’s depth and the number of indirections required to traverse it. The second is accomplished by biasing the cache replacement algorithm to keep page table entries during periods of high TLB miss rates, as these periods also see high data miss rates and are therefore more likely to benefit from having the smaller page table in the cache than to suffer from increased data cache misses.

Speaker 2

David Black-Schaffer, Professor @ Uppsala Architecture Research Team, Uppsala University, Sweden

Title

Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors

Abstract

Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy. In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy. Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).