Omer Khan, an assistant professor of electrical and computer engineering, is part of a team of researchers designing a chip that has proved faster and more efficient in simulations.
Khan and his fellow researchers have developed a computer chip that can store more information and work faster than the industry standard. George Kurian and Srinivas Devadas, both of MIT, were also part of the team.
The new design, according to the researchers, is 15 percent faster and 25 percent more energy efficient. They did this by reconsidering the cache, which stores data to increase access to it.
Currently, caches are arranged hierarchically – that is, according to how quickly each can be accessed. Chips each contain multiple processors (or cores), and each one gets its own cache. An additional cache (the last level cache, or LLC) is there to handle overcapacity. Information that the core has already requested from the main memory – as well as information stored near it – ends up getting stored in the core’s private cache to make it more accessible for subsequent requests. If this information doesn’t get requested again anytime soon, it gets shifted to the LLC and then back to the main memory.
The system usually works well, but not always. If requested information exceeds the capacity of a private cache, then the chip expends a lot of time and energy finding space for the information. With the chip that Khan and his fellow researchers have designed, excess information is split between the private cache and the LLC. Both caches would retain the information, eliminating the need to search for space.
The design has drawn interest among computer enthusiasts, and recently received attention in Ars Technica, the popular technology news website. Yogi Patel of Ars Technica writes:
When the data being stored exceeds the capacity of the core’s private cache, the chip [designed by Khan, Kurian and Devadas] splits up the data between private cache and the LLC. This ensures that the data is stored where it can be accessed more quickly than if it were in the main memory.
Another case addressed by the new work occurs when two cores are working on the same data and are constantly synchronizing their cached version. Here, the technique eliminates the synchronization operation and simply stores the shared data at a single location in the LLC. Then the cores take turns accessing the data, rather than clogging the on-chip network with synchronization operations.