Back to Basics
The fundamental task of a processor is to manage the flow of data through its computational units. However in the past two decades, each successive generation of processors for personal computers has added more transistors dedicated to increasing the performance of spaghetti-like integer code. For example, it is well known that typical integer codes are branchy and that branch mispredict penalties are expensive; in an effort to minimize the impact of branch instructions, transistors were used to develop highly accurate branch predictors. Aside from branch predictors, sophisticated cache hierarchies with large tag arrays and predictive cache prefetch units attempt to hide the complexity of data movement from the software, and further increase the performance of single threaded applications. The pursuit of single threaded performance can be observed in recent years in the proposal of extraordinarily deeply pipelined processors designed primarily to increase the performance of single threaded applications, at the cost of higher power consumption and larger transistor budgets.
The fundamental idea of the CELL processor project is to reverse this trend and give up the pursuit of single threaded performance, in favor of allocating additional hardware resources to perform parallel computations. That is, minimal resources are devoted toward the execution of single threaded workloads, so that multiple DSP-like processing elements can be added to perform more parallelizable multimedia-type computations. In the examination of the first implementation of the CELL processor, the theme of the shift in focus from the pursuit of single threaded integer performance to the pursuit of multiply threaded, easily parallelizable multimedia-type performance is repeated throughout.
Read more
Here