|
B & I SNOWDEN-Find a Wealth of Products and Services
|
Deeply Coupled Computing October 28, 2004 Document 085-100400-001, MPACx: A Scalable, Configurable Infrastructure For Deeply Coupled Computing Abstract Modern semiconductor fabrication technology has made the incremental cost of producing embedded microprocessors essentially free. Consequently, uniprocessors are now giving way to multiprocessor system configurations. For high performance systems the design challenge has become leveraging and scaling free processors to maximize application performance and efficiency. Deeply coupled processor architectures combined with System-on-Chip implementation technologies are enabling new, economically attractive parallel processor system implementations as well as opportunities for improved programming flexibility and system-level co-development automation. CPU Technology presents a fourth generation, deeply coupled multiprocessor architecture, a cost-efficient System-on-Chip implementation methodology and software co-development tools to enable cost-effective implementations of a broad range of high performance electronic systems. 1. Introduction Millions of transistors can now be added to electronic systems for pennies of incremental material cost, due to continuing reductions in transistor feature sizes resulting in increased transistor densities, so incremental processors are essentially free (Figure 1). The central processing unit, the traditional core of a computing system, can now be supported, surrounded or replaced by multiple processors that share processing loads to increase overall system performance and flexibility. Some high-end supercomputers with attractive price-per-peak-performance ratios are now being implemented with tens of thousands of clustered
microprocessors. High performance embedded systems are being implemented with dozens of processors on a single chip. Multi-core standard microprocessors have been introduced that will soon replace their single-processor predecessors. Modern semiconductor fabrication processes combined with current System-on-Chip (SoC) integration technology are providing a dramatic breakthrough for parallel computing — the ability to integrate an entire 64-bit multi-processor node along with its controllers and associated networking support circuitry onto a single chip. This integration capability will produce dramatic price to performance improvements, with system costs for sustained performance that trends to below $10/GFlop, and system power-to-performance ratios trending to less than 1W/GFlop. Hundreds of 64-bit processors will soon fit onto a PC motherboard. Like past integration milestones, such as the single-chip calculator and the single-chip microcontroller, this transition to the single-chip multiprocessor node will support disruptive innovations in high performance system implementation. Figure 1 – Microprocessor transistor costs trending to zero Mere scaling and integration of “free” processors does not guarantee system performance gains. Systems must be carefully designed to avoid system-level chokepoints and thereby achieve real productivity improvements from scaling. Communication bandwidth imbalances are limiting many of today’s large clustered computers to sustained performance levels of less than 10% of peak ratings for a broad range of applications. More respectable Mere scaling and integration of “free” processors does not guarantee system performance gains. Systems must be carefully designed to avoid system-level chokepoints and thereby achieve real productivity improvements from scaling. Communication bandwidth imbalances are limiting many of today’s large clustered computers to sustained performance levels of less than 10% of peak ratings for a broad range of applications. More respectable Computer system latencies have actually been increasing when measured as the number of processor clock cycles required to access off-chip memory. To cope with system imbalances, today’s most popular microprocessor architectures carry increased complexity and cost from every conceivable effort having been made to maximize processor activity while awaiting the arrival of remotely stored information. Balanced scaling of processor performance, storage bandwidth and communication bandwidth enables parallel systems to achieve productivity gains from free processor resources and become broadly adopted. Deeply coupled computing architectures providethe means for effectively scaling and productively utilizing parallel processing systems across a broad range of workloads and applications. These deeply coupled architectures enable complete inter-processor interaction and synchronization to occur at intervals ranging from instantaneous on-chip to less than one microsecond for massively-scaled systems, and enable workloads to be effectively shared across large arrays of processors. High bandwidth, low latency inter-processor interactions allow fine grain parallelism to be exploited, providing the means to extract parallelism from seemingly sequential code. Application software that is performance-constrained due to insufficient computational or processor-to-memory bandwidth can be made to run faster using deeply coupled architectures and fine-grained parallel programming techniques. Parallel processor system development has been ongoing for more than thirty years with many impressive technical achievements but relatively few commercial successes. Widespread adoption has been limited by the lack of high productivity software to run on parallel machines. Many mainstream software applications have been incorrectly viewed to be sequential code, and inherently not “naturally parallelizable”. Programming methods have changed little over the past thirty years. A root problem is the lack of underlying parallel compiler and mapping tools needed to automatically extract parallelism from application source code and distribute workloads efficiently across parallel processors. The relatively few parallel software codes in use today have resulted from heroic manual programming efforts. The time is ripe for investment and advancement in the parallel computing market. New deeply coupled processor architectures, multiprocessor interconnect technologies and methods of parallel programming and optimization are all feasible or available. The door is again open for computer architects to innovate beyond the microprocessor generation, and for companies to capitalize on a myriad of opportunities offered by modern semiconductor process economics. Disruptive innovation in parallel computing can result from successfully blending multiple competencies. First is the ability to develop configurable system and processor architectures that scale in a balanced, cost-effective and power-efficient manner. Second is the ability to develop large, complex, fault-tolerant System-on-Chip designs in nanometer era processes in acceptable design cycles with minimal development cost and risk. Third is the ability to provide compatible pathways for existing long-lived applications to run on new machine architectures. Fourth is the ability to deliver innovative programming technologies including automated code partitioning, system mapping, scheduling and performance monitoring to a new generation of parallel programmers. CPU Technology, Inc. (CPU TECH) has assembled the team and technologies to meet these challenges and opportunities for widespread adoption of System-on-Chip-based systems in the parallel computing era. Emerging solutions and underlying technologies are described in the following sections that apply to a broad range of systems, from single-chip high-performance embedded systems to world-class, massively-scaled supercomputers. Opportunities exist for partnering in and benefiting from the marriage of these tools, technologies and products. 3. Processor Options Scalar processor core integration options for the MPACx processing unit include those developed by a customer, CPU TECH or third party IP suppliers such as ARM, MIPS or PowerPC cores. The scalar processor executes instructions, communicates, and arranges data elements for the application-specific processor engine(s). End system requirements influence the choice of processor cores. For customers with investments in their own processor development, CPU Technology can quickly capture, model, simulate, verify and integrate this content into a System-on-Chip design. For high performance systems, special processor architectures may be required that can be optimized for deep coupling, multi-processor scalability, specific system performance, environmental or physical requirements such as power, size or weight. CPU TECH develops unique processor cores for these applications and can customize them to meet extreme system requirements. CPU Technology’s scalar processor cores are designed for highly reliable operation, as required within critical real-time embedded systems and high performance computers, using anti-tamper, fault detection/correction and fault-tolerant system design practices. CPU TECH processor cores execute a range of compatible instruction sets as well as custom application-optimized instruction sets. These cores emanate from the company’s processor architecture design expertise, design and verification processes, automated tools and unique experience in designing software-compatible systems. CPU TECH’s processor cores provide system and software designers with unprecedented visibility into and control of the architectural features affecting system performance. Using CFrame, CPU TECH’s configurable processor framework, processor structures and parameters can be readily modified and verified. Using MPACx, CPU TECH’s processor cores can be deeply coupled to provide ultra-low-latency interaction between execution pipelines of co-located processor cores. CPU TECH processor core options range from 16 and 32-bit cores to the true 64-bit QuintilliumTM core. CPU TECH processor cores can expose hardware functions to the programmer for optimal parallel software productivity. If binary software backward-compatibility is required, and other system design goals can be readily met, then industry-standard processors may be used. Licensable ARM, MIPS and PowerPC cores are options that come “as is”, therefore changes to their internal architecture are restricted. External standard microprocessors from AMD, IBM, Intel and others can also be used as processor units interconnected by MPACx. 4. System-on-Chip Design Expertise The electronics industry has seen four major eras of computer technology: the ENIAC era, the mainframe era, the minicomputer era and the microprocessor era. Each era has been characterized by the progressive capabilities of its underlying technology, namely tubes, transistors, integrated circuits and very large-scale integration (VLSI). Each new underlying technology has brought order-of-magnitude improvements in performance, reliability, power consumption, level of integration and cost. Resulting breakthroughs in price/performance have opened new applications and new markets previously untouched by electronic computing systems. A new period of computer technology is emerging beyond the microprocessor called the parallel processor era, with System-on-Chip (SoC) integration as its enabling technology. SoC integration technology brings forward complex system development processes, practices and techniques developed and refined over the past fifty years into the field of integrated circuit design. As in past generational transitions, SoC promises order-of-magnitude improvements over microprocessor era technology in size, weight and power, with increased performance and reliability. The parallel processor era will open new markets and foster many new products and applications. Over the past fifteen years, CPU Technology has developed a suite of proprietary system-level design automation tools called SystemLabTM that enable highly productive and effective System-on-Chip design. Using SystemLab, CPU TECH engineers can rapidly architect, model, simulate, analyze and debug complex systems in a virtual environment, speeding the architecture and design processes and accelerating the performance of a wide range of system trade-off analyses to increase the quality of the resulting system design.
CPU Technology has successfully delivered highly integrated SoC-based systems to a variety of customers and end applications, and has received ISO 9001:2000 registration for its system development processes. 5. Parallel Developer Tool Chain Programming multiple processors effectively is significantly more complex than equivalent programming for a single microprocessor. To help developers effectively utilize and manage the available processing resources of an MPACx array, CPU TECH provides popular language compilers, loaders, launchers, powerful debuggers with multiple processor utilities and a robust graphical user interface. These tool chain elements are tailored to target system configurations, and are provided to our SoC-based system customers. CPU Technology provides programmer tool chain support for several prominent parallel programming methodologies including: ❑ Explicit message passing❑ Compilation with embedded directives❑ Explicit thread programming❑ Compilation with automatic parallelization❑ Explicit process parallelization usingdataflow or assembly line representations Figure 5 – SystemLab Multiprocessor Performance Monitor for software load balancing Message passing and barrier synchronization are well supported by the MPACx hardware architecture, providing a compatible pathway for accelerating existing parallel applications that incorporate the MPI standard. In conjunction with on-chip Real-Time Non-intrusive Instrumentation (RTNI), CPU TECH’s parallel developers’ tool chain can be used to evaluate, debug and tune parallel code on target hardware. Full multiprocessor register and bus visibility, breakpoint, trap, single step and trace capabilities and a range of performance monitoring tools enhance developer productivity. CPU Tech is fully committed to advancing the state-of-the-art in parallel application authoring tools and programming technologies for multiprocessor systems. Our development teams are breaking new ground in parallel compilation, parallel processor system analysis and the overall acceleration of parallel system design processes. 6. Summary In the world of computer science, entry into the 21st century has been accompanied by the confluence of multiple technologies, all available to serve as primary elements for producing more powerful electronic systems from essentially free parallel processing resources. These capabilities support the development of computing systems containing hundreds of thousands, even millions of internal processing nodes with large distributed, shared memory and the ability to perform processor interactions and synchronization with minimal latency. It is the confluence of these interdisciplinary technologies, and the capability to cost-effectively combine them in high performance systems development that is truly significant, synergistic and disruptive to current paradigms. The combination of these technologies provides high performance computer architects and developers with the ability to innovate again beyond the microprocessor era, into the era of parallel processing. CPU Technology invites the participation of academia, government and industry to pursue rapidly expanding opportunities for innovation and order-of-magnitude gains in general purpose and embedded computing systems. These capabilities can be applied to a broad range of scientific, defense, business and industrial domains for the improvement of productivity, global competitiveness, security and quality of life. New applications, products and systems will emerge as imagination is applied to technology. High performance computing technology can now be advanced through more efficient scaling, and be more broadly applied to mass market products and mainstream applications.
need other links go to SITE MAP |
Bisnowden,3330 Adeline st. Berkeley,Ca94703 or send to bisnowden@yahoo.com Tele 510-595-1332send mail to
about this web site.
|