Contact US

Creating a Corporation

May

 

SITE MAP

Class Training For Pc Repair

A Plus Guide

A plus Os

LAPTOPS

 

X86 Tech

January

February

March

Osi

auto parts

Winserver8

windows7

 

Virtualization

Cloud Computing

Security

Cash fast

B & I SNOWDEN-Find a Wealth of Products and Services

 

Deeply Coupled Computing

October 28, 2004 Document 085-100400-001,

MPACx: A Scalable, Configurable Infrastructure

For Deeply Coupled Computing

Abstract

Modern semiconductor fabrication technology

has made the incremental cost of producing

embedded microprocessors essentially free.

Consequently, uniprocessors are now giving way

to multiprocessor system configurations. For high

performance systems the design challenge has

become leveraging and scaling free processors to

maximize application performance and efficiency.

Deeply coupled processor architectures

combined with System-on-Chip implementation

technologies are enabling new, economically

attractive parallel processor system implementations

as well as opportunities for improved

programming flexibility and system-level

co-development automation. CPU Technology

presents a fourth generation, deeply coupled

multiprocessor architecture, a cost-efficient

System-on-Chip implementation methodology

and software co-development tools to enable

cost-effective implementations of a broad range of

high performance electronic systems.

1. Introduction

Millions of transistors can now be added to electronic

systems for pennies of incremental material

cost, due to continuing reductions in transistor

feature sizes resulting in increased transistor densities,

so incremental processors are essentially free

(Figure 1). The central processing unit, the traditional

core of a computing system, can now be

supported, surrounded or replaced by multiple

processors that share processing loads to increase

overall system performance and flexibility.

Some high-end supercomputers with attractive

price-per-peak-performance ratios are now being

implemented with tens of thousands of clustered    fig1

microprocessors. High performance embedded

systems are being implemented with dozens of

processors on a single chip. Multi-core standard

microprocessors have been introduced that will

soon replace their single-processor predecessors.

Modern semiconductor fabrication processes

combined with current System-on-Chip (SoC)

integration technology are providing a dramatic

breakthrough for parallel computing — the ability to

integrate an entire 64-bit multi-processor node

along with its controllers and associated networking

support circuitry onto a single chip. This integration

capability will produce dramatic price to performance

improvements, with system costs for

sustained performance that trends to below

$10/GFlop, and system power-to-performance

ratios trending to less than 1W/GFlop. Hundreds of

64-bit processors will soon fit onto a PC motherboard.

Like past integration milestones, such as the

single-chip calculator and the single-chip microcontroller,

this transition to the single-chip multiprocessor

node will support disruptive innovations

in high performance system implementation.

Figure 1 – Microprocessor transistor costs trending to zero

Mere scaling and integration of “free” processors

does not guarantee system performance gains.

Systems must be carefully designed to avoid

system-level chokepoints and thereby achieve real

productivity improvements from scaling. Communication

bandwidth imbalances are limiting many of

today’s large clustered computers to sustained

performance levels of less than 10% of peak ratings

for a broad range of applications. More respectable

Mere scaling and integration of “free” processors

does not guarantee system performance gains.

Systems must be carefully designed to avoid

system-level chokepoints and thereby achieve real

productivity improvements from scaling. Communication

bandwidth imbalances are limiting many of

today’s large clustered computers to sustained

performance levels of less than 10% of peak ratings

for a broad range of applications. More respectable

Computer system latencies have actually been

increasing when measured as the number of

processor clock cycles required to access off-chip

memory. To cope with system imbalances, today’s

most popular microprocessor architectures carry

increased complexity and cost from every conceivable

effort having been made to maximize processor

activity while awaiting the arrival of remotely stored

information. Balanced scaling of processor performance,

storage bandwidth and communication

bandwidth enables parallel systems to achieve

productivity gains from free processor resources

and become broadly adopted.

Deeply coupled computing architectures provide

the means for effectively scaling and productively

utilizing parallel processing systems across a

broad range of workloads and applications. These

deeply coupled architectures enable complete

inter-processor interaction and synchronization to

occur at intervals ranging from instantaneous

on-chip to less than one microsecond for

massively-scaled systems, and enable workloads

to be effectively shared across large arrays

of processors.

High bandwidth, low latency inter-processor interactions

allow fine grain parallelism to be exploited,

providing the means to extract parallelism from

seemingly sequential code. Application software

that is performance-constrained due to insufficient

computational or processor-to-memory bandwidth

can be made to run faster using deeply coupled architectures

and fine-grained parallel programming

techniques.

Parallel processor system development has been

ongoing for more than thirty years with many

impressive technical achievements but relatively

few commercial successes. Widespread adoption

has been limited by the lack of high productivity

software to run on parallel machines.

Many mainstream software applications have been

incorrectly viewed to be sequential code, and inherently

not “naturally parallelizable”.

Programming methods have changed little over the

past thirty years. A root problem is the lack of underlying

parallel compiler and mapping tools needed to

automatically extract parallelism from application

source code and distribute workloads efficiently

across parallel processors. The relatively few

parallel software codes in use today have resulted

from heroic manual programming efforts.

The time is ripe for investment and advancement in

the parallel computing market. New deeply coupled

processor architectures, multiprocessor interconnect

technologies and methods of parallel programming

and optimization are all feasible or available.

The door is again open for computer architects to

innovate beyond the microprocessor generation, and

for companies to capitalize on a myriad of opportunities

offered by modern semiconductor process

economics.

Disruptive innovation in parallel computing can

result from successfully blending multiple competencies.

First is the ability to develop configurable

system and processor architectures that scale in a

balanced, cost-effective and power-efficient

manner. Second is the ability to develop large,

complex, fault-tolerant System-on-Chip designs in

nanometer era processes in acceptable design cycles

with minimal development cost and risk. Third is the

ability to provide compatible pathways for existing

long-lived applications to run on new machine architectures.

Fourth is the ability to deliver innovative

programming technologies including automated

code partitioning, system mapping, scheduling and

performance monitoring to a new generation of

parallel programmers.

CPU Technology, Inc. (CPU TECH) has assembled

the team and technologies to meet these challenges

and opportunities for widespread adoption of

System-on-Chip-based systems in the parallel

computing era. Emerging solutions and underlying

technologies are described in the following sections

that apply to a broad range of systems, from

single-chip high-performance embedded systems

to world-class, massively-scaled supercomputers.

Opportunities exist for partnering in and benefiting

from the marriage of these tools, technologies and

products.

3. Processor Options

Scalar processor core integration options for the

MPACx processing unit include those developed by

a customer, CPU TECH or third party IP suppliers

such as ARM, MIPS or PowerPC cores. The scalar

processor executes instructions, communicates, and

arranges data elements for the application-specific

processor engine(s).

End system requirements influence the choice of

processor cores. For customers with investments in

their own processor development, CPU Technology

can quickly capture, model, simulate, verify and

integrate this content into a System-on-Chip design.

For high performance systems, special processor

architectures may be required that can be optimized

for deep coupling, multi-processor scalability,

specific system performance, environmental or

physical requirements such as power, size or weight.

CPU TECH develops unique processor cores for

these applications and can customize them to meet

extreme system requirements.

CPU Technology’s scalar processor cores are

designed for highly reliable operation, as required

within critical real-time embedded systems and

high performance computers, using anti-tamper,

fault detection/correction and fault-tolerant system

design practices. CPU TECH processor cores

execute a range of compatible instruction sets as well

as custom application-optimized instruction sets.

These cores emanate from the company’s processor

architecture design expertise, design and verification

processes, automated tools and unique experience

in designing software-compatible systems.

CPU TECH’s processor cores provide system and

software designers with unprecedented visibility

into and control of the architectural features

affecting system performance. Using CFrame,

CPU TECH’s configurable processor framework,

processor structures and parameters can be readily

modified and verified. Using MPACx, CPU TECH’s

processor cores can be deeply coupled to provide

ultra-low-latency interaction between execution

pipelines of co-located processor cores. CPU TECH

processor core options range from 16 and 32-bit

cores to the true 64-bit QuintilliumTM core.

CPU TECH processor cores can expose hardware

functions to the programmer for optimal parallel

software productivity.

If binary software backward-compatibility is

required, and other system design goals can be

readily met, then industry-standard processors may

be used. Licensable ARM, MIPS and PowerPC cores

are options that come “as is”, therefore changes to

their internal architecture are restricted. External

standard microprocessors from AMD, IBM, Intel

and others can also be used as processor units interconnected

by MPACx.

4. System-on-Chip Design Expertise

The electronics industry has seen four major eras of

computer technology: the ENIAC era, the mainframe

era, the minicomputer era and the microprocessor

era. Each era has been characterized by the

progressive capabilities of its underlying technology,

namely tubes, transistors, integrated circuits

and very large-scale integration (VLSI). Each new

underlying technology has brought order-of-magnitude

improvements in performance, reliability,

power consumption, level of integration and cost.

Resulting breakthroughs in price/performance have

opened new applications and new markets previously

untouched by electronic computing systems.

A new period of computer technology is emerging

beyond the microprocessor called the parallel

processor era, with System-on-Chip (SoC) integration

as its enabling technology. SoC integration technology

brings forward complex system development

processes, practices and techniques developed and

refined over the past fifty years into the field of

integrated circuit design. As in past generational

transitions, SoC promises order-of-magnitude

improvements over microprocessor era technology

in size, weight and power, with increased performance

and reliability. The parallel processor era will

open new markets and foster many new products

and applications.

Over the past fifteen years, CPU Technology has

developed a suite of proprietary system-level design

automation tools called SystemLabTM that enable

highly productive and effective System-on-Chip

design. Using SystemLab, CPU TECH engineers

can rapidly architect, model, simulate, analyze and

debug complex systems in a virtual environment,

speeding the architecture and design processes and

accelerating the performance of a wide range of

system trade-off analyses to increase the quality of

the resulting system design.

CPU Technology has successfully delivered highly

integrated SoC-based systems to a variety of

customers and end applications, and has received

ISO 9001:2000 registration for its system development

processes.

5. Parallel Developer Tool Chain

Programming multiple processors effectively is

significantly more complex than equivalent

programming for a single microprocessor. To help

developers effectively utilize and manage the

available processing resources of an MPACx array,

CPU TECH provides popular language compilers,

loaders, launchers, powerful debuggers with

multiple processor utilities and a robust graphical

user interface. These tool chain elements are tailored

to target system configurations, and are provided to

our SoC-based system customers.

CPU Technology provides programmer tool chain

support for several prominent parallel programming

methodologies including:

Explicit message passing

Compilation with embedded directives

Explicit thread programming

Compilation with automatic parallelization

Explicit process parallelization using

dataflow or assembly line representations

Figure 5 – SystemLab Multiprocessor Performance Monitor

for software load balancing

Message passing and barrier synchronization

are well supported by the MPACx hardware

architecture, providing a compatible pathway for

accelerating existing parallel applications that

incorporate the MPI standard.

In conjunction with on-chip Real-Time Non-intrusive

Instrumentation (RTNI), CPU TECH’s

parallel developers’ tool chain can be used to evaluate,

debug and tune parallel code on target hardware.

Full multiprocessor register and bus

visibility, breakpoint, trap, single step and trace

capabilities and a range of performance monitoring

tools enhance developer productivity.

CPU Tech is fully committed to advancing the

state-of-the-art in parallel application authoring

tools and programming technologies for multiprocessor

systems. Our development teams are

breaking new ground in parallel compilation,

parallel processor system analysis and the overall

acceleration of parallel system design processes.

6. Summary

In the world of computer science, entry into the 21st

century has been accompanied by the confluence of

multiple technologies, all available to serve as

primary elements for producing more powerful electronic

systems from essentially free parallel

processing resources. These capabilities support the

development of computing systems containing

hundreds of thousands, even millions of internal

processing nodes with large distributed, shared

memory and the ability to perform processor interactions

and synchronization with minimal latency.

It is the confluence of these interdisciplinary technologies,

and the capability to cost-effectively

combine them in high performance systems development

that is truly significant, synergistic and

disruptive to current paradigms. The combination of

these technologies provides high performance

computer architects and developers with the ability

to innovate again beyond the microprocessor era,

into the era of parallel processing.

CPU Technology invites the participation of

academia, government and industry to pursue

rapidly expanding opportunities for innovation and

order-of-magnitude gains in general purpose and

embedded computing systems. These capabilities

can be applied to a broad range of scientific, defense,

business and industrial domains for the improvement

of productivity, global competitiveness,

security and quality of life. New applications,

products and systems will emerge as imagination is

applied to technology. High performance computing

technology can now be advanced through more

efficient scaling, and be more broadly applied to

mass market products and mainstream applications.

 

 

Itanium Processor

need other links go to SITE MAP

home B & I SNOWDEN-Find a Wealth of Products and Services

   Bisnowden,3330 Adeline st. Berkeley,Ca94703 or send to bisnowden@yahoo.com Tele 510-595-1332
send mail to bisnowden@yahoo.com with questions or comments
  about this web site.

Last modified: July 07, 2011