Renesas Technology Develops Massively Parallel Processor Based on Matrix Architecture; Low-Power Processor Achieving 40 GOPS Performance at 200 MHz Operation.TOKYO -- Renesas Technology Corp. today announced the development of a massively parallel processor based on a matrix architecture suitable for image and audio multimedia data processing. This innovatively configured processor is a massively parallel programmable device*1 featuring tight coupling of 2,048 processing elements and 1Mbit SRAM See static RAM. SRAM - static random-access memory , and has been confirmed to achieve 40 GOPS (Giga [billion] Operations Per Second) The measurement of instructional performance of a chip or system. It typically refers to DSP operations. See MOPS. (giga operations per second) performance at a 200 MHz (MegaHertZ) One million cycles per second. It is used to measure the transmission speed of electronic devices, including channels, buses and the computer's internal clock. A one-megahertz clock (1 MHz) means some number of bits (16, 32, 64, etc. clock frequency. Renesas Technology researchers unveiled details at the 2006 IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields. International Solid-State Circuits Conference International Solid-State Circuits Conference is a global forum for presentation of advances in solid-state circuits and Systems-on-a-Chip. The Conference offers a unique opportunity for engineers working at the cutting edge of IC design to maintain technical currency, and to (ISSCC ISSCC International Solid State Circuits Conference ISSCC International Student Services Center Corporation Limited ) being held in San Francisco from February 5. The image and audio multimedia data processing capability is essential for digital home appliances and other electronics, and involves a combination of complex operations such as fast Fourier transform See FFT. (algorithm) Fast Fourier Transform - (FFT) An algorithm for computing the Fourier transform of a set of discrete data values. Given a finite set of data points, for example a periodic sampling taken from a real-world signal, the FFT expresses the data in terms of , convolution convolution /con·vo·lu·tion/ (-loo´shun) a tortuous irregularity or elevation caused by the infolding of a structure upon itself. , and sum of absolute difference operations. Up to now, processing of these operations has generally used hard-wired logic circuits or a DSP (1) (Digital Signal Processor) A special-purpose CPU used for digital signal processing applications (see definition #2 below). It provides ultra-fast instruction sequences, such as shift and add, and multiply and add, which are commonly used in math-intensive (digital signal processor A digital signal processor (DSP) is a specialized microprocessor designed specifically for digital signal processing, generally in real-time computing. Characteristics of typical Digital Signal Processors
Digital Signal Processing - (DSP) Computer manipulation of analog signals (commonly sound or image) which have been converted to digital form (sampled). . However, recent dramatic advances in multimedia applications such as the rapid increase in pixel counts in image applications have increased demands for major improvements in multimedia data processing performance. At the same time, there is a growing demand for such processing to be implemented by means of programmable devices in order to simplify support for various multimedia data standards. One way of improving processing performance is to increase the operating frequency through the use of finer semiconductor processes. However, it will be difficult to continue to gain major improvements in performance while maintaining lower power consumption, and to achieve the required levels of performance with conventional DSP and similar architectures. Meanwhile, a coarse-grained MIMD (Multiple Instruction stream Multiple Data stream) A computer that can process two or more independent sets of instructions simultaneously on two or more sets of data. Computers with multiple CPUs or single CPUs with dual cores are examples of MIMD architecture. (multiple instruction multiple data Multiple Instruction Multiple Data - Multiple Instruction/Multiple Data ) processor has been announced as an architecture that increases processing performance, but this also has issues with reducing power consumption. To solve these issues, Renesas Technology has developed a matrix type processor based on a different memory technology from that of a DSP or MIMD type processor. This new processor is a fine-grained SIMD (Single Instruction stream Multiple Data stream) A computer that performs one operation on multiple sets of data. It is typically used to add or multiply eight or more sets of numbers at the same time for multimedia encoding and rendering as well as scientific (single instruction multiple data) type massively parallel programmable device, featuring the following structural characteristics.
1. Basic configuration: 2-bit processing elements (PE) and 512-bit
SRAM assigned as data registers
2. 2,048 PEs and a total of 1 Mbit SRAM, together with tight
coupling between Pes
The key to the increased performance of this processor lies in how efficiently the individual processing elements are operated. Also, the layout and connection of the processing elements and data registers are important factors in achieving reductions in area and power consumption. These issues are handled by means of the following techniques.
(1) Connection between data registers and PEs, and interconnection
between PEs
1. H channel (Horizontal Channel) connecting PEs
This is a connection path for performing data transfer
between a processing element and data register, comprising a
basic path for operations. Data transfers are performed in
one clock cycle without mutual interference.
2. V channel (Vertical Channel) interconnecting PEs
This is a connection path for performing data transfer
between PEs. A V channel can perform parallel data transfer
between PEs at a fixed distance, and this transfer path
enables essential butterfly computations*2 to be processed
efficiently with digital signal processing operations.
Both the H channel and V channel achieve a high transfer
speed of 816 Gbps (giga bit per second) at 200 MHz operation.
(2) PE circuit configuration
A problem with a standard SIMD processor is its inability to
perform conditional jumps. This processor employs a special
technique in the 2-bit PE circuit configuration. Each PE has a
1-bit register called a V flag (valid flag), and selects whether
or not an H channel or V channel data transfer, or a PE operation
itself, is to be executed. By this means, a conditional jump can
be performed every clock cycle, greatly helping to speed up
butterfly computations.
(3) Two-bank composition, read-modify-write operation SRAM circuitry
A PE basically has two inputs and one output. Therefore, a
3-port data register is necessary to operate a PE continuously,
but the following configuration is used in order to implement
this with single-port SRAM.
1. The SRAM consists of 2 banks. Two input data are read from
these 2 banks respectively.
2. Output data is simultaneously overwritten on the data used
in a read, with this overwriting performed by means of a
memory read-modify-write operation.
As a result, the sequence from reading to computation, and
then to writing, can be completed in one clock cycle, and
data registers that are small in area have been realized.
Results A prototype processor using the new technology was implemented in 90 nm CMOS (Complementary Metal Oxide Semiconductor) Pronounced "c-moss." The most widely used integrated circuit design. It is found in almost every electronic product from handheld devices to mainframes. with a core area of 3.1 mm2, and achieved processing performance of 40 GOPS at a 200 MHz clock frequency and 250 mW power dissipation. These metrics show approximately 70 and 13 times better energy efficiency in terms of unit area ratio and unit power ratio, respectively, compared to a conventional in-house DSP. About Renesas Technology Corp. Renesas Technology Corp. is one of the world's leading semiconductor system solutions providers for mobile, automotive and PC/AV (Audio Visual) markets and the world's No.1 supplier of microcontrollers. It is also a leading provider of LCD Driver ICs, Smart Card microcontrollers, RF-ICs, High Power Amplifiers, Mixed Signal ICs, System-on-Chip (SoC), System-in-Package (SiP) and more. Established in 2003 as a joint venture between Hitachi, Ltd. (TOKYO:6501) (NYSE NYSE See: New York Stock Exchange :HIT) and Mitsubishi Electric Corporation (TOKYO:6503), Renesas Technology achieved consolidated revenue of 1002.4 billion JPY JPY In currencies, this is the abbreviation for the Japanese Yen. Notes: The currency market, also known as the Foreign Exchange market, is the largest financial market in the world, with a daily average volume of over US $1 trillion. in FY2004 (end of March 2005). Renesas Technology is based in Tokyo, Japan and has a global network of manufacturing, design and sales operations in around 20 countries with about 26,000 employees worldwide. For further information, please visit http://www.renesas.com
Notes: 1. Programmable device: A generic term for a processor
whose circuit configuration can be changed
2. Butterfly computation: A computation method used in fast
Fourier transforms and the like. So called because
computation inputs undergo cross-multiplication, presenting
a butterfly-like appearance.
|
|
||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion