Printer Friendly
The Free Library
5,677,251 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Renesas Technology Develops Massively Parallel Processor Based on Matrix Architecture; Low-Power Processor Achieving 40 GOPS Performance at 200 MHz Operation.


TOKYO -- Renesas Technology Corp. today announced the development of a massively parallel processor based on a matrix architecture suitable for image and audio multimedia data processing.

This innovatively configured processor is a massively parallel programmable device*1 featuring tight coupling of 2,048 processing elements and 1Mbit SRAM See static RAM.

SRAM - static random-access memory
, and has been confirmed to achieve 40 GOPS (Giga [billion] Operations Per Second) The measurement of instructional performance of a chip or system. It typically refers to DSP operations. See MOPS.  (giga operations per second) performance at a 200 MHz (MegaHertZ) One million cycles per second. It is used to measure the transmission speed of electronic devices, including channels, buses and the computer's internal clock. A one-megahertz clock (1 MHz) means some number of bits (16, 32, 64, etc.  clock frequency.

Renesas Technology researchers unveiled details at the 2006 IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields.  International Solid-State Circuits Conference International Solid-State Circuits Conference is a global forum for presentation of advances in solid-state circuits and Systems-on-a-Chip. The Conference offers a unique opportunity for engineers working at the cutting edge of IC design to maintain technical currency, and to  (ISSCC ISSCC International Solid State Circuits Conference
ISSCC International Student Services Center Corporation Limited
) being held in San Francisco from February 5.

The image and audio multimedia data processing capability is essential for digital home appliances and other electronics, and involves a combination of complex operations such as fast Fourier transform See FFT.

(algorithm) Fast Fourier Transform - (FFT) An algorithm for computing the Fourier transform of a set of discrete data values. Given a finite set of data points, for example a periodic sampling taken from a real-world signal, the FFT expresses the data in terms of
, convolution convolution /con·vo·lu·tion/ (-loo´shun) a tortuous irregularity or elevation caused by the infolding of a structure upon itself. , and sum of absolute difference operations. Up to now, processing of these operations has generally used hard-wired logic circuits or a DSP (1) (Digital Signal Processor) A special-purpose CPU used for digital signal processing applications (see definition #2 below). It provides ultra-fast instruction sequences, such as shift and add, and multiply and add, which are commonly used in math-intensive  (digital signal processor A digital signal processor (DSP) is a specialized microprocessor designed specifically for digital signal processing, generally in real-time computing. Characteristics of typical Digital Signal Processors
  • Designed for real-time processing
) specialized for digital signal processing See DSP.

Digital Signal Processing - (DSP) Computer manipulation of analog signals (commonly sound or image) which have been converted to digital form (sampled).
. However, recent dramatic advances in multimedia applications such as the rapid increase in pixel counts in image applications have increased demands for major improvements in multimedia data processing performance. At the same time, there is a growing demand for such processing to be implemented by means of programmable devices in order to simplify support for various multimedia data standards.

One way of improving processing performance is to increase the operating frequency through the use of finer semiconductor processes. However, it will be difficult to continue to gain major improvements in performance while maintaining lower power consumption, and to achieve the required levels of performance with conventional DSP and similar architectures. Meanwhile, a coarse-grained MIMD (Multiple Instruction stream Multiple Data stream) A computer that can process two or more independent sets of instructions simultaneously on two or more sets of data. Computers with multiple CPUs or single CPUs with dual cores are examples of MIMD architecture.  (multiple instruction multiple data Multiple Instruction Multiple Data - Multiple Instruction/Multiple Data ) processor has been announced as an architecture that increases processing performance, but this also has issues with reducing power consumption.

To solve these issues, Renesas Technology has developed a matrix type processor based on a different memory technology from that of a DSP or MIMD type processor.

This new processor is a fine-grained SIMD (Single Instruction stream Multiple Data stream) A computer that performs one operation on multiple sets of data. It is typically used to add or multiply eight or more sets of numbers at the same time for multimedia encoding and rendering as well as scientific  (single instruction multiple data) type massively parallel programmable device, featuring the following structural characteristics.
1. Basic configuration: 2-bit processing elements (PE) and 512-bit
      SRAM assigned as data registers

   2. 2,048 PEs and a total of 1 Mbit SRAM, together with tight
      coupling between Pes


The key to the increased performance of this processor lies in how efficiently the individual processing elements are operated. Also, the layout and connection of the processing elements and data registers are important factors in achieving reductions in area and power consumption.

These issues are handled by means of the following techniques.
(1)  Connection between data registers and PEs, and interconnection
     between PEs

     1.  H channel (Horizontal Channel) connecting PEs

         This is a connection path for performing data transfer
         between a processing element and data register, comprising a
         basic path for operations. Data transfers are performed in
         one clock cycle without mutual interference.

     2.  V channel (Vertical Channel) interconnecting PEs

         This is a connection path for performing data transfer
         between PEs. A V channel can perform parallel data transfer
         between PEs at a fixed distance, and this transfer path
         enables essential butterfly computations*2 to be processed
         efficiently with digital signal processing operations.

         Both the H channel and V channel achieve a high transfer
         speed of 816 Gbps (giga bit per second) at 200 MHz operation.

(2)  PE circuit configuration

     A problem with a standard SIMD processor is its inability to
     perform conditional jumps. This processor employs a special
     technique in the 2-bit PE circuit configuration.  Each PE has a
     1-bit register called a V flag (valid flag), and selects whether
     or not an H channel or V channel data transfer, or a PE operation
     itself, is to be executed. By this means, a conditional jump can
     be performed every clock cycle, greatly helping to speed up
     butterfly computations.

(3)  Two-bank composition, read-modify-write operation SRAM circuitry

     A PE basically has two inputs and one output. Therefore, a
     3-port data register is necessary to operate a PE continuously,
     but the following configuration is used in order to implement
     this with single-port SRAM.

     1.  The SRAM consists of 2 banks. Two input data are read from
         these 2 banks respectively.

     2.  Output data is simultaneously overwritten on the data used
         in a read, with this overwriting performed by means of a
         memory read-modify-write operation.

         As a result, the sequence from reading to computation, and
         then to writing, can be completed in one clock cycle, and
         data registers that are small in area have been realized.


Results

A prototype processor using the new technology was implemented in 90 nm CMOS (Complementary Metal Oxide Semiconductor) Pronounced "c-moss." The most widely used integrated circuit design. It is found in almost every electronic product from handheld devices to mainframes.  with a core area of 3.1 mm2, and achieved processing performance of 40 GOPS at a 200 MHz clock frequency and 250 mW power dissipation. These metrics show approximately 70 and 13 times better energy efficiency in terms of unit area ratio and unit power ratio, respectively, compared to a conventional in-house DSP.

About Renesas Technology Corp.

Renesas Technology Corp. is one of the world's leading semiconductor system solutions providers for mobile, automotive and PC/AV (Audio Visual) markets and the world's No.1 supplier of microcontrollers. It is also a leading provider of LCD Driver ICs, Smart Card microcontrollers, RF-ICs, High Power Amplifiers, Mixed Signal ICs, System-on-Chip (SoC), System-in-Package (SiP) and more. Established in 2003 as a joint venture between Hitachi, Ltd. (TOKYO:6501) (NYSE NYSE

See: New York Stock Exchange
:HIT) and Mitsubishi Electric Corporation (TOKYO:6503), Renesas Technology achieved consolidated revenue of 1002.4 billion JPY JPY

In currencies, this is the abbreviation for the Japanese Yen.

Notes:
The currency market, also known as the Foreign Exchange market, is the largest financial market in the world, with a daily average volume of over US $1 trillion.
 in FY2004 (end of March 2005). Renesas Technology is based in Tokyo, Japan and has a global network of manufacturing, design and sales operations in around 20 countries with about 26,000 employees worldwide. For further information, please visit http://www.renesas.com
Notes: 1. Programmable device: A generic term for a processor
          whose circuit configuration can be changed
       2. Butterfly computation: A computation method used in fast
          Fourier transforms and the like. So called because
          computation inputs undergo cross-multiplication, presenting
          a butterfly-like appearance.
COPYRIGHT 2006 Business Wire
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2006, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Publication:Business Wire
Geographic Code:9JAPA
Date:Feb 9, 2006
Words:981
Previous Article:APAC Customer Services, Inc. Reports Continued Progress on Business Turnaround; Quarterly Growth in Customer Care Revenues; Quarterly Improving Gross...
Next Article:It is Predicted That the Output for Circuit & Heat Protection Components in China Will Increase to 21% This Year.



Related Articles
Record speedups for parallel processing.
Silicon Magic Introduces DVine -- Industry's First Chip-Multi-Processor Architecture With Embedded DRAM for Consumer Electronics Products.
U.S. ARMY SUPERCOMPUTER CENTER FIRST DEFENSE DEPARTMENT SITE TO INSTALL 512-PROCESSOR SGI ORIGIN 3000 SERIES SERVER.(Product Information)
Equator Technologies Expands Its MAP-BSP Family of Powerful Digital Signal Processors, Revolutionizing Imaging and Broadband Solutions.
PMC-SIERRA INTRODUCES 600 MHZ PIN-COMPATIBLE RM7000 64-BIT MIPS-BASED PROCESSORS.(PMC-Sierra RM7000C and PMC-Sierra RM7065C)(Product Announcement)
JNI employs eight Xtensa processors in new Fibre Channel architecture.
QuickLogic and Renesas Technology Collaborate on 802.11b/g Wireless IP Phone Reference Design.
Stream Processors, Inc. Announces Storm-1 Family of Data-Parallel Digital Signal Processors.
Stream Processors, Inc. Announces Breakthrough Digital Signal Processor Architecture at ISSCC 2007.
Renesas Introduces Dual-Core 32-bit SuperH Microcontrollers Capable of Up to 960-MIPS Processing Performance, 800 MFLOPS Floating-Point Operation...

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles