### Adaptive processor architectures for detector applications

Prof. Dr.-Ing. habil. Michael Hübner

**Chair for Embedded Systems in Information Technology (ESIT)** Faculty of Electrical Engineering and Information Technology, Ruhr-University of Bochum, Germany





## Content

- Introduction of traditional and new processor architectures
  - Standard RISC processor ATOM
  - Standard DSP processor from Texas Instruments
  - Novel heterogeneous processor including RISC and DSP
- Motivation of an adaptive processor concept
  - Requirements using the processor for detector application
  - Integration of the sensor data into the processor datapath
  - Description of the extended datapath
- How to develop and simulate an adaptive processor
  - High level design flow for processors
  - Benchmarking and emulation
- Conclusion and outlook

RU

## (Non) adaptive processor architectures RISC / CISC



From: Taking a closer look at Intel's Atom multicore processor architecture, Stephen Blair-Chappell

## (Non) adaptive processor architectures DSP

- Traditional processor, developed for dataflow oriented applications
- Example: Texas Instruments DSP Processor
  - Developed for data flow oriented applications (here the data throughput plays a important role, also power consumption)
  - VLIW architecture (256bit), deep pipeline stages (can be over 20)
  - $\rightarrow$  Data dependencies are solved by the compiler during design time

→ Application scenario with low control flow (therefore parallelization during design time by compiler) **but** 

 $\rightarrow$  every control flow, reduces the performance tremendously

• So why not an adaptive processor, or at least providing a heterogeneous architecture

## The hybrid way: C6A816x Integra™ DSP + ARM processor

### Cores

- C674x<sup>™</sup> Programmable, Floating/Fixed Point DSP Core up to 1.5 GHz
- ARM Cortex A8<sup>™</sup> (MPU) up to 1.5 GHz
- 3D Graphics Engine up to 27M polygons/s (C6A8168 only)
- Display Subsystem interface to multiple, simultaneous HD displays

#### Memory

- ARM: 32KB L1I-Cache, 32KB L1 D-Cache, 256K L2
- DSP: 32KB L1I-Cache, 32KB L1 D-Cache, 256K L2
- External Interfaces: Two DDR3-1600 Controllers and NAND

### Peripherals

- Gigabit EMAC x2
- USB 2.0 Ctlr/PHY x 2
- PCIe 2.0 x1; Supports 2 lanes
- SATA 3.0Gbps supports 2 external drives
- HDMI 1.3 Tx
- SD/SDIO
- McASP x3, McBSP
- SPI, GPIO, I2C, UART, EMAC

### Power

• Total Power - Typical 5-6W



RUB

5

### Switched Central Resource (SCR)

#### Peripherals

| PCIe<br>2<br>Ianes | McASP<br>x3<br>SPDIF<br>McBSP | 12C<br>x2 | UART<br>x3 | SPI | USB<br>2.0<br>x2 | GPIO | GMII<br>EMAC<br>x2 |
|--------------------|-------------------------------|-----------|------------|-----|------------------|------|--------------------|
|--------------------|-------------------------------|-----------|------------|-----|------------------|------|--------------------|

#### **Memory Interfaces**

| DDR3<br>x2 | SDIO<br>/SD | Async<br>EMIF/<br>NAND | SATA2<br>x2 |  |
|------------|-------------|------------------------|-------------|--|
|------------|-------------|------------------------|-------------|--|

## The adaptive processor

- Application depend on the "position" of the processor: e.g. ATLAS (trigger level)
- ... from billions of events to hundreds, from petabytes to hundreds of megabytes...
- Different "requirements" of the application with different control / data flow overhead, or even both in separate phases of the application



## The adaptive processor: Advantages

- The adaptive processor is able to "react" to application requirements
- It can be deployed without modification in many application (it starts as "general purpose processor and ends as application specific processor"
- The monitoring can be adapted to many signatures, even a "history" can be stored and reused (keyword case based reasoning from AI)
- And: it combines the methods of embedded computing with the ones from supercomputing (keyword multicore, power saving modes etc.)
- **BUT:** How can a processor be as near as possible to the place, where data are produced?
- E.g. ATLAS level 1: Tight coupling of the processor to the sensor



RU

# Data path of a processor



Bildquelle: Tanenbaum, Structured Computer Organization

# Example pixel detector specific microarchitecture RUB



Bildquelle: Tanenbaum, Structured Computer Organization



# (Modern) processor design





# (Modern) processor design



# **Conclusion and outlook**

 Co-Description of processors as well as the other architecture from high level of abstraction (model based)

- Fast simulation by using virtual prototyping platforms and FPGA based emulators
- Processing real measured data Software | HW partitioning co-Desilen Instruction set ADL 2 PhD thesis are currently one of the second UNDER INVESTIGATION AT EST Microarch. adaptivity processor **Businterface** Virtual Plattform C: **Pipeline-integration** (OVP, Virtualizer) interfaces Co-Rapid proto LUT-width ADL **Routing channels FPGA** Hybrid Prototyping

# Thanks for your interest!

### **Contact:**

Prof Dr.-Ing. habil. Michael Hübner Chair for Embedded Systems in Information Technology (ESIT) Ruhr-University of Bochum (RUB) Building ID/1 Room 341 Tel.: +49 234 32 25975 Email: <u>michael.huebner@rub.de</u>

# **Collaboration welcome!**

### Embedded Systems: Examples from daily life: "Ubiquitous Computing"



→ Interaction with the environment and via network leads to the term Cyber Physical Systems Copyright ESIT, RUB, Prof. Dr.-Ing. Michael Hübner