Logo Lab-STICCFrom sensors to knowledge: Communicate and decide

Computing in network for AI applications


Design of a computing router for computing in network


Memory is a vital part of any digital system. It is used to store the programs and the data on which these
programs operate. The memory must therefore be (very) fast, but ideally also (very) large, and inexpensive.
The impossible meeting between these three requirements has led to proposing solutions based on a
memory hierarchy by relying on various memory technologies. Cache memory, embedded as close as
possible to the processor on the same chip, is a key element of this hierarchy, giving the programmer the
illusion of both fast and large memory.
Through the numerous hardware solutions implemented from generation to generation, more and more
transistors in a circuit are allocated for the sole purpose of improving memory access. In many cases, more
than 80% of the area of a chip is dedicated to caches, memories and memory controllers, interconnects,
etc., whose sole purpose is to store/transfer data or control the storage/transfer of data. Memory accesses
are more expensive than an arithmetic operation [1]. As a result, the total energy spent for moving data
has reached excessive proportions. In a mobile system, memory aspects alone can consume up to 62% of
the energy [2].

Assigning a processor to compute a low number of very simple and basic operations is extremely inefficient
from an energy point of view. Consider an example of a simple arithmetic operation like an addition. It
requires 2 read memory accesses (to load the two operands) and one write access (to store the result). This
represents 3 memory accesses for a single, simple arithmetic operation. In a current many-core architecture
interconnected through a network-on-chip (NoC), the data needs to take a long path, as highlighted in the
left hand figure, from the main memory to the cores: AXI Fabric, DMA (Direct Memory Access), NoC Router,
L2 cache, L1 cache.The main idea is to add some processing capabilities inside the routers of the network on chip to perform the very simple operation. The platform identified in OpenPiton, and the application is SqueezeNet.


The main goal of the post-doc position is to enhance the existing routers of the network on chip with
simple computing capabilities (addition/subtraction, multiplication, comparison, etc.). The challenge is to
design this new router without degrading the initial performance of the network on chip. The main
outcome expected is a new programmable hardware router able to compute very simple operations on a
set of data. The performance will be measured through simulation tools and on an FPGA board.


  • VHLD, verilog, system verilog
  • CAD tools
  • C/C++
  • FPGA
  • English


Duration: up to December 2025
Start: ASAP 2024
Salary: according to experience. From 2300 € after tax (young researcher)


Research team: ARCAD, Lab-STICC
Address: Lab-STICC, rue Saint-Maudé 56100 Lorient, France


Kevin Martin (Lab-STICC, ARCAD team, Lorient) - kevin.martin@univ-ubs.fr


Send resume and application letter to kevin.martin@univ-ubs.fr


[1] Mark Horowitz. Computing’s energy problem (and what we can do about it). In 2014 IEEE International
Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014.
[2] Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur,
Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu. Google Workloads for
Consumer Devices: Mitigating Data Movement Bottlenecks. In Proceedings of the Twenty-Third International
Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’18,
2018. ACM.



All news