Topic:Source-channel coding for learning
Funding: This PhD position will be conducted within the framework of the Labex CominLabs project CoLearn (Coding for Learning)
Expected start date: October, 1st, 2021
Keywords: Asymptotic and non-asymptotic information theory, measure theory, source and channel coding, machine learning.
Every minute, 500 hours of video are uploaded on Youtube, and 240,000 images are added on Facebook. Since it is physically impossible that this huge mass of data is entirely processed and visualized by humans, there is an absolute need to rely on advanced machine learning methods so as to sort, organize, and recommend the content to users. However, the transmission of the data from the location where they are collected toward the server where they are processed must be done as a preliminary step. The conventional data transmission framework assumes that the data should be completely reconstructed, even with some distortions, by the server. Instead, this thesis aims at developing a novel communication framework in which the server may also apply a learning task over the coded data. We aim at developing an information theoretic analysis so as to understand the fundamental limits of such systems, and develop novel coding techniques allowing for both learning and data reconstruction from the coded data.
To perform learning, one straightforward idea consists in using standard coding techniques for data transmission, and to perform learning after data reconstruction. However, it is questionable whether designing the coding scheme from a distortion criterion point of view may also optimize the learning performance. Hence, the first fundamental question the candidate will address is: “is there a tradeoff in terms of coding rate between distortion and learning performance?” Moreover, the source-channel separation theorem states that, under asymptotic conditions, the source coding system and the channel coding system can be designed completely independently from each other, without any loss in performance compared to a joint design of the two systems. Therefore, the second fundamental question which we aim to investigate is: “is source-channel separation still optimal for learning under both asymptotic or non-asymptotic conditions?” .
The few works in literature that dealt with the tradeoff between reconstruction and learning performance have considered either a particular setup of the general problem depicted here, e.g. [2, 3], or have neglected the channel coding part e.g. . In this PhD, the candidate will consider the general setup depicted above and search for the fundamental information-theoretic limits governing the tradeoff between data reconstruction and learning performance measure. Moreover, the PhD candidate will investigate the more promising source and channel coding solutions in order to get closer to the bounds that would have been derived in a first step.
One of the envisaged applications is acoustic signal classification from underwater sensors. The data, collected from acoustic sensors, are transmitted via acoustic underwater channel to a gateway in order to be classified, e.g. biological or geological sound. The coding schemes proposed in the PhD may be applied in this context.
The candidate should have earned an MSc degree, or equivalent, in one of the following field: information theory, signal processing, applied mathematics. He should have a strong background in probabilities and information theory. Some knowledge about the machine learning field would also be appreciated. The candidate should be familiar with Matlab and C/C++ language or Python.
How to apply:
Please send an e-mail to the contacts listed below explaining in a few lines you interest for this subject, and attach:
- Full CV with list project and courses that could be related to the subject
- Complete academic records (from Bachelor to MSc)
- 1 or 2 references
Applications will be reviewed when they arrive until one candidate is selected
Dr. Elsa Dupraz, IMT Atlantique / Lab-STICC UMR CNRS 6285
Dr. Philippe Mary, INSA de Rennes / IETR UMR CNRS - 6164
 V. Kostina, “Lossy data compression: non-asymptotic fundamental limits”, PhD dissertation, Princeton University, 2013.
 E. Tuncel, D. Gündüz, “Identification and lossy reconstruction in noisy databases”, IEEE Trans. on Inf. Theory, 2013.
 S. Sreekumar, D. Günduz, “Distributed hypothesis testing over discrete memoryless channels”, IEEE Trans. on Inf. Theory, 2019
 M. Raginski, “Learning from compressed observations”, In Proc. of IEEE ITW, 2007