Separating Explicit and Implicit Controls for Expressive Real-Time Neural Synthesis - Part 1

video

information

Type
Soutenance de thèse/HDR
performance location
Ircam, Salle Igor-Stravinsky (Paris)
date
October 31, 2025

Recent advances in machine learning have profoundly transformed our relationship with sound and musical creation. Deep generative models are emerging as powerful tools that can support and extend creative practices, yet their adoption by artists remains limited by the question of control. Current approaches either rely on explicit parameters (notes, instruments, textual descriptions) or on abstract representation spaces that enable the exploration of subjective concepts such as timbre and style, but are harder to integrate into musical workflows.This thesis aims to reconcile these two paradigms of explicit and implicit control to design expressive audio synthesis tools that can be seamlessly integrated into music production environments. We begin with a systematic study of neural audio codecs, the building blocks of most modern generative models, identifying design choices that influence both audio quality and controllability. We then explore methods to jointly learn explicit and implicit control spaces, first in a supervised setting, and later through AFTER, a framework designed for the unsupervised case. AFTER enables realistic and continuous timbre transfer across a wide range of instruments while preserving control over pitch and rhythm.Finally, we adapt these models for real-time use through lightweight, streamable diffusion architectures and develop an intuitive interface integrated into digital audio workstations. The thesis concludes with several artistic collaborations, demonstrating the creative potential and practical impact of these generative approaches.

Nils Demerlé's thesis defense

Nils Demerlé, PhD candidate within the EDITE doctoral school (ED 130), conducted his research entitled “Separating Explicit and Implicit Controls for Expressive Real-Time Neural Synthesis” as part of the Analysis–Synthesis team at the STMS Laboratory (IRCAM, CNRS, Sorbonne Université, Ministry of Culture), under the supervision of Philippe Esling and co-supervision of Guillaume Doras.

Jury composition:

  • Joshua REISS – Professor, Queen Mary University of London – Reviewer
  • Nao TOKUI – Artist and Researcher, Neutone – Reviewer
  • Anna HUANG – Assistant Professor, MIT – Examiner
  • Atau TANAKA – Professor, Goldsmiths University – Examiner
  • Tatsuya HARADA – Professor, University of Tokyo – Examiner
  • Alexandre DEFOSSEZ – Researcher, Kyutai – Examiner

Abstract:
Recent advances in machine learning have profoundly transformed our relationship with sound and musical creation. Deep generative models are emerging as powerful tools that can support and extend creative practices, yet their adoption by artists remains limited by the question of control. Current approaches either rely on explicit parameters (notes, instruments, textual descriptions) or on abstract representation spaces that enable the exploration of subjective concepts such as timbre and style, but are harder to integrate into musical workflows.This thesis aims to reconcile these two paradigms of explicit and implicit control to design expressive audio synthesis tools that can be seamlessly integrated into music production environments. We begin with a systematic study of neural audio codecs, the building blocks of most modern generative models, identifying design choices that influence both audio quality and controllability. We then explore methods to jointly learn explicit and implicit control spaces, first in a supervised setting, and later through AFTER, a framework designed for the unsupervised case. AFTER enables realistic and continuous timbre transfer across a wide range of instruments while preserving control over pitch and rhythm.Finally, we adapt these models for real-time use through lightweight, streamable diffusion architectures and develop an intuitive interface integrated into digital audio workstations. The thesis concludes with several artistic collaborations, demonstrating the creative potential and practical impact of these generative approaches.

speakers


share


Do you notice a mistake?

IRCAM

1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43

opening times

Monday through Friday 9:30am-7pm
Closed Saturday and Sunday

subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles

Institut de Recherche et de Coordination Acoustique/Musique

Copyright © 2022 Ircam. All rights reserved.