Keynote Speakers



Pietro Ferraro
SPIE Fellow, H-index: 64
Institute of Applied Sciences & Intelligent Systems Campi Flegrei, Italy

Title:  Learning strategies for the recognition and classification of micro-objects through holographic footprints

Bio-sketch:  Dr. Pietro Ferraro received the doctor of Physics degree, summa cum laude, from the University of Napoli “Federico II”, Italy, in 1987.
Soon after he joined Aeritalia-Alenia Aeronautics (the major Aerospace company in Italy) as researcher to develop applied research in Optical Non-Destructive Testing of carbon fiber materials. He has been Principal Investigator (PI) (1991-1993) on behalf of Composite Materials Research Center of Alenia for two R&D Projects in the frame of a Cooperative Research And Development Agreement (CRADA) between Finmeccanica (Roma) and United Technologies Research Center, Electronics & Photonics Group (Directed by Dr. A.J. De Maria), East Hartford, CT (USA).
The two research projects were on “Non destructive testing of large composite aircraft structures by Holography methods” (PI for UTRC Dr. Karl A. Stetson) and “Fiber Optic Bragg Grating Sensors” (PI for UTRC Dr. J. R. Dunphy). During this cooperation he contributed to pioneering work on Fiber Bragg Grating for strain sensing, development of related instrumentations and optical fiber emebedding process in composite materials for which 3 patents were awarded jointly to Finmeccanica and UTRC. In 1993 he joined Consiglio Nazionale delle Ricerche (CNR) Optics Group at Institute of Cybernetics, Pozzuoli (Napoli), Italy as Associate Researcher to develop interferometric and holographic methods for testing and characterization of optical components and materials. In 2001 he joined as Researcher the CNR–Institute of Microelectronics and Microsystems, Napoli. In 2003 he joined National Institute for Applied Optics (INOA) as Senior Research Scientist. Since 2005 he is Head of the Research Line and Group on behalf of CNR in Optical diagnostics, Interferometric and Microscopy. (More)


Vittorio Murino
Director of the Pattern Analysis & Computer Vision (PAVIS), IAPR Fellow, H-index: 56
Italian Institute of Technology (IIT), Italy

Title:  Multimodal Scene Understanding Leveraging Acoustic Images
Abstract:  In this talk, I will address multimodal scene understanding considering acoustic and video signals. More specifically, I will introduce a new acoustic modality, namely acoustic images, which is not commonly considered in the literature. I will demonstrate the potential of this modality for scene understanding, which proved to reach a more robust and effective multimodal learning, much better as compared to learning using standard single-microphone audio data only.
First, I investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images. Since monaural audio signal showed to be not so robust towards variable environmental sound conditions, I consider a new dataset composed by RGB data, raw audio signals, and acoustic images, acquired by a hybrid audio-visual sensor where the visual and acoustic images are aligned in space and synchronized in time. Using this richer information, I train audio deep learning models in a teacher-student fashion. The related experiments suggest that the learned representations are more powerful and have better generalization capabilities than the features learned from models trained using just single-microphone audio data.
Second, I leverage the direct use of acoustic images for audio-visual scene understanding. In this type of images each acoustic pixel is characterized by a spectral signature associated to a speci c direction in space, and obtained by processing the audio signals coming from an array of microphones. By coupling such array with a video camera, we obtain spatio-temporal alignment of acoustic images and video frames. Since this constitutes a powerful source of self-supervision, I propose to exploit this data in an ad-hoc designed (unsupervised) learning pipeline, without resorting to expensive data annotations. Specifically, I introduce a distillation scheme exploiting a self-supervised learning mechanism such that the richer information content of acoustic images can be transferred to produce more powerful audio and visual feature representations. Such representations can then be employed for downstream tasks such as classi cation and cross-modal retrieval, without the need of a microphone array.
Third, realizing that 2D planar arrays are cumbersome and not as widespread as ordinary microphones onboard optical cameras, I aimed at exploiting such empowered modality while using standard microphones and cameras. Hence, I propose to leverage the generation of synthetic acoustic images from common audio-video data for the task of audio-visual localization. The generation of synthetic acoustic images is obtained by training a novel deep architecture, based on Variational Autoencoder and U-Net models, which is trained to reconstruct the ground-truth spatialized audio data collected by a microphone array, from the associated video and its corresponding monaural audio signal. In other words, the model learns how to mimic what an array of microphones can produce in the same conditions. We assess the quality of the generated synthetic acoustic images on the task of unsupervised sound source localization in a qualitative and quantitative manner, while also considering standard generation metrics.
To recap, in this talk, I present the potential of an unconventional acoustic modality, acoustic images, which, in association with synchronized video data and the design of suitable learning strategies, showed to produce powerful feature representations to be exploited in several downstream tasks such as audio and visual classification, cross-modal retrieval, and sound source localization.

Bio-sketch:  Vittorio Murino is full professor at the University of Verona, Italy, and Senior Video Intelligence Expert at the Ireland Research Centre of Huawei Technologies (Ireland) Co., Ltd. in Dublin. He took the Laurea degree in Electronic Engineering in 1989 and the Ph.D. in Electronic Engineering and Computer Science in 1993 at the University of Genova, Italy. He was chairman of the Department of Computer Science from 2001, year of foundation, to 2007, and director of PAVIS (Pattern Analysis and Computer Vision) department at Istituto Italiano di Tecnologia in Genova, Italy, from 2009 to 2019.
His main research interests include computer vision and machine learning, more specifically, statistical, probabilistic and deep learning techniques for image and video processing for (human) behavior analysis and related applications such as video surveillance and biomedical imaging.
Prof. Murino is co-author of more than 400 papers published in refereed journals and international conferences, member of the technical committees of important conferences (CVPR, ICCV, ECCV, ICPR, ICIP, etc.), and guest co-editor of special issues in relevant scientific journals. He is also member of the editorial board of Computer Vision and Image Understanding and Machine Vision & Applications journals. Finally, prof. Murino is IEEE Senior Member and IAPR Fellow.


Plenary Speaker

Konstantin Bulatov
Federal Research Center "Computer Science and Control" of RAS, Russia

Title:  Anytime Algorithms of Machine Vision
Abstract:  In this talk, we will touch on the subject of anytime algorithms: iterative interruptible processes with quality of the result depends on the time invested in the computation; and their application in machine vision problems. A brief overview of the properties of anytime algorithms will be given, along with the formulations of basic problems which are address when applying an anytime algorithm to a computing system. Examples will be given with relation to information extraction, text recognition, and computed tomography. The talk is directed to machine vision research and systems engineers, in the hope that this overview will give them additional angles on how to look at the machine vision problems and the implementation of intelligent systems.

Bio-sketch:  Konstantin Bulatov was born in Petrozavodsk, Russian Federation in 1991. He received a Specialist degree in applied mathematics from the National University of Science and Technology “MISiS”, Moscow, Russia, in 2013. He obtained his Ph.D. degree in computer science in 2020 from the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Moscow, Russia.

Since 2014 he is employed at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Moscow, Russia, and since 2016 he is employed at Smart Engines Service LLC, Moscow, Russia. He has been teaching a “Combinatorial optimization” course at the Moscow Institute of Physics and Technology (State University). His fields of study are computer vision, image processing, and document recognition systems.



"We sincerely invite you and your colleagues immediately mark this event on your calendar and make your plans to Rome, Italy!"
Copyright © 2021 The 14th International Conference on Machine Vision (www.icmv.org)