The researcher’s life is a long pathway, often in both space and time. Here’s mine (click on the map or see project descriptions below):
Comparing cortical feedback (fMRI) to Self-Supervised DNNs
The promise of artificial intelligence in understanding biological vision relies on the comparison of computational models with brain data with the goal of capturing functional principles of visual information processing. Convolutional neural networks (CNN) have successfully matched the transformations in hierarchical processing occurring along the brain’s feedforward visual pathway extending into ventral temporal cortex.
However, we are still to learn if CNNs can successfully describe feedback processes in early visual cortex. Here, we investigated similarities between human early visual cortex and a CNN with encoder/decoder architecture, trained with self-supervised learning to fill occlusions and reconstruct an unseen image. Using Representational Similarity Analysis (RSA), we compared 3T fMRI data from a non-stimulated patch of early visual cortex in human participants viewing partially occluded images, with the different CNN layer activations from the same images. Results show that our self-supervised image-completion network outperforms a classical object-recognition supervised network (VGG16) in terms of similarity to fMRI data. This provides additional evidence that optimal models of the visual system might come from less feedforward architectures trained with less supervision. We also find that CNN decoder pathway activations are more similar to brain processing compared to encoder activations, suggesting an integration of mid- and low/middle-level features in early visual cortex. Challenging an AI model to learn natural image representations via self-supervised learning and comparing them with brain data can help us to constrain our understanding of information processing such as neuronal predictive coding. [biorxiv – journal paper]
The analysis framework is composed of two parts: the encoder/decoder (artificial) neural network and brain imaging data collection. (a) The image passed through the network, we extracted activations for one layer, we selected a quadrant at the time, and we applied PCA transformation to reduce the dimension to 1024 components; we then obtained one 1024-d vector per layer (15), per quadrant (2), and per image (24). We used these vectors to compute Representational Dissimilarity Matrices (RDMs). (b) fMRI data were collected from participants whilst viewing the same images that were fed into the network (in testing) and RDMs computed. We compared RDMs of the network and brain data (cross-validated – see [Walther et al., 2016]), for every CNN layer (15 layers analysed), human visual area (V1 and V2), and image space quadrant (occluded and non-occluded quadrants).
Deep learning methods for MRI data analysis
Understanding brain mechanisms associated with sensory perception is a long-standing goal of Cognitive Neuroscience. Using non-invasive techniques, such as fMRI, researchers have nowadays the possibility to study brain activity in different areas. An essential step in many functional and structural neuroimaging studies is segmentation, the operation of partitioning the MR images in anatomical structures. Current automatic (multi-) atlas-based segmentation strategies often lack accuracy on difficult-to-segment brain structures and, since these methods rely on atlas-to-scan alignment, they may take long processing times. Alternatively, recent methods deploying solutions based on Convolutional Neural Networks (CNNs) are enabling the direct analysis of out-of-the-scanner data. However, current CNN-based solutions partition the test volume into 2D or 3D patches, which are processed independently. This process entails a loss of global contextual information, thereby negatively impacting the segmentation accuracy. In these works, we introduce CEREBRUM, an optimised end-to-end Convolutional Neural Network (CNN), that allows the segmentation of a whole T1w MRI brain volume at once, without partitioning the volume, preprocessing, nor aligning it to an atlas. Different quantitative measures demonstrate an improved accuracy of this solution when compared to state-of-the-art techniques. Moreover, through a randomised survey involving expert neuroscientists, we show that subjective judgements favour our solution with respect to widely adopted atlas-based software. We delivered two tools:
Work done in collaboration with the Department of Information Engineering, University of Brescia (Italy).
Transfer learning of DNN representations for fMRI decoding
Deep neural networks have revolutionised machine learning, with unparalleled performance in object classification. However, in brain imaging (e.g., fMRI), the direct application of Convolutional Neural Networks (CNN) to decoding subject states or perception from imaging data seems impractical given the scarcity of available data.
In this work we propose a robust method to transfer information from deep learning (DL) features to brain fMRI data with the goal of decoding. By adopting Reduced Rank Regression with Ridge Regularisation we establish a multivariate link between imaging data and the fully connected layer (
fc7) of a CNN. We exploit the reconstructed
fc7 features by performing an object image classification task on two datasets: one of the largest fMRI databases, taken from different scanners from more than two hundred subjects watching different movie clips, and another with fMRI data taken while watching static images.
fc7 features could be significantly reconstructed from the imaging data, and led to significant decoding performance. The decoding based on reconstructed
fc7 outperformed the decoding based on imaging data alone. In this work we show how to improve fMRI-based decoding benefiting from the mapping between functional data and CNN features.
The potential advantage of the proposed method is twofold: the extraction of stimuli representations by means of an automatic procedure (unsupervised) and the embedding of high-dimensional neuroimaging data onto a space designed for visual object discrimination, leading to a more manageable space from dimensionality point of view. [paper]
Work done in collaboration with the Department of Cognitive Neuroscience, Maastricht University (The Netherlands), the Department of Information Engineering, University of Brescia (Italy), and the Sagol Brain Institute, Wohl Institute for Advanced Imaging, Tel-Aviv Sourasky Medical Center, Tel-Aviv (Israel).
Inter-subject audiovisual decoding in fMRI using high-dimensional regression
Major methodological advancements have been recently made in the field of neural decoding, which is concerned with the reconstruction of mental content from neuroimaging measures. However, in the absence of a large-scale examination of the validity of the decoding models across subjects and content, the extent to which these models can be generalised is not clear. This study addresses the challenge of producing generalisable decoding models, which allow the reconstruction of perceived audiovisual features from human magnetic resonance imaging (fMRI) data without prior training of the algorithm on the decoded content. We used data from more than 200 subjects watching several movie clips to perform a full brain fMRI decoding (42k voxels).
We applied an adapted version of kernel ridge regression combined with temporal optimisation on data acquired during film viewing (234 runs) to generate standardised brain models for sound loudness, speech presence, perceived motion, face-to-frame ratio, lightness, and color brightness. The prediction accuracies were tested on data collected from different subjects watching other movies mainly in another scanner. [paper]
Videos are protected by Copyrights ©. All Rights Reserved.
Work primarily done in the Functional Brain Center, Tel Aviv Sourasky Medical Center, Tel Aviv, (Israel) and in collaboration with the Department of Cognitive Neuroscience, Maastricht University (The Netherlands), and the Department of Information Engineering, University of Brescia (Italy).
Shot scale detection and authorship recognition
What can you learn from a single frame of a movie?
The scale of shot, i.e. the apparent distance of the camera from the main subject of a scene, is one of the main stylistic and narrative functions of audiovisual products, conveying meaning and inducing the viewer’s emotional state.
Figure: Shot scale classes from the movie Hugo (Scorsese, 2011).
The statistical distribution of different shot scales in a film may be an important identifier of an individual film, an individual author, and of various narrative and affective functions of a film. In order to understand at which level shot scale distribution (SSD) of a movie might become its fingerprint, it is necessary to produce automatic recognition of shot scale on a large movie corpus. In our work we propose an automatic framework for estimating the SSD of a movie by using inherent characteristics of shots containing information about camera distance, without the need to recover the 3D structure of the scene. [shot_scale, over-the-shoulder]
In another work, we show how low-level formal features, such as shot duration, meant as length of camera takes, and shot scale, i.e. the distance between the camera and the subject, are distinctive of a director’s style in art movies. So far such features were thought of not having enough varieties to become distinctive of an author. However our investigation on the full filmographies of six different authors (Scorsese, Godard, Tarr, Fellini, Antonioni, and Bergman) for a total number of 120 movies analysed second by second, confirms that these shot-related features do not appear as random patterns in movies from the same director. For feature extraction we adopt methods based on both conventional and deep learning techniques. Our findings suggest that feature sequential patterns, i.e. how features evolve in time, are at least as important as the related feature distributions. To the best of our knowledge this is the first study dealing with automatic attribution of movie authorship, which opens up interesting lines of cross-disciplinary research on the impact of style on the aesthetic and emotional effects on the viewers. [movie_director, movie_director_full_analysis]
Work done in collaboration with the Department of Information Engineering, University of Brescia (Italy).
Figaro: Hair detection, segmentation, and hairstyle classification in the wild
Hair highly characterises human appearance. Hair detection in images is useful for many applications, such as face and gender recognition, video surveillance, and hair modelling. We tackle the problem of hair analysis (detection, segmentation, and hairstyle classification) from unconstrained view by relying only on textures, without a-priori information on head shape and location, nor using body-part classifiers. We first build a hair probability map by classifying overlapping patches described by features extracted from a CNN, using Random Forest. Then modelling hair (resp. non-hair) from high (resp. low) probability regions, we segment at pixel level uncertain areas by using LTP features and SVM. For the experiments we extend Figaro, an image database for hair detection to Figaro1k, a new version with more than 1,000 manually annotated images. Achieved segmentation accuracy (around 90%) is superior to known state-of-the-art. Images are eventually classified into hairstyle classes: straight, wavy, curly, kinky, braids, dreadlocks, and short.