Subhrajyoti Dasgupta
Learning with machines.

I am a Master's student at Mila and Université de Montréal. I'm primarily interested in audio-visual learning, visual scene understanding and computational photography.

During my graduate studies, I was fortunate to have worked with Ruohan Gao(Meta), Prof. Mohamed Elhoseiny(KAUST) and Prof. Dinesh Manocha(UMD). I am also grateful to have collaborated with Senthil Yogamani(Qualcomm), Prof. Ciarán Eising(U.Limerick) and other wonderful researchers at Qualcomm and Valeo AI.

Previously, I worked under the guidance of Prof. Ujjwal Bhattacharya at the CVPR Unit, Indian Statistical Institute, Kolkata. Earlier, I have also completed a short stint at Bhabha Atomic Research Center, Mumbai where I worked on Devanagari text recognition in limited-data settings. I graduated from Amity University with a First Class with Distinction Bachelor's degree in Computer Science and Engineering.

My prior research experience has been on audio-visual co-segmentation, audio-visual summarization and medical signal processing.

Email  /  CV  /  Linkedin  /  Twitter  /  Github

profile photo
Updates

[Jul '24]: Meerkat is accepted at ECCV 2024! Check here!
[Jul '23]: AdVerb is accepted at ICCV 2023! Check here!
[Jul '23]: UnShadowNet is accepted in IEEE Access journal!
[Sep '22]: Joined Mila as a Master's student. The program is supervised by Prof. Yoshua Bengio.
[Nov '21]: Presented AudViSum at BMVC 2021! [Presentation]
[Oct '21]: AudViSum accepted at BMVC 2021!
[Sep '21]: Presented Listen to the Pixels at ICIP 2021! [Presentation]
[May '21]: Listen to the Pixels accepted at ICIP 2021!
[Jan '21]: Presented CardioGAN at ICPR 2020! [Presentation]

Research
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury*, Sayan Nag*, Subhrajyoti Dasgupta*, Jun Chen, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha
ECCV, 2024
Code / Dataset / Website / BibTex

We present Meerkat, an audio-visual LLM equipped with a fine-grained understanding of image and audio both spatially and temporally. With a new modality alignment module based on optimal transport and a cross-attention module that enforces audio-visual consistency, Meerkat can tackle challenging tasks such as audio referred image grounding, image guided audio temporal localization, and audio-visual fact-checking. Moreover, we carefully curate a large dataset AVFIT-3M that comprises 3M instruction tuning samples collected from open-source datasets, and introduce MeerkatBench that unifies five challenging audio-visual tasks.

AdVerb: Visually Guided Audio Dereverberation
Sanjoy Chowdhury, Sreyan Ghosh, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha
ICCV, 2023
Code / Website / BibTex

AdVerb leverages visual cues of the environment to estimate clean audio from reverberant audio. For instance, given a reverberant sound produced in a large hall, our model attempts to remove the reverb effect to predict the anechoic or clean audio.

UnShadowNet: Illumination Critic Guided Contrastive Learning for Shadow Removal
Subhrajyoti Dasgupta, Arindam Das, Senthil Yogamani, Sudip Das, Ciarán Eising, Andrei Bursuc, Ujjwal Bhattacharya
IEEE Access
Paper / BibTex

Shadow removal is a hard task given the challenges associated with it, one of them being unavailabilty of paired labelled data. We propose a weakly supervised, illumination critic guided method using contrastive learning for efficiently removing shadows.

AudViSum: Self-Supervised Deep Reinforcement Learning for diverse Audio-Visual Summary generation
Sanjoy Chowdhury, Aditya P. Patra, Subhrajyoti Dasgupta, Ujjwal Bhattacharya
BMVC, 2021
Code / Presentation / BibTex

Generating representative and diverse audio-visual summaries by exploiting both the audio and visual modalities, unlike prior works. Also presented a new dataset on TVSum and OVP with audio and visual annotations.

Listen to the Pixels
Sanjoy Chowdhury, Subhrajyoti Dasgupta, Sudip Das, Ujjwal Bhattacharya
ICIP, 2021
Code / Presentation / BibTex

Audio-visual co-segmentation and sound source separation using a novel multimodal fusion mechanism, also addressing partially occluded sound source separation and co-segmentation for multiple but similar sound sources.

CardioGAN: An Attention-based Generative Adversarial Network for Generation of Electrocardiograms
Subhrajyoti Dasgupta, Sudip Das, Ujjwal Bhattacharya
ICPR, 2020
Presentation / BibTex

Generating synthetic ECGs, for easy sharing without risk of privacy breach, using an Attention-based Generative Adversarial Network.

Projects
Detection and Recognition of Handwritten Text written in Devanagari script from documents

While there exists large literature to detect and recognise English text in natural scenes and documents, during the time of this study, regional languages were not very largely studied. In this project done at Bhabha Atomic Research Center, Mumbai, dealt with a huge shortage of data and the nuances in the Devanagari script. Learning strategies for constrained settings like few-shot learning, transfer learning were used to develop the project. The project was implemented using Keras and Python. A great deal of OpenCV, Matplotlib and other scientific tools were also used.

Code
Studying ways to solve challenges faced by the LHC (CERN) with Machine Learning

A humongous amount of data is produced by the LHC per day. This data needs to be processed and used efficiently for further research. This study was on how Machine Learning can be implemented for particle identification, particle track reconstruction, clustering of particles based on similarity, and identifying rare decays. A study on the proposed SHiP experiment, with the scope of Machine Learning in it, was also done.

Personal

I often like to go out with my camera to cover music festivals, capture people and moments. Check out my works on 500px. Besides, I like indulging in a wide variety of movies and music. I keep a huge collection of movies from Kubrick to Nolan, Bergman to Ray.


Template Credits: Dr. Jon Barron