• Bio
  • Research
  • Experience
  • Projects
  • Publications
Enhancing Visual Spatial Reasoning through Multi-Task Learning
LXMERT

Enhancing Visual Spatial Reasoning through Multi-Task Learning

Developed a vision-language model incorporating spatial features like depth maps, 3D coordinates, and edge maps through multi-task learning, improving spatial understanding and achieving state-of-the-art results on the Visual Spatial Reasoning (VSR) dataset.

Jun 16, 2024
Adversarial Attacks on Aligned Large Language Models
Transformers

Adversarial Attacks on Aligned Large Language Models

Developed a novel adversarial attack technique for Large Language Models, leveraging regularized gradients with continuous optimization to generate valid tokens and significantly improve attack efficiency and success rates

Jun 8, 2024
Adversarial Exploitation in Robot Vision-Language Navigation
CLIP

Adversarial Exploitation in Robot Vision-Language Navigation

Developed a novel algorithm to adversarially modify images, exploiting the semantic representation gaps in vision-language models, enabling controlled redirection of robot navigation paths through minimal image alterations. Improved security through a detection mechanism sensitive to noise in manipulated images.

Mar 15, 2024
Inherent Vulnerabilities of Vision Transformers for Medical Image Classification
ViT

Inherent Vulnerabilities of Vision Transformers for Medical Image Classification

Explored vulnerabilities in vision transformers for medical image classification, demonstrating that minor modifications can significantly alter representations, and developed an efficient detection method to enhance model reliability.

Mar 2, 2024
Multimodal Classification with LLMs - Unstructured Text Feature Extraction
Finetuning LLM

Multimodal Classification with LLMs - Unstructured Text Feature Extraction

Developing text data processing pipelines with private large language models to extract meaningful features with in-context reasoning and developing fusion models to improve classification scores by combining unstructured and tabular data in multimodal settings.

Aug 14, 2023
See all

© 2024 Me. This work is licensed under {license}

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.