Adversarial Attacks on Aligned Large Language Models

Jun 8, 2024 · 1 min read

Developed a novel adversarial attack technique for Large Language Models, leveraging regularized gradients with continuous optimization to generate valid tokens and significantly improve attack efficiency and success rates

Last updated on Jul 8, 2024

Transformers LLaMa-2

Authors

Chashi Mahiul Islam

Doctoral Student, Dept. of Computer Science

← Enhancing Visual Spatial Reasoning through Multi-Task Learning Jun 16, 2024

Adversarial Exploitation in Robot Vision-Language Navigation Mar 15, 2024 →