Adversarial Attacks on Aligned Large Language Models

Jun 8, 2024 ยท 1 min read

Developed a novel adversarial attack technique for Large Language Models, leveraging regularized gradients with continuous optimization to generate valid tokens and significantly improve attack efficiency and success rates