LLaMa-2

Adversarial Attacks on Aligned Large Language Models

Developed a novel adversarial attack technique for Large Language Models, leveraging regularized gradients with continuous optimization to generate valid tokens and significantly improve attack efficiency and success rates

Jun 8, 2024