Adversarial Attacks on Aligned Large Language Models
Developed a novel adversarial attack technique for Large Language Models, leveraging regularized gradients with continuous optimization to generate valid tokens and significantly improve attack efficiency and success rates
Jun 8, 2024