Machine learning-based spam detection models learn from a set of labeled training data and detect spam emails after the training phase. We study a class of vulnerabilities of such detection models, where the attack can manipulate a trained model to misclassify maliciously crafted spam emails at the detection phase. However, very often feature extraction methods make it very difficult to translate the change in the feature space to that in the textual email space. This paper proposes a new attack method of making guided changes to text data by taking advantage of findings of generated adversarial examples that purposely modify the features representing an email. We study different feature extraction methods using various Natural Language Processing (NLP) techniques. We develop effective methods to translate adversarial perturbations in the feature space back to a set of “magic words”, or malicious words, in the text space, which can cause desirable misclassifications from the attacker’s perspective. We show that our attacks are effective across different datasets and various machine learning methods in white-box, gray-box, and black-box attack settings. Finally, we discuss preliminary exploration to counter such attacks. We hope our findings and analysis will allow future work to perform additional studies of defensive solutions against this new class of attacks.