Currently, there is a growing number of online bullying, attacks, and propaganda of prohibited materials over the Internet. The main means of exchanging information, selecting and promoting personnel for such structures is the Internet, namely web resources, social networks and e-mail. In this regard, the task of identifying, identifying communication topics, connections, as well as monitoring behavior and predicting threats emanating from individual users, groups and network communities that generate and distribute information of a bullying network on the Internet. The article is devoted to the research and data mining related to the topic of detecting bullying on the Internet. The study examines the development of the data set aimed to train the machine learning algorithms in the Kazakh language and marking them as con-sist of bullying phrasesor not.