Analyzing Greek Trap Trends

Posted by Zeses Pitenis on November 21th

Popular words, themes and name entities with SpaCy

To start things off, I was never into the new rap/trap wave of Greek music due to what I felt was weird to be said in a Greek song, those lyrics were never really something I could find myself humming. To be honest, the whole premise of life on the streets, the transition from poverty to loads of money and cars, the ever-popular sexist comments were always putting me off, as they had nothing to do with the actual situation in my country. 

However, after I returned to Greece from the UK this past May, I came to realize that this genre of music was not just a phase anymore; it was playing everywhere, and representative artists were topping the IFPI charts. In addition, due to the recent surge of reggaeton worldwide, many Greek songs implemented the characteristic rhythm of reggaeton, while keeping the topics of lyrics the same, possibly to attract more people to listen to these songs. And, it worked.

In my last blog post, I utilized the lyricsgenius package to create a lyrics corpus via the Genius API. Oops, I did it again and now I got a corpus from rap/trap artists who topped the Official IFPI Digital Singles Chart, choosing 12 names from that list. Then I collected a maximum of 20 songs per artist, which brought the items of the dataset to a total of 209 songs. Shall we see what are these artists rapping about? What themes are trending among them? Let’s delve into the lyrics.

import pandas as pd
from collections import Counter
df = pd.read_csv('greek_trap_corpus.csv', encoding = 'utf-8')

I’m particularly interested in checking the most frequent words and the top named entities in the sample dataset I collected and creating bar plots and a word cloud showcasing my results. I will be trying the Greek model of spaCy for tokenization and named entity recognition (NER). Regarding NER, the accuracy is not as high as in other languages, currently standing at 71%, but I thought I’d try.


Part 1: Importing and Making Use of spaCy for Information Extraction

import spacy
from spacy.lang.el.stop_words import STOP_WORDS

nlp = spacy.load('el_core_news_sm')

stop_words = ['κι', "απ'", 'μες', "μ'", "ό,τι", "σ'", "τ'", 'σα', "είν'", 'ή', "γι'", "'ναι", "είμ'", "ν'", 'ό']
for w in stop_words:
    STOP_WORDS.add(w)

Greek is not a popular language nowadays, so the spaCy model is in dire need of improvements, that is why I added some extra stop words before continuing with the analysis. Then I create lists for the tokens and the named entities to be used to create our barplots and wordcloud.

tokens = []
ents = []
for song in df['lyrics']:
    doc = nlp(song)
    tokens.append([token.text for token in doc if token.is_stop != True and token.is_punct != True])
    ents.append([ent.text for ent in doc.ents])

Part 2: Getting Word and Entities Frequencies

word_freq = Counter(w.lower() for wl in tokens for w in set(wl))

ent_freq = Counter(e for el in ents for e in set(el))

filtered_word_freq = {k:v for k, v in word_freq.items() if k not in STOP_WORDS}

wrong_ents = ['Τώρα', 'Σαν', 'Άμα', 'Πάντα', 'Άσε', 'Μου', 'Yah', 'Λένε', 'Πόσο', 'Είσαι', 
              'Πάω', 'Νιώθω', 'Ξέρω', 'Ποιος', 'Ρωτάνε', 'Τους', 'Απλ΄ά', 'Εσύ', 'Μαύρα', 'Πίσω',
              'Aye', 'Είπα', 'Μόλις', 'Παίζω', 'Απλά', 'Λες', 'Πονάει', 'Παίρνω', 'Πόσοι', 'Λίγο', 'Μπορεί',
              'Λέγε', 'Βγήκα', 'Τόσες', "Τ'", 'Πάλι', 'Τόσοι', 'Τρέχω', 'Ψάχνω', 'Ήμουν', 'Πριν', 'Κάτσε',
              'Βγάζω', 'Θέλουνε', 'Κάνουμε', 'Κάποτε', 'Λέω', 'Μες', 'Μπαίνω', 'Έχουμε', 'Απόψε', 'Γάμησε',
              'Γαμώ', 'Δουλεύω', 'Είχα', 'Κανένας', 'Κοιτάνε', 'Μάλλον', 'Μαγειρεύω', 'Μιλάνε',
              'Μπλεγμένος', 'Μόνιμα', 'Ξεκίνησα']
filtered_ent_freq = {k:v for k, v in ent_freq.items() if k not in wrong_ents}

As you can see I make use of the Counter module by creating frequency dictionaries for words and named entities. The extra lines of code you see here is because of omissions of stop words by spaCy (to their defense, some of these are just short forms) and because of the not so accurate built-in NER. For sopmeone that knows Greek, you can easily identify some of the top verbs and adverbs, that maybe because their first letter was capitalized, the model confused them with named entities. So, I'm filtering out what I can before I come up with a solution to improve the model, I guess. And now to the fancy part.


Part 3: Plotting the Data and Comments

import seaborn as sns
sns.set(rc={'figure.figsize':(20,10)})
sns.set_style('ticks')

sns.set_palette(sns.color_palette("Paired"))
common_words_df = pd.DataFrame(list(filtered_word_freq.items()), columns=['word', 'count'])
common_words_sorted = common_words_df.sort_values(by='count', ascending=False)
sns.barplot(x ='word', y='count', data = common_words_sorted[0:20])
Top Words in Greek Trap

After converting the word frequencies dictionary to a dataframe and sorting it, we plot the 20 most frequent words in a barplot. The list contains words such as: 

  1. κάνω (I'm doing)
  2. θέλω (I want)
  3. yah
  4. yeah
  5. ξέρω (I know)
  6. πες (say)
  7. φράγκα (money - colloquial)
  8. λένε (they say)
  9. πουτάνα (whore)
  10. λεφτά (money)
  11. βλέπω (I see)
  12. λέει (he/she says)
  13. θέλουν (they want)
  14. πίνω (I drink)
  15. μέρα (day)
  16. ποιος (who)
  17. σπίτι (house)
  18. μάτια (eyes)
  19. λες (you say)
  20. κάνει (he/she does)

While many of these words are common verbs, the appearance of words like whore and money so high in the list, indicate that the general trend of trap music is present in Greek artists as well. To get a more complete view of what's trending on their songs, we plot a wordcloud:

from wordcloud import WordCloud
import matplotlib.pyplot as plt
wordcloud = WordCloud(width=1600, height=800, max_font_size=200).generate_from_frequencies(filtered_word_freq)
plt.figure(figsize=(20,10),facecolor='k')
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()
Greek Trap Word Cloud

Looking at the above image, an English speaker can certainly recognize several words as they are in English (e.g. team, flex, hits, flow, club, gang, bitch, etc.) and as for me, a speaker of both Greek and English, well, my opinion is the same as it was before I wrote this. In a few words, greek trappers do whatever they want with their money: they buy anything from designer clothes, to drugs and prostitutes.

sns.set_palette(sns.color_palette("RdBu", n_colors=20))
common_ent_df = pd.DataFrame(list(filtered_ent_freq.items()), columns=['entity', 'count'])
common_ent_sorted = common_ent_df.sort_values(by='count', ascending=False)
sns.barplot(x ='entity', y='count', data = common_ent_sorted[0:20]) 
Top Entities in Greek Trap

We plotted the data same way as we did a while back, only changing the color palette for fun. Let's take a closer look at the top 20 entities:

  1. Gucci
  2. Αθήνα (Athens)
  3. Ελλάδα (Greece)
  4. FY
  5. Skive
  6. Ortiz
  7. Xannie
  8. TV
  9. Fendi
  10. Παρίσι (Paris)
  11. Light
  12. Μπάτσοι (Cops)
  13. Μιλάνο (Milan)
  14. Mente Fuerte
  15. God
  16. Ευρώπη (Europe)
  17. Young
  18. Louis
  19. Benzo
  20. Molly

While several of these can be interpreted as repeated words in certain songs (Benzo, short for Mercedes Benz, a popular car), we can see some clear patterns in the lyrics. First of all Gucci is the top word in our sample and along with Fendi and Louis (Vuitton) they are well-known brands and expensive ones, used as indicators of wealth in many of these songs. The geographical named entities are usually used to assure the listener that the trapper either is the best in a country or that they visit many cities every year (or week). There are several mentions to the artists themselves, which is something common in rap and its children genres, the most notable of them being Skive, considered as the first producer to bring trap music to Greece via his work. Last but not least, the cops are mentioned a lot and two drugs with their colloquial words. 

To sum up what this data analysis contributed: We already knew about the content of these songs. Nevertheless, I had to do it for confirmation. I still need to figure out what makes these particular lyrics so popular. Many people claim that it's just the extremely catchy beats and tunes that get them into this type of music. But then, you see them humming the lyrics quite so often in clubs or even quoting them in casual coffee dates. 

Thanks again for reading and see you in the next blog post, which I hope is soon!