Irony Detection in the Use of Emojis

Posted by Zeses Pitenis on December 12th

Annotating tweets as literal or ironic with LightTag

Ever since I started working on my dissertation on offensive language detection, I was really enthusiastic about annotating and working with textual data. Now, observing Internet language in microblogging platforms like Twitter, I decided to pitch this idea to my annotators in a conference project (more on that soon): Irony is sometimes expressed through emojis. Why don't we pick some and check how they work in a sentence? So me and Efthymia started thinking of this project and picked three tricky emojis: 😂, 👍 and 🙂. In general, we chose the aforementioned emojis intuitively because we think that they’re the most prominent when expressing irony. That goes to say, we think they might serve as indicators of extralinguistic behavior.

After checking their Emojipedia entries we settled on labelling the tweets containing each emoji as Laughter, Approval and Contentment respectively. Our group of annotators consisted of undergraduates, a graduate and Masters/ PhD students from Aristotle University of Thessaloniki (AUTH), Greece, all Modern Greek speakers that study/studied language. Modern Greek hasn’t been studied extensively in sentiment analysis, that is why it has been the focus of this blog so far.

Part 1: Collecting and Sampling the Tweets

The collection of tweets took place over a week and concluded at the end of November, when the annotation team started working on the project. Each one of the three emojis was used as a search keyword for tweets. Once again tweepy handled the data mining and urls and user mentions were removed, with user mentions substituted by @USER for the sake of subjectivity. The initial data collected contained 40,276 tweets and below you can see the label distribution.

Laughter       36821
Approval        2573
Contentment      882
Name: label, dtype: int64

Since we are testing emojis and how they may be used ironically, we need to narrow down the tweets by sampling those with more characters, thus getting more linguistic content to help our annotators decide. We settled on 70 characters or more, so as to not exclude that much data. However, 😂 is the most used in Twitter and returned the majority of collected tweets. Tweets with 😂, even watered down to tweets containing 70 characters or more, returned an immense number of samples (11,243), so we began by excluding it from the initial sampling.

#Get tweets with 70 chars and more

tweets_70_chars = combined[combined['clean_text'].map(len) > 70]

#Get a sample from the Laughter label and re-concat the dfs

laughter_70_chars = tweets_70_chars[tweets_70_chars['label'] == 'Laughter']
no_laughter = tweets_70_chars.loc[tweets_70_chars['label'] != 'Laughter']
sampled_laughter = laughter_70_chars.sample(n=997, random_state=1)

dataset = pd.concat([no_laughter,sampled_laughter])

The sampled tweets containing 70 characters or more returned a small number of tweets with 🙂 and almost double the tweets with 👍, so the logical thing was to reflect the original distribution by sampling triple the tweets with 😂. Let's see the label distribution in the final dataset we used for annotation, a total of 2013 tweets.

Laughter       997
Approval       678
Contentment    338
Name: label, dtype: int64

Part 2: LightTag and Annotators' Observations

LightTag is the first annotation tool I've ever used and I still do because it is constantly updating with better functionality for its features and because it has a clean interface and it's simple and easy to use for any annotation task. So naturally, we decided to proceed with annotation using LightTag. We uploaded the collected and cleaned dataset in csv format in the platform, set up the name and the specifics of the task (e.g. number of annotators per data point, the designated team and plain guidelines indicating the common meaning of the emojis in question) and boom! Our team started annotating the tweets as literal or non literal/ironic, regarding the use of the three emojis we are looking into. The six annotators had to see approximately 1000 tweets each in two teams of three, to provide each tweet with a golden label.

Interface for project management (Older projects have been digitally removed)
Interface for Annotators
Interface for Annotators with Classes

After completing their tasks in LightTag, our annotators shared their comments on the process. The three emojis were rated as having graded difficulty. The ‘Face With Tears of Joy Emoji’ (😂) was admittedly the most controversial and ambiguous of all. When ironic, it was thought to serve as a mild, under-the-radar type of ironic comment that aimed to avoid possible confrontation, as it didn’t entirely lose its designated literal sense. The annotators agreed that to be 100% sure of their decision they would need access to the tweets’ context, that is attached photos, videos or other tweets. Also, it was observed that the cases that were the most challenging were the ones were the text was linguistically ironic, making it difficult to discern the emoji’s contribution to the irony being expressed. Therefore, it was proposed that in future studies, it would be interesting to examine whether the irony was caused by the emoji or the text separately, or whether text and emoji both contributed to it.

Moreover, the ‘Slightly Smiling Face Emoji’ (🙂) and the ‘Thumbs Up Sign Emoji’ (👍) indicating contentment and approval respectively, were deemed more straightforward and less enigmatic in both their literal and their illiteral/ironic sense. To conclude, the annotators were almost entirely guided by their linguistic ability. Now that we presented our data collection strategy, as well as what happened during the annotation process, let's move into the Python side of things. The annotation is finished so, after downloading the json file from LightTag, the next step would be to transform the results in a good and tidy format to work with. You can find all the code and information for this, in the very helpful LightTag docs.

Part 3: Working with Annotation Results

Let's manipulate the data to draw some inferences. We first load the data and perform some cleaning on the original sentiment labels, which are in the metadata column with some, not so fancy, regex substitutions. The initial dataset is pictured below.

import pandas as pd
import re

data = pd.read_csv('emoji_results.csv', encoding='utf-8')
Annotated Tweet DataFrame
data['metadata'] =  [re.sub(r'[^(Laughter|Contentment|Approval)]','', str(x)) for x in data['metadata']]   
data['metadata'] =  [re.sub(r'lael','', str(x)) for x in data['metadata']]
data['metadata'] =  [re.sub(r'taleata','', str(x)) for x in data['metadata']]  

To continue from here, we define two functions, one to get the golden label a.k.a. the true intention of the emoji used, and a second to calculate the agreement between our annotators. We are calculating agreement based on the label assignments for each tweet and not between individual annotators, as each tweet was viewed by randomly assigned teams of three. However, due to confusion that occured during annotation, we are removing the REV (as in review) instances, because they have mistakenly been seen by more than three annotators, thus they do not have a golden label.

def lit_or_not(row):
    LIT = row['Literal']
    IRO = row['Non Literal/Ironic']
    if LIT > IRO:
        return 'LIT'
    if IRO > LIT:
        return 'IRO'
    if IRO == LIT:
        return 'REV'
def agreement(row):
    LIT = row['Literal']
    IRO = row['Non Literal/Ironic']
    if LIT == 2 and IRO == 1 or IRO == 2 and LIT == 1:
        return '66%'
    if LIT >= 3 or IRO >= 3:
        return '100%'

data['label'] = data.apply(lambda row: lit_or_not(row), axis=1)

data = data[data.label != 'REV']

data['agreement'] = data.apply(lambda row: agreement(row), axis=1)

After assigning the label and agreement columns, two last changes to our dataframe, for the tidyness reasons, is to give the columns appropriate names and drop anything we don't need. We can now come to some conclusions.

data = data.rename(columns={'content':'tweet', 'metadata':'sentiment'})

data = data.drop(columns=['example_id', 'Literal', 'Non Literal/Ironic'])

We need to calculate the co-occurence of the automatic labels assigned based on the emoji found in the tweet with the golden labels assigned by the annotators.

co_occurence = data.groupby(['sentiment', 'label']).size().to_frame('count').reset_index()
     sentiment label  count
0     Approval   IRO    127
1     Approval   LIT    550
2  Contentment   IRO     64
3  Contentment   LIT    273
4     Laughter   IRO    615
5     Laughter   LIT    375

Thoughts and Conclusions

In a sample of 2004 tweets, our annotators fully agreed 52.35% of the time. Of the emojis, 😂, which was the most represented in the dataset, was found to also be the most ironic of all three, at 62.12% of all instances. The second most frequent emoji, 👍, was ironic only 18.76% of the time while 🙂 18.99% of the time. It is worth mentioning that more data is needed to draw clearer conclusions but in this pilot project it is evident that 😂 needs further examination as it is the most common emoji in social media platforms - and the Oxford Word of the Year 2015- and seems to be used ambiguously. We can also see the ambiguity of the tears of joy emoji by taking a look at the agreement by tweet of the annotators.

confusion = data.groupby(['sentiment', 'agreement']).size().to_frame('count').reset_index()
     sentiment agreement  count
0     Approval      100%    473
1     Approval       66%    192
2  Contentment      100%    223
3  Contentment       66%    111
4     Laughter      100%    353
5     Laughter       66%    624

While for the other emojis our annotators mostly agreed, regarding 😂 there were fewer times where the annotators fully agreed. In the future, we hope that we could extend the scope of this project to more emojis and of course more data to annotate. Nevertheless, emojis are a nice idea for a data collection strategy for sentiment analysis tasks, as they extend the writer's message to provide additional and valuable information on their feelings.

Team members in alphabetical order:

Efthymia Apokatanidis, Co-Author/Masters Student in Theoretical and Applied Linguistics

Pavlos Avramidis, Undegraduate Philology Student, Department of Linguistics

Vasilis Ioannidis, Philology Graduate, Department of Medieval and Modern Greek Studies

Georgia Karathanasi, Masters Student in Theoretical and Applied Linguistics

Maria Nomikou, Undegraduate Philology Student, Department of Linguistics

Theodoros Xioufis, PhD Candidate in Theoretical Linguistics