Exploring Genocide Discourse on YouTube: A Case Study of the Israel - Hamas War

Mila Georgieva, Valerie Cortés, Shiyun Qian, Talida Munteanu, Steven Delmotte

Introduction

Over the last decade, social media has evolved into a powerful force, playing a pivotal role in shaping public opinions and influencing discussions across a wide range of topics. Among these digital platforms, YouTube stands out as a preeminent video-sharing platform, registering no more or less than a billion hours of daily content consumption by users worldwide (Dean 2023). The spectrum of content creators on YouTube, as highlighted by Rieder et al. (2022), is remarkably diverse, encompassing “amateurs engaging in intimate sharing of their everyday experiences, to star YouTubers with millions of subscribers, to established television networks and music labels that use the platform to distribute their content to mass audiences, and in particular younger viewers” (Rieder et al. 2022, p. 2). Hence, we now assist in the phenomenon of micro-celebrity (Lewis 2020), where strong voices can influence their followers’ opinions and reshape their perspectives, feeding them with information or entertainment.

As through the medium of audiovisual communication users engage in the creation and dissemination of content, regardless of their level of expertise, divergent opinions are polarized on platforms like YouTube. This participatory digital landscape usually comes with its challenges. Extensive research has illuminated the prevalence of extreme political content (Ribeiro et al. 2020) and the propagation of misinformation (Bounegru et al. 2020) on YouTube. Ha et al. (2022) have gone a step further, characterizing the platform as a fertile ground for the germination of conspiracy theories. When it comes to mainstream media, Glaesener (2023) raises the concern of their dominance on YouTube when investigating the German YouTube sources of information on the Russia-Ukraine war. Hence, in his paper, Glaesener (2023) highlights that understanding the influence of mainstream media on the platform is crucial, posing the question of whether YouTube’s content is dominated by mainstream media or diversified through channels independent of traditional media.

In the context of the October 7 attacks by Hamas, followed by Israel’s response, a plethora of controversies have been created among social media users, whose discourse has taken different contours. Besides this, TikTok has faced allegations of influencing young minds regarding Gaza, as stated by Malik (2023), while Palestinians have claimed that their content is not being promoted on social media platforms (Siddiqui et al. 2023). Hence, we aimed to understand how content is being displayed on YouTube mainstream media, which informs users about the Hamas-Israel war, and how commenters react to YouTube videos on the topic of genocide.

Considering these aspects, our research embarks on an exploration of the discourse on YouTube, with a specific focus on the nuanced and complex topic of genocide within the context of the Israel-Hamas war. Following Glaesener’s study (2023), we aimed to investigate the dynamics of content dissemination, urging researchers to delve into source diversity for a comprehensive understanding of the digital landscape, along with how discourse is created around the genocide. To illuminate this landscape, we analyze search results from specific queries, including “genocide”, “Gaza genocide”, “Jewish genocide”, and “October 7 genocide”, during discrete periods spanning October, November, and December. By scrutinizing these queries, our research aims to not only uncover prevailing thematic structures but also to discern patterns in the evolution of content over time.

Drawing on insights from Rieder et al.’s (2023) exploration of YouTube’s influence on political opinions, our research seeks to challenge prevailing assumptions. The conventional wisdom of YouTube’s Western-centric dominance in political discourse is subjected to scrutiny. Acknowledging the complexity of opinions within the digital realm, our paper adopts a digital methods approach (Rogers 2019), enabling us to uncover thematic structures, scrutinize the most connected video network, and trace the evolution of discourse from a user’s perspective.

Research Questions

Considering the previous section, our research paper aims to answer three main questions.
  • RQ 1: What are the predominant topics identified within the videos between October 7 - December 14?
  • RQ 2: What type of information clusters could be found within the video network?
  • RQ 3: How do the comment sections of the top 3 videos for this period inform us about the discourse on the war?

Methodology

The first stage of our data collection process started using YouTube Data Tools (Rieder 2015) to gather a video list for all four queries in different periods. Then, we downloaded the video transcripts for each dataset. In total, we collected 1.997 video transcripts, some of them were in French, Hindi, Spanish, and Hebrew but they were translated into English to conduct thematic analysis. We constructed a dataset for each query per month and uploaded them into 4CAT (Peeters & Hagen 2018) to tokenize the transcripts and remove stopwords or emojis. This part of our methodology was inspired by the one used by Shekhar & Saini (2021) in their research using topic modeling. We started with the data scraping step and moved into the data cleaning process, both automated by the tools we used (Youtube Data Tools and 4CAT).

The next step was exploratory data analysis “to understand better the main features of data, variables, and relationships that hold them” (Shekhar & Saini 2021). Thus, we exported and analyzed the most frequent bigrams and words for each data set and analyzed them.

During this process, we encountered similar findings in all 12 data sets (each query had 3 different datasets based on timestamp), meaning that there were no significant changes, thematically speaking, during the three months for each query. Instead, we identified more general topics that were dominant through all datasets during the three months. Based on this preliminary finding, we restructured our data corpus and merged all transcripts into one single text (214.290 characters). One of our challenges was that ChatGPT 3.5 in its free version has a limit of characters to process and our corpus surpassed that limit. Thus, we uploaded it into 4CAT and extracted a list of bigrams with their frequency. Then, we analyzed the most frequent bigrams and processed 300 of them using an AI tool (ChatGPT 3.5.), providing the instruction to use Latent Dirichlet Allocation (LDA) (Blei et. al. 2001) as the topic modeling algorithm to select five topics from the bigrams. The number of topics was randomly selected after doing a qualitative analysis ourselves and finding six general topics. The prompt we used with GPT goes as follows:

[You [GPT 3.5] are a topic modeling expert. Prompt: Using Latent Dirichlet Allocation algorithm, you are going to (1) find six dominant topics from the following bigrams (2) provide a name for each topic.
[header word_1 word_2 value]
[list of bigrams]

To identify the information clusters within our data, using YouTube Data Tools (Rieder 2015), we downloaded a co-commenting network of all four queries. The result was a network with 1.709 videos (nodes) and 44.942 edges linked based on the users’ commenting patterns. For this step, we used Gephi to visualize the network, filter the clusters, and identify different user communities. Then, we implemented a qualitative analysis of the most viewed and commented videos in the network to find how these videos are connected and how users engage with the videos they commented on.

For the final direction of analysis, the top 4.000 comments for each of the top three videos (most viewed and commented ones) were analyzed, first extracting them with YouTube Data Tools’ Video Comments module (Rieder 2015). Later on, those comments were explored using 4CAT’s word tree module as well as the Jason Davies Word Tree website (Wattenberg and Viégas 2008). As already existing work on the relevance of word trees shows, this "visualization and information-retrieval technique [...] enables rapid querying and exploration of bodies of text" (Wattenberg and Viégas 2008). Therefore, in the context of our research paper, it allows us to identify and visually present the wider narratives and patterns within the comment sections of the videos. Since the number of comments in each section is above 28K, each sample of 4000 comments is not a fully representative sample, thus one of the limitations of this analytical approach. Finally, the presented Word Tree in this paper (Fig. 1), highlights an even more filtered selection from all 3 videos’ comment sections, serving only as a template for understanding the process of assembling the word tree analysis and the corresponding findings. The comments in this “template” were selected manually, based on the already existing observations of the research process, therefore narrowing down the overall representativeness of the vast amount of comments even more. As shown in Figure 4, the main keywords used as “root words” in all three cases were “Genocide”, “Israel”, “Palestine”, and “Hamas”. The keywords were identified with the help of 4CAT’s processors and further manual exploration of the datasets, with a final goal of extracting information that is relevant to the context of “Genocide”.

Fig.1. Word tree template for understanding the findings from the comment sections.

Findings

The results reveal three major findings. First, contrary to our expectations, there is no significant development in the discourse on genocide in the context of the Israel-Hamas war. Instead, contextualizing the four queries leads to relatively similar search results. For the video transcripts, we found that the most frequent bigrams for the four queries were similar throughout the three months of analysis. With the bigrams from our data corpus (all transcripts merged), we identified manually six topics that were predominantly and compared them with the ones delivered by GPT3.5, as seen in Fig. 2. Comparatively, we found that two out of five topics (Israel-Palestine conflict and political issues/international relations) are similar in keywords and the remaining three were clustered by GPT into even border categories than the ones we identified.

Fig.2. Comparison of topics identified within 300 most frequent bigrams.

Additionally, as shown in Fig.3, Gaza is the most paired word in the top 30 bigrams, it is connected with words around the conflict itself (bombing, Israel, war, strip) and population impact (food, people, Palestinians). The lack of other visible frequent pairings can be due to the extension of our transcript corpus that allowed different semantic versions for similar words (e.g. support, supported, supports).

Likewise, in all three months, the same three videos were the most viewed and commented on. Thus, our second major finding is that while most videos in our dataset come from official media organization channels, such as Fox News, AJ Jazzeera English, and NBC News, the most viewed and commented ones are by individual content creators [e.g. the channel of Priya Jain, creating educational content, tailored for India (Jain 2018); Last Week Tonight’s channel - a “news satire television program hosted by comedian John Oliver” (“Last Week Tonight with John Oliver” 2024), and the TV show Piers Morgan Uncensored (Morgan 2021).

When it comes to the findings from the three comment sections of the most viewed and commented videos, we see that Priya Jain has the biggest diversity in terms of polarized opinions and engagement in the comment section, also considering their number - more than 94K. Here, the topic of genocide is understood through self-identification and references to other historical events, such as being “Kashmiri Hindus” as a predisposition to support Israel, or the “Sikh community in 1984”, the “Mappilah riot”, and “Bangladesh Liberation War” as reasons to support or condemn the discussed genocide. There is a strongly expressed support for either Palestine or Israel, communicated through the national belonging of Indians, in comments such as “i am indian i support palestine” or “indians are with israel”. This generalization of the statements is the prevalent form of commenting on this video. The most apparent connection to the topic of genocide, however, develops around three other terms - “Israel”, “Hamas”, and “Terrorism”. A very strong critique goes in the direction of Israel in comments like “israel is right but rapping women…” and the overall frustration with the violence and actions against humanity as “hamas is terrorist yes or cowards who kill women", “hamas and islam threat to humanity”, “hamas terrorists killed jewish children”.