Filip Hossner (KInIT, filip.hossner@kinit.sk) (facilitator)
Adrián Bindas (KInIT, adrian.bindas@kinit.sk) (facilitator)
Afina Krisdianatha (a_finavien@outlook.com)
Aanila Tarannum (atarann@iu.edu)
Jurij Smrke (jurij.smrke@mirovni-institut.si)
Loizos Bitsikokos (lbitsiko@purdue.edu)
Elena Aversa (elena.aversa@polimi.it) (designer)
Github repository (includes datasets and code for data wrangling and analysis)
Video of Hashtag Co-occurrence Temporal Network on YouTube [ 1,2 ]
This project involves a micro-audit of the TikTok algorithm, in the context of the EU’s Digital Services Act (European Parliament, 2022), aiming to assess the platform’s recommender system’s compliance with the law. We used three pairs of archetypal TikTok user accounts to test the app’s personalization of content and advertisements and its profiling of users.
Social platforms face a crisis as they become riddled with disinformation, harmful narratives or inappropriate content, further the spread of propaganda and machine-generated content while the content moderation is being restricted and the behavior of recommender systems is opaque and (intentionally) obfuscated. The damaging behavior of recommender systems is difficult to overlook as it showcases bias and echo chambers, raises privacy concerns, and fails to adhere to legal obligations.
To accurately describe and examine the problematic interactions between user generated content and recommender systems, an efficient and representative analysis of the recommender behaviour is required. This can be facilitated by so-called algorithmic audits.
In general, algorithmic auditing is a process of dynamic black-box assessment of real-world (AI-based) software system behavior. Since social media recommender systems are black-boxes (we cannot analyze or influence their inner workings), audits must explore their properties behaviorally: user interactions with an algorithm are simulated (e.g., content visits), and observed responses (e.g., recommended videos) are examined for the presence of the audited phenomenon. Typically, bots or human agents are employed to simulate such user interactions.
This novel type of social media algorithm analysis is in line with the Digital Services Act (DSA) Article 37, which requires Very Large Online Platforms (VLOPs) such as TikTok, YouTube, Instagram, Facebook, LinkedIn and Very Large Online Search Engine (VLOSE) like Google Search to undergo regular assessments by external auditors to evaluate their compliance with DSA regulations. Our analysis of DSA audit reports published in Q4 2024, revealed that TikTok is the only platform that received a “negative” appraisal regarding compliance with recommender system transparency requirements.
The goal of our project is to employ algorithmic auditing to assess how TikTok recommender systems protect minors, preserve non-profiling of sensitive characteristics or allow users to turn off personalization of their video feeds.
The behavior of recommender systems has wide ethical and legal implications. DSA defines obligations for very large online platforms related to 1) recommender system transparency and user choices in recommender systems (Articles 27 & 38 of the DSA), 2) ban of profiling minors in advertisement systems (Article 28), and 3) restrictions on advertisements based on sensitive personal data (Article 26) .
This project is also closely related to the EU-funded vera.ai project, in which novel tools for supporting media professionals in combating false information are developed. Moreover, it relates to AI-Auditology, a research project in which we aim to fundamentally change the oversight of social media AI algorithms by a novel paradigm of model-based algorithmic auditing.
This summer school project serves a noble purpose and will contribute to a positive societal impact on the online media environment. Currently, media regulators, NGOs, or researchers have no real tool to quantitatively investigate whether big tech follows on their self-regulatory promises or adheres to relevant legislation on tackling recommender system biases, mitigating toxic content, or preventing misinformation spread. The findings and outcomes of this summer school project will help to advance the research and development in our long-term goal of algorithmic auditing becoming a powerful watchdog tool, fully in line with European values.
Previous study by (Mosnar et al., 2025) examining the TikTok platform has confirmed a high impact of personalisation factors, such as geographical location and passive watching on recommendations. The study has also shown low reproducibility and short-term validity of findings of previous audits.
Subsequent research on this topic, utilizing methods described above, can help examine and explain the behavior of the TikTok recommender system and improve the auditing platform.
Our data was harvested from TikTok using Zeeschuimer and performing the GDPR data export.
We have generated and saved data pertaining to our online activities. For each of the research questions we have created two users (name, age, location, bio etc.) and manually performed the initial search, following and liking actions (set-up phase) to send strong signals about our user’s preferences.After that we did one or two 1h rounds (depending on the research question) of manual browsing (scroll phase).
Considering our research questions the most important data gathered during these activities was:
the list of favorited videos with timestamps,
the list of watched videos with timestamps,
the descriptions of videos,
the hashtags used in the videos.
For more details see the datasets available in our project’s repository.
Video description texts were pre-processed using the Python language processing library nltk (Bird et al., 2009) and regular expressions. Text was turned into lowercase, hashtags were detected, punctuation and stop words were removed while text was tokenized and lemmatized. For each user archetype two sets of video recommendation metadata are available one for the set-up phase and one for the browsing phase.
Auditing of the social platform can provide valuable information about the design and behavior of its recommender systems. Analysis of data gathered during the audit can reveal details about recommender “explore” and “exploit” phases and the effect of personalization on recommendations.
The following main research question will be answered as part of the summer school project:
Do the TikTok recommender systems fulfill obligations stated by the Digital Services Act?
More specifically, we will aim to address the following specific research questions:
Are users’ sensitive and protected characteristics (e.g., health data) omitted from TikTok ’s personalized ad recommendations as required by Article 26(3) DSA?
Are minors protected from profiling-based advertisements under the requirements of Article 28 DSA?
Does TikTok enable users to opt-out of personalized recommendations in accordance with Article 38 DSA?
We tested whether opting out of “personalized feed” is effective for avoiding personally curated content, as ruled under 38 of the DSA. We also tested whether minors are profiled for advertising, which is explicitly prohibited in Article 28. Finally, we simulated two young adult users with health issues to check whether their personal information on physical illnesses would influence the advertisements on their for-you page. As information concerning a user's health is a protected characteristic as stated in Article 9(1) of Regulation (EU) 2016/679 (European Parliament, 2016), the Article 26(3) DSA explicitly prohibits such personalization.
To simulate authentic and believable user behaviour on a TikTok we created 6 different user accounts on TikTok that were constructed to resemble “user archetypes”, i.e. users with specific characteristics. Two health-related archetypes included: (1) a 21 year old Dutch male student who has type 2 diabetes, and (2) a 21 year old Dutch female student who is worried about sexually transmitted infections. Archetypes of minors included: (1) a 16 year old Dutch girl who is interested in skincare routines, and (2) a 15 year old Dutch boy who is neurodivergent. For the advertisement-related audit we customized two different profiles of the same user archetype, a 22-year old Dutch man who is interested in travelling content, with one profile opting out of personalized recommendations and the other opting in.
The accounts were created using mobile application and email, for each account we set a nickname, date of birth and bio - short user description visible under profile. Each audit began with three seed searches that signaled the user’s interests to the recommender system. We watched, liked, and saved 10 videos from the search results and followed creators who posted relevant videos. The audit was executed manually on the Firefox browser with users based in the Netherlands.
Table: User archetype characteristics. | |||
Archetype | Gender | Search Terms | Hashtag Terms |
Travel (Opt-in) | Male | what to wear, best games 2025, best country in asia travel | #fashion #gaming #traveltok |
Travel (Opt-out) | Male | what to wear, best games 2025, best country in asia travel | #fashion #gaming #traveltok |
Minor (autism) | neurodivergent, autism, video games | #neurodivergent #stimming #autismcommunity | |
Minor (skincare) | Female | olivia rodrigo, skincare routine, outfit inspo | #aliviarodrigo #skincare #dutchgirlthings |
Health (diabetes) | Male | diabetes type 2, diabetes meal ideas, diabetes reversal | #diabetes #diabetes_treatments #diabetesrecipe |
Health (STI) | Female | std signs females, std symptoms vs uti, std prevention | std, stdawareness, stdproblems |
To aid in the interpretation of the results, hashtag semantic networks were created for each archetype. Nodes in the network represent hashtags, and two hashtags are connected if they appear in the same video. Edges can be weighted according to the number of video co-occurrences, and a temporal aspect can be incorporated to reflect the timing of when hashtags appear together in the scrolling process. Node attributes indicate whether a hashtag appears during the setup phase, the scrolling phase, or both. An example semantic network can be seen in Figure 1.
Our first browsing session (a couple of hours after the set-up phase) yielded a very small amount of personalized content, however that amount increased in our second browsing session (a day after the set-up phase). This means one (or both) of two things:
feed personalization only kicks in after about a day;
the recommender algorithm only had enough data to work well after also gathering data from the first browsing session.
Article 26, paragraph 3 of the DSA states that online platforms are prohibited from displaying ads based on profiling that utilizes sensitive personal data. This includes information like a person’s race, religion, political opinions, sexual orientation, health status, etc.
These types of data are considered “special categories” under EU privacy law (Article 9 GDPR). The goal of this rule is to protect users from being unfairly targeted or discriminated against based on deeply personal characteristics when they are shown ads online.
In our audit scenarios, we created two archetypal user profiles who were concerned about diabetes and sexually-transmitted infections (STIs). These concerns were signaled to the recommender system through seed searches of relevant terms.
The for-you page of an account indicating that the user suffers from diabetes did not initially adapt to their searches and engagement with diabetes-related content. However, during a second round of scrolling the following day, the feed significantly featured diabetes-related videos.
These videos were primarily educational or recipe-focused, but they also included misinformation regarding "diabetes reversal". Notably, no health-related advertisements were found in the feed during approximately two hours of browsing.
Following these seed searches, the algorithm filled the for-you page with softcore sexual content, some of them generated by AI. On the next day, Sanne’s feed received more educational content on sexual health, although sexual videos were strewn in there as well.
Figure 2: Networks of the hashtags for videos appearing in the set-up and scrolling phases. In red, the hashtags appearing in both phases.Minors face greater risks on TikTok because its recommendation systems can amplify harmful content, impacting their mental health and compromising their privacy. Protecting them from profiling-based advertising is one of the key objectives of the DSA, as seen under Article 28, which restricts online platforms from profiling minors based on personal information.
In the Children’s Privacy Policy, it is worth noting that there is a wording contradiction wherein TikTok states that they “do not engage in profiling which results in legal or similarly significant effects, as defined under applicable law.” However, the platform still uses minor users’ interactions (likes, comments, etc) to predict and suggest content that aligns with their interests, and continuously refines recommendations on the for-you page. This act of predicting minors’ interests based on their online behavior is defined as ‘profiling’ under Article 4(4) of the General Data Protection Regulation (GDPR), as referred to in Article 28 of the DSA.
Both accounts belonging to minors received content that can be identified as advertisements, as defined in the DSA. These videos were labeled “paid partnership” and contained the hashtag #ad. The ads were related to PC gaming and skincare (with HEMA partnership). Both topics are relevant to the minor accounts’ seed searches. Ads shown to minors were tailored to their interests.
Personalized advertisements curated to the interests of the minors’ accounts indicate a clear misalignment between regulatory requirements and platform compliance under Article 28 of the DSA, which prohibits profiling minors for advertising purposes based on their personal data, raising concerns about the platform’s adherence to the law.
Figure 3: Scrolling sequence per category: focus on ads frequencyWe have created two identical users (name, age, location) and manually performed the initial search, following and liking actions (set-up phase) to send strong signals about our users preferences: we performed interest in fashion, gaming and travelling.
After that we did two rounds (about 1 hour each) of manual browsing (scroll phase). We have liked and bookmarked all the videos we considered in line with our interests.During analysis we have used the bookmarks as the main indicator of the recommender algorithm doing the job of personalization.
Based on our micro audit, we can say that opting out of feed personalization does not significantly change what content gets displayed on your for you page/feed. In other words, it seems opting out of content personalization does not have expected impact.
Figure 4: Scrolling sequence: focus on favourite contentThe GDPR data is ready for download very quickly after the request (matter of minutes), but it can take up to a day for it to get populated with all the data. Generally we observe a lag of a couple of hours.
Liking behaves erratically. It is common for likes to disappear from the interface of newly created users. They might or might not eventually show up in the GDPR data.
You can not turn off feed/search personalization in the browser, you need to use the mobile app.
You can not follow hashtags in the browser, you need to use the mobile app.
Based on the Digital Services Act (DSA), addressing disinformation and transparency of online content in the European Union, we have performed a micro-audit on TikTok to investigate compliance of the social platform with 3 articles of DSA. The audit was conducted using user profiles based on user archetypes resembling real users on social platforms. In order to signal interests to the recommender, the user accounts underwent “set-up” by watching and interacting with videos and following creators based on their interests. Afterwards, users were scrolling through their “for you page” to observe recommendations from the system. Collected were recommended videos and data on each video along with GDPR data for each user.
RQ 1: Are users’ sensitive and protected characteristics (e.g., health data) omitted from TikTok ’s personalized ad recommendations as required by Article 26(3) DSA?
Based on our findings, no advertisements related to users’ health data were found in the feed. This behaviour of the recommender system is in compliance with the DSA. Note that the scroll phase was conducted just a few hours after the set-up phase. As noted in general findings, in order to capture more representative recommendations, scroll phase should have been executed after more significant delay.
Additionally multiple cases of sexual content and misinformation were spotted, which is concerning given the sensitivity of health-related topics and their possible negative impact on users.
RQ 2: Are minors protected from profiling-based advertisements under the requirements of Article 28 DSA?
Micro-audit has shown that videos marked as “Advertisement” were not shown to minor users (aged 14 and 15). However videos tagged as “Promotional content” or “#ad” appeared on feeds of minors. Annotation and categorization of feeds revealed that both minor users were subjected to personalization and were recommended videos marked as “Promotional content” or “#ad”. Appearance of this type of content, identified as advertisement as defined in the DSA, indicates a misalignment between requirements stated in the DSA and platform compliance.
RQ 3: Does TikTok enable users to opt-out of personalized recommendations in accordance with Article 38 DSA?
Comparison of opt-in and opt-out feeds revealed that the user who opted out of the personalized recommendations received a lower amount of personalized content (15.34% of relevant content) than the user who opted in (18.22% of relevant content) during the same amount of time. However as the personalized content still appeared for the opted-out user, the opt-out does not have expected impact, signaling a possible failure to adhere to the DSA requirements.
Generally, the results have indicated that the TikTok platform might be failing to fulfill their obligations defined by the DSA (as shown in RQ2 and RQ3). However micro-audits are not a sufficient method to decide with enough certainty.
The landscape of social platforms is hostile to any auditor and researcher, giving a need for a more efficient method of auditing. Currently, one such way is an employment of an algorithmic auditing leveraging automated agents to assess recommender systems and provide data on their behaviour.
If the results suggested by this micro-audit were backed by a larger amount of data and verified using a more comprehensive audit, the platform's refusal to adhere to legal obligations would indicate a failure to protect minors and preserve users’ rights.
Mosnar, M., Skurla, A., Pecher, B., Tibensky, M., Jakubcik, J., Bindas, A., Sakalik, P., & Srba, I. (2025). Revisiting Algorithmic Audits of TikTok: Poor Reproducibility and Short-term Validity of Findings. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3357–3366). ACM.
European Parliament, & Council of the European Union. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016… (GDPR). Official Journal of the European Union, L 119, 1–88.
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python.
European Parliament & Council of the European Union. (2022). Regulation (EU) 2022/2065 of the European Parliament and of the Council of 19 October 2022 on a single market for digital services … (Digital Services Act). Official Journal of the European Union, L 277, 1–102.