Giovanna Salazar (10848223)
Agnieszka Walewinder (10848290)
María Belén Muñoz Román (10848673)
Since its foundation in 2006, Twitter has positioned itself as one of the most popular microblogging tools that is used daily by the Internet users. By microblogging, we mean a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web (Kwak et al., 2010). As of June 30, 2014, it counted around 271 million active users on a monthly basis, and every day about 500 million Tweets are published.
Throughout its existence, Twitter has already transitioned from being a banal, ambient friend-following platform that posed the following question to users: what are you doing?, this period ran from 2006 to 2009, to an event/news-following type of platform interested in knowing whats happening? from 2009-2012 (Rogers 2013). From 2012 onwards, Twitter changed its question to a generic tagline that reads compose new tweet. Each one of these periods is described by Richard Rogers as Twitter I, Twitter II, and Twitter III, in order to being able to conceive Twitter as an object of study ( see Rogers 2013).
The main research project in which we participated during the Digital Methods Winter School 2015 was lead by Daniela Stockmann, Assistant Professor with Tenure Department of Political Science at Leiden University, and belongs to the strand of research which may be called Twitter impact studies, meaning, that it focuses on the study of the role of the platform in a particular event. The focus of the main research project is studying the Twitter activity related to the Hong Kong protests that happened between October 1st, 2014 and October 15, 2014, by collecting and analyzing different data sets from the Twitter Streaming API, the Twitter Search API and the Firehose API, related to the following hashtags: #hongkong #occupycentral #umbrellarevolution #occupyadmiralty #hk929 #hkstudentstrike. They particularly seek to identify bias in the information by analyzing the data sets they retrieved from the different APIs with which they work with, taking into consideration the ways the APIs work.
Twitters Streaming API returns real-time data from Twitter by providing some parameters, such as keywords, user IDs or geographical parameters; it provides non-historical data with a 1% rate limit, meaning that the data set must be below 1% of the whole volume of Twitter traffic; the data sets returned through this API, is catalogued by Twitter as being conformed by the most relevant tweets. On the other hand, Twitters Searching API provides historical data sets from Twitter by querying a username or a hashtag. The Firehose API is a private data set that allows access to 100% of all public tweets, without including deleted accounts and deleted tweets.
It is worth noting that the main project also retrieved data sets from the Sina Weibo and the Tencent Weibo, which are microblogging platforms in China, similar to Twitter, nevertheless, the present research report is focused only on the data sets from Twitter, specifically from the Streaming and the Searching APIs, and the data they both registered for October 13th and 14th, 2014, mainly because this is the data and assigned dates that were given to the present sub-group.
The Searching API and Streaming API datasets were retrieved by using the Digital Methods Initiative toolset named DMI Twitter Capture and Analysis Toolset (DMI-TCAT), which is a tool intended for capturing and analyzing Twitter data (Borra and Rieder 262).
The datasets used were: HongKongProtests, which corresponds to the Streaming API data, and the hongkonglookups, which, in turn, contains the Searching API data. In both cases the following query was introduced: #hongkong OR #occupycentral OR #umbrellarevolution OR #occupyadmiralty OR #hk929 OR #hkstudentstrike in the Query field, and the date parameters were set from October 13th, 2014 to October 14th, 2014.
Among the myriad of options regarding data that could be obtained, the following datasheets were extracted: (i) Hashtag frequency, which displayed the top used hashtags; (ii) User visibility (mention frequency), which showed statistics on users who were mentioned the most in other users tweets; (iii) Identical tweet frequency (RT), which ordered the results based on the content that was retweeted the most; and (iv) User stats (individual), which showed an overall information about the single users who tweeted about the Hong Kong protests, but only the data regarding the total amount of tweets per user was used.
Due to time constraints, and taking into account the fact that this research dealt with big data, the subsequent analysis focused on the top 10 results of each category (hashtags, mentions, retweets and users) within the Search and Streaming API data in the given dates (October 13th and 14th, 2014).
In this section the findings of the analysis will be described. The top 10 hashtags will be the first category introduced, one column with contain the results from the Streaming API, another one the results of the Searching API, along with their frequency. The differences and the similarities will be described, such findings are the main focus of the report. This same procedure will be used for the following categories: mentions, retweets and users.
Comparison between top 10 hashtags on the 13th (with frequency)
SEARCH API - October 13th // STREAMING API - October 13th
In both APIs the first four places remained the same. 5th and 6th switched place, meaning that #hk and #umbreallamovement has different place on the APIs. The Streaming API has #HongKongProtest while the Searching one does not include it, instead of that it has #香港 (Hong Kong) on the 10th place but it is not on the list of the Streaming API. Due to this, the list has a slide from the 7th place onward. The frequency in general shows a higher result in the streaming API on the hashtags.
Comparison between top 10 hashtags on the 14th (with frequency)
SEARCH API - October 14th // STREAMING API - October 14th
The first 7 hashtags are the same for both at the Search and Streaming APIs in terms of order and the names: HongKong, OccupyCentral, UmbrellaRevolution, OccupyHK, Argentina, HK and UmbrellaMovement. The hashtag HongKongProtests appears only in the search API whereas the hashtag Admiralty appears only in the streaming API. Hashtag China is listed on both APIs, but in a different order 8th place in the search API and 9th place in streaming API.
The frequency of hashtags is apparently higher in the streaming API than in the search one, for example: hashtag hongkong has appeared 13,491 times on Twitter according to search API and 14,366 times according to streaming API which gives us a difference of 875. Another example of hashtag is occupyhk which appeared 3034 times according to search API and 3,229 times according to stream API which shows a difference of 195.
#Argentina appears on the 14th because of a soccer game that took place in Hong Kong on the same day as a protest. Since the end was 7-0 for Argentina, several tweet appeared with the hashtag combination of Hong Kong and Argentina, leading it to the top 10 hashtag list.
Comparison between top 10 mention frequency within Search and Streaming APIs on the 13th
|1. hkdemonow (1034)||1. hkdemonow (1101)|
|2. youngposthk (864)||2. youngposthk (901)|
|3. fion_li (688)||3. fion_li (720)|
|4. SCMPVideoMoJo (451)||4. SCMPVideoMoJo (470)|
|5. BBCBreaking (439)||5. BBCBreaking (466)|
|6. leungfaye (418)||6. galileo44 (448)|
|7. galileo44 (414)||7. leungfaye (432)|
|8. PenguinSix (341)||8. freakingcat (366)|
|9. freakingcat (309)||9. PenguinSix (356)|
|10. Zuki_Zucchini (290)||10. Zuki_Zucchini (301)|
Both Twitter APIs top 10 lists include the exact same mentions, following the same order from the first place to the fifth. The main difference is the switch between the places of the 6th and 7th, and 8th-9th between the Streaming and Search APIs. The ones that were switched are leungfaye and galileo44, also, the pair of PenguinSix and freakingcat. The frequency is higher in the case of the Streaming API.
Comparison between top 10 mention frequency within Search and Streaming APIs on the 14th
|1. hkdemonow (1765)||1. hkdemonow (1866)|
|2. freakingcat (840)||2. freakingcat (866)|
|3. fion_li (778)||3. fion_li (812)|
|4. SCMP_News (667)||4. SCMP_News (702)|
|5. youngposthk (568)||5. youngposthk (592)|
|6. JournoDannyAsia (398)||6. JournoDannyAsia (420)|
|7. Zuki_Zucchini (398)||7. Zuki_Zucchini (362)|
|8. nytchinese (301)||8. nytchinese (312)|
|9. wilfredchan (300)||9. wilfredchan (311)|
|10. george_chen (286)||10. george_chen (308)|
Both the Search and Streaming APIs include the same mentions within the top 10 lists, in exactly the same order, although also in this case, the Streaming APIs frequency is higher compare to the Search one.
In order to make the comparison as visible as possible, this section will include descriptive tables.
Comparison between top 10 retweet within Search and Streaming APIs on the 13th
|RT @BBCBreaking: #HongKong #OccupyCentral - police say their goal is to clear road blocks to restore traffic & not to clear demonstrators||253|
|RT @cnnireport: Its been two weeks and protesters are still sleeping in the streets of #HongKong: http://t.co/QHZs6RVZyI http://t.co/yaeb||210|
|RT @aguerosergiokun: #HongKong http://t.co/19Gr41GJc5||202|
|RT @BBCBreaking: #HongKong police begin removing barricades erected by pro-democracy protesters||190|
|RT @BBCNewsAsia: Clashes between #HongKong pro-democracy activists and #OccupyHK opponents http://t.co/3gpMw1Q9Ly http://t.co/qhb6f9iFHs||169|
|RT @gloomynews: 香港の民主占拠デモ現場にマスク姿の反占拠デモ隊が大量乱入、反占拠派タクシー集団がバリケード突破目指し突入、親北京デモ隊が大規模行進開始との情報。 RT @SCMPVideoMoJo #HongKong #OccupyCentral http://||166|
|RT @adamnajberg: When the Hong Kong police take away metal barricades #OccupyCentral protesters build their own. http://t.co/3cra1RL3AJ||157|
|RT @BBCWorld: Clashes in #Hong Kong as masked men move in on #OccupyCentral protesters http://t.co/wcqtgpTtuX||131|
|RT @JeromeTaylor: These are the kind of people Chinese state media have called radicals & thugs #HongKongProtests http://t.co/8yFfghAlu7||129|
|RT @SCMPChinese: #OccupyCentral 【學聯港府明對話或難實現 學聯望港府今午後決定】有港媒引述消息稱，雙方在明天下午實現對話的可能性很微，學聯常委方志信表示，希望港府在今天下午之前有所決定。http://t.co/80ieCMhzbu http://||122|
|RT @BBCBreaking: #HongKong #OccupyCentral - police say their goal is to clear road blocks to restore traffic & not to clear demonstrators||241|
|RT @cnnireport: Its been two weeks and protesters are still sleeping in the streets of #HongKong: http://t.co/QHZs6RVZyI http://t.co/yaeb||185|
|RT @BBCBreaking: #HongKong police begin removing barricades erected by pro-democracy protesters||177|
|RT @aguerosergiokun: #HongKong http://t.co/19Gr41GJc5||169|
|RT @BBCNewsAsia: Clashes between #HongKong pro-democracy activists and #OccupyHK opponents http://t.co/3gpMw1Q9Ly http://t.co/qhb6f9iFHs||163|
|RT @gloomynews: 香港の民主占拠デモ現場にマスク姿の反占拠デモ隊が大量乱入、反占拠派タクシー集団がバリケード突破目指し突入、親北京デモ隊が大規模行進開始との情報。 RT @SCMPVideoMoJo #HongKong #OccupyCentral http://||161|
|RT @adamnajberg: When the Hong Kong police take away metal barricades #OccupyCentral protesters build their own. http://t.co/3cra1RL3AJ||147|
|RT @SCMPChinese: #OccupyCentral 【學聯港府明對話或難實現 學聯望港府今午後決定】有港媒引述消息稱，雙方在明天下午實現對話的可能性很微，學聯常委方志信表示，希望港府在今天下午之前有所決定。http://t.co/80ieCMhzbu http://||123|
|RT @BBCWorld: Clashes in #Hong Kong as masked men move in on #OccupyCentral protesters http://t.co/wcqtgpTtuX||120|
|RT @WilliamsJon: Perhaps most incredible photo of #HongKong you will ever see: protests last night via @hkdemonow http://t.co/hSuYMXHTCF||109|
The first two position in the case of retweets remained the same in both APIs during the observation. The retweets on the third and fourth places were switched between the search and the Stream APIs. The fifth, sixth and seventh places stayed the same on the lists.
The eighth place from the Stream API appears on the ninth place on the Search API (@BBCWorld). The ninth place of the Stream API is not on the list of the Search API (@JeromeTaylor). The tenth place in case of the Stream API had a switch to the ninth place on the Search API (@SCMPChinese). The tenth place of the Search API is not included in the list of the Stream API (@WilliamsJon).
Comparison between top 10 Retweets on Search and Streaming APIs from the 14thStreaming API
|RT @nytchinese: 周一，香港数百反占中人士试图拆除路障时与示威者爆发冲突，并指责美国背后指使。By @ChuBailiang @PekingMike #OccupyCentral #HongKong http://t.co/Ppc7EqqxHf||132|
|RT @JohnSaeki: Barricades give the finger in #hongkong http://t.co/Q5YJ2b4PnF||103|
|RT @fion_li: Police removing reinforced barricades at Queensway #OccupyCentral #occupyhk http://t.co/bUFGdQ97A3||84|
|RT @george_chen: BREAKING: Pro-Beijing anti-#OccupyCentral protesters tried to block @nytimes HK distributions http://t.co/M3mTkGWf3P http||84|
|RT @VivienneChow: Together #HKers guard their city. Pic via Wan Leung #OccupyCentral #UmbrellaMovement #art #culture #hope #HongKong http:||78|
|RT @tomgrundy: Barriers being reinforced in tunnel #OccupyHK #Occupycentral http://t.co/tvQXF5kQ8p||71|
|RT @arabthomness: #Hongkong: scene from Hong Kong tonight after police tried to take down road blocks. #OccupyHK #UmbrellaRevolution http:/||70|
|RT @wilfredchan: just happened: protesters successfully hold off riot police in Lung Wo Road with umbrellas barricades #OccupyCentral http||64|
|RT @PhelimKine: #China govt mouthpiece People's Daily gives #HongKong #OccupyCentral a #Tiananmen era warning http://t.co/SNHuI6B0Qr http:/||62|
|RT @cronicaweb: ¡Goool de #Argentina! Paliza al poderosísimo #HongKong. Gaitán con un terrible zurdazo pone el partido 3-0... #Ohhhhhhhh||61|
|RT @nytchinese: 周一，香港数百反占中人士试图拆除路障时与示威者爆发冲突，并指责美国背后指使。By @ChuBailiang @PekingMike #OccupyCentral #HongKong http://t.co/Ppc7EqqxHf||128|
|RT @JohnSaeki: Barricades give the finger in #hongkong http://t.co/Q5YJ2b4PnF||98|
|RT @christineparis9: Это гениально ❗️Баррикады в Гонконге 😂 #Гонконг #HongKong @UmbrellaRevHK http://t.co/deuZwnBsBW||87|
|RT @george_chen: BREAKING: Pro-Beijing anti-#OccupyCentral protesters tried to block @nytimes HK distributions http://t.co/M3mTkGWf3P http||81|
|RT @fion_li: Police removing reinforced barricades at Queensway #OccupyCentral #occupyhk http://t.co/bUFGdQ97A3||78|
|RT @VivienneChow: Together #HKers guard their city. Pic via Wan Leung #OccupyCentral #UmbrellaMovement #art #culture #hope #HongKong http:||76|
|RT @tomgrundy: Barriers being reinforced in tunnel #OccupyHK #Occupycentral http://t.co/tvQXF5kQ8p||68|
|RT @arabthomness: #Hongkong: scene from Hong Kong tonight after police tried to take down road blocks. #OccupyHK #UmbrellaRevolution http:/||65|
|RT @wilfredchan: just happened: protesters successfully hold off riot police in Lung Wo Road with umbrellas barricades #OccupyCentral http||63|
|RT @cronicaweb: ¡Goool de #Argentina! Paliza al poderosísimo #HongKong. Gaitán con un terrible zurdazo pone el partido 3-0... #Ohhhhhhhh||58|
The first two positions on both top 10 lists show the same retweets, namely the ones from nytchinese and JohnSaeki . The rest of the retweets appear on both APIs, but in different order in some cases. The 10th position in both lists shows a retweet from the same user, namely cronicaweb. There are two exceptions:
Retweet from christineparis9 (3rd position) only on the Search API
Retweet from PhelimKine (9th position) only shows up on the Streaming API
Just as at the case of hashtags, also here there is a retweet that appears because of the soccer game between Argentina and Hong Kong.
Comparison between top 10 Users on Search and Streaming APIs from the 13th
|Search API||Stream API|
|from_user_name (tweets in data set)||from_user_name (tweets in data set)|
|1. BoomboomFengur (344)||1. BoomboomFengur (448)|
|2. hongkongcang (263)||2. rightnowio_feed (267)|
|3. FollowHKNews (255)||3. hongkongcang (263)|
|4. rightnowio_feed (247)||4. FollowHKNews (255)|
|5. hk928umbrella (241)||5. hk928umbrella (242)|
|6. GodBlessFreedom (193)||6. iamthor_us (199)|
|7. iamthor_us (193)||7. Daoish (198)|
|8. askabear81 (182)||8. askabear81 (182)|
|9. tax_free (173)||9. tax_free (175)|
|10. kelvw (172)||10. kelvw (172)|
As it is visible, there is a 90 percent match in case of users on the 13th, comparing to the Streaming and Searching API. Out of the one different user, the rest remained the same and only the order of them changed.
The first user remained the same at both cases. The second user of Search API appears on the third place of the Streaming one. Because that, a slide appeared and the third place of the Search API takes place at the fourth place at the Streaming one. The fifth place is the same at both APIs. The sixth places user only appeared at the Search API, the sixth place of the Streaming APis at the seventh place of the Search API. The8th, 9th and 10th places are the same in the orders at both APIs.
Comparison between top 10 Users on Search and Streaming APIs from the 14th
|Search API||Stream API|
|from_user_name (tweets in data set)||from_user_name (tweets in data set)|
|1. hk928umbrella (389)||1. hk928umbrella (393)|
|2. freakingcat (338)||2. BoomboomFengur (339)|
|3. FollowHKNews (324)||3. freakingcat (338)|
|4. BoomboomFengur (288)||4. FollowHKNews (324)|
|5. hongkongcang (208)||5. rightnowio_feed (255)|
|6. tax_free (192)||6. hongkongcang (208)|
|7. hkdemonow (189)||7. hkdemonow (194)|
|8. kelvw (182)||8. tax_free (193)|
|9. akiba2013 (176)||9. kelvw (184)|
|10. Winter_IceCream (171)||10. akiba2013 (178)|
As it can be seen, 9 out of 10 users are the same for both search and streaming APIs. The user hk928umbrella appears as the first one on both lists. There is only one user name Winter_IceCream that is shown by Search API and not by the Stream and one user named rightnowio_feed which is not shown by the Search API, but appears on Streaming API.
The main purpose of the present report was to identify the differences among the datasets from the Streaming API and the Searching API, that were retrieved for the main research project on Hong Kong protests. We were assigned the dates October 13th and 14th, 2014. Our analysis showed that the datasets from the APIs from the 13th and 14th did not have relevant differences.
In the particular case of hashtags, for October 13th, the Streaming and the Search API showed that nine out of ten hashtags remained the same but they appeared in different orders. On the 14th, seven out of ten hashtags were the same, they also appeared in different order. We were also able to notice that the Streaming API showed much higher frequency than on other days. In regards to mentions, both days matched and showed the same results. Nevertheless, on the 13th, two positions were switched, otherwise, everything else remained in the same order. In the case of retweets the majority remained in the same place of the top 10 list in both APIs. Finally, same results appeared in the case of users. 90 percent of the users remained the same during the 13th and 14th, so there was only one change in the top 10 users. Apart from this, there were small changes in the order within the list but it does not lead to main differences between the APIs. The streaming APIs frequency remained higher, even though the difference in general was smaller.
Borra E. and Rieder B., (2014) " Programmed method: developing a toolset for capturing and analyzing tweets." Aslib Journal of Information Management, 66(2): 262 - 278.
Kwak, H., Lee, Ch., Park, H. and Moon, S. What is Twitter, a social network or a news media? Proceedings of the 19th international conference on World Wide Web. New York, 2010. 591-600.
Rogers, R. (2013). "Debanalizing Twitter: The Transformation of an Object of Study." Proceedings of ACM Web Science 2013. Paris: May 2013.
Twitter. Twitter Reports Second Quarter 2014 Results. Investor Twitterinc. December 2, 2014. January 15, 2014 < https://investor.twitterinc.com/releasedetail.cfm?ReleaseID=862505>.