Comments for the Author:
Reviewer #1: Summary:
This work provide a sentiment based analysis of tweets during the 2019 Chinese National Day. The study includes both English and Chinese tweets and explores the differences in sentiment across time and location. Additionally it presents some linguistic features to determine which topics drive the reported sentiment.
Reasons to reject:
- The paper lacks technical depth. No novel methods are introduced.
- There are some design issues in the labeling procedure which compromise the results of the study.
- It is unclear whether the reported classifier performance is on training, testing or unseen data which affects all other experiments conducted.
Detailed comments:
- The claim in line 31 stating that online opinions during specific events are "supposed to be more explicit and polarized" lacks evidence. Although there may be advantages towards event-centered analysis, these need to be elaborated more clearly.
- In the dataset collection, the authors need to expand on how they selected the 59 hashtags. We can see there are 29 hashtags for English and 15 for both simplified and traditional Chinese, however they are the same in both Chinese sets. It would be better to use a data driven approach to select the top hashtags in each subset so the collection can better match the online behavior.
- More details are required for the labeler background. Particularly are they part of the team working on this project? Additionally, there is always a gap when rating sentiment in a non-native language, this may have compromised the sentiment labeling on English tweets.
- Table 5 and the description in page 6 lines 47 to 55 point towards several issues in the labeling setup. If the labeling was properly conducted there wouldn't be need to prove this point. Moreover, individual differences can not directly be accredited to gender, major or nationality. In any case this evaluation should have been performed on every labeler to see if none of them present a different opinion.
- What are the dictionaries being used in combination with SVM? The authors just report adjusting them manually but need to provide details on which kind of terms are included and with what purpose.
- It is unclear on which data were the classifiers evaluated on. Section 4.1 states that the SVM is trained on the labeled data set, is the reported accuracy on the training set? If so, this does not constitute a proper validation of the classifiers performance. Additionally there are no details provided on the distribution of sentiment classes on the labeled dataset, the data may be imbalanced and we can not see the vulnerability until evaluated with a testing or unseen dataset.
- In the introductory paragraphs the authors emphasize on how public opinion can shape international relations. However, on the spacial analysis section they rely solely on location. The location of a user is not necessarily their place of nationality or even residency, it is not a correct to present it as if it was "sentiment by each country". It is also not proven that 2000 tweets are enough for yielding meaningful results as the authors claim.
- China and the United States consistently produce the most positive and negative tweets in both English and Chinese. This may indicate further statistical analysis is required particularly in terms of normalization and integrating innate properties of the twitter platform.
- There is no evidence to support the claims on lines 44 and 45 on page 14.
Reviewer #2: This article collects a Twitter dataset of opinions about China's National Day 2019. They invited four markers to tag 1,000 English tweets and 1,000 Chinese tweets to determine if these tweets are relevant and indicate their emotional type. Based on this training set, they used SVM to classify the remaining tweets. Then they provided a lot of analysis based on all these tweets. Basically, they analyzed that different languages (Chinese and English) and different countries may have different feelings about Chinese National Day activities. In addition, they compared common words in positive and negative emotional tweets.
I think the contribution of this article is very limited. The most serious problem of this article is that it has no technical contribution. The methods used in the analysis are well-known existing methods such as text preprocessing, SVM and visual display. The technical depth is shallow. In addition, sentiment towards China's National Day is not a key issue and deserves special research by social network researchers.