--======== Review Reports ========--
The review report from reviewer #1:
*1: Is the paper relevant to Bigdata?
[_] No
[X] Yes
*2: How innovative is the paper?
[_] 5 (Very innovative)
[_] 4 (Innovative)
[X] 3 (Marginally)
[_] 2 (Not very much)
[_] 1 (Not)
[_] 0 (Not at all)
*3: How would you rate the technical quality of the paper?
[_] 5 (Very high)
[_] 4 (High)
[_] 3 (Good)
[X] 2 (Needs improvement)
[_] 1 (Low)
[_] 0 (Very low)
*4: How is the presentation?
[_] 5 (Excellent)
[X] 4 (Good)
[_] 3 (Above average)
[_] 2 (Below average)
[_] 1 (Fair)
[_] 0 (Poor)
*5: Is the paper of interest to Bigdata users and practitioners?
[X] 3 (Yes)
[_] 2 (May be)
[_] 1 (No)
[_] 0 (Not applicable)
*6: What is your confidence in your review of this paper?
[X] 2 (High)
[_] 1 (Medium)
[_] 0 (Low)
*7: Overall recommendation
[_] 5 (Strong Accept: top quality)
[_] 4 (Accept: a regular paper)
[_] 3 (Weak Accept: could be a poster or a short paper)
[_] 2 (Weak Reject: don't like it, but won't argue to reject it)
[X] 1 (Reject: will argue to reject it)
[_] 0 (Strong Reject: hopeless)
*8: Detailed comments for the authors
This paper touches upon an interesting topic, namely the mapping of US-China disputes as captured in the Twitter stream. The authors seem to be inspired by political-science studies, which seek to understand the changing dynamics of US-China relations through manual discourse analysis of news articles, and try to take advantage of big data analysis upon Twitter data to derive insights on this issue.
However, the study presented in the paper is still at a very early stage, the methodology followed has several drawbacks, and the results presented are not significant. In particular:
1) Your analysis uses a sample of the twitter stream, which is selected based on a number of hashtags. However, many tweets that may be relevant to the topics explored may not use these particular hashtags. Often, twitter users may use other synonyms, acronyms, slang, and filtering based on a wider selection of keywords could produce more representative datasets. How can you extend the number of collected tweets. Also, how can you overcome the limitations that the Twitter API places on the messages it returns in response to keyword queries (I assume you are not using Firehose?).
2) Not all Twitter accounts are equal, or created equal. Therefore, if one needs to understand the dynamics of the discourse, one needs to take into account how relevant, credible, influential a twitter account that sends messages of relevance is, and where it originates from? For example, according to many studies, twitter accounts that are very active in twitting and re-twitting, are bots. Including such accounts in your study could introduce significant biases and affect the accuracy of your conclusions. Similarly, not all twitter accounts have the same number of followers and, therefore, your dataset generation may result in tweets that very few people read. Also, many tweets are sent to circulate links to more extended articles and possibly these articles may convey a more accurate view regarding the sentiment of a tweet. It is not clear if you deal with such cases and how?
3) You are doing sentiment analysis on the contents of tweets retrieved according to your criteria. But are all these tweets relevant to your analysis? Maybe a significant amount of them do not really talk about politics or business and could affect your results. How do you deal with noise and irrelevant content?
Overall, the collection of tweets needs to be revisited so that you are sure about the robustness of your dataset. Also, please examine the terms of use of Twitter, because I am not sure if you are allowed to redistribute a dataset collected from Twitter without violating their terms.
4) The analysis methodology is not sufficiently developed: there are no research hypotheses or theory behind this analysis, and no particular research questions stated. Consequently, the observations presented in the paper do not reveal too much and it 's not clear how this paper advances the state of the art.
========================================================
The review report from reviewer #2:
*1: Is the paper relevant to Bigdata?
[_] No
[X] Yes
*2: How innovative is the paper?
[_] 5 (Very innovative)
[_] 4 (Innovative)
[_] 3 (Marginally)
[_] 2 (Not very much)
[X] 1 (Not)
[_] 0 (Not at all)
*3: How would you rate the technical quality of the paper?
[_] 5 (Very high)
[_] 4 (High)
[_] 3 (Good)
[_] 2 (Needs improvement)
[X] 1 (Low)
[_] 0 (Very low)
*4: How is the presentation?
[_] 5 (Excellent)
[_] 4 (Good)
[_] 3 (Above average)
[_] 2 (Below average)
[X] 1 (Fair)
[_] 0 (Poor)
*5: Is the paper of interest to Bigdata users and practitioners?
[_] 3 (Yes)
[X] 2 (May be)
[_] 1 (No)
[_] 0 (Not applicable)
*6: What is your confidence in your review of this paper?
[_] 2 (High)
[X] 1 (Medium)
[_] 0 (Low)
*7: Overall recommendation
[_] 5 (Strong Accept: top quality)
[_] 4 (Accept: a regular paper)
[_] 3 (Weak Accept: could be a poster or a short paper)
[_] 2 (Weak Reject: don't like it, but won't argue to reject it)
[X] 1 (Reject: will argue to reject it)
[_] 0 (Strong Reject: hopeless)
*8: Detailed comments for the authors
The manuscript entitled “Depicting U.S.-China Disputes on Tech Giants through Social Media: An Attempt of Computational Political Communication" included data analysis for user data on Twitter containing 3 Chinese Tech giants and 7 English-speaking countries. From the data analysis, the authors concluded that: (1) the popularity of discussions about certain countries and companies is inconsistent, and might be event-induced; (2) the discourse of all these companies is interrelated rather than separated. Word2vec and an existing sentiment analysis model were also used in the analysis.
Although the topic is interesting, there are some major issues with the manuscript.
My main reservation about this manuscript is that this study is incomplete.
First, in the Introduction part, the authors listed some existing methods and claimed that the approach in this manuscript is more comprehensive and real-time focused. However, I didn’t see real-time analysis or models with predictive power in the manuscript, rather than just plotting collected data as a function of time.
Second, at the end of section III B, when discussing Figure 2, the authors mentioned “It indicates that discourse of all these companies is inter-related.” This means that word2vec doesn’t show its power to separate words associated with such 3 companies. Actually, there are neutral words in Figure 2 that should be filtered, such as “were”, “me” and “did”. The authors should (1) clean the results or do some cleaning when preprocessing the data; (2) do some case studies on the resulting words and (3) try other approaches if word2vec does not work.
Third, when comparing Fig 3 vs 4, and Fig 5 vs 6, the authors concluded that “And the rankings of the number of positive and negative tweets containing country names and company names are generally the same.” This means the positive and negative tweets were not actually well classified, and thus still highly correlated to other common factors. More work is needed to address this issue.
Besides, the manuscript is not very innovative, as the main algorithm is an existed model (“fuzzy rule based unsupervised sentiment analysis technique developed by Vashishtha and her colleague for analyzing social media posts”). The authors mentioned they used an “adapted” version, but didn’t give details on what was adapted and how much effort it was.
Moreover, the presentation of the manuscript is not at high quality. The axis labels in Figure 1 are too small to be seen, and the given link for original figures is not valid. Similarly for Figures 2-4.
========================================================
The review report from reviewer #3:
*1: Is the paper relevant to Bigdata?
[_] No
[X] Yes
*2: How innovative is the paper?
[_] 5 (Very innovative)
[_] 4 (Innovative)
[_] 3 (Marginally)
[X] 2 (Not very much)
[_] 1 (Not)
[_] 0 (Not at all)
*3: How would you rate the technical quality of the paper?
[_] 5 (Very high)
[_] 4 (High)
[_] 3 (Good)
[_] 2 (Needs improvement)
[X] 1 (Low)
[_] 0 (Very low)
*4: How is the presentation?
[_] 5 (Excellent)
[_] 4 (Good)
[_] 3 (Above average)
[_] 2 (Below average)
[X] 1 (Fair)
[_] 0 (Poor)
*5: Is the paper of interest to Bigdata users and practitioners?
[_] 3 (Yes)
[_] 2 (May be)
[X] 1 (No)
[_] 0 (Not applicable)
*6: What is your confidence in your review of this paper?
[X] 2 (High)
[_] 1 (Medium)
[_] 0 (Low)
*7: Overall recommendation
[_] 5 (Strong Accept: top quality)
[_] 4 (Accept: a regular paper)
[_] 3 (Weak Accept: could be a poster or a short paper)
[_] 2 (Weak Reject: don't like it, but won't argue to reject it)
[X] 1 (Reject: will argue to reject it)
[_] 0 (Strong Reject: hopeless)
*8: Detailed comments for the authors
In current state, the paper is not eligible for publication due to the reasons discussed in the following. In my opinion the focus of this work should be improved and both abstract and introduction should reflect that clearly.
Anyway, regardless these issues, the scientific contribute of the paper is not relevant. In the first part of the paper, the authors elaborate on the US-China political relationship and the role of the main IT and social media companies, without focusing on the scientific contribution of the paper in the field of data analysis. The paper does not present a new methodology or algorithm of sentiment analysis, but uses an existing technique to extract some results about sentiment polarization of social media users. However, such results are poorly described and, in addition, the charts proposed are too small and practically illegible. Finally, no details have been provided about the tests have been executed, e.g., what about the algorithm' settings used in the experiments?
========================================================
The review report from reviewer #4:
*1: Is the paper relevant to Bigdata?
[_] No
[X] Yes
*2: How innovative is the paper?
[_] 5 (Very innovative)
[_] 4 (Innovative)
[_] 3 (Marginally)
[X] 2 (Not very much)
[_] 1 (Not)
[_] 0 (Not at all)
*3: How would you rate the technical quality of the paper?
[_] 5 (Very high)
[_] 4 (High)
[_] 3 (Good)
[_] 2 (Needs improvement)
[X] 1 (Low)
[_] 0 (Very low)
*4: How is the presentation?
[_] 5 (Excellent)
[_] 4 (Good)
[_] 3 (Above average)
[X] 2 (Below average)
[_] 1 (Fair)
[_] 0 (Poor)
*5: Is the paper of interest to Bigdata users and practitioners?
[_] 3 (Yes)
[X] 2 (May be)
[_] 1 (No)
[_] 0 (Not applicable)
*6: What is your confidence in your review of this paper?
[X] 2 (High)
[_] 1 (Medium)
[_] 0 (Low)
*7: Overall recommendation
[_] 5 (Strong Accept: top quality)
[_] 4 (Accept: a regular paper)
[_] 3 (Weak Accept: could be a poster or a short paper)
[_] 2 (Weak Reject: don't like it, but won't argue to reject it)
[X] 1 (Reject: will argue to reject it)
[_] 0 (Strong Reject: hopeless)
*8: Detailed comments for the authors
Detailed comments:
1. Used the abbreviation of VICS in the abstract (The Verb In Context System), but it is introduce only at page 2?
1) Figure cannot be able to exam them well? The Figure 1 is a purely image, it is not be able to know the details even after zoom in.
a. Figure 1, could we add a country name as column for the sub-figures without needing of counting?
b. Could we add a row to explain the company to help the readers to quickly find out the company and country harvested?
c. Same scale?
d. Left is more negative; right is more positive?