With the explosion of social media in e-commerce such as service portals, websites,
social networks, and online entertainment channels, it has created a fertile ground for
researchers to customer study. Moreover, with the spread of the 4.0 technology revolution,
machine learning is considered a very useful tool for online business forecasting and
analysis problems. Based on these two trends, the paper proposes a way to detect the
interest topics of online customers to apply to customer data analysis problems,
forecasting problems or application in the recommendation system. The approach of the
paper is based on analyzing customer historical data. The goal is to analyze and classify
topics of interest to customers based on a number of supervised machine learning
algorithms
14 trang |
Chia sẻ: Thục Anh | Ngày: 13/05/2022 | Lượt xem: 408 | Lượt tải: 0
Nội dung tài liệu Phát hiện chủ đề quan tâm của khách hàng trực tuyến bằng học máy, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
77.14 95.45 61.48 84.22 83.03 96.09 67.35
comp.graphics 81.48 67.33 90.00 56.93 82.73 76.60 91.69 72.65
comp.os.ms-windows.misc 81.14 65.91 87.16 58.07 84.15 55.35 87.44 72.62
comp.sys.ibm.pc.hardware 78.62 71.25 87.73 65.45 79.10 79.99 90.16 71.84
comp.sys.mac.hardware 73.52 72.37 90.57 63.07 71.50 80.77 92.20 62.03
comp.windows.x 80.97 73.25 92.73 58.30 81.55 80.65 93.76 72.94
misc.forsale 83.36 76.25 91.14 61.14 83.26 83.12 92.59 71.63
rec.autos 79.28 75.91 93.86 59.66 82.49 78.30 94.78 70.20
rec.motorcycles 84.32 80.42 95.45 62.16 86.26 84.77 96.12 74.78
rec.sport.baseball 82.81 70.57 96.82 63.18 82.76 79.76 97.28 70.14
rec.sport.hockey 87.27 70.84 97.95 66.14 88.68 79.66 98.24 72.56
sci.crypt 84.66 65.11 94.43 61.59 86.38 76.57 95.30 65.59
sci.electronics 78.72 75.91 91.36 57.84 82.82 83.03 92.74 67.31
sci.med 82.27 63.64 93.30 61.82 84.49 75.68 94.34 70.97
960
sci.space 81.93 72.27 95.91 66.48 83.50 80.26 96.46 80.80
soc.religion.christian 85.80 62.00 98.07 72.95 88.18 74.95 98.33 77.73
talk.politics.guns 79.98 71.02 94.43 76.14 83.50 78.88 95.24 75.64
talk.politics.mideast 80.57 69.08 96.82 65.23 81.36 77.85 97.26 68.31
talk.politics.misc 75.64 72.16 87.61 69.66 78.96 80.25 90.12 68.83
talk.religion.misc 79.25 75.10 93.07 70.57 82.91 82.07 94.15 71.35
Average of labels 81.12 71.38 93.19 63.89 82.94 78.58 94.21 73.15
The results of Semeval2017 with Accuracy and F1-score in the Table 4, in which
shows that the MNB algorithm reaches the highest accuracy value in 4/4 labels, average
results on all labels, MNB for the highest accuracy value, followed by W2V, CNN and K-
NN on Accuracy and F1-score.
Table 4: Results of Semeval2017 with Accuracy and F1-score
Labels Accuracy F1-score
CNN W2V MNB K-NN CNN T2V MNB 67.94
anger 64.04 66.18 78.67 53.47 59.69 69.58 79.71 69.56
fear 59.69 66.36 76.12 56.22 54.05 66.99 77.27 40.70
joy 65.18 72.81 78.47 60.41 55.39 75.74 79.45 68.59
sadness 62.08 65.65 78.67 55.61 61.54 71.56 80.26 61.70
Average of labels 62.75 67.75 77.98 56.43 57.66 70.97 79.17 67.94
The results of Sample Vietnamese with Accuracy and F1-score in the Table 5, in
which shows that the MNB algorithm reaches the highest accuracy value in 10/10 labels,
average results on all labels, MNB for the highest accuracy value, followed by W2V, CNN
and K-NN on Accuracy and F1-score.
Table 5: Results of Sample Vietnamese with Accuracy and F1-score
Labels Accuracy F1-score
CNN W2V MNB K-NN CNN T2V MNB K-NN
Chính trị 67.14 72.14 77.14 64.29 26.29 71.12 75.08 66.06
Đời sống - Xã hội 68.57 50.00 76.43 64.29 28.90 58.94 68.58 45.84
Giáo dục 68.57 55.48 72.86 46.43 60.63 60.98 69.82 63.37
Khoa học - Công nghệ 59.29 59.29 67.14 36.43 43.71 38.94 54.30 53.14
Kinh doanh 67.14 63.57 75.71 58.57 26.10 54.62 56.29 36.52
Thời sự 62.14 47.86 62.86 41.43 44.54 47.32 31.57 18.31
Văn hóa - Giải trí 67.86 65.71 76.43 36.43 56.16 61.14 58.12 63.86
Pháp luật 75.71 85.00 85.71 64.29 54.43 79.49 75.86 70.72
Thể thao 85.71 82.75 84.29 36.43 71.66 64.27 69.91 63.86
Sức khỏe 75.71 70.71 85.71 59.29 66.60 66.64 78.41 60.11
Average of labels 69.79 65.25 76.43 50.79 47.90 60.34 63.79 54.18
961
5. Conclusions and policy implications
5.1. Policy implications
Use of Machine Learning is one of those changes that will make people work
differently and will make business environments different in future. Besides, it is another
big difference between Data Science and Business Data Analytics, so the conversation
flows nicely from the previous part.
In this article, text data from social media trends are analyzed for customer in the
world. Collected text data from social media are modeled with two approaches: use-centric
based and object-centric based. Text data from social media are used in modeling as
textual information can often be noisy and coarse. Four algorithms in machine learning are
CNN, MNB, W2V and K-NN which are supervised learning algorithms is trained in
WEKA to check the effectiveness of our representation. Text data are analyzed to find
popular customer topics, which are categorized. Obtained results indicate that the
methodology can be used in the development of information filtering and prediction
systems. The proposed methodology can also be used to find customer interests and apply
in business problems such as page ranking, collaborative filter, automatic translation of
documents, security applications, named entity recognition, speech recognition, problems
of classify, etc.
The following steps are all going to be using machine learning in your business:
First, understanding what the difference between Artificial Intelligence and Machine
Learning. Machine Learning is a subset of Artificial Intelligence field, it is a predefined
programming model which is trained by a huge number of data to make predictions. ML
can help you to automate daily human processes and make a decision/judgment. Seconds,
study your business processes and identify which processes can be ML-enabled. Third,
data collection and feature extraction for machine learning, this are the keys to machine
learning. The best practice is storing all data in a database for future better data analysis
and management. Forth, find the best model, your firm have training data and then run
different models and tests to find the best model based on the training data. Fifth, verify the
accuracy of the model and then finally, measure the ROI, the last and most important step
is to measure the ROI of whole Machine Learning implementation.
Machine learning algorithms were also integrated in data analysis tools such as R
which is a programming language developed by Ross Ihaka and Robert Gentleman in
1993. R possesses an extensive catalog of statistical and graphical methods. It includes
machine learning algorithm, linear regression, time series, statistical inference to name a
few; Python which for data analysis and interactive computing and data visualization,
Python will inevitably draw comparisons with other open source and commercial
programming languages and tools in wide use, such as R, MATLAB, SAS, Stata, and
others. In recent years, Python’s improved support for libraries (such as pandas and scikit-
learn) has made it a popular choice for data analysis tasks. Combined with Python’s overall
strength for general-purpose software engineering, it is an excellent option as a primary
language for building data applications and so on.
962
5.2. Conclusion
This paper considered the problem of topic interest classify with the distinction of
online customers. There are three datasets of text label with text content are built and
introduced, one in Vietnamese, another in English. Based on results of experiment could
see that the MNB algorithm in machine learning is the best result with text data in social
media. The result of paper could be apply to customer data analysis problems, forecasting
problems or application in the recommendation system. This are problems which is
concerned in firms nowaday.
REFERENCES
1. A.M. Kibriya, E. Frank, B. Pfahringer and G. Holmes (2004), Multinomial Naive
Bayes for Text Categorization Revisited, in: Proceedings of the 17th Australian Joint
Conference on Advances in Artificial Intelligence, AI’04, Springer-Verlag, Berlin,
Heidelberg, pp. 488-499.
2. Ahmad Abdul-Rahim, et al., (2014), "Determinants of Online Buying Behavior of
Social Media Customers in Saudi Arabia: An Exploratory Study," India, 2014.
3. Alex Smola and S.V.N. Vishwanathan, (2008), “Introduction to Machine Learning”,
Cambridge University Press The Edinburgh Building, Cambridge CB2 2RU, UK
4. Charles Steinfield, et al., (2017), "Online Social Network Sites and the Concept of
Social Capital," International Journal of Applied Sociology, vol. 7, no. 1, pp. 13-19,
2017.
5. E. Diaz-Aviles et al., (2013), What is Happening Right Now That Interests Me?:
Online Topic Discovery and Recommendation in Twitter. In ACM CIKM.
6. G. Salton and M.J. McGill, (1986), Introduction to Modern Information Retrieval,
McGraw-Hill, Inc., New York, NY, USA, 1986. ISBN 0070544840.
7. Guy Ido, et al., (2013), "Mining Expertise and Interests from Social Media," in
Proceedings of the 22Nd International Conference on World Wide Web , WWW '13
,Rio de Janeiro, Brazil, 2013.
8. H. Kautz, B. Selman, and M. Shah, (1997), Referral Web: combining social networks
and collaborative filtering. Communications of the ACM, 40(3):63-65, 1997.
9. J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. H. Chi, (2010), Short and tweet:
experiments on recommending content from information streams. In ACM SIGCHI, 2010.
10. Kleiton M. Bishop (2006), Pattern Recognition and Machine Learning, Springer.
11. Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L. E. & Brown, D.
E. (2019). Text Classification Algorithms: A Survey, ACM Journal.
12. L. Buitinck, J. van Amerongen, E. Tan and M. de Rijke (2015). Multi-emotion
detection in user-generated reviews. Proc. 37th European Conference on Information
Retrieval (ECIR). 2015
13. L. Hong, A. S. Doumith, and B. D. Davison, (2013), Co-factorization Machines:
Modeling User Interests and Predicting Individual Decisions in Twitter. In ACM
WSDM, 2013.
963
14. Lee, D. D., and Seung, H. S., (2001), Algorithms for nonnegative matrix factorization.
Advances Neural Information Processing Systems 13:556-562.
15. M. F. Schwartz and D. C. M. Wood, (1993), Discovering shared interests using graph
analysis. Communications of the ACM, 36(8):78-89, 1993.
16. M. H. Nguyen, (2018), On the Distinction of Subjectivity and Objectivity of Emotions
in Texts. International Journal of Advanced Computer Science and Applications
(IJACSA), 9(9), p.584-589, 2018.
17. M. Michelson and S. A. Macskassy, (2010), Discovering customers’ topics of interest
on Twitter: a first look. In ACM Workshop on Analytics for Noisy Unstructured Text
Data, 2010.
18. R.G. Rossi, R.M. Marcacini and S.O. Rezende, (2013), Benchmarking Text Collections
for Classification and Clustering Tasks, Technical Report, 395, Institute of
Mathematics and Computer Sciences - University of Sao Paulo, 2013.
19. S. M. Mohammad and F. Bravo-Marquez (2017), Emotion Intensities in Tweets. In
Proceedings of the sixth joint conference on lexical and computational semantics
(*Sem), August 2017, Vancouver, Canada.
20. S.M. Mohammad and S. Kiritchenko, (2015), Using Hashtags to Capture Fine
Emotion Categories from Tweets, Computational Intelligence 31(2) (2015), 301-326.
21. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, (2013), Distributed
representations of words and phrases and their compositionality. In Proceedings of the
26th International Conference on Neural Information Processing Systems - Volume 2
(NIPS'13), Vol. 2. Curran Associates Inc., USA, 3111-3119.
22. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. (2013), Efficient estimation of word
representations in vector space. arXiv 2013, arXiv:1301-3781.
23. Tang Jiliang, et al.,(2013) "Mining Social Media with Social Theories: A Survey,"
SIGKDD Explor. Newsl., vol. 15, no. 2, pp. 20-29, 2013.
24. Xiang, L.; Yuan, Q.; Zhao, S.; Chen, L.; Zhang, X.; Yang, Q.; and Sun, J. (2010),
Temporal recommendation on graphs via long- and short-term preference fusion. In
Proc. 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD 2010).
25. Y. Kim (2014), Convolutional Neural Networks for Sentence Classification,
Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP-2014), Doha, Qatar 2014, pp.1746-1751.
26. Z. e. a. Abbassi, (2015), "Optimizing Display Advertising in Online Social Networks,"
in Proceedings of the 24th International Conference on World Wide Web, WWW '15,
Florence, Italy, 2015.
27. https://data.world/crowdflower/sentiment-analysis-in-text
Các file đính kèm theo tài liệu này:
- phat_hien_chu_de_quan_tam_cua_khach_hang_truc_tuyen_bang_hoc.pdf