Skip to main content

Knowledge Extraction from Twitter Towards Infectious Diseases in Spanish

  • Conference paper
  • First Online:
Technologies and Innovation (CITI 2020)

Abstract

Infodemiology consists in the extraction and analysis of data compiled on the Internet regarding public health. Among other applications, Infodemiology can be used to analyse trends on social networks in order to determine the prevalence of outbreaks of infectious diseases in certain regions. This valuable data provides better understanding of the spread of infectious diseases as well as a vision about social perception of citizens towards the strategies carried out by public healthcare institutions. In this work, we apply Natural Language Processing techniques to determine the impact of outbreaks of infectious diseases such as Zika, Dengue or Chikungunya from a compiled dataset with tweets written in Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ajao, O., Bhowmik, D., Zargari, S.: Fake news identification on twitter with hybrid CNN and RNN models. In: Proceedings of the 9th International Conference on Social Media and Society, pp. 226–230 (2018)

    Google Scholar 

  2. Apolinardo-Arzube, O., García-Díaz, J.A., Medina-Moreira, J., Luna-Aveiga, H., Valencia-García, R.: Evaluating information-retrieval models and machine-learning classifiers for measuring the social perception towards infectious diseases. Appl. Sci. (2019). https://doi.org/10.3390/app9142858

    Article  Google Scholar 

  3. Apolinario-Arzube, Ó., Medina-Moreira, J., Luna-Aveiga, H., García-Díaz, J.A., Valencia-García, R., Estrade-Cabrera, J.I.: Prevención de enfermedades infecciosas basada en el análisis inteligente en rrss y participación ciudadana. Procesamiento del Lenguaje Nat. 63, 163–166 (2019)

    Google Scholar 

  4. Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–2204 (2010)

    Google Scholar 

  5. Badillo, S., et al.: An introduction to machine learning. Clin. Pharmacol. Ther. 107(4), 871–885 (2020)

    Article  Google Scholar 

  6. Baviera, T.: Técnicas para el análisis de sentimiento en twitter: aprendizaje automático supervisado y sentistrength. Rev. Dígitos 1(3), 33–50 (2017)

    Google Scholar 

  7. Chandrasekaran, N., et al.: The utility of social media in providing information on Zika virus. Cureus 9(10), e1792 (2017)

    Google Scholar 

  8. Chew, C., Eysenbach, G.: Pandemics in the age of Twitter: content analysis of tweets during the 2009 H1N1 outbreak. PLoS ONE 5(11), e14118 (2010)

    Article  Google Scholar 

  9. Cortés, V.D., Velásquez, J.D., Ibáñez, C.F.: Twitter for marijuana infodemiology. In: Proceedings of the International Conference on Web Intelligence, pp. 730–736 (2017)

    Google Scholar 

  10. Cuan-Baltazar, J.Y., Muñoz-Perez, M.J., Robledo-Vega, C., Pérez-Zepeda, M.F., Soto-Vega, E.: Misinformation of COVID-19 on the internet: infodemiology study. JMIR Public Health Surveill. 6(2), e18444 (2020). https://doi.org/10.2196/18444. http://publichealth.jmir.org/2020/2/e18444/

    Article  Google Scholar 

  11. Dey, L., Haque, S.K.: Opinion mining from noisy text data. In: Proceedings of SIGIR 2008 Workshop on Analytics for Noisy Unstructured Text Data, AND 2008 (2008). https://doi.org/10.1145/1390749.1390763

  12. Espina, K., Estuar, M.R.J.E.: Infodemiology for syndromic surveillance of dengue and typhoid fever in the Philippines. Procedia Comput. Sci. 121, 554–561 (2017). https://doi.org/10.1016/j.procs.2017.11.073. http://www.sciencedirect.com/science/article/pii/S1877050917322731

    Article  Google Scholar 

  13. Eysenbach, G.: SARS and population health technology. J. Med. Internet Res. 5(2), e14 (2003)

    Article  Google Scholar 

  14. Eysenbach, G.: Infodemiology: tracking flu-related searches on the web for syndromic surveillance. In: AMIA Annual Symposium Proceedings, vol. 2006, p. 244. American Medical Informatics Association (2006)

    Google Scholar 

  15. Eysenbach, G.: Medicine 2.0: social networking, collaboration, participation, apomediation, and openness. J. Med. Internet Res. 10(3), e22 (2008)

    Article  Google Scholar 

  16. Eysenbach, G.: Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the internet. J. Med. Internet Res. 11(1), e11 (2009)

    Article  Google Scholar 

  17. Fiesler, C., Proferes, N.: “Participant” perceptions of Twitter research ethics. Soc. Media+ Soc. 4(1) (2018). https://doi.org/10.1177/2056305118763366

  18. García-Díaz, J.A., Cánovas-García, M., Valencia-García, R.: Ontology-driven aspect-based sentiment analysis classification: an infodemiological case study regarding infectious diseases in Latin America. Future Gener. Comput. Syst. Impress 112, 641–657 (2020)

    Article  Google Scholar 

  19. García-Díaz, J.A., Cánovas-García, M., Colomo-Palacios, R., Valencia-García, R.: Detecting misogyny in Spanish tweets: an approach based on linguistics features and word embeddings. Future Gener. Comput. Syst. 114, 506–518 (2021). https://doi.org/10.1016/j.future.2020.08.032. http://www.sciencedirect.com/science/article/pii/S0167739X20301928

    Article  Google Scholar 

  20. Gu, Y., Qian, Z.S., Chen, F.: From Twitter to detector: real-time traffic incident detection using social media data. Transp. Res. Part C: Emerg. Technol. 67, 321–342 (2016)

    Article  Google Scholar 

  21. Havrlant, L., Kreinovich, V.: A simple probabilistic explanation of term frequency-inverse document frequency (TF-IDF) heuristic (and variations motivated by this explanation). Int. J. Gen. Syst. 46(1), 27–36 (2017)

    Article  MathSciNet  Google Scholar 

  22. Hernández-García, I., Giménez-Júlvez, T.: Assessment of health information about COVID-19 prevention on the internet: infodemiological study. JMIR Public Health Surveill. 6(2), e18717 (2020). https://doi.org/10.2196/18717. https://publichealth.jmir.org/2020/2/e18717

    Article  Google Scholar 

  23. Hockx-Yu, H.: The Web as History (2018)

    Google Scholar 

  24. Jeevan Nagendra Kumar, Y., Mani Sai, B., Shailaja, V., Renuka, S., Panduri, B.: Python NLTK sentiment inspection using Naïve Bayes classifier. Int. J. Recent Technol. Eng. (2019). https://doi.org/10.35940/ijrte.B1328.0982S1119

  25. Khan, A., Baharudin, B., Khan, K.: Sentiment classification using sentence-level lexical based. Trends Appl. Sci. Res. 6(10), 1141–1157 (2011)

    Article  Google Scholar 

  26. Kim, S.M., Hovy, E.: Identifying and analyzing judgment opinions. In: HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference (2006). https://doi.org/10.3115/1220835.1220861

  27. Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_35

    Chapter  Google Scholar 

  28. Lim, W.L., Ho, C.C., Ting, C.-Y.: Tweet sentiment analysis using deep learning with nearby locations as features. In: Alfred, R., Lim, Y., Haviluddin, H., On, C.K. (eds.) Computational Science and Technology. LNEE, vol. 603, pp. 291–299. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0058-9_28

    Chapter  Google Scholar 

  29. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  Google Scholar 

  30. Luna-Aveiga, H., et al.: Sentiment polarity detection in social networks: an approach for asthma disease management. In: Le, N.-T., Van Do, T., Nguyen, N.T., Thi, H.A.L. (eds.) ICCSAMA 2017. AISC, vol. 629, pp. 141–152. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61911-8_13

    Chapter  Google Scholar 

  31. Mayer, S.V., Tesh, R.B., Vasilakis, N.: The emergence of arthropod-borne viral diseases: a global prospective on Dengue, Chikungunya and Zika fevers. Acta Tropica 166, 155–163 (2017). https://doi.org/10.1016/j.actatropica.2016.11.020. http://www.sciencedirect.com/science/article/pii/S0001706X16306246

    Article  Google Scholar 

  32. García-Díaz, J.A., et al.: Opinion mining for measuring the social perception of infectious diseases. an infodemiology approach. In: Valencia-García, R., Alcaraz-Mármol, G., Del Cioppo-Morstadt, J., Vera-Lucio, N., Bucaram-Leverone, M. (eds.) CITI 2018. CCIS, vol. 883, pp. 229–239. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00940-3_17

    Chapter  Google Scholar 

  33. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)

    Google Scholar 

  34. Mostafa, M.M.: More than words: social networks’ text mining for consumer brand sentiments. Expert Syst. Appl. 40(10), 4241–4251 (2013)

    Article  Google Scholar 

  35. Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retrieval 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  36. Paredes-Valverde, M.A., Colomo-Palacios, R., Salas-Zárate, M.d.P., Valencia-García, R.: Sentiment analysis in Spanish for improvement of products and services: a deep learning approach. Sci. Program. 2017 (2017)

    Google Scholar 

  37. Patterson, J., Sammon, M., Garg, M.: Dengue, Zika and Chikungunya: emerging arboviruses in the new world. West. J. Emerg. Med. 17(6), 671 (2016)

    Article  Google Scholar 

  38. Pearce, N.: Traditional epidemiology, modern epidemiology, and public health. Am. J. Public Health 86(5), 678–683 (1996)

    Article  Google Scholar 

  39. Ramteke, J., Shah, S., Godhia, D., Shaikh, A.: Election result prediction using Twitter sentiment analysis. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 1, pp. 1–5. IEEE (2016)

    Google Scholar 

  40. Ruiz-Martínez, J.M., Valencia-García, R., García-Sánchez, F., et al.: Semantic-based sentiment analysis in financial news. In: Proceedings of the 1st International Workshop on Finance and Economics on the Semantic Web, pp. 38–51 (2012)

    Google Scholar 

  41. Salas-Zárate, M.d.P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., Rodriguez-Garcia, M.A., Valencia-Garcia, R.: Sentiment analysis on tweets about Diabetes: an aspect-level approach. Comput. Math. Methods Med. 2017 (2017)

    Google Scholar 

  42. Salas-Zárate, M.D.P., Paredes-Valverde, M.A., Limon-Romero, J., Tlapa, D., Baez-Lopez, Y.: Sentiment classification of Spanish reviews: an approach based on feature selection and machine learning methods. J. UCS 22(5), 691–708 (2016)

    MathSciNet  Google Scholar 

  43. del Pilar Salas-Zárate, M., Paredes-Valverde, M.A., Rodriguez-García, M.Á., Valencia-García, R., Alor-Hernández, G.: Automatic detection of satire in Twitter: a psycholinguistic-based approach. Knowl. Based Syst. 128, 20–33 (2017). https://doi.org/10.1016/j.knosys.2017.04.009

  44. Saldanha, T.J., Krishnan, M.S.: Organizational adoption of web 2.0 technologies: an empirical analysis. J. Organ. Comput. Electron. Commer. 22(4), 301–333 (2012)

    Article  Google Scholar 

  45. Wolfe, R.M., Sharp, L.K.: Vaccination or immunization? The impact of search terms on the internet. J. Health Commun. 10(6), 537–551 (2005). https://doi.org/10.1080/10810730500228847. pMID: 16203632

    Article  Google Scholar 

  46. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by the Spanish National Research Agency (AEI) and the European Regional Development Fund (FEDER/ERDF) through projects KBS4FIA (TIN2016-76323-R) and LaTe4PSP (PID2019-107652RB-I00). In addition, José Antonio García-Díaz has been supported by Banco Santander and University of Murcia through the Doctorado industrial programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Valencia-García .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Apolinario-Arzube, Ó., García-Díaz, J.A., Luna-Aveiga, H., Medina-Moreira, J., Valencia-García, R. (2020). Knowledge Extraction from Twitter Towards Infectious Diseases in Spanish. In: Valencia-García, R., Alcaraz-Marmol, G., Del Cioppo-Morstadt, J., Vera-Lucio, N., Bucaram-Leverone, M. (eds) Technologies and Innovation. CITI 2020. Communications in Computer and Information Science, vol 1309. Springer, Cham. https://doi.org/10.1007/978-3-030-62015-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62015-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62014-1

  • Online ISBN: 978-3-030-62015-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Morty Proxy This is a proxified and sanitized view of the page, visit original site.