Data Science & DataLab
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data.
DataLab at Yonsei University pursues data-driven research with the slogan - "Designing Science with Data". We have been applying data science technologies to data from various domains such as publications, healthcare, social media, and data in various forms including but not limited to relational data, text data, graph data, and electronic health records data.
RESEARCH
Health Informatics
Health informatics is the application of information science and information technologies in the service of better health and better healthcare. We study, develop, and improve innovative information technologies in healthcare.
Science of Science
Science is an expanding and evolving network of ideas, scholars, and scholarly publications. Science of Science tries to use quantitative methods to understand the structure and dynamics of science as well as interactions among scientific entities.
PUBLICATIONS
Journal Articles
- Hao, J., Zhang, P., Che, C., Jin, B., & Zhu, Y. (2023). CariesFG: A fine-grained RGB image classification framework with attention mechanism for dental caries. Engineering Applications of Artificial Intelligence
- Zhang, P., Chen, J., Che, C., Zhang, L., Jin, B., & Zhu, Y. (2023). IEA-GNN: Anchor-aware Graph Neural Network Fused with Information Entropy for Node Classification and Link Prediction. Information Sciences
- Kim, D., Quan, L., Seo, M., Kim, K., Kim, J.W., & Zhu, Y. (2023). Interpretable machine learning-based approaches for understanding suicide risk and protective factors among South Korean females using survey and social media data. Suicide and Life-Threatening Behavior
- Kim, D., Jung, W., Jiang, T., & Zhu, Y. (2023). An Exploratory Study of Medical Journal’s Twitter Use: Metadata, Networks, and Content Analyses. Journal of Medical Internet Research,25:e43521
- Lou, W., He, J., Xu, Q., Zhu, Z., Lu, Q., & Zhu, Y. (2023). Rhetorical structure parallels research topic in LIS articles: a temporal bibliometrics examination. Aslib Journal of Information Management
- Zhu, Y., Quan, L., Chen, P-Y., Kim, M.C., & Che, C. (2023). Predicting coauthorship using bibliographic network embedding. Journal of the Association for Information Science & Technology, 74(4), 388-401
- Oh, H., Nam, S., & Zhu, Y. (2023). Structured Abstract Summarization of Scientific Articles: Summarization Using Full-text Section Information. Journal of the Association for Information Science & Technology, 74(2), 234-248
- Zhu, Y., Nam, S., Quan, L., Baek, J., Jeon, H., & Tang, B. (2022). Linking Suicide and Social Determinants of Health in South Korea: An Investigation of Structural Determinants. Frontiers in Public Health, 10:1022790
- Liu, Y., Zhong, Z., Che, C., & Zhu, Y. (2022). Recommendations with residual connections and negative sampling based on knowledge graphs. Knowledge-Based Systems, 258
- Shan, Y., Che, C., Wei, X., Wang, X., Zhu, Y., & Jin, B. (2022). Bi-graph attention network for aspect category sentiment classification. Knowledge-Based Systems, 258
- Nam, S., Kim, D., Jung, W., & Zhu, Y.(2022). Understanding the Research Landscape of Deep Learning in Biomedical Science: Scientometric Analysis. Journal of Medical Internet Research, 24(4):e28114
- Kim, D., Jung, W., Nam, S., Jeon, H., Baek, J., & Zhu, Y.(2022). Understanding information behavior of South Korean Twitter users who express suicidality on Twitter. Digital Health. 8
- Jung, W., Kim, D., Nam, S., & Zhu, Y.(2021). Suicidality detection on social media using metadata and text feature extraction and machine learning. Archives of Suicide Research.
- Wu, C., Yan, E., Zhu, Y., & Li, K. (2021). Gender imbalance in the productivity of funded projects: A study of the outputs of National Institutes of Health R01 grants. Journal of the Association for Information Science & Technology. 72(11), 1386-1399.
- Kim, M., Feng, Y., & Zhu, Y. (2021). Mapping scientific profile and knowledge diffusion of Library Hi Tech. Library Hi Tech, 39(2), 549-573.
- Zhu, Y., Kim, D., Yan, E., Kim, M. C., & Qi, G. (2021). Analyzing China’s research collaboration with the United States in high-impact and high-technology research. Quantitative Science Studies, 2(1), 363-375.
- Yan, E., Zhu, Y., & He, J. (2020). Analyzing academic mobility of US professors based on ORCID data and the Carnegie Classification. Quantitative Science Studies, 1(4), 1451-1467.
- Kim, M., Nam, S., Wang, F., & Zhu, Y. (2020). Mapping scientific landscapes in UMLS research: a scientometric review. Journal of the American Medical Informatics Association, 27(10), 1612-1624
- Zhu, Y., Che, C., Jin, B., Zhang, N., Su, C., & Wang, F. (2020). Knowledge-driven drug repurposing using a comprehensive drug knowledge graph. Health Informatics Journal, 26(4), 2737-2750.
- Zhu, Y., Jung, W., Wang, F., & Che, C. (2020). Drug repurposing against Parkinson’s disease by text mining the scientific literature. Library Hi Tech, 38(4), 741-750.
- Zhu, Y., Yan, E., Peroni, S., & Che, C. (2020). Nine million book items and eleven million citations: a study of book-based scholarly communication using OpenCitations. Scientometrics, 122(2), 1097-1112.
- Su, C., Tong, J., Zhu, Y., Cui, P., & Wang, F. (2020). Network embedding in biomedical data science. Briefings in Bioinformatics, 21(1), 182-197.
- Zhu, Y., Olivier, E., Pathak, J., & Wang, F. (2019). Drug knowledge bases and their applications in biomedical informatics research. Briefings in Bioinformatics, 20(4), 1308-1321.
- Kim, M.H., Banerjee, S., Zhao, Y., Wang, F., Zhang, Y., Zhu, Y., DeFerio, J., Evans, L., Park, S.M., & Pathak, J. (2018). Association networks in a matched case-control design – Co-occurrence patterns of preexisting chronic medical conditions in patients with major depression versus their matched controls. Journal of Biomedical Informatics, 87, 88-95.
- Zhang, F., Yan, E., Niu, X., & Zhu, Y. (2018) Joint modeling of the association between NIH funding and its three primary outcomes: patents, publications, and citation impact. Scientometrics, 117(1), 591-602.
- Zhu, Y., Kim, M., Banerjee, S., Deferio, J., Alexopoulos, G.S., & Pathak, J. (2018). Understanding the research landscape of major depressive disorder via literature mining: an entity-level analysis of PubMed data from 1948-2017. JAMIA OPEN, 1(1), 115–121
- Yan, E. & Zhu, Y. (2018). Tracking word semantic change in biomedical literature. International Journal of Medical Informatics, 109, 76-86.
- Song, I.-Y. & Zhu, Y. (2017). Big Data and Data Science: Opportunities and Challenges of iSchools. Journal of Data and Information Science, 2(3), 1-18.
- Zhu, Y., Yan, E., & Song, I.-Y. (2017). A natural language interface to a graph-based bibliographic information retrieval system. Data & Knowledge Engineering, 111, 73-89.
- Zhu, Y., Yan, E., & Wang, F. (2017). Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Medical Informatics and Decision Making, 17(1), 95.
- Zhu, Y. & Yan, E. (2017). Examining academic ranking and inequality in library and information science through faculty hiring networks. Journal of Informetrics, 11(2), 641-654.
- Yan, E. & Zhu, Y. (2017). Adding the dimension of knowledge trading to source impact assessment: Approaches, indicators, and implications. Journal of the Association for Information Science & Technology, 68(5), 1090-1104.
- Zhu, Y., Kim, M.C., & Chen, C. (2017). An investigation of the intellectual structure of opinion mining research. Information Research, 22(1), paper 739.
- Zhu, Y. & Yan, E. (2016). Searching bibliographic data using graphs: A visual graph query interface. Journal of Informetrics, 10(4), 1092-1107.
- Choi, N., Song, I.-Y., & Zhu, Y. (2016). A Model-based Method for Information Alignment: A Case Study on Educational Standards. Journal of Computing Science and Engineering, 10(3), 85-94.
- Zhu, Y., Yan, E., & Song, M. (2016). Understanding the evolving academic landscape of library and information science through faculty hiring data. Scientometrics, 108(3), 1461-1478.
- Zhu, Y., Song, M., & Yan, E. (2016). Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-based Approach. PLoS ONE, 11(5), e0156091.
- Zhu, Y., Yan, E. & Song, I.-Y. (2016). The use of a graph-based system to improve bibliographic information retrieval: System design, implementation, and evaluation. Journal of the Association for Information Science & Technology, 68(2), 480-490.
- Kim, M.C., Zhu, Y., & Chen, C. (2016). How are they different? A quantitative domain comparison of information visualization and data visualization (2000-2014). Scientometrics, 107(1), 123-165.
- Song, I.-Y. & Zhu, Y. (2015). Big data and data science: what should we teach? Expert Systems, 33(4), 364-373.
- Yan, E. & Zhu, Y. (2015). Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods. Journal of Informetrics, 9(3), 455–465.
- Zhu, Y. & Yan, E. (2015). Dynamic subfield analysis of disciplines: An examination of the trading impact and knowledge diffusion patterns of computer science. Scientometrics, 104(1), 335-359.
- Kim, H., Zhu, Y., Kim, W., & Sun, T. (2014). Dynamic faceted navigation in decision making using Semantic Web technology. Decision Support Systems, 61, 59-68.
Conference Papers
- Che, C., Zhu, M., Zhu, Y., Zhang, Q., Zhou. D., & Wang. B. (2020) A Protein Embedding Model for Drug Molecular Screening. IEEE BigComp 2020, Busan, Korea.
- Kim, J., Kim, J., & Zhu, Y. (2019) Analyzing public opinion toward the 2019 North Korea–United States summit through mining twitter. ASIS&T 2019, Melbourne, Australia.
- Yun, J. & Zhu, Y. (2019) An analysis of physical characteristics of Joseon Dynasty books using statistical approaches. ASIS&T 2019, Melbourne, Australia.
- Kim, J., Koo, Y., & Zhu, Y. (2019) A study for categorizing relations between headword and aliases. ASIS&T 2019, Melbourne, Australia.
- Zhu, Y., Kim, M.C., & Yan, E. (2018) Evaluating interactive bibliographic information retrieval systems: A user-centered approach. ASIS&T 2018, Vancouver, Canada.
- Kim, M.H., Zhu, Y., Banerjee, S., Evans, L., Zhang, Y., Wang, F., Park, S.M., & Pathak, J. (2018) Comparing sex-specific association networks of chronic medical conditions. IEEE ICHI 2018. New York City, USA.
- Yan, E. & Zhu, Y. (2017). Word semantic change: The law of differentiation vs. the law of parallel change. ISSI 2017. Wuhan, China.
- Song, I.-Y., Zhu, Y., Ceong, H., & Thonggoom, O. (2015). Methodologies for Semi-automated Conceptual Data Modeling from Requirements. ER 2015. Stockholm, Sweden.
- Zhu, Y., Yan, E., & Song, I.-Y. (2015). Topological Analysis of Interdisciplinary Scientific Journals: Which Journals Will be the Next Nature or Science? ACM RACS 2015. Prague, Czech Republic.
- Kim, M. C., Feng, Y., Zhu, Y., & Ping, Q. (2015). Quantitative exploration into the diffusion process of creative ideas. ASIS&T 2015. Missouri, USA.
- Zhu, Y., Jeon, D., Kim, W., Hong, J. S., Lee, M., Wen, Z., & Cai, Y. (2012). The Dynamic Generation of Refining Categories in Ontology-Based Search. JIST 2012. Nara, Japan.
Book Chapters
- Kim, M.C. & Zhu, Y. (2018) Scientometrics of Scientometrics: Mapping Historical Footprint and Emerging Technologies in Scientometrics. In Scientometrics. IntechOpen
PROJECTS
Understanding science of collaboration and team impact using big scholarly data (2023-2026), National Research Foundation of Korea.
Transdisciplinary Study for Resilience of Abused Children: Brain, Data Science, Technology & Social Work (2022-2025), National Research Foundation of Korea.
A Study on Digital Communication Incivility: Developing Measurement Scales and an Integrated Theoretical Model to Recover Healthy Communication (2021-2027), National Research Foundation of Korea.
Analyzing social determinants of suicide in South Korea: Developing SDOH framework for suicide and identifying high risk groups (2021-2024), National Research Foundation of Korea.
Longitudinal Effects of Sexual Assault on Mental Health: Developing Smart Health Care Program Based on Integrative Theoretical Model (2019-2022), National Research Foundation of Korea.
Implementing suicidality detection using social media and national registry data (2019-2021), Samsung Medical Center.
Drug repurposing for Parkinson’s disease through text mining scientific literature (2019-2020), National Research Foundation of Korea.