Data-Driven Science of Science

By Ying Ding

The talk I attended today was hosted by our school, School of Information Sciences (which has been changed to School of Computing and Information recently) at University of Pittsburgh. The talk was mainly about bibliometry field and its application domains in different areas using data-driven approaches. As a researcher who has worked in bibliometry domain in the past (in URAP laboratory at Middle East Technical University), I was really enthusiastic to listen her talk.


At the beginning of the talk, Professor Ding introduced current layers of bibliometrics, which are macro level, meso level, and micro level. Macro level includes a complex network of bibliometric elements like authors, journals, keywords etc and tht network is usually used to derive fundamental laws. Meso level is about mapping of science and evaluation (or ranking) of people. In other words, its main question is:  how can we rank scientific reserchers? The micro level is concerned about teams from two aspects: behavioral and organizational. Hence, the questions to be answered in the micro level are: 1) How can be the team successful? 2) How can teams work together?

In the part “Beyond Bibliometrics”, she talked about two main concepts. First one is data-driven discovery. Entitymetrics is an application of data-driven discovery where knowledge diffusion in a research domain is inspected. For example, you can derive diseases, medications, side-effects etc from the full-texts in medicine domain utilizing text mining tools. Another application of data-driven discovery is computational hypothesis generation. As an example, if a drug is mentioned in 80% of articles about  autism, but not used to treat it, it may be useful for treating autism. The last application domain of bibliometrics in data-driven discovery is digital innovation. Here, she mentioned about an intelligent system which wrote an innovative article. That article was accepted by a conference and the author was (article authoring software) invited to host a workshop!  The second concept she talked about was data-driven decision making. Some sample application areas for bibliometrics in decision making are understanding scientific career (like career on the move, rising stars etc), understading scientific collaboration, and understanding scientific success or innovation (quantifying patterns of scientific excellence).

In the second session, she explained the research they conducted in their research group related to the notions she mentioned in the first session. First, she elaborated on their studies in entitymetrics domain. They use authors, journals and references as macro level metrics, keywords as meso level metric and datasets, methods, and biomedical entities (like gene, drug, disease etc) as micro level metrics. By using topic distribution, they constructed graphs and found shortest paths to see the relation between side effects of drugs, genes, and diseases. Another study they conducted in biomedical domain was about assesing drug similarity from biological function. The main research question was: Can I use this drug to treat different disease? The motivation behind this question was that finding a new drug has been very hard in recent years. The third study she talked about was related to diffusion of innovations. The aim was understanding diffusion structure of an innovation using diffusion trees. Here, there are broadcasting people to whom you diffuse your innovation. The approach followed by the researchers here requires to find the right broadcasters so that the only thing to be done by the researcher is to diffuse innovations those broadcasters. Then, the researchers may fill comfortable, because broadcasters will do the remaining job for them, i.e. they will diffuse the innovation to wider audience.The last example was about text mining of newspapers to inspect the media coverage of autism causation. They checked several newspapers like NYTimes, The Guardian, and Washington Post to construct their dataset. Using text mining tools they derived triples (subject, object, relation between them) for semantically evaluating the content of news about the causes of autism.


Without exaggeration,  this talk was the one which I have ever been impressed most. Professor Ding has an influential style and she made the talk very interesting with her enthusiastic style. I like to listen this kind of people who talk about a wide range of information and who have visions about different topics. No matter what mainly they discuss, they know how to catch the attention of the listeners using their words of wisdom!



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s