Week 3

Learning Objectives

Data wrangling with PostgreSQL and Python: working with joins and data aggregation.
Creating a presentation on Uzzi et al. (2013).

Journal Articles

Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical Combinations and Scientific Impact. Science, 342(6157), 468–472. 10.1126/science.1240474

Uzzi et al. (2013) claims that the best performing scientific papers are firmly based in convention but have a high degree of novelty in the most novel 10% of citation combinations. To examine this, z-scores were calculated for how likely it was that two journal citations would occur in the same paper such that a positive z-score is likely (Tetrahedron & Experientia) and a negative z-score is unlikely (Journal of the American Chemical Society & Life Science). Each paper received a median score reflecting its median conventionality/novelty, as well as a "tail-novelty" score, reflecting how novel the most novel 10% of journal combinations were. Findings show that papers with high tail novelty and high conventionality are more likely to be highly cited than papers with different combinations of high/low novelty and high/low median conventionality.
Some critques included the way highly cited papers were counted, as the top 5% of highly cited papers were deemed to be "hits," favoring papers from the natural sciences, where top-performing papers earn more citations than those from the humanities. Additionally, the Monte Carlo algorithm was only run 10 times, which may have not been sufficient to determine which jouirnal pairs are truly novel or conventional.

Things I made

I created a report on citation data pertaining to histones published between 1980 and 1990. The code used to run my analyses can be found here. Some take-aways from this report: Citation patterns follow a power law distribution where a few papers receive a very high number of citations while the vast majority of papers receive very few (< 10).

I also created a presentation on Uzzi et al. (2013), to be presented to the team at the beginning of next week. The slides for that presentation can be found here.

Other

Struggles

A good amount of time this week was spent working on counts and joins in Postgres. I was trying to get in-degree (the number of citations received by a paper) and out-degree (the number of references in a paper) counts for each node using only a single line of code. The best I got was a table with columns “doi,” “total degree,” and “total degree” again. I switched to working with Python after talking with another lab member, but I later found out this could have been accomplished with two separate count() statements followed by an outer join.

Written on June 3, 2022