Week 5
Learning Objectives
- Finish histone data co-citation frequency counts and graphs.
- Final product linked below under "Co-citation count table and rudimentary graphs"
- Read and present Leng (2021), more information provided below.
- Start on creating a citation network using the Leiden algorithm
- The Leiden algorithm is used to create connected clusters within networks, but it takes an input file with a very specific format. My databases so far have been of the form [citing_doi, cited_doi], where each doi is a string. The Leiden algorithm takes a tsv file without a header or indices as an input, and each node must be represented by an integer starting at 0. One of my challenges this week will be taking my data an manipulating it so that it's accepted by the algorithm.
Journal Articles
Leng, R. I. (2021). Diversity in citations to a single study: A citation context network analysis of how evidence from a prospective cohort study was cited. Quantitative Science Studies, 2(4), 1216–1245. 10.1162/qss_a_00154
- Leng examines citations attributed to Paul et al. (1963), a study which examined variables which could lead to the development of coronary heart disease (CHD). Notably, Paul et al. found no correlation between diet and CHD at a time when a common and popular hypothesis was that consuming high levels of dietary fat was a contributing factor in developing CHD.
- Paul et al. is considered a highly cited paper with 673 citations as of 2020 and 446 citations within 20 years of publication. However, Leng finds that people citing Paul et al. rarely did so because of this single negative result which did not support the popular diet-heart hypothesis. More commonly, a research community appeared to select a particular finding from this study that supported their own claims. As an example, the finding of a lack of connection between dietary fat consumption and CHD was picked up by those who believed sugar, not fat, was the main contributing factor behind CHD. This community used the finding that coffee consumption may be a contributing factor for CHD as well as the negative finding about dietary fat as a means of supporting their own conclusions, even though Paul et al.'s findings did not include sugar at all.
- The results of Paul et al. were utilized by a broad range of research communities in one form or another (breadth), but the findings were not foundational or crucial to any one community (depth). Some of these communities iteracted with and cited other communities, and others cited more within their own communities. Part of this could be a factor of time. Papers using Paul et al. to support their investigations into CHD and alcohol consumption peaked in popularity later than other papers, and so would have had other papers in the network available to them. Papers published soon after Paul et al. would not have had this ability, as many of these other papers would not have yet been published.
Things I made
- Presentation on Leng (2021), which can be found here.
- Co-citation count table and rudimentary graphs An image of one of my graphs is posted here. This graph was created with data where each node is part of a co-citation pair which has been co-cited more than 3 times. Unfortunately, this graph is not very legible as many of the nodes are overlapping and the edges aren’t visible in most cases.
Other
UIUC REU Seminar
This week’s REU Seminar was about networking, with Professors Marco Morales and Karrie Karahalios. Per Prof. Karahalios, strong ties in your social network are people such as your parents or siblings; your weak ties are your acquaintances. People used to think you only need strong ties to get by, but for getting a job, weak ties are more useful. You likely know all the same people as your strong ties. Not every thing is built from scratch, we get new ideas from different people.
Per Nancy Amato, the Department Head of Computer Science at UIUC: “Find what the hard part of networking is for you and practice it.” Networking in general is a fairly weak skill of mine, but the thing I think I should practice the most is keeping a conversation going.
Another take-away from the seminar was that I should practice my elevator pitches. These should be tailored to your audience and you should have one for different lengths from five seconds to thirty. Knowing your audience sounds like the trickiest bit of this: How do you know what other people don’t know?