Week 4
Learning Objectives
- Present Uzzi et al. (2013)
- Count the number of co-citation pairs in the histone data generated previously, then graph the co-citation network.
- Create a script that will generate keyword summaries of graph clusters using frequently occurring title words. These summaries would provide insight into the topics defining a research cluster.
- The team discussed research ethics and went over tricky but common ethical dilemmas in research.
Journal Articles
Pimple, K. D. (2002). Six domains of research ethics. Science and Engineering Ethics, 8(2), 191–205. 10.1007/s11948-002-0018-1
- Pimple (2002) asks three questions of any research project: "Is it true? Is it fair? Is it wise?" Truth here refers primarily to falsified data that do not truely reflect reality. Fairness relates to relationships between the researchers and others. In human-subject experiments, it can refer to the relationship between the researchers and participants in experiments. It can also refer to plagiarism: stealing work from others is unfair to the original researcher. Finally, wisdom has more to do with the impact of the research. If the research being done has a negative effect on the world, the researcher has a moral obligation to not conduct said research.
- Pimple then provides a framework for expanding upon these questions in detail, which can be used by newcomers and seasoned researchers alike. Given that research is a social endeavor, each scientist has an obligation to maintain the integrity of the research record and be a good community member.
Zigmond, M. J., & Fischer, B. A. (2002). Beyond fabrication and plagiarism: The little murders of everyday science. Science and Engineering Ethics, 8(2), 229–234. 10.1007/s11948-002-0024-3
- Expanding on Pimple (2002), Zigmond & Fischer (2002) categorize breaches in research conduct as "high crimes and misdemeanors." High crimes include falsification and fabrication of data as well as plagiarism and are generally recognized as inappropriate conduct. Misdemeanors, on the other hand, are much trickier and include not disclosing funding sources, intentionally hindering replication efforts, and instances of "honorary authorship." Another example of a misdemeanor might be hiding information behind unclear and/or bloated writing that can easily be misinterpreted. A good community member does not seek to mislead their audience and aims to be clear in their writing and transparent about their methods.
Things I made
In progress:
A script that generates co-citation counts when given a input set of citation data. One challenge of this task is finding ways of reducing actual running time. Over 3 million co-citation pairs are found in the data; my least efficient version took over 4 hours to complete and the platform I was using (Google Colab) would not let me save the data upon completion, meaning I would have to run all my analyses again if the server disconnected. I’m currently working on moving to Jupyter Notebook in VSCode, which has the benefit of running directly on my machine and can save csv files locally, meaning I only have to do the computation once. The newest version can calculate the pairs in 9 seconds using a list of tuples rather than a pandas dataframe. The next step in this task is to create a graph of these co-citations, which I’m finding difficult. I have very limited experience in coding graphs. The graphs I have made only involved a few nodes, not several hundred thousand.
In progress:
A script that when given a set of dois, returns a set of keywords reflecting common topics found within those papers. I did some research into NLP python libraries and stopword omission. Stopwords are contentless words such as “the” or “this,” which should be omitted from a list of keywords as they provide no meaningful information. I ran some tests on a set of citations I have involving microRNA, but the keywords returned included words such as “the” or “their.” Attempting to modify the set of stopwords did not work – George suggested I look into BERT as an alternative.
Other
UIUC REU Seminar
This week I attended my first UIUC Summer REU seminar, which was titled “How to do Research: Part I.” This seminar was a Q&A platform where students completing summer research at UIUC could ask questions to a panel of faculty members. This week, the panel was hosted by UIUC professors Camille Cobb, Matus Telgarsky, and Tianyin Xu. Many of the questions centered around best practices for approaching professors you’d like to work on research with, which I found personally confusing as those at the seminar already have faculty mentors as part of the REU process. Maybe they’re looking ahead for grad school opportunities?
The most important thing I took away from the panel, brought up by Prof. Xu, was that persistance is far more important than genius when it comes to research. He brought up how he failed at his first attempt in pursuing a PhD (which I also did), but how he tried again because he knew what he wanted and the types of questions he wanted to pursue. I think persistence is difficult without clear goals in mind.