The Open Science Stars series has been one of the most pleasurable aspects for me of working at ScienceOpen, seeing the great diversity of researchers all around the world working to make science a better field to be in. For the latest, we spoke with Chris Hartgerink, a PhD student at Tilburg University in the Netherlands. Chris has a strong background in open research practices, and is a prolific member of the data mining community. Here’s his story!
When did you first hear about ‘open science’? What was your first reaction, do you remember?
I first heard about Open Science in late 2012/early 2013 during my Masters. My then supervisor (Jelte Wicherts) said to me, “Let’s put all this online”, and I remember thinking this seemed so obvious but that I simply hadn’t considered it before – nor had I been taught about this during my education. This helped multiple puzzle pieces to fall into place. Since then transparent research has been central to all that I do. I also remember asking myself how to do this because it is non-trivial if you simply know nothing about it, and it has been a gradual process since then learning how to share in an easy-to-comprehend way. But it doesn’t have to be perfect from the beginning because open science is more a way of approaching science than it is a checkmark.
What has inspired your dedication to open research? What sort of things do you do on a daily basis to commit to this?
To be honest, what you call dedication is an ethical responsibility in my eyes. The old, opaque way of doing science is based on the analogue age with severely outdated standards. This is irresponsible, just like a current-day astronomer using Galileo’s antique telescope would be irresponsible. This antique telescope gives relatively imprecise measures compared to modern telescopes, so nobody would pay attention to new results based on it. I don’t think the science done with the antique telescope in the old days is invalid, I just think we have to build on the old, create the new, and then use the new. Closed research, as you might call it, is stuck in the old. I would even go so far to say that such unnecessarily (!) closed research obfuscates science and can be deemed pseudo-science. I hardly pay attention to new research that is unverifiable.
The old, opaque way of doing science is based on the analogue age with severely outdated standards.
By the way, when I say irresponsible, I mean irresponsible to others and to yourself. Our work is complex and making your work shareable and understandable to others helps others to understand what you did – including your future self. Transparent research has saved my skin repeatedly.
Your research is centered on data fabrication and fraud and the prevalence and impact of this on research. How did you get into this field, and what sort of things does this entail?
During my undergraduate years I majored in psychology and at one point I was employed as a research assistant to one of the (then unknown) largest science frauds in psychology. I was really inspired by him to go for an academic career. After he confessed to fraud, it caused somewhat of an existential crisis for my belief in science.
Open research helps the verifiability in science and has helped me regain trust in science as an endeavour. But I also wanted to contribute to help make science better and help scientists better understand how they conduct science. So when I was invited to join the Meta-Research group in Tilburg to work on detecting data fabrication, I immediately knew I found my topic given my drive to contribute and my history with scientific misconduct (I window shopped without success for a topic before).
Open research helps the verifiability in science and has helped me regain trust in science as an endeavour
You published a paper with us recently on best practices in research and assessment. Can you tell us what this was about?
This paper did not present new research, so in essence no new research was done. But it was the fruit of extensive reading on ethics, misconduct, and the grey kinds of behaviours in between. We discuss these topics frequently in our research group. It is easy to discuss scientific misconduct, but it is rather difficult to discuss good conduct. But during the discussions within our group, I came to the realization that the norms of responsible conduct of research in science scream transparency across the board. The paper gives readers a theoretical- and practical framework of doing responsible science, learn about grey practices, and learn about some of the intricacies of data fabrication/falsification. Of course, there is much more in the paper, so if these things I mentioned interest you, definitely go read it (especially if you disagree, and feel to review it and counter the propositions!).
How did you find the publication system at ScienceOpen too? And is there anything we can do to improve it?
Honestly, you were not my first choice! My co-author recommended me to write this as a chapter for a book to be published by the American Psychological Association (APA). Regretfully, they wanted all our copyright and did not want to negotiate about these conditions. After back-and-forth, they did not budge and I decided to pull my submission. I then thought ScienceOpen would be a good outlet, because I would not have to sign away our copyright nor would I have to pay the ridiculously high APCs or Open Access publication costs. ScienceOpen seemed like a reasonable choice, and I could use something reasonable after the unreasonable way the APA treated me for simply wanting to open up bilateral negotiations.
You recently published yet another paper on using content mining technology in psychology. How easy was it to perform this research, and what were the key findings of your study?
The technical infrastructure behind this project was overwhelming sometimes, but in the end open research has pushed me to make it as reproducible as possible and made it manageable as a project. Or in other words: without the open element of this project it would have been total chaos. The end result aren’t key findings, but data to help people, including myself, answer research questions. It is one of the largest datasets I have ever produced and was approximately a year in the making. It would be arrogant of me to think I would be able to squeeze all the possible information out of it, and honestly, I don’t want to bear that responsibility either.
How are publishers helping to aid or abet your work? Have you found differences in your treatment from different publishers at all?
Some publishers are agnostic and allow me to do my work as long as I don’t cause their infrastructure to fail (they trust I won’t until I do). Other publishers equate any content mining, which requires systematically downloading many papers, as copyright infringement and outright theft if you don’t agree to their unilateral conditions (they seem to have systemic mistrust of anything out of their control, which I can understand given Sci-Hub but I don’t think is reasonable). Other publishers encourage you to systematically download and mine their content because they want the research to be fully reused (these typically have the licenses that encourage reuse and sharing). So the behaviours publishers show with regards to content mining range from both ends of the spectrum: fully permissive and fully restrictive and much in between.
Content mining helps us parse this amount of information that is simply unfeasible to have read by humans.
Why do you think content mining is so important to research?
Scientific output has not stopped growing and I think we all feel overwhelmed by the amount of information available that we might (!) need to know. It causes this fear of missing out, where a great paper might get drowned in an ocean of irrelevant papers. But how can we deal with this? Content mining helps us parse this amount of information that is simply unfeasible to have read by humans. Computers can read faster, and memorize better if we program to do so. It is maybe a matter of time before computers can even comprehend text better than humans. But regardless, computers are a great tool that is being denied use, or at least, is being made very difficult with legal threats going around. No wonder relatively few are using it and innovation is slower than it could be.
How important is open access to this, and where is the future of content mining?
Earlier I mentioned that certain publishers encourage content mining and that these typically have the licenses that encourage reuse and sharing. These are the Open Access publishers.
What is the importance of copyright law to your work? Do you ever find that copyright is actively used to prevent you from researching?
Copyright has become such a nuisance to my work I have been spending way too much time on it. I might have even become quite knowledgeable on its history and limitations even though I am a statistician.
I understand the original goal of copyright was to promote the production of creative works by allowing the rightsholders a temporary monopoly on selling it. And that original goal makes sense. Writers need to make a living, musicians do, and many other people employed in creative industries have to as well. They too have families to take care of.
But copyright in science? It makes no sense anymore. We can publish digitally, with infinite copies at such a low cost, but researchers still agree to sign away their copyright. I understand that copyright stimulated publishers to get into the sector in the analogue age (16th-20th century), but copyright has now become a tool that allows rightsholders (i.e., publishers) to reign over the science. Self-governance is one of the norms in science, but we hardly self-govern our communications.
copyright has now become a tool that allows rightsholders (i.e., publishers) to reign over the science
Because of this, I decided in January 2016 that I will not sign away my copyright anymore and publish only in an open manner. Of course I feel that others should do too, because it will benefit us as a collective. It directly affects our ability to retake control of the knowledge commons we call science, and it directly affects how we develop science into the future, for example with content mining but also with the availability of knowledge.
How can we as a research community make sure that copyright is used in the best interests of teaching, education, and learning?
Copyright serves its purpose to stimulate creativity. But knowledge used in teaching, education, and learning, does not require copyright to stimulate production. First off, humans are inherently curious. But more importantly, researchers are already getting paid to produce knowledge (also outside of universities). Communicating that knowledge is part of the job they are paid for. So copyright need not create a temporary monopoly for remuneration, because there is already remuneration in place.
I also remember we had to remove all our copyrighted learning materials from our digital learning environment several years ago, because we risked getting fined for copyright infringement. This does not serve education and learning. In the US this falls under fair use and is allowed, if I understand correctly. The European Commission is looking to make an exception for education as well. This is good news, but does not remove all problems. Books remain absurdly expensive to buy and inexpensive to produce.
So I think that the best (and maybe idealistic?) way is to contribute to open projects that benefit education, learning, teaching. These include contributing to OpenStax (free and open textbooks) and stop writing books that cost $100 to buy. But also in simply sharing their educational materials, as some do on the Open Science Framework.
If there is one thing I would recommend to anyone, is to think about these issues and where you stand. Determine your position (closed or open), justify your position to yourself and critically assess whether these reasons are logically sound. And yes, there are good reasons to close everything, but in my opinion these reasons are outshadowed by the good reasons to open up.
Determine your position (closed or open), justify your position to yourself and critically assess whether these reasons are logically sound
How important do you think communities like OpenCon are to advance science?
Essential. They connect people and create opportunities that otherwise would not be there. If it were not for OpenCon I probably would not have had the courage to stand up either. It has taught me that the problems in the world are not there by nature, but by our own doing. And our own doing we can adjust.
How can platforms like ScienceOpen help younger researchers develop their skills in open research? What other tools or platforms would you recommend to researchers?
My experience reviewing is limited, but my review on ScienceOpen was well received by the authors and helped us discover a mutual research interest. I think that peer reviewing frequently and in the open helps you develop your skills, because it also allows others to provide feedback and it helps you make reviews as objective as possible.
Other platforms I can recommend for open research are GitHub, because version control makes your life so much simpler. Many think it is difficult, but spend just 20 minutes walking through the introduction and you’ll see the power that resides in version control and how it is key to open research. Github also has very easy desktop software. I helped another PhD get started last year, just a few weeks before a harddrive crashed. I’ve never seen someone so happy to use version control, but it’s something you don’t want to use until you wished you used it. So just get going, you’ll thank yourself later.
Do you have any advice for younger researchers looking to start a career in academia?
Don’t let anyone fool you with ill reasoning, because there is much going around.
Thanks, Chris! Great to get your thoughts on these issues, and all the best with the remainder of your PhD!