We’ve had some amazing new publications recently here at ScienceOpen, and with many more in the pipeline too! For us, every paper we publish is special, and we like to highlight the effort put into them by our authors as much as possible. One of our newest addition is from the field of molecular biology and genomics, a huge and rapidly advancing research domain.
The title of the work is “About the variability, quality and reproducibility of ChIP-seq data“, and is open access of course so everyone and anyone has the opportunity to read it. The new study comes from Hinrich Gronemeyer, a well-respected researcher and Research Director at the Institute of Genetics, Cellular & Molecular Biology (IGBMC) in Strasbourg-Illkirch, and his team.
Here’s the abstract in full:
The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics. Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are – among others – delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding. Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data.
This research actually fits in nicely with another piece we just published by PhD student Chris Hartgerink, on “Research practices and assessment of research misconduct“. We are pleased that authors are seeing ScienceOpen as a quality venue for publishing important research of this type. We believe that reproducibility is one of the key reasons why Open Science is so important, and welcome additional submissions from this field (and any other!)
If you would like to know more about publishing your research openly with ScienceOpen, we have more information here.