Funded byESRC and the British Academy.
What does it mean for research to be based on empirical evidence? And for researchers to produce, disseminate and understand data?
These questions, which have always been central to the philosophy of science, are being reformulated and reconsidered within contemporary biological and biomedical science. Several commentators have argued that the extraction of knowledge from automatically generated data may constitute a new approach to scientific method, described as ‘data-driven’. Yet, there are no clear methodological guidelines for how researchers are supposed to use data available online towards new discoveries; nor how complex datasets can be integrated through cyberinfrastructures in ways that make them intelligible to and (re)usable by researchers. To help dealing with these key concerns, this project examines the characteristics of data-driven research and its significance for future research from the perspectives of philosophical, historical and social studies of science.
The project brings together a network of interested scholars with backgrounds in the natural sciences, the humanities and the social sciences, in order to reach a better understanding of what data-driven science consists of, how it differs from other forms of knowledge-making and what implications this has for how we understand scientific research in the digital age.
The aim is to reach an understanding of how researchers are supposed to use data available online towards new discoveries, and how complex datasets can be integrated through cyberinfrastructures in ways that make them intelligible to and (re)usable by researchers.
Specific themes in the project:
- Characterisation of the epistemology of data-intensive science: If data-driven research constitutes a distinctive mode of knowledge production, how can it be characterised, and how innovative is it with respect to existing or past scientific practices?
- Impact on scientific discovery: How does data collection and use affect the production of scientific knowledge? How should datasets be disseminated and visualised in order to stimulate discovery?
- Role of theory and ontologies: What is the role of theoretical assumptions and hypotheses within research practices that are currently referred to as data-driven, and what are the relationships more generally between data-driven and hypothesis-driven research? What is the role played by classification systems such as bio-ontologies?
- Data governance: What are the problems encountered by data curators when assembling cyberinfrastructures, and how can they be solved to favour uptake and improved research practices by experimenters? How can cyberinfrastructures be supported and maintained in the long term?
- The scale of big data: What role does scale play in shaping data-intensive science? What does it mean for this to be a ‘big science’ area?
The following key events have been or are being organised in relation to this project:
International Workshop (Exeter, 15-16 April 2010).
- Organiser: Sabina Leonelli.
- Sponsors: the British Academy and the Economic and Social Research Council (ESRC).
- Speakers and commentators: Douglas Bruce Kell (University of Manchester and BBSRC), Tony Hey (Microsoft), Richard Burian (Virginia Tech), Bruno Strasser (Yale University), Rachel Ankeny (University of Adelaide), Alberto Cambrosio (McGill University), Peter Keating (Université du Québec à Montréal), Ulrich Krohs (University of Hamburg), Miguel Garcia-Sanchos (Spanish National Research Council, Madrid), Jane Calvert (University of Edinburgh), Edna Suarez (UNAM, Mexico), Anna Maria Carusi (Oxford), Werner Callebaut (KLI), John Dupré (Egenis), Maureen O’Malley (Egenis), Staffan Mueller-Wille (Egenis), Susan Kelly (Egenis), Sabina Leonelli (Egenis).
- A special issue of Studies in the History and the Philosophy of the Biological and Biomedical Sciences: Part C, expanding and analyzing the themes that emerged from the workshop, will be published in early 2012. A draft of the editorial is available here.
- Workshop report.
- Follow-up conference 'Making Data Accessible to All' (see below).
International Workshop '' (Exeter, 17-18 March 2011).
- Organisers: Sabina Leonelli, Gail Davies (UCL) and Emma Frow (EGN Forum).
- Sponsor: EGN Genomics Forum.
- Participants: Speakers included Jane Calvert (University of Edinburgh), Gail Davies (UCL), Rebecca Ellis (Lancaster University), Emma Frow (University of Edinburgh), Stephen Hilgartner (Cornell University), Sabina Leonelli (University of Exeter), Kaushik Sunder Rajan (University of Chicago), Niki Vermeulen (Universitat Wien), Jean-Paul Gaudillere (CNRS), Javier Lezaun, Rob Doubleday, Barbara Prainsack, Brian Salter, Jack Stilgoe (Royal Society), Matthew Kearnes.
- Outcomes: a special issue of BioSocieties on the theme 'Bigger, Faster, Better? Rhetorics and Practices of Large-Scale Research in Contemporary Bioscience', edited by Gail Davies, Emma Frow and Sabina Leonelli, is currently in preparation.
Conference '' (Exeter, 20-21 June 2011).
- Organisers: Sabina Leonelli and Werner Callebaut (KLI, Vienna).
- Sponsors: the Konrad Lorenz Institute for Evolution and Cognition Research (Austria) and the ESRC Centre for Genomics in Society.
- Speakers: Murray Grant (University of Exeter), Jessie Kennedy (Napier University, Edinburgh), Alberto Cambrosio (McGill University), Alison Wylie (University of Washington), Wendy Parker (Ohio University), Karen S. Baker (Long Term Ecological Research Network, University of California, San Diego), Paul Schofield (University of Cambridge), Gail Davies (UCL), Jenny Reardon (University of Santa Cruz), Annamaria Carusi (Oxford), John Dupré (Egenis), Rachel Ankeny (University of Adelaide), Staffan Mueller-Wille (University of Exeter), Werner Callebaut (KLI), Sabina Leonelli (Egenis).
Conference 'Making Data Accessible to All' (Exeter, July 2012).
- Organisers: Ruth Bastow (Warwick, GARNet), Sabina Leonelli, Irene Lavagi (Warwick, MASC) and Berris Charnley (Egenis).
- Sponsors: GARNet and the Economic and Social Research Council (ESRC).
- Participants: Speakers will include several plant scientists, representatives of funding bodies and publishing houses, and social scientists.
Characterisation of the epistemology of data-intensive science:A special issue of Studies in the History of the Biological and the Biomedical Sciences: Part C, edited by Sabina Leonelli and focusing on data-driven research in the biological and the biomedical sciences is available online on the journal website and will be published in 2012. The issue consists of 8 papers, including an introduction and two discussion pieces.
Leonelli, S. (under review) Classificatory Theory in Biology. Biological Theory.
Leonelli, S. (2012) Classificatory Theory in Data-Intensive Science: The Case of Open Biomedical Ontologies. International Studies in the Philosophy of Science 26(1).
Leonelli, S. (2012) 'Data-Intensive Research'. In: Dubitzky, W., Wolkenhauer, O., Cho, K-H., Yokota, H. (Eds.) Encyclopedia of Systems Biology. Springer.
Leonelli, S. (2009) On the Locality of Data and Claims About Phenomena. Philosophy of Science, 76 (5): 737-749.
Impact on scientific discovery: Leonelli, S. Understanding Data in the Digital Age. Submitted.
Leonelli, S. (2010) Packaging Data for Re-Use: Databases in Model Organism Biology. In Howlett, P. and Morgan, M.S. (Eds.) How Well Do 'Facts' Travel? The Dissemination of Reliable Knowledge. Cambridge University Press.
Role of theory and ontologies:Leonelli, S. (2010) Documenting the Emergence of Bio-Ontologies: Or, Why Researching Bioinformatics Requires HPSSB. History and Philosophy of the Life Sciences, 32, 1: 105-126.
Bio-Ontologies as Tools for Integration in Biology, Biological Theory, vol. 3, no. 1, 2008, 8-11
Leonelli, S. 'Bio-Ontologies'. In: Dubitzky, W., Wolkenhauer, O., Cho, K-H., Yokota, H. (Eds.) Encyclopedia of Systems Biology. Springer.
Data governance: Leonelli, S. (2012) When Humans are the Exception: Cross-Species Databases at the Interface of Biological and Clinical Research. Social Studies of Science.
Leonelli, S. (2010) The Commodification of Knowledge Exchange: Governing the Circulation of Biological Data. In: Radder, H. (Ed.) The Commodification of Academic Research: Science and the Modern University. Pittsburgh University Press.
Bastow, R. and Leonelli, S. (2010) Viable models for database funding: A review of available paths towards long-term sustainability for cyberinfrastructure. EMBO Reports.
Leonelli, S. (2009) Centralising labels to distribute data: The regulatory role of genomic consortia. In: Atkinson, P., Glasner, P. and Lock, M. (Eds.) The Handbook for Genetics and Society: Mapping the New Genomic Era. London: Routledge, pp. 469-485.
The scale of big data:A special issue of BioSocieties on the theme 'Bigger, Faster, Better? Rhetorics and Practices of Large-Scale Research in Contemporary Bioscience', edited by Gail Davies, Emma Frow and Sabina Leonelli, is currently in preparation. Expected publication in 2013.