Blog January 31, 2018

New Large-Scale Genomic/Health Data Effort Launched, with More to Come

UK biobank logoA recently announced six-member consortium of pharmaceutical companies is planning to turn the UK Biobank into the world’s largest publicly accessible concentration of genetic and health data. The effort is being led by Regeneron, which will perform the work at its Tarrytown, NY facility, one of the world’s largest and most sophisticated human genetics sequencing centers.  Abbvie, Alnylam, AstraZeneca, Biogen, and Pfizer have each contributed $10 million to the effort, which will sequence the exomes (the 3% of the human genome that encodes genes) of each of the 500,000 volunteers who participate in the Biobank. Their goal is to complete the sequencing effort by the end of 2019. The consortium members will have private access to the data for 6 to 12 months, after which the Biobank will make the data publicly available by 2020, and the companies plan to publish their specific findings in peer-reviewed journals and open-source sites.


The UK Biobank was initiated nearly 10 years ago and is a public database of de-identified medical records, test results, imaging data, blood and biological samples, and psychological assessments from volunteers, assembled with participation from the UK’s Medical Research Council, Wellcome Trust, Department of Health, the Welsh and Scottish governments, British Heart Foundation, Cancer Research UK, and Diabetes UK. The consortium represents the first large-scale human sequencing effort linked to human medical records that is also publicly available. As well as including their medical records from the UK National Health Service, the Biobank volunteers are being followed prospectively, and approximately 100,000 will undergo MRI and X-ray imaging to look for physical changes over time. The consortium researchers hope that adding the sequencing data to this wealth of medical information will allow them to link genetic variation and changes with biological and disease information, enabling researchers to correlate genetic changes with physical differences or emerging illnesses as the years pass.


The new consortium builds on—and significantly shortens the timeline for—a prior collaboration between Regeneron, the UK Biobank, and GlaxoSmithKline announced in March 2017. While GSK is not participating in the new group, that company committed the equivalent of US $54 million in late 2017 to support whole genome sequencing of the volunteers’ data at a later date. The plan is to sequence the volunteers’ exomes now while the cost of sequencing remains relatively high, to learn how to deal with the data at scale, and to ramp up to expand to whole genome sequencing (to examine the other 97% of each volunteer’s genetic material) in 4-5 years when the cost of sequencing will have likely decreased significantly.


The U.S. National Institutes of Health has announced a similar large-scale sequencing effort, the “All of Us” study. While this effort aims to eventually include genomic data from a million or more people, as yet the study has only recruited 10,000 and sequencing has not yet begun. However, the NIH has announced plans to launch a large-scale recruitment effort in this spring.