ELIXIR – sustainable infrastructure for bio info in Europe
Janet Thornton
Disclaimer: My notes. Fairly incoherent and probably not accurate
ELIXIR is a European effort to co-ordinate the infrastructure. Preparatory project – 10-20yr roadmap for infrastructure devel to support research.
EU 32 partners, 13 member states. 4.5mill E funding to define scope, cost of infrastructure.
Goals:
Co-ordinated data resources, integration & interoperability of data, links to data in other domains, open access to data, enhance euro competitiveness in bioscience, address need for increased funding adn its co-ord.
v young science. Funding streams for infrasxr not in place.
stakeholders: users, experimentalists (data provision), resource providers (core & specialist), tool providers (bioinformaticians), funders – govt bodies, EMBL, EU charities, Industry.
Challenges prompting ELIXIR: data growth, global context, large and distributed userbase, preservation & accessibility of data, impact on biosciences, growth of funding
Cost of maintaining data is insignificant compared to cost of data generation. Makes sense to fund.
Integration increasingly important as academic, molecular type data is increasingly needed by medicine, agriculture etc
ESFRI: Biology research infrastructure proposals. ELIXIR will support these.
Reports from initial committee meetings (userbase consultation) due now – will define the scope & remit of ELIXIR. Then work on international agreement for goals, costs then look at how to fund.
Can’t keep everything centralized – need more distrubution. Hub at EBI and nodes in diff member states
Will provide: core and specialist data resources, compute centres, infrastructure for tools and services integration, support for Bio ESFRI projects, community support and training.
DB survey. 170 DBs across EU. Many of the core DBs are at the EBI, but are distributed in the sense that data providers are across Europe. Also many specialist resources across EU. All of these use the core resources as reference data. DB sizes follow power law – most <10GB but a few are huge. All have web browser queries. Some still have email query. about 70% have data downloads and about 30% have programmatic access. 39/170 have some restrictions on data access (legal or practical). A fairly high proportion have no funding. Most of them cost < mill euros. About 40 mill euros a year being spent at the moment on these DBs. Total invest to date is 308 mill euros. 90% have less than 3 year funding security. Most have less than 50K unique users /month, but a few have many more. Most have <5 staff, a few have many. Many don’t have any members of staff. See Poster E41. for details.
So – ELIXIR needs to co-ordinate, prioritise and stabilise funding for these resources.
Databases relatively under control compared to other aspects: Standards and ontologies, Literature, Other domains (medical data, biodiversity data etc), Integration
Don’t need to centralise standards devel, is fine for them to come out of communities, but do need to encourage and publicise standards. OBO.
Lit: integrated, open access text-based lit resource would be nice
compute resources…? Other domains deal with much bigger scale data (CERN), but they have fewer users and bioinfo data is growing at an exponential rate. Can’t chuck NGS data around the web. So – what do we need to keep? Should it be centralised? Probably need biodata grid like CERN (only more complex).
Modularise organisation of dataresources. Build network of biocomp resources. Catalyst devel of web services and cloud computing. More program to data rather than other way round. Work with EU supercomputing centres.
User priorities: integration, format compatibiltiy, website usability.
Short term: acecss – programmatic, web-site, web service, downloads. Develop well-maintained catalogue.
Long term – integration of data and tools. Encourage commercial tools to adopt open standards
Co-ord training.
Comments:
DB developers should have to abide by standards in order to publish / be funded by ELIXIR
Global context is important and ELIXIR will take into account international collab models in funding approaches. Data sharing will be required.
ELIXIR not about providing national infrastructure – this should come from per-country funding. Only interested in pan-EU infrastructure. Prob national nodes would be well set up to also provide pan-EU function though (shared compute etc).
May call for proposals for nodes from EU countries, although no actual funding and no mechanism for deciding which would be accepted yet so proposals a bit hypothetical.
Tags: ISMB2009