Prof Tim Ebbels obtained his PhD in astrophysics from the University of Cambridge and in 1998 moved into bioinformatics via postdoctoral work at Imperial College London. His group focuses on the application of bioinformatic, machine learning and chemometric techniques to post-genomic data, with a particular emphasis on computational metabolomics. He has worked on projects ranging from environmental monitoring, through molecular epidemiology, to toxicogenomics and high-performance computing infrastructures. Much work focuses on modelling of the analytical technologies used to obtain metabolomic data, but his group is also addressing problems of data integration, visualisation, network analysis, time series and metabolite annotation. He is particularly known for the ‘BATMAN’ software for analysing complex metabolic NMR spectra. Tim is an active member of the metabolomics community, having served as a Director of the international Metabolomics Society from 2012-2018 (Secretary from 2014-16). He has co-organised several international conferences (international scientific committee Metabolomics 2014-17) and is a co-founder of the London Metabolomics Network. He is a member of the OECD Metabolomics Reporting Framework, co-chaired the ECETOC Metabolomics Standards Initiative in Toxicology (MERIT) and is an editorial board member for BMC Bioinformatics and the Journal of Chemometrics. He has a strong commitment to postgraduate education, serving as Director of the MRes in Biomedical Research at Imperial College (>700 students trained), leading its Data Science stream and leading the Data Analysis short course at the Imperial’s International Phenome Training Centre. He is a Fellow of the Royal Society of Chemistry.
1. When and why did you start using metabolomics in your investigations?
I got into metabolomics before the name existed! Having done a PhD in astrophysics, I was bored with my job in commerce and was looking for an academic challenge. In October 1998 joined the group of Jeremy Nicholson and John Lindon to apply maximum entropy techniques to extract signals from NMR spectra of biofluids. I realised that extracting information from complex mixtures was a really interesting and challenging problem, to which I could apply the skills I had learnt in physics. Actually, analysing NMR spectra of biological samples is not so very different from analysing optical spectra of distant galaxies! I also found that the world of metabolomics (and omics more generally) was fascinating and contained a host of intriguing but unsolved problems. I started to learn more about machine learning and multivariate statistics and enjoyed the idea that these approaches could be applied to attack problems in biological and biomedical research. I could see that these computational approaches would only become more important in disentangling the molecular data which was becoming ever larger and more complex, so I decided to stay in the field.
2. What have you been working on recently?
As usual I’m involved in a variety of projects both developing new methods and applying them to real problems. One of my strong interests is data integration. While most people think of multi-omics integration, in my view this topic is much broader. For example, one can think of combining untargeted datasets collected with similar assays but on different samples. How can you match metabolomic features (e.g. defined by m/z and retention time, RT) without knowing the metabolite identities? Rui Pinto from Imperial’s School of Public Health has developed some exciting network based methods for solving this problem beyond simple “nearest neighbour” matching. Since metabolomics often combines data from multiple assays on the same samples, one can also ask how these should be combined in statistical models. We’ve worked in this area before using multi-block PLS and other methods to produce integrative predictive models. Another area which is relatively new to me is computational annotation of metabolomic datasets. This means putting an identity on an unknown peak, though usually not with 100% certainty. We are involved in a project with Rick Dunn (Liverpool University) and Claire O’donovan (European Bioinformatics Institute) which aims to annotate unknown compounds in the EBI’s Metabolights database to enrich and add value to existing data. A new project starting soon will add in Prof Pieter Dorrestein and the GNPS database to aid in this effort. Other areas of current interest include development of pathway based tools to model and integrate data. We have found that pathway based analysis of metabolomic data is far from straightforward  but still feel that there is great potential to use these methods for data fusion as they naturally map entities from different omic domains to a common space, and are highly interpretable.
3. Integrating data from different -omic technologies can be challenging, what is the biggest obstacle that you need to overcome before you can synthesise information from multi-omic datasets?
I think perhaps the biggest challenge is actually defining what you are aiming at by integrating multi-omic data. Are you looking to find new relationships between molecules? Or are you aiming to make better predictions? Or is a mechanistic understanding you are aiming at? Depending on these goals, the best type of analysis will change. Beyond this, of course the big challenges are things like identification of unknown metabolites, the difficulty of mapping metabolites to known pathways and the huge mismatch between different -omes in their dimensionality, scale and noise characteristics. This all assumes that the same samples have been assayed in the different omics, i.e. the samples are matched. If however, different samples have been used in the different omics (e.g. separate biological replicates for metabolomics and transcriptomics) then you will need to take a different approach, perhaps by comparing average profiles in each condition. Overall, there are many challenges which depend on the experimental design and assay types.
4. Are there specific resources for computational metabolomics that you would recommend beginners for statistical integration and visualisation of metabolic profiles and networks with other post-omic data?
There are some great tools out there. A good place to start could be our recently published a Nature Protocols workflow for conventional statistical analysis in metabolomics . This uses a set of iPython notebooks to model the metabolomic data and includes univariate and multivariate statistical methods. But there are many other tools, both open source and commercial. Metaboanalyst, for example, is a free online environment offering a very wide range of tools, for data analysis. But one has to be careful when using these – or any tools. You must be sure you understand the method being applied and its strengths and weaknesses. This is particularly true of web-based tools which make it extremely easy to produce outputs which could be biased or inappropriate.
5. You have developed various tools for interpreting data for multi-omic experiments, like IMPaLA. Are there guidelines for researchers in the field to use such tools?
I think multi-omic modelling is still in its infancy. There are so many different types of study aims, omics data and experimental design that it is hard to come up with harmonised approaches to integrate and interpret the results. Pathway tools like IMPaLA are great, and present a common framework onto which we can map – in principle – any omic data. As yet, there are limited guidelines for use of these, though we and others are working on it . There are exciting developments in this area such as single sample pathway methods which allow any omic data to be transformed into a “pathway space” where one can then conduct any type of statistical or machine learning model. This goes beyond just a list of enriched pathways, allowing researchers, for example, to compare more than two classes, compare individual sample pathway scores, or visualise networks of pathways. Many of these approaches were first developed for transcriptomics data and their use in metabolomics can be both powerful and highly problematic. We are currently investigating how they can be safely used to integrate metabolomic and other omics data – watch this space!
6. Do you have any advice for early career members working in the field of computational metabolomics?
My primary advice for anyone starting off in computational metabolomics would be to get a thorough understanding of the analytical technology and procedures first. This is important, of course, for understanding the problems in the field, and deciding which to prioritise. But it is even more important to be able to communicate with researchers from other disciplines – especially those of “traditional” analytical chemistry and biology. Only by speaking the same language can one build productive partnerships which yield advances in this area. If you have a more computational training, it can help a lot to base yourself within a group which generates its own analytical data. This way you get to talk to people facing these problems every day and can learn a great deal from their complementary knowledge.
1. Wieder C, Frainay C, Poupin N, Rodríguez-Mier P, Vinson F, et al. (2021). Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis. PLOS Computational Biology 17(9): e1009105. https://doi.org/10.1371/journal.pcbi.1009105
2. Blaise, B.J., Correia, G.D.S., Haggart, G.A. et al. (2021). Statistical analysis in metabolic phenotyping. Nat Protoc 16, 4299–4326 (2021). https://doi.org/10.1038/s41596-021-00579-1
3. Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten years of pathway analysis: current approaches and outstanding challenges. PLoS computational biology, 8(2), e1002375. https://doi.org/10.1371/journal.pcbi.1002375