Justin van der Hooft
- 1 Short Biography
- 2 Expert Opinion
- 3 See also
Justin obtained his BSc and MSc in Molecular Sciences at Wageningen University, The Netherlands, after which he conducted his PhD in Wageningen that was dedicated to setting up systematic workflows for mass spectrometry and NMR-based metabolite annotation and identifications. Justin then moved to Glasgow, United Kingdom, to work with Prof. Alan Crozier at the Univeristy of Glasgow on the bioavailability of (-)-epicatechin in humans and rats based on LC-MS combined with radiolabeling during a one-year postdoc. He then moved jobs to Glasgow Polyomics to work with Dr Karl Burgess and Prof. Mike Barrett and different partners from Glasgow Polyomics. Justin obtained an ISSF Fellowship from the Wellcome Trust to work on method development and implementation of fragmentation approaches to enhance the metabolite annotation capacities of the high-resolution LC-MS systems focusing on small polar metabolites in urine, beer, and bacterial extracts. Justin has been working on several metabolomics projects thereby exploiting the information-rich fragmentation data that modern mass spectrometers generate and alleviate the bottleneck of metabolite annotation and identification in untargeted metabolomics approaches. In the MS2LDA project, Justin provides valuable biological data interpretations and insights that is crucial to the development of the system. Justin recently moved back to his academic roots to take up a shared Postdoc position between Dr Marnix Medema at the WUR Bioinformatics Group and the group of Prof. Pieter Dorrestein at the UCSD, USA. The work will be focusing on how to combine workflows developed for genome and metabolome mining to aid in functional annotations of genes and structural annotations of metabolites. Justin very recently got a grant awarded to continue this work for another 3 years. During his scientific career, Justin was member of the EMN committee for 3 years from its conception and also served as chair during the Metabolomics2016 meeting in Dublin, Ireland. Currently, as a Board of Director he is still very much involved with the EMN activities.
How did you end up in the field of metabolomics?
During my PhD I was tasked with setting up mass spectrometry (MS) fragmentation methods and hyphenated MS and NMR methods to study metabolites in tomato, tea, and urine of tea-drinkers. As these analytical techniques form the cornerstones of the metabolomics field, it was just a matter of time before I started to read metabolomics-related papers. I then got the opportunity to present a poster in Edmonton (Canada) during the Metabolomics2009 meeting and the Metabolomics2010 meeting took place in Amsterdam (The Netherlands, at the time, 'on my doorstep') providing opportunities to strengthen my links with the metabolomics community. It still fascinates me to see how a complex molecular mixture decomposes into its functional building blocks and how these are related to each other.
What are you working on at the moment?
During my postdoc period at Glasgow Polyomics, I have implemented the metabolome mining tool Molecular Networking (https://gnps.ucsd.edu/) in their analysis workflow; furthermore, I initiated and contributed to the development of MS2LDA (www.ms2lda.org) which uses topic modelling to find fragmentation patterns in mass spectrometry fragmentation data thereby exposing chemical building blocks in the data (http://www.pnas.org/content/113/48/13738).
Currently, I am involved in projects to combine those metabolome mining tools to gain more structural understanding of complex chemical mixtures. Furthermore, the overall aim of the recently awarded postdoc grant is to combine metabolome and genome mining workflows to exploit the complementary structural information these pipelines provide on specialized metabolites. Genome mining tools are increasingly well positioned to predict the biosynthetic potential of organisms based on their genomes. Some information, such as the stereochemistry of amino acids used in specialized metabolites, is very hard to obtain from mass spectrometry data, but might well be readily available in the gene cluster that encodes for a chiral product. This project will be in collaboration with the Netherlands eSciencecenter (www.esciencecenter.nl) who will provide crucial eScience support to enable handling large amounts of metabolomics and genomics data, find the relevant links in them, and visualize those in a neat interactive interface.
What tools for data processing and interpretation can you recommend for beginners in metabolomics?
Let me start of by saying that there are many flavors of metabolomics data processing and analysis and that you are best off by choosing those that have proven to work on similar data in the lab, or that are in active use by fellow lab members so direct help is available. Furthermore, there are many different platforms in use these days and it is impossible to list them all here and comment on them. This review (http://onlinelibrary.wiley.com/doi/10.1002/elps.201500417/full) lists many tools that you might consider. Importantly, it also lists the specifications for each of the mentioned tools as this may restrict your options due to the use of a different operating system. Here, I will highlight some recent processing tools that may be of interest in view of their usability and interconnectivity to annotation tools that handle LC-MS and LC-MS/MS data – currently the dominant analytical technique used in metabolomics research.
Provided you had good guidance in setting up your metabolomics experiment, you will have a nice set of LC-MS files with experimental groups and appropriate controls. This review lists many sources of (unwanted) bias and provides tips on how to correct for them (https://www.sciencedirect.com/science/article/pii/S0731708517315911). Where experienced users might opt for command-line operated processing tools, beginners will benefit from tools like PiMP (https://www.ncbi.nlm.nih.gov/pubmed/28961954). PiMP provides a user-friendly web interface that runs (XCMS-based) peak picking algorithms on the background. After conversion of your mass spectrometry data files into open format, following the steps will set up your analysis within 30 minutes. You can add fragmentation files as well that are used to annotate peaks in your data using matches from spectral libraries as well as in silico annotators. All results are then displayed in your browser and you can also look at the converted raw data to check on peak shapes and perform other validation steps.
Another recently introduced workflow is GNPS-Trinity (https://bix-lab.ucsd.edu/display/Public/GNPS+data+analysis+workflow+2.0 ), for which a tutorial video can be found here: https://www.youtube.com/watch?v=zDcY7iuvyQY&t=4s. The Trinity workflow is embedded within the GNPS infrastructure (https://www.nature.com/articles/nbt.3597) and runs (MzMine or Optimus based) peak picking algorithms and performs subsequent matching with MS/MS data to facilitate streamlined access to Molecular Networking. This will benefit many experiments, particularly those that contain MS1 as well as MS/MS (MS2) fragmentation data. Furthermore, visualization tools are also available (‘ili’) as explained in the tutorial video. Finally, the benefits of running your analysis embedded within the GNPS platform are access to up-to-date and growing spectral libraries and facilitated use of in-silico annotators including cutting-edge tools that are still under development.
Where do you think the most future work is needed for improving metabolomics data processing?
We are witnessing an enormous growth in available tools to handle large-scale metabolomics analyses and in-silico metabolite annotations. In untargeted metabolomics, the transformation of an acquired LC-MS/MS chromatogram into a metabolite table with associated fragmentation spectra still consists of several challenging steps dealt with in different ways by currently available tools. To really take off, tools have to become modular and facilitate re-analysis of processed data: novel tools may provide complementary information and updated versions of existing tools might provide additional insights. Further unification in open data formats and tool inter-operability would benefit the entire community. Standardization is a very important aspect and I am still glad that a constructive reviewer pointed me to the Metabolomics Standards Initiative papers early in my PhD. Reviewers and editors play a vital role in ensuring standards are used in published data sets and papers. Currently, we are working with the Metabolite Identification Task Group of the Metabolomics Society on a revamp of the metabolite identification levels to further encourage their use by the community. Finally, to become part of ‘systems biology’, data sharing should become common practice for metabolomics labs, with several dedicated repositories like GNPS-MassiVe and MetaboLights available now. With the increasingly available data sets, I foresee exciting challenges and results ahead!