1a): 73.1% of the compounds share co-similarity indices lower than 0.4 and 97.6% share similarity values lower than 0.7.

While over 200,000 chemical structures have been attributed to plants to date, this number likely represents only a small part of the global plant metabolic repertoire.

The majority of these metabolites represent specialized metabolites (or secondary metabolites) that accumulate to high levels in certain plant families or species.

However, even at this level of confidence, some cases of ambivalence are possible; notably, stereoisomers are not always distinguishable even with the finest chromatographic separation methods and structural determination by NMR spectroscopy must therefore be used.

For that reason, new criteria for reporting confidence in metabolite identification have recently been proposed, evolving a more elaborated mechanism for describing annotated metabolites.

As a result, metabolomics assays suffer from relatively low discovery rate and even false identifications, and only a few percent of the detected metabolites can be assigned a confident, unambiguous identity.

The naive, straightforward, approach for feature annotation in LC–MS analysis is by matching each unique mass signal to the mass of all theoretically possible and relevant metabolites.

As about 50% of the library metabolites have similarity indices lower than 0.3 and over 80% are in the lower than 0.5 range, the library can be considered very structurally diverse.

(b) An outline of the software which automatically creates the reference library (see Methods section and Supplementary Figs 2 and 3).

Furthermore, structures that were never published and associated with a particular living organism or those found in only single or several species to date are detected and identified in the studied plant extracts.

To generate a comprehensive reference mass spectra library of plant-derived metabolites we used a set of 3,540 highly pure standard metabolites.

WEIZMASS is a unique reference metabolite spectral library developed from high-resolution MS data acquired from a structurally diverse set of 3,540 plant metabolites.