Abstract
In this study, unsupervised and supervised classification methods were compared for comprehensive analysis of the fingerprints of 26 Phyllanthus samples from different geographical regions and species. A total of 63 compounds were identified and tentatively assigned structures for the establishment of fingerprints using high-performance liquid chromatography time-of-flight mass spectrometry (HPLC/TOFMS). Unsupervised and supervised pattern recognition technologies including principal component analysis (PCA), nearest neighbors algorithm (NN), partial least squares discriminant analysis (PLS-DA), and artificial neural network (ANN) were employed. Results showed that Phyllanthus could be correctly classified according to their geographical locations and species through ANN and PLS-DA. Important variables for clusters discrimination were also identified by PCA. Although unsupervised and supervised pattern recognitions have their own disadvantage and application scope, they are effective and reliable for studying fingerprints of traditional Chinese medicines (TCM). These two technologies are complementary and can be superimposed. Our study is the first holistic comparison of supervised and unsupervised pattern recognition technologies in the TCM chemical fingerprinting. They showed advantages in sample classification and data mining, respectively.
Caffeic acid (98.0 % purity), ellagic acid (98.0 % purity), gallic acid (98.0 % purity), luteolin (98.0 % purity), oleanolic acid (98.0 % purity), protocatechuic acid (98.0 % purity), palmitic acid (98.0 % purity), rutin (98.0 % purity), and quercetin (98.0 % purity) were obtained from Chengdu Herb Purify (SiChuan Province, China)