Steroids. 2014 Oct;88:83-9.

Peer Group Normalization and Urine to Blood Context in Steroid Metabolomics: the Case of CAH and Obesity

Edward Vitkin1, Amir Ben-Dor2, Michael Shmoish3, Michaela F. Hartmann4,

Zohar Yakhini1, 2, Stefan A. Wudy4 and Ze’ev Hochberg5

1 Faculty of Computer Science, Technion – Israel Institute of Technology, Haifa, Israel

2 Agilent Laboratories, Tel Aviv, Israel

3 The Lokey Interdisciplinary Center for Life Sciences and Engineering, Technion – Israel Institute of Technology, Haifa, Israel

4 Steroid Research and Mass Spectrometry Unit, Division of Pediatric Endocrinology & Diabetology, Center of Child and Adolescent Medicine, Justus Liebig University of Giessen, Germany

5 The Rappaport Family Faculty of Medicine, Technion – Israel Institute of Technology, Haifa, Israel

Corresponding author: Edward Vitkin

Computer Science Department, Technion Israel Institute of Technology, Haifa 32000, Israel.

Phone: +972-4-829-4951. Fax: +972-4-829-3900 Email:

Grants: This research was supported by an Agilent Technologies Foundation grant to ZH

Disclosure summary: AB-D and ZY are employed by Agilent Technologies.



Traditional interpretation of GC-MS output involved the semi-quantitative estimation of outstanding low or high specific metabolites and the ratio between metabolites. Here, we utilize a systems biology approach to steroid metabolomics of a complex steroid-related disorder, using an all-inclusive analysis of the steroidal pathway in the form of a subject steroidal fingerprint and disease signature, providing novel methods of normalization and visualization.

The study compares 324 normal children to pure enzymatic deficiency in 27 untreated 21-hydroxylase CAH patients and to complex disease in 70 children with obesity. Steroid profiles were created by quantitative data generated by GC–MS analyses. A novel peer-group normalization method defined each individual subject’s control group in a multi-dimensional space of metadata parameters. Classical steroid pathway visualization was enhanced by adding urinary end-product sub-nodes and by color coding of semi-quantitative metabolic concentrations and enzymatic activities.

Unbiased automated data analysis confirmed the common knowledge for CAH – the inferred 17-hydroxyprogesterone was up-regulated and the inferred 21-hydroxylase enzyme activity was down-regulated. In childhood obesity, we observe a general decrease of both glucocorticoid and mineralocorticoid metabolites, increased androgens, up-regulation of 17,20-Lyase, 17-OHase and 11β-HSD1 activity and down-regulation of 21-OHase enzymatic activity.

Our study proved novel normalization and visualization techniques are to be useful in identifying subject fingerprint and disease signature in enzymatic deficiency and insufficiency, while demonstrating hypothesis generation in a complex disease such as childhood obesity.


  • Define subject fingerprint and disease signature
  • A novel peer-group normalization for steroid data
  • Novel data visualization and analysis in the context of the steroidogenesis pathway
  • Usage of proposed techniques simplifies hypothesis-generating steroid research


  • Data Normalization
  • Data Visualization
  • Obesity
  • Congenital Adrenal Hyperplasia
  • Gas-chromatography mass spectroscopy
  • Steroid metabolomics



Typical high-throughput metabolomics studies usually consist of measurements of several tens of metabolites performed on several hundreds of subjects under various conditions. The informatics part of it generally follows several stages: (i) data collection, (ii) data normalization, (iii) statistical interpretation and (iiii) results visualization. In our work we addressed each of the four data analysis stages with emphasis on data normalization and results visualization stages and demonstrated utility of the proposed techniques for study of congenital adrenal hyperplasia (CAH) due to 21-hydroxylase deficiency in children and for study of childhood obesity.


Data Normalization

The available metabolite concentrations and enzymatic activities (variables) may depend both on case-specific pathway properties as well as on factors unrelated to the disease process such as age and body composition (metadata). The normalization for such confounding factors is non-trivial, yet essential for the study of significant dependencies. Luckily, measurements for control subjects are not affected from case-specific factors, thus providing some reference baseline for the effect of each confounding factor. For example (Figure 1), we can estimate the effect of age on the DHEA concentration in the group of control subjects based on correlation strength between age and DHEA measurements. For simplicity, we can state that if the Pearson correlation p-value between those vectors is below 0.01 (or any other threshold), then correlation exists otherwise it does not.

 ev fig1

Figure 1 – Sample data. Contains patient id, age and DHEA concentration

We can score the general effect of age on our data by assigning 1 for each measured correlated variable (like DHEA) and by summarizing the results. In our case (56 variables), age appeared to have score of 39 (Original article, Table 1). Same analysis can be performed for other confounding factors, like gender or BMI. Notice, that once data is normalized, we expect to have low to none scores for each confounding factor in the control set of patients.

To normalize the measured variables for some patient (either case or control) we want to compare it with the same variables at similar control patients (peer-group). This peer-group should be defined in terms of metadata dimensions (i.e. age, gender, etc) and should consider differences in scores calculated earlier. Thus, we define inter-patient distance function as weighted Euclidian distance in the metadata space, where each dimension is normalized to range [0-1] and then weighted by the appropriate confounding factor score (Original article, Eq. 1).

Once inter-patient distance function is defined, we can find a peer-group for each normalized subject which is the group of all control subjects, who are closer than some predefined threshold. Generally, we want this threshold to be minimal possible one, still catching the sufficient amount of controls for each case subject (Original article, Eq.2). Once the subject’s peer-group is defined, we take each measured variable (i.e. DHEA) and Z-normalize its value based on values of peers.

We call this technique peer-group normalization and it proved to be quite efficient in our case studies (Original article, Table 1).


Data and Results Visualization

An ideal visualization for steroid metabolism and related metabolomics data would be presented in the context of the steroidogenesis pathway and would incorporate all metabolomics data, including blood and urine levels of each metabolite, as well as levels of activity for each enzymatic reaction. However, visualization of metabolic maps incorporating all the details for all chemical is too complex for human comprehension due to the significant number of participating metabolites (Figure 2a). On the other hand, classical views, usually used in medical science (Figure 2b), are too simplified and are missing essential details.

ev fig2a

ev fig2b

Figure 2 – Steroidogenesis pathway visualization. (a) Full pathway1; (b) Simplified pathway2


We propose to create a visualization (which we call Urine-to-Blood visualization) incorporating major details together with maintaining the simplicity of the classical steroidogenesis pathway view. To achieve this, for each blood steroid (presented in the classical view, Figure 2b), we add all its measured urinary end-products, omitting the degradation pathway (Figure 3, Original article Fig.1 & Fig.2). For example, THF is one of urinary products of Cortisol degradation, thus we add the THF-describing sub-node to the node of cortisol.

To demonstrate blood and urine levels of each metabolite and enzymatic activity levels of reactions, we color-code each visualized node. The colors range from yellow for up-regulated (in case compared to control) to blue for down-regulated parameters. For example, both Student TTest and TNoM analyses, comparing our CAH case and control patients does not reveal any significant differences for blood cortisol concentrations, thus coloring appropriate node in white (Original article, Fig. 1). On the other hand, Student TTest identified a slight down-regulation of THF concentration in urine for the case patients, thus the appropriate sub-node is light-blue (Original article, Fig. 1b).


ev fig3

Figure 3 – Urine-to-Blood steroidogenesis pathway visualization. Typical CAH patient after normalization.


The proposed Urine-to-Blood visualization can describe not only group comparison but also data for single patients compared to their reference peer-group (Figure 3). Researcher can clearly see a significant down-regulation (compared to peer-group of controls) in activity of 21-hydroxylase enzyme, especially for the reaction converting 17-OH Progesterone to 11-Deoxycortisol. Moreover, elevated concentrations (compared to peer-group of controls) for all urinary and blood metabolites in the upper part of the pathway (before 21-hydroxylase) as well as decreased concentrations for many urinary and blood metabolites in the lower part of the pathway (after 21-hydroxylase) are easily identified.

To summarize, proposed normalization and visualization methodologies can be found useful both for the analysis of patients’ populations as well as for the analysis of the clinical profile of specific patient.



  1. Kanehisa, Minoru, and Susumu Goto. “KEGG: kyoto encyclopedia of genes and genomes.”Nucleic acids research 1 (2000): 27-30.
  2. Han, Thang S., et al. “Treatment and health outcomes in adults with congenital adrenal hyperplasia.”Nature Reviews Endocrinology 2 (2014): 115-124.
Multiselect Ultimate Query Plugin by InoPlugs Web Design Vienna | Webdesign Wien and Juwelier SchönmannMultiselect Ultimate Query Plugin by InoPlugs Web Design Vienna | Webdesign Wien and Juwelier Schönmann