Curr HIV Res. 2013 Jun;11(4):271-80.

A digital signal processing-based bioinformatics approach to identifying the origins of HIV-1 non B subtypes infecting US Army personnel serving abroad.

Nwankwo N.

De Montfort University, Leicester, C/O Sir David Offodum, No 15 Ben Anyaeze Street, Awada, Onitsha, Anambra State, Nigeria.



Two HIV-1 non B isolates, 98US_MSC5007 and 98US_MSC5016, which have been identified amongst the US Army personnel serving abroad, are known to have originated from other nations. Notwithstanding, they are categorized as American strains. This is because their countries of origin are unknown. American isolates are basically B subtype. 98US_MSC5007 belongs to Circulating Recombinant Form (CRF02_AG) while 98US_MSC5016 is of the C clade. Both sub-groups are recognized to have originated from African and Asian continents. It has become necessary to properly determine the countries of origin of microbes and viruses. This is because diversity and cross-subtyping have been found to mitigate the designing and development of vaccine and therapeutic interventions. The aim of this study therefore is to identify the countries of origin of the two American isolates found amongst US Army personnel serving abroad. A Digital Signal Processing-based Bioinformatics technique called Informational Spectrum Method (ISM) has been engaged. ISM entails translating the amino acids sequences of the protein into numerical sequences (signals) by means of one biological parameter (Amino Acids Scale). The signals are then processed using Discrete Fourier Transform (DFT) in order to uncover and present the embedded biological information as Informational Spectra (IS). Spectral Position of Maximum Binding Interaction (SPMBI) is used. Several approaches including Phylogeny have preliminarily been employed in the determination of evolutionary trends of organisms and viruses. SPMBI has preliminarily been used to re-establish the semblance and common originality that exist between human and Chimpanzee, evolutionary roadmaps in the Influenza and HIV viruses. The results disclosed that 98US_MSC5007 shared same semblance and originality with a Nigeria isolate (92NG083) while 98US_MSC5016 with the Zairian isolates (ELI, MAL, and Z2/CDC-34). These results appear to demonstrate that the American soldiers harboring these strains may have been infected by isolates from Nigeria and Zaire, respectively. This is because 98US_MSC5007 and the Nigerian isolate share SPMBI at position 44. Additionally, 98US_MSC5016, which has SPMBI at position 148, may have come from Zaire as it has similar SPMBI with the Zairian isolates at 150. SPMBI is a demonstration of Bio-functionality arising from maximum affinity by the proteins from different sources to a common protein. To help validate the findings, the experiment was further repeated using ISM-based Phylogenetic technique. The outcome appears not to be in complete accord with the results obtained in this study. It is therefore recommended that the countries in which these US Army personnel are deployed be identified and where the findings made and the locations of the Army personnel appropriately correlate, this novel procedure be engaged in the identification of the nations of origins of all other such HIV isolates across all clades and nations.

PMID: 23931160


Additional Information: Simplistic explanation for non-Engineers

Soldiers, seamen, expatriates, diplomats who are on oversea assignments, as well as tourists, businessmen, immigrants of all nations especially with risk behavior are known to be the major inter-continental transmitters of HIV/AIDS [1]. HIV isolates belonging to each continent are classified into a particular subtype. Each subtype has its characteristics. Mix incubation of HIV isolates from several subtypes is known to produce isolates with completely different and complex features [2]. These cross-breeds are acknowledged to be difficult to tackle as they are known to acquire enhanced ability to multiply and mutate. This is reported to have militated against the designing and development of drugs and vaccines [1].

Several isolates harbored by the American soldiers on Foreign Service have been identified to have originated from other nations. Because their origins are unknown, they are classified as US strains.  According to this study [2], two of these isolates, 98US_MSC5007 and 98US_MSC5016 belong to this group.

In order to find the origins of these isolates, a Digital Signal Processing technique, which has been employed in the development of Radar, Speech detector [3], etc, is engaged. It was not until this technique was first applied on proteins by Veljkovic et al [4] that it became possible employ these procedure to uncover biological characteristics embedded in proteins and then investigate isolates harboring them.

To illustrate how this Digital Signal Processing-based technique called Informational Spectrum technique is used to identify the origins of the two American isolates, two peptides designated Peptide1 and Peptide2, preliminarily engaged [5], are utilized. They are known to commonly bind to a protein called HLA-Cw*0102 [6]. They are:

                                V I P M F S A L S and C A P A G F A I L                              

The amino acid sequences of Peptide1 and Peptide2 are first converted into Numerical sequences (signals) by means of a biological parameter (Amino Acid Scales) [7] called Electron-Ion Interaction Potential (EIIP) [8]. They are further processed using Informational Spectrum Method (ISM) [8]. This is in order to uncover the embedded biological information. The three steps involved are shown below.


Amino Acid Scales are parameters that express the level of involvement of each of the 20 essential amino acids in each interaction [7].

By means of the EIIP values (Table 1), V I P M F S A L S and C A P A G F A I L are converted into numerical sequences, which are graphically represented in Figures 1 and 2 below.

fig2A signal is more than a single line as shown in Figures 1 and 2. It is an embodiment of information such as the binding properties engaged here. It could be series of information on sound and images, etc with several frequencies or positions, and amplitudes. Signals are well understood by the Electrical/Electronics Engineers. For non-Engineers, a simplistic illustration is provided here.

In order to simply explain how this procedure is engaged in this analysis, we will represent the signal from Peptide1 as a bunch of wires A-F encased in an insulator, P (Figure 3). Similarly, we symbolize the signal in Peptide 2 by wires G-M also insulated by a case, N (Figures 4).

fig3Table 2 is the assumed results of the Discrete Fourier Transform (DFT) processing of the two signals in Peptide1 and Peptide24. Here, the diameters of the wires stand for the frequencies or positions while the strengths of the wires represent the magnitude of interaction at various frequencies or positions.

fig4Figures 5 and 6 show the plot of the assumed DFT results for Peptide1 and Peptide2 (Table 2) displaying the diameters that represent frequencies/positions of interaction (x-axis), and strength (y-axis).  These plots signify the binding properties of the two peptides as entrenched in the signals. This is called Spectral Characteristics.

From Table 2, it can be observed that at diameter (position) 2, both wires have highest strength (maximum amplitude). They are 10 and 9 units, respectively.

fig5Consequently, the results of the point-wise multiplication (Common Informational Spectrum or Cross Spectral Analysis) demonstrate maximum amplitude (90 units) at position 2. This is also shown in Figure 7. As observed in Table 2, there are six wires with various diameters. Wires with diameters 3, 5 and 8 are not in existence.

According to the ISM procedure engaged, proteins with common biological characteristics share same consensus frequency or point of interaction [8]. Again, proteins belonging to organisms or viruses with common originalities have been found to share common position of maximum amplitude (interaction). Such is found in  Influenza [8].

fig7When the ISM technique was applied to the protein residues of the CD4 obtained from the Human and Chimpanzee, which have preliminarily been acknowledged to share common originality [10], their positions of maximum amplitude were found to be same (68) [5].

fig8This is a typical binding behavior that could be uncovered from the sequences of organisms and viruses using these procedures. Each peak signifies binding interaction with a protein. Position 18 (F=0.0354), for example, is the point of interaction with the HIV. A study at this position has helped determine how HIV transforms to AIDS [10]. They are both maximally attracted to a particular protein at position 68.

This result and others derived from additional HIV isolates and hosts are shown in Table 3.

fig9Protein residues of the CD4 derived from Dancing, Pig-tailed and green monkeys, which are known to have common origin, are identified to share same maximum amplitude at position 101. Same was observed of the Zairian HIV-1 strains namely MAL, ELI, and Z2. They demonstrated common maximum amplitude at position 150. Two African isolates Z6 (Zaire) and OYI (Gabon) are found to have same maximum position of interaction at 155 with the American isolate identified as CDC-451 suggesting cross Atlantic transmission. Same observation is made with the Zairian isolate known as WMJ1 and the Cameroonian counterpart referred to as 96CM-MP535. They share same maximum point of interaction at position 152 with the American strain, SC.

Ape to Human cross-species was implied by the shared position of maximum amplitude at position 158.This is demonstrated by a Human Immunodeficiency Virus called V1850, and the Simian counterpart termed MB66.

Therefore, when there arose, a research question about how to identify the origins of the two American isolates, 98US_MSC5007 and 98US_MSC5016, which are already designated American isolates though their country of origins are yet unrecognized, this procedure was applied. Protein residues of the HIV gp120 belonging to the American isolates, 98US_MSC5007 and 98US_MSC5016 are analyzed using this technique. 98US_MSC5007 displayed maximum amplitude at position 44, while 98US_MSC5016 showed maximum amplitude at 148.

Preliminarily, a Nigerian isolate 92NG083 has been acknowledged to demonstrate maximum amplitude at position 44 [5]. Based on these results, it is concluded therefore that the American soldiers on Foreign Service who harbored 98US_MSC5007, which shares same maximum amplitude with the Nigerian isolate 92NG083 may have contracted it when serving in Nigeria. Similarly, the Soldier with the isolate 98US_MSC5016 may have been infected in Zaire. It is recommended that the places of assignment by these soldiers be identified in order to help reconcile the findings made. The procedure will then be engaged in identifying other isolates.

Because, ISM–based technique has helped re-affirm the common originality existing between human and Chimpanzee, and cross Atlantic transmission amongst African and American isolates; and identify the evolutionary roadmap in Influenza [8], it was applied to the two American isolates with unknown origins. The findings made suggest the 98US_MSC5007 may have originated from Nigeria while 98US_MSC5016 may be of the Zairian stock.

This technique has earlier helped determine the mechanism by which HIV transforms into AIDS [11]. It was also engaged in experimenting how drug resistance could be calculated computationally [12]. The procedure was further engaged develop a functional bio-medical device called Computer-Aided Drug Resistance Calculator. This was achieved by calculating the resistance offered by Amprenavir to the HIV Protease Enzyme. These protein residues have information on all their mutations and Amino Acids Scales involved. The resistance was calculated as 5.86%. The study engaged is the research carried out by the Danish researchers at Copenhagen HIV Program (CHIVP), Hoj L, et al. 2008. “In silico identification of physiochemical properties at mutating positions relevant to reduced susceptibility to amprenavir”. XVII International HIV Drug Resistance Workshop ,. Poster No.113.

This discovery was submitted to IEEE EMBS Magazine in 2011 as “Computer-Aided Drug Resistance Calculator: Calculating Drug Resistance: Using Amprenavir as a Case Study”. It was later released by IEEE EMBS Magazine in 2013. It was then re-submitted to IEEE TBME, who advised it be sent to a more specialized journal. Current HIV Research (CHIVR) later received it and commented that the data published by the Danish and engaged in the study is unverified.

This invention was acknowledged by my university, De Montfort University via a letter by the Manager, Innovation Centre. The university requested I disclose the Intellectual Property (Technology) for joint exploitation with her, “with a clear the view of sharing the dividend with me”. De Montfort University also admitted assessing this innovation, the tool engaged, and my other researches.

I believe that this Bio-medical device will sometime serve mankind.  It needs investors.



  1. Brown BK1, Darden JM, Tovanabutra S. et al. 2005. “Biologic and genetic characterization of a panel of 60 human immunodeficiency virus type 1 isolates, representing clades A, B, C, D, CRF01_AE, and CRF02_AG, for the development and assessment of candidate vaccines.” J Virol. 79(10):6089-6101.
  2. Tovanabutra S1, Brodine SK, Mascola JR. 2005. “Characterization of complete HIV type 1 genomes from non-B subtype infections in U.S. military personnel”. AIDS Res Hum Retroviruses. 21(5):424-429.
  3. Smith SW. 2002. “The Scientist and Engineer’s Guide to Digital Signal Processing”. California Technical Publishing.
  4. Veljkovic V, Cosic I, Dimitrijevic B, Lalovic D. 1985. “Is it possible to analyze dna and protein sequence by the method of digital signal processing,” IEEE Trans Biomed Eng. 32(5): 337-341
  5. Nwankwo N. Signal processing-based Bioinformatics methods for characterization and identification of Bio-functionalities of proteins. 2012. PhD Thesis (submitted). De Montfort University, Leicester, United Kingdom; also available at the
  6. Walse VA, Hattotuwagama CK, Doytchinova A et al. 2009. “Integrating in silico and in vitro analysis of peptide binding affinity to hla-cw*0102: a bioinformatics approach to the prediction of new epitopes”. Plosone.  4(11):e8095.
  7. Tomii K, Kanehisa M. 1996. “Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins,” Protein Engineering, vol. 9(1): 27-36.
  8. Veljkovic V, Niman HL, Glisic S. 2009 “Identification of hemagglutinin structural domain and polymorphisms which may modulate swine h1n1 interactions with human receptor,” BMC Structural Biology. 9(62):1-11.
  9. Smith SW. 2002. “The Scientist and Engineer’s Guide to Digital Signal Processing”. California Technical Publishing.
  10. Almecija S, Moya-Sola S, Alba MD. 2010. “Early origin for human-like precision grasping: A comparative study of pollical distal phalanges in fossil hominins,” PLosone. 5(7): 11727-11737.
  11. Nwankwo N, Seker H, “A signal processing-based bioinformatics approach to assessing drug resistance: Human immunode_ciency virus as a case study,” Proc. of IEEE EMBS, vol. 2010, pp. 1836{1839, 2010.
  12. Nwankwo N, Seker H. 2013. “HIV Progression to AIDS: Bioinformatics Approach to Determining the Mechanism of Action”. Curr HIV Res. 11(1):30-42.


                Norbert Nwankwo teaches Pharmaco-informatics, Research Methods, etc

               at the Dept of Clinical Pharmacy, Faculty of Pharmacy, Madonna University,

               Elele Campus, Rivers State, Nigeria


Multiselect Ultimate Query Plugin by InoPlugs Web Design Vienna | Webdesign Wien and Juwelier SchönmannMultiselect Ultimate Query Plugin by InoPlugs Web Design Vienna | Webdesign Wien and Juwelier Schönmann