MolPharm

Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
 QUICK SEARCH:   [advanced]


     


0026-895X/03/6306-1256-1272$20.00
Mol Pharmacol 63:1256-1272, 2003

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fredriksson, R.
Right arrow Articles by Schiöth, H. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fredriksson, R.
Right arrow Articles by Schiöth, H. B.

The G-Protein-Coupled Receptors in the Human Genome Form Five Main Families. Phylogenetic Analysis, Paralogon Groups, and Fingerprints

Robert Fredriksson, Malin C. Lagerström, Lars-Gustav Lundin, and Helgi B. Schiöth

Department of Neuroscience, Uppsala University, Uppsala, Sweden (R.F., M.C.L., L.-G.L., H.B.S.); and Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden (R.F.)

Received December 23, 2002; accepted March 11, 2003


    Abstract
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
The superfamily of G-protein-coupled receptors (GPCRs) is very diverse in structure and function and its members are among the most pursued targets for drug development. We identified more than 800 human GPCR sequences and simultaneously analyzed 342 unique functional nonolfactory human GPCR sequences with phylogenetic analyses. Our results show, with high bootstrap support, five main families, named glutamate, rhodopsin, adhesion, frizzled/taste2, and secretin, forming the GRAFS classification system. The rhodopsin family is the largest and forms four main groups with 13 sub-branches. Positions of the GPCRs in chromosomal paralogons regions indicate the importance of tetraploidizations or local gene duplication events for their creation. We also searched for "fingerprint" motifs using Hidden Markov Models delineating the putative inter-relationship of the GRAFS families. We show several common structural features indicating that the human GPCRs in the GRAFS families share a common ancestor. This study represents the first overall map of the GPCRs in a single mammalian genome. Our novel approach of analyzing such large and diverse sequence sets may be useful for studies on GPCRs in other genomes and divergent protein families.


The superfamily of G-protein-coupled receptors (GPCRs) is one of the largest families of proteins in the mammalian genome (Lander et al., 2001Go; Venter et al., 2001Go). It has been estimated that more than half of all modern drugs are targeted at these receptors (Flower, 1999Go), and several ligands for GPCRs are found among the worldwide top-100-selling pharmaceutical products. It is also evident that drugs have still only been developed to affect a very small number of the GPCRs, and the potential for drug discovery within this field is enormous.

The ligands for the GPCRs have tremendous variation; ions, organic odorants, amines, peptides, proteins, lipids, nucleotides, and even photons are able to mediate their message through these proteins. The GPCR proteins are also highly variable. There are two main requirements for a protein to be classified as a GPCR. The first requirement relates to seven sequence stretches of about 25 to 35 consecutive residues that show a relatively high degree of calculated hydrophobicity. These sequences are believed to represent seven {alpha}-helices that span the plasma membrane in an counter-clockwise manner, forming a receptor, or a recognition and connection unit, enabling an extracellular ligand to exert a specific effect into the cell. The second principal requirement is the ability of the receptor to interact with a G-protein. There is a great diversity in the functional coupling of the GPCRs; they have a number of alternative signaling pathways, interacting directly with a number of other proteins. Interaction with G-proteins has not been demonstrated for most GPCRs, in particular for those whose genes have just recently been sequenced. It may therefore be more technically correct to term this superfamily "seven transmembrane (TM) receptors", but the GPCR terminology is more established.

Several classification systems have been used to sort out this superfamily. Some systems group the receptors by how their ligand binds, and others have used both physiological and structural features. One of the most frequently used systems uses clans (or classes) A, B, C, D, E, and F, and subclans are assigned using roman number nomenclature (Attwood and Findlay 1994Go; Kolakowski, 1994Go). This A–F system is designed to cover all GPCRs, in both vertebrates and invertebrates. Some families in the A–F system do not exist in humans. Examples of this are clans D and E, which represent fungal pheromone receptors and cAMP receptors, family IV in clan A, which is composed of invertebrate opsin receptors, and clan F, which contains archaebacterial opsins. The overall classification of the GPCRs has been hampered by the large sequence differences between mammalian and invertebrate GPCRs. The GPCRs in Drosophila melanogaster show in many cases little resemblance to those in mammals (Broeck, 2001Go). Certain species show also a high difference in the numbers of receptor genes in different classes. Caenorhabditis elegans, a worm, has, for example, developed a remarkable number of chemosensory (olfactory) GPCRs related to the creature's specific lifestyle. Those chemosensory receptors, as well as the olfactory receptors in D melanogaster, do not show any clear resemblance to the olfactory receptors in humans.

Gene duplication occurs both by individual duplication, which often leaves the new gene near the parent gene, and by block duplications involving chromosomal regions or entire chromosomes. Large-scale duplications, including polyploidizations, are believed to be an important mechanism of vertebrate evolution. Two rounds of large-scale duplications are thought to have occurred in early vertebrate ancestry (Lundin, 1993Go; Holland et al., 1994Go), resulting in up to four copies of each gene in mammals, which originate from a common ancestor gene in a cephalochordate. It is now known as the "2R hypothesis" or the "one-to-four model". This has led to the construction of maps that contain paralogous chromosomal regions, or paralogons (Lundin, 1993Go; Holland et al., 1994Go; Katsanis et al., 1996Go; Popovici et al., 2001Go), in vertebrates, which in combination with phylogenetic analysis can provide valuable information on gene relationships and origins.

In this study, we collected a large set of GPCR sequences in the human genome and performed multiple phylogenetic analyses. The first task was to compile a comprehensive data set with just a single copy of each gene. We wanted to avoid polymorphism, pseudogenes, duplicates (resulting from the same gene having multiple names), and other related problems. We identified more than 800 GPCRs in databases and simultaneously analyzed sequences of 342 unique functional nonolfactory human GPCRs and grouped them by phylogenetic analysis. The chromosomal localization and positioning in paralogous groups of the genes were studied to give insight into the mechanism involved in creating the receptor genes. The different families were also analyzed for common sequence motifs, and we discuss the evidence for common descent of the families.


    Materials and Methods
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
Data Retrieval. Approximately 200 GPCRs, both orphans and characterized receptors, known from the literature were downloaded from the GenBank database using the Entrez data-retrieval tool (http://www.ncbi.nlm.nih.gov/Entrez/). This data set was considered the start set, and all the genes were manually searched against the human genome database using BLASTP (Altschul et al., 1997Go) on the protein database. New receptors that were not already in the data set were saved and included. At least 20 of the most significant BLAST hits (sorted by E-Value), for each receptor, were checked to further extend the data set obtaining the first crude database. Duplicates were removed from this data set using a crude phylogenetic analysis. Thereafter, Entrez was used in keyword searches to identify orphan receptors, which are usually named GPRnnn, where nnn is a number. In our case searches were made with nnn ranging from 1 to 150.

To extend the data set, searches were made with all receptor sequences in the data set against the human genome protein database at NCBI. All genes were screened against the first version of the database to avoid duplicates. To identify possible novel receptors, not yet annotated in the human genome database at NCBI, we searched with a diverse set of GPCR receptors at the nucleotide level using BLASTX against the Genescan data set. A P value of 0.001 was used as a threshold or a maximum of 100 BLAST hits were analyzed for each search.

The genes were named according to the convention used in the human genome database at NCBI, although several orphan GPCRs, which recently had their ligands identified, were subsequently renamed according to recent literature. If no name was assigned to a specific sequence in the database, these were assigned GPR numbers as provided by the HUGO nomenclature committee. Sequences not present in the human genome database were given either an accepted name from the literature or the GenBank accession number. Accurate chromosomal positions were obtained from the University of California Santa Cruz "the golden gate" human genome database (http://genome.ucsc.edu), the Dec 2001 assembly. If not present in the public genome assembly, we used the chromosomal position from the Celera database (http://www.celera.com).

Alignment. Each data set was randomized 20 times with regard to sequence input order using a program called Randfasta (http://www.neuro.uu.se/medfarm/schiothSoft.html), because the input order of sequences is known to affect the resulting alignment. These 20 data sets, containing the full set of sequences but in different order, were all aligned using the Win32 version of ClustalW 1.81 (Thompson et al., 1994Go). The default alignment parameters were applied.

Sequence Bootstrapping and Randomization. The 20 alignments were all bootstrapped 50 times using SEQBOOT from the Phylip package (Felsenstein, 1993Go) to obtain a total of 1000 different alignments from each dataset.

Neighbor-Joining Trees. Protein distances were calculated using Protdist from the Win32 version of the Phylip package. For the calculation, the Dayhof PAM matrix was used. The trees were calculated on the 20 different distance matrixes, previously generated with Protdist, using neighbor from the Phylip package, resulting in 20 files with 50 trees each. All trees were unrooted. Because of limitations in the Consense program (version 3.5; Felsenstein, 1993Go), a consensus tree for the complete rhodopsin family could not be calculated; therefore, 300 bootstrap replicas were used. The trees were plotted using Treeview (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html).

Maximum Parsimony Trees. Maximum parsimony trees were calculated from the same input files that were used for Protdist using Protpars from the Phylip package. The trees were unrooted and calculated using ordinary parsimony, and the topologies was obtained using the built-in tree search procedure. As above, consensus trees were calculated using Consense 3.5 from Phylip and trees were plotted using Treeview.

Calculating the Overall Relationship of the Main GPCR Families Using Random Selection of Genes. These calculations are based on all members from four of the main groups: secretin, frizzled, glutamate, and adhesion, together with 20 randomly selected rhodopsin receptors, selected using Randfasta. Randfasta was used to randomize the input order of sequence 20 times. The 20 datasets were aligned, sampled using SEQBOOT (50 replicas each), and 1000 parsimony trees were calculated using Protpars and consensus trees were calculated using Consense 3.5.

Fingerprint Analysis. For the fingerprint/motif analyses an approach using Hidden Markov Models (HMM) was applied as implemented in the HMMR 2.1 package (Eddy, 1998Go), recompiled for WIN32 using Visual C++ 6.0. From the secretin, adhesion, glutamate, rhodopsin and frizzled families, alignments of the entire coding regions were constructed using ClustalW 1.81; from these alignments, one HMM per family was calculated using the HMMbuild. The model allowed local alignments within the HMM, global alignments with respect to the query sequence, and multiple domains per sequence to hit. All HMMs were calibrated using HMMcalibrate. To define the transmembrane regions statistically described by the HMMs, the transmembrane region as described in the literature for one of the members of each family was aligned to the respective HMM using HMMsearch. The sequences used were FZD3, GRM1, GLP1, LEC1, and ADRB2. The identified TM regions from the HMMs were subsequently aligned to each other, region by region, using ClustalW 1.81, and conserved motifs were identified in the HMM alignments by manual inspection.


    Results
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 References
 
The schematic presentation of the approach used for retrieving sequences and the overall phylogenetic analysis is shown in Fig. 1. Detailed descriptions of the different steps are given under Materials and Methods. We assembled a primary data set of 802 unique GPCRs from the human genome. We believe that this data set contains most of the functional GPCRs in the human genome. The results show that the receptors cluster in five main families that we term glutamate (G, with 15 members), rhodopsin (R, 701), adhesion (A, 24), frizzled/taste2 (F, 24) (frequently abbreviated to frizzled hereafter), and secretin (S, 15), to which we apply the acronym GRAFS. Twenty-three protein sequences could not be assigned to any of the five families with appreciable bootstrap values (above 50%); these are discussed separately below under the section "other 7TM receptors". Figure 2 shows trees describing the overall relationship between the five main families of GPCRs. The bootstrap values shown in Fig. 2 separating the respective family from its closest neighbor [secretin (862), adhesion (789), glutamate (839), and frizzled (774)] are high; together with the overall topology, they give good support for each of the GRAFS families. The phylogenetic analysis shown in Fig. 2 is performed on protein sequences in which the N and C termini were deleted (see detailed comments below), whereas analysis on the full-length sequences also provided good support for five main families (data not shown). It should be noted here that the five families represent the smallest number of clusters that the phylogenetic analysis can delineate from the data set with appreciable bootstrap values and that the phylogenetic analysis does not show sufficient bootstrap support to link any of the GRAFS families together. It is possible, however, to further subdivide each family, because several bootstrap values within them show very high values. For example, the GABA receptors could be divided from the other receptors in the glutamate family, but because there are appreciable bootstrap values that link them within the glutamate family, we have decided to stick to this minimum number of families (i.e., five). The rhodopsin family has by far the largest number of receptors and was therefore further subdivided into four main groups and 13 branches (see below).



View larger version (42K):
[in this window]
[in a new window]
 
Fig. 1. Flowchart describing the sequence analysis strategy used in this work. The first step was to construct a database of GPCRs in the human genome. Using the Entrez online data retrieval tool and keyword searches, we downloaded approximately 200 GPCRs known from literature. Most GPCRs have several names, and they have also been deposited in database under several entries. Therefore, the database was carefully checked to remove any duplicated genes throughout the process. The approximately 200 human GPCRs were considered our "seeding" set, and the sequences were manually searched against the human genome database and the NR database to extend the GPCR database. Primary phylogenetic analyses, using a small number of bootstrap replicas and no randomization of the input file, were performed on the sequences in the database to identify splice variants, polymorphism, and duplicates. The final step was to search against the human genome GeneScan database, which contains genes predicted from the genome sequence by the GeneScan algorithm, to obtain possible nonannotated genes. The large phylogenetic analyses were carried out as described under Materials and Methods. Briefly, the Fasta file containing all sequences in the database was randomized using the Randfasta program to randomize the input order of the sequences, because the input order of the sequences can influence the resulting alignment. The sequences in these files were subsequently aligned and each of the sequence files was bootstrapped using the SEQBOOT software to obtain 1000 replicas of the alignment. Neighbor joining trees were constructed using the Protdist, Neighbor, and Consense programs. From this initial tree, the rhodopsin-like GPCRs and the nonrhodopsins were identified. The rhodopsin family was analyzed as one unit using the same strategy as above; from that analysis, the olfactory receptors and the four rhodopsin groups were identified. This analysis was carried out several times using both maximum parsimony and neighbor joining methods, and the groups that were finally defined were consensus groups from all these trees. A few receptors did not show stable topology in any group and these are discussed separately under Results. The nonrhodopsin receptors were analyzed both as full-length receptors and with the N- and C-termini removed, as shown in Fig. 2. To investigate how the rhodopsin family is related to the nonrhodopsins, 20 rhodopsins were randomly selected and included in the calculations. These analyses were repeatedly performed using the maximum parsimony method, with the dataset randomized as above, using Protpars and Consense. The four rhodopsin groups were also analyzed using maximum parsimony in the same way as described for the nonrhodopsin, but also using neighbor joining and maximum likelihood methods as described under Materials and Methods. These trees are not presented in this work but were used to identify instabilities in the topologies and are available upon request.

 


View larger version (193K):
[in this window]
[in a new window]
 
Fig. 2. Phylogenetic relationship between the GPCRs (TMI–TMVII) in the human genome. The tree was calculated using the maximum parsimony method on 1000 replicas of the data set terminally truncated GPCR as described under Materials and Methods. The position of the rhodopsin family was established by including twenty random receptors from the rhodopsin family. These branches were removed from the final figure and replaced by an arrow toward the rhodopsin family analysis in Fig. 3.

 




View larger version (63K):
[in this window]
[in a new window]
 
Fig. 3. The phylogenetic relationship between GPCRs (TMI–TMVII) in the human rhodopsin family. The tree was calculated using the maximum parsimony method on 300 replicas. The position of the olfactory cluster was established by including 17 diverse random receptors from the olfactory cluster. These branches were removed from the final figure and replaced by an arrow toward the olfactory receptor cluster.

 
The receptors in all of the main families, except the rhodopsin family, have long N termini, whereas the rhodopsin family has only a few members with this characteristic. These long N termini are especially evident for the receptors within the adhesion family, but the secretin, glutamate, and frizzled receptors have also rather long N termini that are fairly rich in Cys residues. The only significant common feature of the proteins is the seven TM stretch; from an evolutionary perspective, it could be misguided to include these diverse and long N termini in the analysis. The number of evolutionary events needed for generating long N termini is likely to be more related to, for example, the number of domains than the replacements of single amino acids used for the phylogenetic calculations. Therefore, we decided to use the truncated receptors, where we use the sequence from the start of the TMI to the end of TMVII, for the main tree presented in Fig. 2. Each of the receptors was thus manually cut to provide this data set.

Below we give comments to our results for each of the families. The number of receptors in each family is indicated in parentheses. At the end of each section, we list the receptor names. First, we give the sequence identification name in bold. We provide the HUGO name in parenthesis in those cases in which it is different from the name we found to be most appropriate, for various reasons, except for the chemokine receptors (found in the rhodopsin family). HUGO lists only a few chemokine receptors, and the current naming system is thus not appropriate until it is more complete. We did not add their names in parenthesis, because we would have ended up with the same name for different receptors in our lists. After the name, we list the sequence accession code followed by the chromosomal position. We want the reader to be aware that many of the receptors have multiple additional names; a list with alternative names, which can be found online (http://www.neuro.uu.se/medfarm/schiothArt.html), includes many of the names provided by ENSEMBL (http://www.ensembl.org/).

The Secretin Receptor Family (15)
The receptors in the secretin family bind rather large peptides that share high amino acid identity and most often act in a paracrine manner. The secretin family name is related to the fact that the secretin receptor was the first one to be cloned in this family. The term "secretin-like receptor" has also frequently been used in the literature for receptors in this cluster. This group basically corresponds to clan B of the A-F system. The N terminus, between ~60 and 80 amino acids long, contains conserved Cys bridges and is particularly important for binding of the ligand to these receptors. The N terminus of the vasoactive intestinal peptide receptor (VIPR) and pituitary adenylyl cyclase-activating protein (PACAP) receptors alone constitutes a functional binding site for the ligand. Members of this family are the calcitonin receptor (CALCR), the corticotropin-releasing hormone receptors (CRHRs), the glucagon receptor (GCGR), the gastric inhibitory polypeptide receptor (GIPR), the glucagon-like peptide receptors (GLPRs), the growth hormone-releasing hormone receptor (GHRHR), PACAP, the parathyroid hormone receptors (PTHR), the secretin receptor (SCTR), and VIPR. The tree has four main subgroups: the CRHRs/CALCRLs, the PTHRs, GLPRs/GCGR/GIPR and the subgroup including secretin and four other receptors. Most of these receptors, 11 of 15, belong to the HOX paralogon, 2q/12q/17q/7/(3p) (see Fig. 4):



View larger version (57K):
[in this window]
[in a new window]
 
Fig. 4. The positioning of the GPCRs in paralogon groups in the human genome. Frames indicate the paralogons (PGs) according to Lundin (1993Go), Holland et al. (1994Go), Katsanis et al. (1996Go), Sidow (1996Go), Pebusque et al. (1998Go), Kasahara (1999Go), and Holland (1999Go), further extended in Popovici et al. (2001Go). Red, 2q/12q/17q/7/(3p) [PG 10 (HOX paralogon)]; dark blue, 1p/3p/7/22q (PG 11), light blue: 3q/13q/11q14-q25/17p/19q/Xq (PG 6/7); dark green, 1/5p-q21/6p21-p25/9/15q11-q26/19p (PG 3), light green: 1p3, 2p, 8q, 6, 16q, 18 and 20q (PG 13/14); orange, 4p16.3, 5q, 10q21-26, 8p12-22/2p11-23 (PG 9 [Meta HOX)]; yellow, 1p21.1-p13.1,1q1-q44/11p/12/19q (PG 1); purple, 4q/5q/13q/X [PG 8 (ParaHOX)]; brown, 7/16p/17/22q (PG 12); black, 1q23-q44/2p22-p25/11q13.1-q23.4/14q/15q11-q26/19q/20p (PG 4).

 

CALCR, NP_001733 [GenBank] .1, 7q21.3; CALCRL, NP_005786 [GenBank] .1, 2q21 [PDB] .1-q21.3; CRHR1, NP_004373 [GenBank] .1, 17q21.31; CRHR2, NP_001874 [GenBank] .1, 7p14.3; GCGR, NP_000151 [GenBank] .1, 17q25.3; GHRHR, NP_000814 [GenBank] .1, 7p14; GIPR, NP_000155 [GenBank] .1, 19q13.3; GLP1R, NP_002053 [GenBank] .1, 6p21.2; GLP2R, NP_004237 [GenBank] .1, 17p11.2; PACAP, NP_001109 [GenBank] .1, 7p14; PTHR1, NP_000307 [GenBank] .1, 3p21 [PDB] .31; PTHR2, NP_005039 [GenBank] .1, 2q33; SCTR, NP_002971 [GenBank] .1, 2q14.1; VIPR1, NP_004615 [GenBank] .1, 3p22.1; VIPR2, NP_003373 [GenBank] .1, 7q36.3

The Adhesion Receptor Family (24)
This rather new and peculiar family of GPCRs consists of receptors with GPCR-like transmembrane-spanning regions fused together with one or several functional domains with adhesion-like motifs in the N terminus, such as EGF-like repeats, mucin-like regions, and conserved cysteine-rich motifs (for overview on the N termini in some of these receptors, see Hayflick, 2000Go; Harmar, 2001Go). The N termini are variable in length, from about 200 to 2800 amino acids long, and are often rich in glycosylation sites and proline residues, forming what has been described as mucin-like stalks. The family name "adhesion" relates to these long N termini, which contains motifs that are likely to participate in cell adhesion (McKnight and Gordon, 1998Go; Stacey et al., 2000Go). Some receptors in this family have been termed secretin-like receptors, and the latrotoxin receptors have previously been placed into clan B (Flower, 1999Go) or clan B2 (Harmar, 2001Go), but our analysis clearly shows that they belong to a distinct family of their own. The bootstrap values for the adhesion and the secretin families are also very high at 789 and 862, respectively, indicating clear distinction between the families. The analysis of the full-length proteins also indicates distinction between the secretin and adhesion families (data not shown). Although the phylogenetic analyses by Harmar (2001Go) does not stretch beyond "clan B" (secretin and adhesion), it basically supports our conclusion of separate clusters of secretin and adhesion receptors. Our analysis shows that several of the receptors appear in clusters of three or four; the CELSRs (EGF LAG seven-pass G-type receptors), the brain-specific angiogenesis-inhibitory receptors (BAIs), the lectomedin receptors (LECs) and the EGF-like module containing (EMRs). CD97 antigen receptor (CD97) and EGF-TMVII-latrophilin-related (ETL) also group with these on a separate main branch. CD97 share highest sequence similarity with EMR2 (56%), which is higher than the level of identity within the EMRs. The EMRs and CD97 are all positioned on 19p31, indicating that they may have arisen through several local gene duplications. The other main branch includes HE6 (TMVIILN2) and GPR56 (TMVIIXN1 or TMVIILN4) and a group of recently discovered receptors, related to GPR56 and HE6, named GPR97 and GPR110 to GPR116 (Fredriksson et al., 2002Go). The N termini of the receptors in this branch have varying lengths and relatively few identified functional domains compared with the other main branch of the adhesion receptors. Most of the genes of the entire adhesion family are positioned within the paralogon 1/5p-q21/6p21-p25/9/15q11-q26/19p providing support for their common ancestry (Fig. 4): BAI1, NP_001693 [GenBank] .1, 8q24; BAI2, NP_001694 [GenBank] .1, 1p35 [PDB] ; BAI3, NP_001695 [GenBank] .1, 6q12; CELSR1, NP_055061 [GenBank] .1, 22q13.3; CELSR2, NP_001399 [GenBank] .1, 1p21; CELSR3, NP_001398 [GenBank] .1, 3p21 [PDB] .31; CD97, NP_001775 [GenBank] .1, 19p13.13; EMR1, NP_001965 [GenBank] .1, 19p13.3; EMR2, NP_038475 [GenBank] .1, 19p13.1; EMR3, NP_115960 [GenBank] .1, 19p13.3; ETL, NP_071442 [GenBank] .1, 1p33 [PDB] -p32; GPR97, AY140959 [GenBank] , 16q13; GPR110, AY140952 [GenBank] , 6p12.3; GPR111, AY140953 [GenBank] , 6p12.3; GPR112, AY140954 [GenBank] , Xq26.3; GPR113, AY140955 [GenBank] , 2p23.3; GPR114, AY140956 [GenBank] , 16q13; GPR115, AY140957 [GenBank] , 6p12.3; GPR116, AY140958 [GenBank] , 6p12.3; HE6 (GPR64), NP_005747 [GenBank] .1, XP22.22; LEC1, NP_036434 [GenBank] .1, 1p31 [PDB] .1; LEC2, NP_055736 [GenBank] .1, 19p13.2; LEC3, NP_056051 [GenBank] .1, 4q13.1; GPR56 (TMVIIXN1), NP_003263 [GenBank] .1, 1q42 [PDB] -q43

The Glutamate Receptor Family (15)
This family of receptors consists of eight metabotropic glutamate receptors (GRM), two GABA receptors (e.g., GAB-AbR1, which has two splice variants, a and b, and GAB-AbR2), a single calcium-sensing receptor (CASR), and five receptors that are believed to be taste receptors (TAS1). This group basically corresponds to what has been called clan C receptors. Several other GABA receptors are found in the human genome, but these are ion channels. The ligand recognition domain in the metabotropic glutamate is found in the N terminus of ~280 to 580 amino acids, and it has been proposed to share structural homology with bacterial amino acid binding proteins, such as LIVBP. The N terminus is believed to form two distinct lobes separated by a cavity in which glutamate binds, forming a so-called "Venus fly trap" where the glutamate causes the lobes to close around the ligand. The CASR also has a long cysteine-rich N terminus, but it is uncertain if it is involved in the binding of Ca2+, even though it is important for mediating the signal of Ca2+. The N-terminal of the GABA receptors is long and contains the ligand-binding site but lacks the cysteine-rich domain found in the other receptors of this family. The TAS1 receptors also have a long N terminus with a series of conserved Cys residues. They are expressed in the tongue and are likely to mediate taste signals. CASR falls with the TAS1 receptors, whereas the two GABA receptors branch basally in the family. GRM2 and GRM3 share 67% sequence identity and are located in chromosomal regions 3p and 7q, respectively. GRM7 and GRM8 share 74% sequence identity and are also positioned on 3p and 7q. These regions are both part of the postulated 1p/3p/7/22q paralogon, supporting a common ancestry (Fig. 4):

CASR, NP_000379 [GenBank] .1, 3q21.1; GABBR1, NP_001461 [GenBank] .1, 6p21.1; GABBR2(GPR51), NP_005449 [GenBank] .1, 9q22.1-q22.3; GRM1, NP_000829 [GenBank] .1, 6q24.3; GRM2, NP_000830 [GenBank] .1, 3p21 [PDB] .31; GRM3, NP_000831 [GenBank] .1, 7q21.12; GRM4, NP_000832 [GenBank] .1, 6p21.1; GRM5, NP_000833 [GenBank] .1, 11q21.1; GRM6, NP_000834 [GenBank] .1, 5q35.3; GRM7, NP_000835 [GenBank] .1, 3p21 [PDB] .1; GRM8, NP_000836 [GenBank] .1, 7q31.3-q32.1; GPRC6A, NP_683766 [GenBank] .1, 6q22.1; TAS1R1, NP_619642 [GenBank] , 1p36.23; TAS1R2, NP_689418 [GenBank] .1, 1p36.2; TAS1R3, XP_060177.1, 1p36.33

The Frizzled/Taste2 Receptor Family (24)
This group includes two distinct clusters, the frizzled receptors and the TAS2 receptors. We were surprised that the TAS2 receptors clustered together with the frizzled receptors with a high bootstrap value. There are no obvious similarities between the receptors in the frizzed branch and the taste branch of this receptor family. However, when we compared the TAS2 receptors consensus sequence against an HMM model of the frizzled receptor branch, several features may explain why these two groups of receptors cluster together, such as consensus sequence of IFL in TMII, SFLL in TMV, and SxKTL in TMVII. None of these motifs is found in the consensus sequences of the other four families. The TAS2 receptors showed no clear similarities with the TAS1 receptors in the glutamate receptor family. The TAS2 receptors show clearly seven hydrophobic regions in a hydrophobicity plot but they have a very short N terminus that is unlikely to contain a ligand binding domain. Rather little is known about the role and function of the TAS2 receptors except that they are expressed in the tongue and palate epithelium, and it is believed that they function as bitter taste receptors. We found 13 TAS2 receptors in the human databases. Two of the receptors we found were not previously annotated or found in any database. We approached the HUGO Gene Nomenclature Committee at University College London and they confirmed that the sequences were unique and not public. The committee provided these receptors with new GPR numbers (GPR59 and GPR60). These numbers had previously been preliminarily assigned to other receptors but were never used, which explains the low GPR numbers.

The frizzled receptors control cell fate, proliferation, and polarity during metazoan development by mediating signals from secreted glycoproteins termed Wnt. The frizzled name was first used for a receptor cloned from D melanogaster, and the frizzled name (referring to the curled and twisted Wnt ligand) has frequently been used for this relatively recently discovered cluster of receptors. It has been shown that Wnt ligand binding to the rat F2DR can induce G-protein coupling (Slusarski et al., 1997Go), providing evidence that the frizzled proteins are GPCRs. This has also been supported by previous phylogenetic analyses showing some structural relationship to GPCRs (Barnes et al., 1998Go). The frizzled family of receptors have a 200-amino acid N terminus with conserved cysteines that are likely to participate in Wnt binding. The frizzled family consists of 10 frizzled receptors, FZD1–10, together with SMOH, which is the most divergent receptor of the family, sharing only 24% identity with FZD2 and less with the others. The topology of the tree shows four main clusters of the frizzled branch of receptors; the cluster containing FZD1, -2, and -7 share approximately 75% identity with each other, FZD8 and -5 share 70% identity, FZD 10, 9, and 4 share ~65% identity, and finally, FZD6 and -3 share 50% amino acid identity. The identities shared by receptors from different clusters are between 20 and 40%, indicating that four parental genes from the frizzled family were formed initially and the four clusters of receptors were subsequently formed out of these. All the frizzled genes, except FZD6, -3, and -8, are located in the chromosomal regions belonging to the HOX paralogy group. In addition, the phylogeny does indicate that the frizzled family was expanded in the two genome duplications proposed to have occurred basally in the vertebrate lineage (see Introduction). This is supported by the fact that the FZD7, -1, and -2 genes are located on different paralogous chromosomes, as are FZD9 and -10. However, if this scenario is true, several genes were lost (for example, all other copies of the SMOH gene). Interestingly, all the taste2 receptors from this group are located in the 1p3/3q/7q/12p/17p paralogon, indicating that some of these genes were present early in vertebrate evolution. The fact that the genes are clustered on chromosome 7q31 and 12p13 suggests that this family expanded through several local gene duplications. Noteworthy is that two of the frizzled receptors, FZD9 and SMOH, are also located in the same paralogon:

FZD1, NP_003496 [GenBank] .1, 7q21.13; FZD2, NP_001454 [GenBank] .1, 17q21.31; FZD3, NP_059108 [GenBank] .1, 8p21.1; FZD4, NP_036325 [GenBank] .1, 11q14.2; FZD5, NP_003459 [GenBank] .1, 2q33-q34; FZD6, NP_003497 [GenBank] .1, 8q22.3-q23.1; FZD7, NP_003498 [GenBank] .1, 2q33; FZD8, NP_114072 [GenBank] .1, 10p11.21; FZD9, NP_003459 [GenBank] .1, 7q11.23; FZD10, NP_009128 [GenBank] .1, 12q24.33; SMOH, NP_005622 [GenBank] .1, 7q32.1; TAS2R13, NP_076409 [GenBank] , 12p13; TAS2R14, NP_076411 [GenBank] .1, 12p13; TAS2R7, NP_076408 [GenBank] .1, 12p13; TAS2R9, NP_076406 [GenBank] .1, 12p13; TAS2R8, NP_76407.1, 12p13.2; TAS2R3, NP_058639 [GenBank] .1, 7q31.3-q32; TAS2R10, NP_076410 [GenBank] .1, 12p13; TAS2R5, NP_061853 [GenBank] .1, 7q31.3-q32; TAS2R4, NP_058640 [GenBank] .1, 7q31.3-q32; TAS2R1, NP_062545 [GenBank] .1, 5p15; TAS2R16, NP_58641.1, 7q31.1-q31.3; GPR59, XP_069626, 7q33; GPR60, XP_090424, 7q33

The Rhodopsin Family (241 Nonolfactory, Total of 701)
The rhodopsin family has the largest number of receptors and overall analysis is shown in Fig. 3 (except the olfactory cluster; see comments below). The rhodopsin family corresponds to what has previously been called either the rhodopsin-like receptors or clan A in the A-F classification system. The rhodopsin family has several characteristics such as NSxxNPxxY motif in TMVII, the DRY motif or D(E)-R-Y(F) at the border between TMIII and IL2. Only a few receptors do not comply with these motifs, but these have other "fingerprint" elements that clearly link them to the rhodopsin family, apart from the phylogenetic analysis. The crystal structure of bovine rhodopsin has been revealed (Palczewski et al., 2000Go). Bovine rhodopsin has highest homology to rhodopsin (RHO) in the opsin receptor group. It should be noted that bacteriorhodopsin has no sequence similarity with the GPCR receptors in the human genome (Josefsson, 1999Go). The ligands for most of the rhodopsin receptors bind within a cavity between the TM regions (Baldwin, 1994Go). There are, however, important exceptions to this, in particular for the glycoprotein binding receptors (LH, FSH, TSH, and LG), where the ligand-binding domain is in the N terminus. Our analysis showed four main groups. We have opted to call these main groups {alpha}, {beta}, {gamma}, and {delta}. Results for each of the groups are described below.

The {alpha}-Group of Rhodopsin Receptors (89). This group has five main branches: the prostaglandin receptor cluster, amine receptor cluster, opsin receptors cluster, melatonin receptor cluster, and MECA receptor cluster. The bootstrap values that define these branches are very high (267, 262, 290, 299, and 239 of 300, respectively); these are highlighted in bold in Fig. 3.

The prostaglandin receptor cluster (15). This branch has eight prostaglandin receptors and seven orphan receptors. The prostaglandin receptors (PTGERs) are between 19 and 41% identical and share motifs in TMVII (IXDPW), and in the TMI (LXXTDXXG). The PTGERs, except PTGDR and PTGER4, belong to the paralogous regions on chromosomes 1/5p-q21/6p21-p25/9/15q11-q26/19p, further supporting the likelihood that the receptors in this group share a common evolutionary origin (Fig. 4). PTGDR and PTGER4 belong to the 1q23-q44/2p22-p25/11q13.1-q23.4/14q/15q11-q26/19q/20p paralogon:

TBXA2R, NP_001051 [GenBank] .1, 19p13.3; PTGER3, NP_000948 [GenBank] .1, 1p31 [PDB] ; PTGER2, NP_000947 [GenBank] .1, 1q22.1; PTGDR, XP_051711.1, 14q22.1; PTGER4, NP_000949 [GenBank] .1, 5p12; PTGIR, NP_000951 [GenBank] .1, 19q13.31; PTGER1, NP_000946 [GenBank] .1, 19p13.12; PTGFR, NP_000950 [GenBank] .1, 1p31 [PDB] .1; SREB3, NP_061842 [GenBank] .1, Xp11; GPR26, XP_061555.1, 10q26.2; SREB1(GPR27), NP_061844 [GenBank] .1, 3p21 [PDB] -p14; SREB2(GPR85), NP_061843 [GenBank] .1, 7q31; GPR61, NP_114142 [GenBank] , 1p13.3; GPR62, NT_005975.6, 3p21 [PDB] .31; GPR78, NT_006307 [GenBank] .5, 4p16.1

The amine receptor cluster (40). The biogenic amine receptor group contains serotonin receptors (HTR), dopamine receptors (DRD), muscarinic receptors (CHRM), histamine receptors (HRH), adrenergic receptors (ADR), trace amine receptors (TAR), and several orphan receptors. All the known ligands of the receptors in this group are structurally related small amine molecules with a single aromatic ring. The degree of sequence conservation varies among the different classes. The HTRs display a heterogeneous phylogenetic pattern. Two distinct subgroups can be seen, the HTR2s and HTR1B-1F. The rest of the HTRs branch separately or together with other biogenic amine receptors. These receptors are positioned near each other on chromosome 5q, suggesting early local gene duplication. The ADRs form three clusters in the phylogenetic tree, resulting in branches containing ADRA1, ADRA2, and ADRB, respectively. The three clusters could be a result of the postulated vertebrate genome duplications because the receptor genes, with a few exceptions, are positioned within the MetaHOX paralogon (Lundin, 1993Go; Coulier et al., 2000Go). This could explain why the sequence identities within the clusters are more than 45%, whereas the identities between the groups are about 25%. The TAR subgroup shares 37 to 82% sequence identity and the receptors are all positioned on chromosome 6q23, suggesting several early and late local gene duplications. This is evident also in rat, having 14 different TARs with high sequence identity, indicating an ongoing expansion of this gene family in mammals. Two orphan GPCRs, GPR57 and GPR58, share sequence similarities with the TARs. Several motifs, including RKAAKTLG in TMVI and FKQLHXPTN in TMI, together with the chromosomal data, strengthens their relationship to the TARs. CHRMs form the most homogenous cluster within the amine group, sharing between 40 and 50% identity. This can be seen in the tree with the receptors grouping together with strong bootstrap support. The DRDs appear in two clusters in the tree: with DRD2, DRD3, and DRD4 on one branch, placing DRD4 most basal, and DRD1 and DRD5 together with the {beta}-adrenergic receptors. Identities within the dopamine clusters are 38 to 52% and 54%, respectively. The sequence identities between the clusters are ~27%, whereas ADRAB1 and DRD1 are 31% identical. The serotonin receptors are the largest group, with 13 members distributed more or less over the entire amine group tree, in general sharing low sequence identity, often as low as 20%:

HTR1A, NP_000515 [GenBank] .1, 5q11.2-q13; HTR5(HTR5A), NP_076917 [GenBank] .1, 7q36.3; HTR7, NP_000863 [GenBank] .1, 10q21-q24; HRH2, NP_071640 [GenBank] .1, 5q35.2; HTR4, NP_000861 [GenBank] .1, 5q31-q33; HTR6, NP_000862 [GenBank] .1, 1p36-q35; ADRA1A, NP_000671 [GenBank] .1, 8p21.2; ADRA1D, NP_000669 [GenBank] .1, 20p13; ADRA1B, NP_000670 [GenBank] .1, 5q33.1; ADRB1, NP_000675 [GenBank] .1, 10q25.3; ADRB3, NP_000016 [GenBank] .1, 8p12-p11.2; ADRB2, NP_000015 [GenBank] .1, 5q32; DRD5, NP_000789 [GenBank] .1, 4p16.1; DRD1, NP_000785 [GenBank] .1, 5q35.2; HTR2B, NP_000858 [GenBank] .1, 2q36.3-q37.1; HTR2A, NP_000612 [GenBank] .1, 13q14-q21; HTR2C, NP_000859 [GenBank] .1, Xq24; TAR1, AAK71236 [GenBank] ; 8q23.2; PNR, NP_003958 [GenBank] .1, 6q23; TAR3, AAK71240 [GenBank] ; 6q23.2; TAR4, AAK71243 [GenBank] ; 6q23.2; TAR5(GPR102), NP_444508 [GenBank] .1, 6q23.2; GPR58, NP_055441 [GenBank] .1, 6q24; GPR57, NP_055442.1, 6q23.2; HTR1B, NP_000854 [GenBank] .1, 6q13; HTR1D, NP_008555 [GenBank] .1, 1p36.3-p34.3; HTR1E, NP_000856 [GenBank] .1, 6q14-q15; HTR1F, NP_000857 [GenBank] .1, 3p12; ADRA2B, NP_000673 [GenBank] .1, 3p13-q13; ADRA2A, NP_000672 [GenBank] .1, 10q25.2; ADRA2C, NP_000674 [GenBank] .1, 4p16; DRD4, NP_000788 [GenBank] .1, 11p15.5; DRD3, NP_000787 [GenBank] .1, 3q13.3; DRD2, NP_000786 [GenBank] .1, 11q23; HRH4, NP_067830.1, 18q11.2; CHRM4, NP_000732 [GenBank] .1, 11p12-p11.2; CHRM2, NP_000730 [GenBank] .1, 7q31-q35; CHRM1, NP_000729 [GenBank] .1, 11q13; CHRM3, NP_000731 [GenBank] .1 1q43 [PDB] ; CHRM5, NP_036257 [GenBank] .1, 15q26

The opsins receptor cluster (9). This cluster of receptors comprises the rod visual pigment (RHO), the three cone visual pigments (OPN1SW, OPN1LW, OPN1MW), the peropsin (RRH), the encephalopsin (OPN3), the melanopsin (OPN4), and the retinal G-protein-coupled receptor (RGR). The opsins are the only GPCRs that are known to respond to light, and none of the receptors are known to bind any physical ligand. OPN1LW and OPN1MW are found in the same chromosomal position, Xq28. These two proteins are more than 96% identical, indicating, together with the fact that they are positioned near one another on Xq, that they share a recent common ancestor. Phylogenetic comparison of opsins in different species also indicates that the duplication is specific for mammals. The phylogenetic analysis divides the group into three branches; RHO/OPN1SW/OPN1LW/OPN1MW, RRH/RGR, and OPN3/OPN4. The chromosomal localization of these receptors is not consistent with any paralogy group, but it is worth noting that RGR and OPN4 are found in the same chromosomal position, 10q23:

GPR21, NP_005285 [GenBank] .1, 9q33; GPR52, NP_005675 [GenBank] .1, 1q24 [PDB] ; RHO, NP_000530 [GenBank] .1, 3q21-q24; OPN1LW, NP_064445 [GenBank] .1, Xq28; CBP; OPN1MW, NP_000504 [GenBank] .1, Xq28; OPN1SW, NP_001699 [GenBank] .1, 7q31.3-q32; RRH, NP_006574 [GenBank] .1, 4q; OPN3, NP_055137 [GenBank] .1, 1q43 [PDB] ; OPN4, NP_150598 [GenBank] .1, 10q22

The melatonin receptor cluster (3). The analysis discerns two subgroups in this tree: the melatonin receptors (MTNR1A, MTNR1B) together with the orphan receptor GPR50. GPR50 has an extended C-terminal end compared with the MTNRs, whereas the other regions of the receptors most closely resemble MTNRs, especially in the third TM helix, which is almost identical. GPR50 and MTNR1A both belong to the ParaHOX paralogon (Fig. 4):

GPR50, NP_004215 [GenBank] .1, Xq28; MTNR1A, NP_005949 [GenBank] .1, 4q35.1; MTNR1B, NP_005950 [GenBank] .1, 11q21-q22

The MECA receptor cluster (22). This group consists of the melanocortin receptors (MCRs), endothelial differentiation G-protein coupled receptors (EDGRs), cannabinoid receptors (CNRs), and adenosin binding receptors (ADORAs). Three orphan receptors also belong to this group (GPR-3, -6, and -12). It is interesting to note that the receptors in this group bind structurally different ligands; melanocyte stimulating hormone (13-residue peptide, MCRs); lysophosphatidic acid (lipid, EDGRs), and anandamide (arachidonylethanolamide, CNRs) and adenosine. The orphan receptors are 55% identical to each other and roughly 25% identical to the MCRs. The orphans share several motifs with the MCRs, such as PM(Y/F)X(F/L)X(C/G)SLAXADXL in TMIII, ALXY(H/Y) in TMIV, and PXIYAFR in TMVII. The CNRs share 39% identity to each other and their chromosomal positions indicate a common ancestor, because both genes are located in the paralogous group involving the positions 1p3 and 6q (Spring, 1997Go) (Fig. 4). GPR3 and GPR6 share the same chromosomal positions as the CNRs, which may indicate that these orphans share a common ancestor with the CNRs. The MCRs shares between 39 and 56% identity and belong to the 8q/16q/18/20q paralogon, supporting the idea that they share a common ancestor (Fig. 4). The EDG receptors form clusters at chromosome 1p, 9q, and 19p, suggesting two common ancestors together with one extra gene duplication at position 19p, resulting in two EDGRs at 1p and 9q, together with four EDGRs at chromosome 19p. These genes are all positioned in the paralogy group that was first proposed by Katsanis et al. (1996Go) and subsequently expanded by Popovici et al. (2001Go) 1/5p-q21/6p21-p25/9/15q11-q26/19p (Fig. 4). All the adenosine receptors except ADORA1 are located in the paralogy group 7/16p/17/22q (Fig. 4):

ADORA3, NP_000668 [GenBank] .1, 1p13.3; ADORA1, NP_000671 [GenBank] .1, 8p21.2; ADORA2A, NP_000666 [GenBank] .1, 22q11.23; ADORA2B, NP_000667 [GenBank] .1, 17q12; GPR3, NP_005272 [GenBank] .1, 1p35 [PDB] .3; GPR12, NP_005279 [GenBank] .1, 13q12.13; GPR6, NP_005275 [GenBank] .1, 6q21 [PDB] ; MC2R, NP_000520 [GenBank] .1, 18p11.2; MC1R, NP_002377 [GenBank] .1, 16q24.3; MC3R, NP_063941 [GenBank] .1, 20q13.31; MC4R, NP_005903 [GenBank] .1, 18q22; MC5R, NP_005904 [GenBank] .1, 18p11.2; EDG7, NP_036284 [GenBank] .1, 1p22.3; EDG2, NP_001392 [GenBank] .1, 9q31.3; EDG4, NP_004711 [GenBank] .1, 19p12; EDG8, NP_110387 [GenBank] .1, 19p13.2; EDG5, NP_004221 [GenBank] .1, 19p13.2; EDG6, NP_003766 [GenBank] .1, 19p13.3; EDG3, NP_005217 [GenBank] .1, 9q22.1; EDG1, NP_001391 [GenBank] .1, 1p21; CNR1, NP_001831 [GenBank] .1, 6q15; CNR2, NP_001832 [GenBank] .1, 1p36.11

The {beta}-Group of Rhodopsin Receptors (35). This group has no main branches and includes 36 receptors (Fig. 3). All the known ligands to these receptors are peptides. The group includes the hypocretin receptors (HCRTRs), the neuropeptide FF receptors (NPFFs), the tachykinin receptors (TACRs), the cholecystokinin receptors (CCKs), the neuropeptide Y receptors (NPYRs), the endothelin-related receptors (EDNR and ETBRLP1/2), gastrin-releasing peptide receptor (GRPR), the neuromedin B receptor (NMBR), the uterinbombesin receptor (BRS3), the neurotensin receptors (NTSRs), the growth hormone secretagogues receptor (GHSR), the neuromedin receptors (NMURs), the thyrotropin releasing hormone receptor (TRHR), the ghrelin receptor, arginine vasopressin receptors (AVPRs), the gonadotropin-releasing hormone receptors (GNRHRs), and the oxytocin receptor (OXTR) and orphan receptor.

The NPY5R groups with the CCK receptors rather than with the other NPY receptors. This might seem confusing, but it is consistent regardless of the method used (maximum parsimony, neighbor joining). One reason for this topology is that the NPY5R has a large third extracellular loop that is not present in the other NPYRs but is found in the CCK receptors. This feature might be the reason for this seemingly large difference between the NPY5R and the other NPY receptors. If the third extracellular loop of the NPY5R is removed, the NPY5R places on the same branch as NPY2R (data not shown). Surprisingly, the NPY2R has a higher identity to PrRP and GPR72 than to the other NPY receptors. The receptor GPR118 is 27% identical to GPR72 whereas the identity to the other receptors on that branch is below 20%. Several of these receptor clusters (i.e., NPY, NPFF, CCK, TACR) are positioned within the MetaHOX paralogon, consisting of chromosomes 4, 5q, 10q21-26, 8p12-22, and 2p11-2