|
|
|
|
Department of Neuroscience, Uppsala University, Uppsala, Sweden (R.F., M.C.L., L.-G.L., H.B.S.); and Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden (R.F.)
Received December 23, 2002; accepted March 11, 2003
| Abstract |
|---|
|
|
|---|
The ligands for the GPCRs have tremendous variation; ions, organic
odorants, amines, peptides, proteins, lipids, nucleotides, and even photons
are able to mediate their message through these proteins. The GPCR proteins
are also highly variable. There are two main requirements for a protein to be
classified as a GPCR. The first requirement relates to seven sequence
stretches of about 25 to 35 consecutive residues that show a relatively high
degree of calculated hydrophobicity. These sequences are believed to represent
seven
-helices that span the plasma membrane in an counter-clockwise
manner, forming a receptor, or a recognition and connection unit, enabling an
extracellular ligand to exert a specific effect into the cell. The second
principal requirement is the ability of the receptor to interact with a
G-protein. There is a great diversity in the functional coupling of the GPCRs;
they have a number of alternative signaling pathways, interacting directly
with a number of other proteins. Interaction with G-proteins has not been
demonstrated for most GPCRs, in particular for those whose genes have just
recently been sequenced. It may therefore be more technically correct to term
this superfamily "seven transmembrane (TM) receptors", but the
GPCR terminology is more established.
Several classification systems have been used to sort out this superfamily.
Some systems group the receptors by how their ligand binds, and others have
used both physiological and structural features. One of the most frequently
used systems uses clans (or classes) A, B, C, D, E, and F, and subclans are
assigned using roman number nomenclature
(Attwood and Findlay 1994
;
Kolakowski, 1994
). This
AF system is designed to cover all GPCRs, in both vertebrates and
invertebrates. Some families in the AF system do not exist in humans.
Examples of this are clans D and E, which represent fungal pheromone receptors
and cAMP receptors, family IV in clan A, which is composed of invertebrate
opsin receptors, and clan F, which contains archaebacterial opsins. The
overall classification of the GPCRs has been hampered by the large sequence
differences between mammalian and invertebrate GPCRs. The GPCRs in
Drosophila melanogaster show in many cases little resemblance to
those in mammals (Broeck,
2001
). Certain species show also a high difference in the numbers
of receptor genes in different classes. Caenorhabditis elegans, a
worm, has, for example, developed a remarkable number of chemosensory
(olfactory) GPCRs related to the creature's specific lifestyle. Those
chemosensory receptors, as well as the olfactory receptors in D
melanogaster, do not show any clear resemblance to the olfactory
receptors in humans.
Gene duplication occurs both by individual duplication, which often leaves
the new gene near the parent gene, and by block duplications involving
chromosomal regions or entire chromosomes. Large-scale duplications, including
polyploidizations, are believed to be an important mechanism of vertebrate
evolution. Two rounds of large-scale duplications are thought to have occurred
in early vertebrate ancestry (Lundin,
1993
; Holland et al.,
1994
), resulting in up to four copies of each gene in mammals,
which originate from a common ancestor gene in a cephalochordate. It is now
known as the "2R hypothesis" or the "one-to-four
model". This has led to the construction of maps that contain paralogous
chromosomal regions, or paralogons
(Lundin, 1993
;
Holland et al., 1994
;
Katsanis et al., 1996
;
Popovici et al., 2001
), in
vertebrates, which in combination with phylogenetic analysis can provide
valuable information on gene relationships and origins.
In this study, we collected a large set of GPCR sequences in the human genome and performed multiple phylogenetic analyses. The first task was to compile a comprehensive data set with just a single copy of each gene. We wanted to avoid polymorphism, pseudogenes, duplicates (resulting from the same gene having multiple names), and other related problems. We identified more than 800 GPCRs in databases and simultaneously analyzed sequences of 342 unique functional nonolfactory human GPCRs and grouped them by phylogenetic analysis. The chromosomal localization and positioning in paralogous groups of the genes were studied to give insight into the mechanism involved in creating the receptor genes. The different families were also analyzed for common sequence motifs, and we discuss the evidence for common descent of the families.
| Materials and Methods |
|---|
|
|
|---|
To extend the data set, searches were made with all receptor sequences in the data set against the human genome protein database at NCBI. All genes were screened against the first version of the database to avoid duplicates. To identify possible novel receptors, not yet annotated in the human genome database at NCBI, we searched with a diverse set of GPCR receptors at the nucleotide level using BLASTX against the Genescan data set. A P value of 0.001 was used as a threshold or a maximum of 100 BLAST hits were analyzed for each search.
The genes were named according to the convention used in the human genome database at NCBI, although several orphan GPCRs, which recently had their ligands identified, were subsequently renamed according to recent literature. If no name was assigned to a specific sequence in the database, these were assigned GPR numbers as provided by the HUGO nomenclature committee. Sequences not present in the human genome database were given either an accepted name from the literature or the GenBank accession number. Accurate chromosomal positions were obtained from the University of California Santa Cruz "the golden gate" human genome database (http://genome.ucsc.edu), the Dec 2001 assembly. If not present in the public genome assembly, we used the chromosomal position from the Celera database (http://www.celera.com).
Alignment. Each data set was randomized 20 times with regard to
sequence input order using a program called Randfasta
(http://www.neuro.uu.se/medfarm/schiothSoft.html),
because the input order of sequences is known to affect the resulting
alignment. These 20 data sets, containing the full set of sequences but in
different order, were all aligned using the Win32 version of ClustalW 1.81
(Thompson et al., 1994
). The
default alignment parameters were applied.
Sequence Bootstrapping and Randomization. The 20 alignments were all
bootstrapped 50 times using SEQBOOT from the Phylip package
(Felsenstein, 1993
) to obtain a
total of 1000 different alignments from each dataset.
Neighbor-Joining Trees. Protein distances were calculated using
Protdist from the Win32 version of the Phylip package. For the calculation,
the Dayhof PAM matrix was used. The trees were calculated on the 20 different
distance matrixes, previously generated with Protdist, using neighbor from the
Phylip package, resulting in 20 files with 50 trees each. All trees were
unrooted. Because of limitations in the Consense program (version 3.5;
Felsenstein, 1993
), a consensus
tree for the complete rhodopsin family could not be calculated; therefore, 300
bootstrap replicas were used. The trees were plotted using Treeview
(http://taxonomy.zoology.gla.ac.uk/rod/treeview.html).
Maximum Parsimony Trees. Maximum parsimony trees were calculated from the same input files that were used for Protdist using Protpars from the Phylip package. The trees were unrooted and calculated using ordinary parsimony, and the topologies was obtained using the built-in tree search procedure. As above, consensus trees were calculated using Consense 3.5 from Phylip and trees were plotted using Treeview.
Calculating the Overall Relationship of the Main GPCR Families Using Random Selection of Genes. These calculations are based on all members from four of the main groups: secretin, frizzled, glutamate, and adhesion, together with 20 randomly selected rhodopsin receptors, selected using Randfasta. Randfasta was used to randomize the input order of sequence 20 times. The 20 datasets were aligned, sampled using SEQBOOT (50 replicas each), and 1000 parsimony trees were calculated using Protpars and consensus trees were calculated using Consense 3.5.
Fingerprint Analysis. For the fingerprint/motif analyses an approach
using Hidden Markov Models (HMM) was applied as implemented in the HMMR 2.1
package (Eddy, 1998
),
recompiled for WIN32 using Visual C++ 6.0. From the secretin, adhesion,
glutamate, rhodopsin and frizzled families, alignments of the entire coding
regions were constructed using ClustalW 1.81; from these alignments, one HMM
per family was calculated using the HMMbuild. The model allowed local
alignments within the HMM, global alignments with respect to the query
sequence, and multiple domains per sequence to hit. All HMMs were calibrated
using HMMcalibrate. To define the transmembrane regions statistically
described by the HMMs, the transmembrane region as described in the literature
for one of the members of each family was aligned to the respective HMM using
HMMsearch. The sequences used were FZD3, GRM1, GLP1, LEC1, and ADRB2. The
identified TM regions from the HMMs were subsequently aligned to each other,
region by region, using ClustalW 1.81, and conserved motifs were identified in
the HMM alignments by manual inspection.
| Results |
|---|
|
|
|---|
|
|
|
Below we give comments to our results for each of the families. The number of receptors in each family is indicated in parentheses. At the end of each section, we list the receptor names. First, we give the sequence identification name in bold. We provide the HUGO name in parenthesis in those cases in which it is different from the name we found to be most appropriate, for various reasons, except for the chemokine receptors (found in the rhodopsin family). HUGO lists only a few chemokine receptors, and the current naming system is thus not appropriate until it is more complete. We did not add their names in parenthesis, because we would have ended up with the same name for different receptors in our lists. After the name, we list the sequence accession code followed by the chromosomal position. We want the reader to be aware that many of the receptors have multiple additional names; a list with alternative names, which can be found online (http://www.neuro.uu.se/medfarm/schiothArt.html), includes many of the names provided by ENSEMBL (http://www.ensembl.org/).
The Secretin Receptor Family (15)
The receptors in the secretin family bind rather large peptides that share
high amino acid identity and most often act in a paracrine manner. The
secretin family name is related to the fact that the secretin receptor was the
first one to be cloned in this family. The term "secretin-like
receptor" has also frequently been used in the literature for receptors
in this cluster. This group basically corresponds to clan B of the A-F system.
The N terminus, between
60 and 80 amino acids long, contains conserved
Cys bridges and is particularly important for binding of the ligand to these
receptors. The N terminus of the vasoactive intestinal peptide receptor (VIPR)
and pituitary adenylyl cyclase-activating protein (PACAP) receptors alone
constitutes a functional binding site for the ligand. Members of this family
are the calcitonin receptor (CALCR), the corticotropin-releasing hormone
receptors (CRHRs), the glucagon receptor (GCGR), the gastric inhibitory
polypeptide receptor (GIPR), the glucagon-like peptide receptors (GLPRs), the
growth hormone-releasing hormone receptor (GHRHR), PACAP, the parathyroid
hormone receptors (PTHR), the secretin receptor (SCTR), and VIPR. The tree has
four main subgroups: the CRHRs/CALCRLs, the PTHRs, GLPRs/GCGR/GIPR and the
subgroup including secretin and four other receptors. Most of these receptors,
11 of 15, belong to the HOX paralogon, 2q/12q/17q/7/(3p) (see
Fig. 4):
|
CALCR, NP_001733 [GenBank] .1, 7q21.3; CALCRL, NP_005786 [GenBank] .1, 2q21 [PDB] .1-q21.3; CRHR1, NP_004373 [GenBank] .1, 17q21.31; CRHR2, NP_001874 [GenBank] .1, 7p14.3; GCGR, NP_000151 [GenBank] .1, 17q25.3; GHRHR, NP_000814 [GenBank] .1, 7p14; GIPR, NP_000155 [GenBank] .1, 19q13.3; GLP1R, NP_002053 [GenBank] .1, 6p21.2; GLP2R, NP_004237 [GenBank] .1, 17p11.2; PACAP, NP_001109 [GenBank] .1, 7p14; PTHR1, NP_000307 [GenBank] .1, 3p21 [PDB] .31; PTHR2, NP_005039 [GenBank] .1, 2q33; SCTR, NP_002971 [GenBank] .1, 2q14.1; VIPR1, NP_004615 [GenBank] .1, 3p22.1; VIPR2, NP_003373 [GenBank] .1, 7q36.3
The Adhesion Receptor Family (24)
This rather new and peculiar family of GPCRs consists of receptors with
GPCR-like transmembrane-spanning regions fused together with one or several
functional domains with adhesion-like motifs in the N terminus, such as
EGF-like repeats, mucin-like regions, and conserved cysteine-rich motifs (for
overview on the N termini in some of these receptors, see
Hayflick, 2000
;
Harmar, 2001
). The N termini
are variable in length, from about 200 to 2800 amino acids long, and are often
rich in glycosylation sites and proline residues, forming what has been
described as mucin-like stalks. The family name "adhesion" relates
to these long N termini, which contains motifs that are likely to participate
in cell adhesion (McKnight and Gordon,
1998
; Stacey et al.,
2000
). Some receptors in this family have been termed
secretin-like receptors, and the latrotoxin receptors have previously been
placed into clan B (Flower,
1999
) or clan B2 (Harmar,
2001
), but our analysis clearly shows that they belong to a
distinct family of their own. The bootstrap values for the adhesion and the
secretin families are also very high at 789 and 862, respectively, indicating
clear distinction between the families. The analysis of the full-length
proteins also indicates distinction between the secretin and adhesion families
(data not shown). Although the phylogenetic analyses by Harmar
(2001
) does not stretch beyond
"clan B" (secretin and adhesion), it basically supports our
conclusion of separate clusters of secretin and adhesion receptors. Our
analysis shows that several of the receptors appear in clusters of three or
four; the CELSRs (EGF LAG seven-pass G-type receptors), the brain-specific
angiogenesis-inhibitory receptors (BAIs), the lectomedin receptors (LECs) and
the EGF-like module containing (EMRs). CD97 antigen receptor (CD97) and
EGF-TMVII-latrophilin-related (ETL) also group with these on a separate main
branch. CD97 share highest sequence similarity with EMR2 (56%), which is
higher than the level of identity within the EMRs. The EMRs and CD97 are all
positioned on 19p31, indicating that they may have arisen through several
local gene duplications. The other main branch includes HE6 (TMVIILN2) and
GPR56 (TMVIIXN1 or TMVIILN4) and a group of recently discovered receptors,
related to GPR56 and HE6, named GPR97 and GPR110 to GPR116
(Fredriksson et al., 2002
).
The N termini of the receptors in this branch have varying lengths and
relatively few identified functional domains compared with the other main
branch of the adhesion receptors. Most of the genes of the entire adhesion
family are positioned within the paralogon 1/5p-q21/6p21-p25/9/15q11-q26/19p
providing support for their common ancestry
(Fig. 4): BAI1,
NP_001693
[GenBank]
.1, 8q24; BAI2, NP_001694
[GenBank]
.1, 1p35
[PDB]
; BAI3, NP_001695
[GenBank]
.1,
6q12; CELSR1, NP_055061
[GenBank]
.1, 22q13.3; CELSR2, NP_001399
[GenBank]
.1, 1p21;
CELSR3, NP_001398
[GenBank]
.1, 3p21
[PDB]
.31; CD97, NP_001775
[GenBank]
.1, 19p13.13;
EMR1, NP_001965
[GenBank]
.1, 19p13.3; EMR2, NP_038475
[GenBank]
.1, 19p13.1;
EMR3, NP_115960
[GenBank]
.1, 19p13.3; ETL, NP_071442
[GenBank]
.1, 1p33
[PDB]
-p32;
GPR97, AY140959
[GenBank]
, 16q13; GPR110, AY140952
[GenBank]
, 6p12.3; GPR111,
AY140953
[GenBank]
, 6p12.3; GPR112, AY140954
[GenBank]
, Xq26.3; GPR113, AY140955
[GenBank]
,
2p23.3; GPR114, AY140956
[GenBank]
, 16q13; GPR115, AY140957
[GenBank]
, 6p12.3;
GPR116, AY140958
[GenBank]
, 6p12.3; HE6 (GPR64), NP_005747
[GenBank]
.1, XP22.22;
LEC1, NP_036434
[GenBank]
.1, 1p31
[PDB]
.1; LEC2, NP_055736
[GenBank]
.1, 19p13.2;
LEC3, NP_056051
[GenBank]
.1, 4q13.1; GPR56 (TMVIIXN1), NP_003263
[GenBank]
.1,
1q42
[PDB]
-q43
The Glutamate Receptor Family (15)
This family of receptors consists of eight metabotropic glutamate receptors
(GRM), two GABA receptors (e.g., GAB-AbR1, which has two splice variants, a
and b, and GAB-AbR2), a single calcium-sensing receptor (CASR), and five
receptors that are believed to be taste receptors (TAS1). This group basically
corresponds to what has been called clan C receptors. Several other GABA
receptors are found in the human genome, but these are ion channels. The
ligand recognition domain in the metabotropic glutamate is found in the N
terminus of
280 to 580 amino acids, and it has been proposed to share
structural homology with bacterial amino acid binding proteins, such as LIVBP.
The N terminus is believed to form two distinct lobes separated by a cavity in
which glutamate binds, forming a so-called "Venus fly trap" where
the glutamate causes the lobes to close around the ligand. The CASR also has a
long cysteine-rich N terminus, but it is uncertain if it is involved in the
binding of Ca2+, even though it is important for
mediating the signal of Ca2+. The N-terminal of the GABA
receptors is long and contains the ligand-binding site but lacks the
cysteine-rich domain found in the other receptors of this family. The TAS1
receptors also have a long N terminus with a series of conserved Cys residues.
They are expressed in the tongue and are likely to mediate taste signals. CASR
falls with the TAS1 receptors, whereas the two GABA receptors branch basally
in the family. GRM2 and GRM3 share 67% sequence identity and are located in
chromosomal regions 3p and 7q, respectively. GRM7 and GRM8 share 74% sequence
identity and are also positioned on 3p and 7q. These regions are both part of
the postulated 1p/3p/7/22q paralogon, supporting a common ancestry
(Fig. 4):
CASR, NP_000379 [GenBank] .1, 3q21.1; GABBR1, NP_001461 [GenBank] .1, 6p21.1; GABBR2(GPR51), NP_005449 [GenBank] .1, 9q22.1-q22.3; GRM1, NP_000829 [GenBank] .1, 6q24.3; GRM2, NP_000830 [GenBank] .1, 3p21 [PDB] .31; GRM3, NP_000831 [GenBank] .1, 7q21.12; GRM4, NP_000832 [GenBank] .1, 6p21.1; GRM5, NP_000833 [GenBank] .1, 11q21.1; GRM6, NP_000834 [GenBank] .1, 5q35.3; GRM7, NP_000835 [GenBank] .1, 3p21 [PDB] .1; GRM8, NP_000836 [GenBank] .1, 7q31.3-q32.1; GPRC6A, NP_683766 [GenBank] .1, 6q22.1; TAS1R1, NP_619642 [GenBank] , 1p36.23; TAS1R2, NP_689418 [GenBank] .1, 1p36.2; TAS1R3, XP_060177.1, 1p36.33
The Frizzled/Taste2 Receptor Family (24)
This group includes two distinct clusters, the frizzled receptors and the
TAS2 receptors. We were surprised that the TAS2 receptors clustered together
with the frizzled receptors with a high bootstrap value. There are no obvious
similarities between the receptors in the frizzed branch and the taste branch
of this receptor family. However, when we compared the TAS2 receptors
consensus sequence against an HMM model of the frizzled receptor branch,
several features may explain why these two groups of receptors cluster
together, such as consensus sequence of IFL in TMII, SFLL in TMV, and SxKTL in
TMVII. None of these motifs is found in the consensus sequences of the other
four families. The TAS2 receptors showed no clear similarities with the TAS1
receptors in the glutamate receptor family. The TAS2 receptors show clearly
seven hydrophobic regions in a hydrophobicity plot but they have a very short
N terminus that is unlikely to contain a ligand binding domain. Rather little
is known about the role and function of the TAS2 receptors except that they
are expressed in the tongue and palate epithelium, and it is believed that
they function as bitter taste receptors. We found 13 TAS2 receptors in the
human databases. Two of the receptors we found were not previously annotated
or found in any database. We approached the HUGO Gene Nomenclature Committee
at University College London and they confirmed that the sequences were unique
and not public. The committee provided these receptors with new GPR numbers
(GPR59 and GPR60). These numbers had previously been preliminarily assigned to
other receptors but were never used, which explains the low GPR numbers.
The frizzled receptors control cell fate, proliferation, and polarity
during metazoan development by mediating signals from secreted glycoproteins
termed Wnt. The frizzled name was first used for a receptor cloned from D
melanogaster, and the frizzled name (referring to the curled and twisted
Wnt ligand) has frequently been used for this relatively recently discovered
cluster of receptors. It has been shown that Wnt ligand binding to the rat
F2DR can induce G-protein coupling
(Slusarski et al., 1997
),
providing evidence that the frizzled proteins are GPCRs. This has also been
supported by previous phylogenetic analyses showing some structural
relationship to GPCRs (Barnes et al.,
1998
). The frizzled family of receptors have a 200-amino acid N
terminus with conserved cysteines that are likely to participate in Wnt
binding. The frizzled family consists of 10 frizzled receptors, FZD110,
together with SMOH, which is the most divergent receptor of the family,
sharing only 24% identity with FZD2 and less with the others. The topology of
the tree shows four main clusters of the frizzled branch of receptors; the
cluster containing FZD1, -2, and -7 share approximately 75% identity with each
other, FZD8 and -5 share 70% identity, FZD 10, 9, and 4 share
65%
identity, and finally, FZD6 and -3 share 50% amino acid identity. The
identities shared by receptors from different clusters are between 20 and 40%,
indicating that four parental genes from the frizzled family were formed
initially and the four clusters of receptors were subsequently formed out of
these. All the frizzled genes, except FZD6, -3, and -8, are located in the
chromosomal regions belonging to the HOX paralogy group. In addition, the
phylogeny does indicate that the frizzled family was expanded in the two
genome duplications proposed to have occurred basally in the vertebrate
lineage (see Introduction). This is supported by the fact that the FZD7, -1,
and -2 genes are located on different paralogous chromosomes, as are FZD9 and
-10. However, if this scenario is true, several genes were lost (for example,
all other copies of the SMOH gene). Interestingly, all the taste2 receptors
from this group are located in the 1p3/3q/7q/12p/17p paralogon, indicating
that some of these genes were present early in vertebrate evolution. The fact
that the genes are clustered on chromosome 7q31 and 12p13 suggests that this
family expanded through several local gene duplications. Noteworthy is that
two of the frizzled receptors, FZD9 and SMOH, are also located in the same
paralogon:
FZD1, NP_003496 [GenBank] .1, 7q21.13; FZD2, NP_001454 [GenBank] .1, 17q21.31; FZD3, NP_059108 [GenBank] .1, 8p21.1; FZD4, NP_036325 [GenBank] .1, 11q14.2; FZD5, NP_003459 [GenBank] .1, 2q33-q34; FZD6, NP_003497 [GenBank] .1, 8q22.3-q23.1; FZD7, NP_003498 [GenBank] .1, 2q33; FZD8, NP_114072 [GenBank] .1, 10p11.21; FZD9, NP_003459 [GenBank] .1, 7q11.23; FZD10, NP_009128 [GenBank] .1, 12q24.33; SMOH, NP_005622 [GenBank] .1, 7q32.1; TAS2R13, NP_076409 [GenBank] , 12p13; TAS2R14, NP_076411 [GenBank] .1, 12p13; TAS2R7, NP_076408 [GenBank] .1, 12p13; TAS2R9, NP_076406 [GenBank] .1, 12p13; TAS2R8, NP_76407.1, 12p13.2; TAS2R3, NP_058639 [GenBank] .1, 7q31.3-q32; TAS2R10, NP_076410 [GenBank] .1, 12p13; TAS2R5, NP_061853 [GenBank] .1, 7q31.3-q32; TAS2R4, NP_058640 [GenBank] .1, 7q31.3-q32; TAS2R1, NP_062545 [GenBank] .1, 5p15; TAS2R16, NP_58641.1, 7q31.1-q31.3; GPR59, XP_069626, 7q33; GPR60, XP_090424, 7q33
The Rhodopsin Family (241 Nonolfactory, Total of 701)
The rhodopsin family has the largest number of receptors and overall
analysis is shown in Fig.
3 (except
the olfactory cluster; see comments below). The rhodopsin family corresponds
to what has previously been called either the rhodopsin-like receptors or clan
A in the A-F classification system. The rhodopsin family has several
characteristics such as NSxxNPxxY motif in TMVII, the DRY motif or D(E)-R-Y(F)
at the border between TMIII and IL2. Only a few receptors do not comply with
these motifs, but these have other "fingerprint" elements that
clearly link them to the rhodopsin family, apart from the phylogenetic
analysis. The crystal structure of bovine rhodopsin has been revealed
(Palczewski et al., 2000
).
Bovine rhodopsin has highest homology to rhodopsin (RHO) in the opsin receptor
group. It should be noted that bacteriorhodopsin has no sequence similarity
with the GPCR receptors in the human genome
(Josefsson, 1999
). The ligands
for most of the rhodopsin receptors bind within a cavity between the TM
regions (Baldwin, 1994
). There
are, however, important exceptions to this, in particular for the glycoprotein
binding receptors (LH, FSH, TSH, and LG), where the ligand-binding domain is
in the N terminus. Our analysis showed four main groups. We have opted to call
these main groups
,
,
, and
. Results for each of
the groups are described below.
The
-Group of Rhodopsin Receptors (89). This group has
five main branches: the prostaglandin receptor cluster, amine receptor
cluster, opsin receptors cluster, melatonin receptor cluster, and MECA
receptor cluster. The bootstrap values that define these branches are very
high (267, 262, 290, 299, and 239 of 300, respectively); these are highlighted
in bold in Fig.
3.
The prostaglandin receptor cluster (15). This branch has eight prostaglandin receptors and seven orphan receptors. The prostaglandin receptors (PTGERs) are between 19 and 41% identical and share motifs in TMVII (IXDPW), and in the TMI (LXXTDXXG). The PTGERs, except PTGDR and PTGER4, belong to the paralogous regions on chromosomes 1/5p-q21/6p21-p25/9/15q11-q26/19p, further supporting the likelihood that the receptors in this group share a common evolutionary origin (Fig. 4). PTGDR and PTGER4 belong to the 1q23-q44/2p22-p25/11q13.1-q23.4/14q/15q11-q26/19q/20p paralogon:
TBXA2R, NP_001051 [GenBank] .1, 19p13.3; PTGER3, NP_000948 [GenBank] .1, 1p31 [PDB] ; PTGER2, NP_000947 [GenBank] .1, 1q22.1; PTGDR, XP_051711.1, 14q22.1; PTGER4, NP_000949 [GenBank] .1, 5p12; PTGIR, NP_000951 [GenBank] .1, 19q13.31; PTGER1, NP_000946 [GenBank] .1, 19p13.12; PTGFR, NP_000950 [GenBank] .1, 1p31 [PDB] .1; SREB3, NP_061842 [GenBank] .1, Xp11; GPR26, XP_061555.1, 10q26.2; SREB1(GPR27), NP_061844 [GenBank] .1, 3p21 [PDB] -p14; SREB2(GPR85), NP_061843 [GenBank] .1, 7q31; GPR61, NP_114142 [GenBank] , 1p13.3; GPR62, NT_005975.6, 3p21 [PDB] .31; GPR78, NT_006307 [GenBank] .5, 4p16.1
The amine receptor cluster (40). The biogenic amine receptor group
contains serotonin receptors (HTR), dopamine receptors (DRD), muscarinic
receptors (CHRM), histamine receptors (HRH), adrenergic receptors (ADR), trace
amine receptors (TAR), and several orphan receptors. All the known ligands of
the receptors in this group are structurally related small amine molecules
with a single aromatic ring. The degree of sequence conservation varies among
the different classes. The HTRs display a heterogeneous phylogenetic pattern.
Two distinct subgroups can be seen, the HTR2s and HTR1B-1F. The rest of the
HTRs branch separately or together with other biogenic amine receptors. These
receptors are positioned near each other on chromosome 5q, suggesting early
local gene duplication. The ADRs form three clusters in the phylogenetic tree,
resulting in branches containing ADRA1, ADRA2, and ADRB, respectively. The
three clusters could be a result of the postulated vertebrate genome
duplications because the receptor genes, with a few exceptions, are positioned
within the MetaHOX paralogon (Lundin,
1993
; Coulier et al.,
2000
). This could explain why the sequence identities within the
clusters are more than 45%, whereas the identities between the groups are
about 25%. The TAR subgroup shares 37 to 82% sequence identity and the
receptors are all positioned on chromosome 6q23, suggesting several early and
late local gene duplications. This is evident also in rat, having 14 different
TARs with high sequence identity, indicating an ongoing expansion of this gene
family in mammals. Two orphan GPCRs, GPR57 and GPR58, share sequence
similarities with the TARs. Several motifs, including RKAAKTLG in TMVI and
FKQLHXPTN in TMI, together with the chromosomal data, strengthens their
relationship to the TARs. CHRMs form the most homogenous cluster within the
amine group, sharing between 40 and 50% identity. This can be seen in the tree
with the receptors grouping together with strong bootstrap support. The DRDs
appear in two clusters in the tree: with DRD2, DRD3, and DRD4 on one branch,
placing DRD4 most basal, and DRD1 and DRD5 together with the
-adrenergic
receptors. Identities within the dopamine clusters are 38 to 52% and 54%,
respectively. The sequence identities between the clusters are
27%,
whereas ADRAB1 and DRD1 are 31% identical. The serotonin receptors are the
largest group, with 13 members distributed more or less over the entire amine
group tree, in general sharing low sequence identity, often as low as 20%:
HTR1A, NP_000515 [GenBank] .1, 5q11.2-q13; HTR5(HTR5A), NP_076917 [GenBank] .1, 7q36.3; HTR7, NP_000863 [GenBank] .1, 10q21-q24; HRH2, NP_071640 [GenBank] .1, 5q35.2; HTR4, NP_000861 [GenBank] .1, 5q31-q33; HTR6, NP_000862 [GenBank] .1, 1p36-q35; ADRA1A, NP_000671 [GenBank] .1, 8p21.2; ADRA1D, NP_000669 [GenBank] .1, 20p13; ADRA1B, NP_000670 [GenBank] .1, 5q33.1; ADRB1, NP_000675 [GenBank] .1, 10q25.3; ADRB3, NP_000016 [GenBank] .1, 8p12-p11.2; ADRB2, NP_000015 [GenBank] .1, 5q32; DRD5, NP_000789 [GenBank] .1, 4p16.1; DRD1, NP_000785 [GenBank] .1, 5q35.2; HTR2B, NP_000858 [GenBank] .1, 2q36.3-q37.1; HTR2A, NP_000612 [GenBank] .1, 13q14-q21; HTR2C, NP_000859 [GenBank] .1, Xq24; TAR1, AAK71236 [GenBank] ; 8q23.2; PNR, NP_003958 [GenBank] .1, 6q23; TAR3, AAK71240 [GenBank] ; 6q23.2; TAR4, AAK71243 [GenBank] ; 6q23.2; TAR5(GPR102), NP_444508 [GenBank] .1, 6q23.2; GPR58, NP_055441 [GenBank] .1, 6q24; GPR57, NP_055442.1, 6q23.2; HTR1B, NP_000854 [GenBank] .1, 6q13; HTR1D, NP_008555 [GenBank] .1, 1p36.3-p34.3; HTR1E, NP_000856 [GenBank] .1, 6q14-q15; HTR1F, NP_000857 [GenBank] .1, 3p12; ADRA2B, NP_000673 [GenBank] .1, 3p13-q13; ADRA2A, NP_000672 [GenBank] .1, 10q25.2; ADRA2C, NP_000674 [GenBank] .1, 4p16; DRD4, NP_000788 [GenBank] .1, 11p15.5; DRD3, NP_000787 [GenBank] .1, 3q13.3; DRD2, NP_000786 [GenBank] .1, 11q23; HRH4, NP_067830.1, 18q11.2; CHRM4, NP_000732 [GenBank] .1, 11p12-p11.2; CHRM2, NP_000730 [GenBank] .1, 7q31-q35; CHRM1, NP_000729 [GenBank] .1, 11q13; CHRM3, NP_000731 [GenBank] .1 1q43 [PDB] ; CHRM5, NP_036257 [GenBank] .1, 15q26
The opsins receptor cluster (9). This cluster of receptors comprises the rod visual pigment (RHO), the three cone visual pigments (OPN1SW, OPN1LW, OPN1MW), the peropsin (RRH), the encephalopsin (OPN3), the melanopsin (OPN4), and the retinal G-protein-coupled receptor (RGR). The opsins are the only GPCRs that are known to respond to light, and none of the receptors are known to bind any physical ligand. OPN1LW and OPN1MW are found in the same chromosomal position, Xq28. These two proteins are more than 96% identical, indicating, together with the fact that they are positioned near one another on Xq, that they share a recent common ancestor. Phylogenetic comparison of opsins in different species also indicates that the duplication is specific for mammals. The phylogenetic analysis divides the group into three branches; RHO/OPN1SW/OPN1LW/OPN1MW, RRH/RGR, and OPN3/OPN4. The chromosomal localization of these receptors is not consistent with any paralogy group, but it is worth noting that RGR and OPN4 are found in the same chromosomal position, 10q23:
GPR21, NP_005285 [GenBank] .1, 9q33; GPR52, NP_005675 [GenBank] .1, 1q24 [PDB] ; RHO, NP_000530 [GenBank] .1, 3q21-q24; OPN1LW, NP_064445 [GenBank] .1, Xq28; CBP; OPN1MW, NP_000504 [GenBank] .1, Xq28; OPN1SW, NP_001699 [GenBank] .1, 7q31.3-q32; RRH, NP_006574 [GenBank] .1, 4q; OPN3, NP_055137 [GenBank] .1, 1q43 [PDB] ; OPN4, NP_150598 [GenBank] .1, 10q22
The melatonin receptor cluster (3). The analysis discerns two subgroups in this tree: the melatonin receptors (MTNR1A, MTNR1B) together with the orphan receptor GPR50. GPR50 has an extended C-terminal end compared with the MTNRs, whereas the other regions of the receptors most closely resemble MTNRs, especially in the third TM helix, which is almost identical. GPR50 and MTNR1A both belong to the ParaHOX paralogon (Fig. 4):
GPR50, NP_004215 [GenBank] .1, Xq28; MTNR1A, NP_005949 [GenBank] .1, 4q35.1; MTNR1B, NP_005950 [GenBank] .1, 11q21-q22
The MECA receptor cluster (22). This group consists of the
melanocortin receptors (MCRs), endothelial differentiation G-protein coupled
receptors (EDGRs), cannabinoid receptors (CNRs), and adenosin binding
receptors (ADORAs). Three orphan receptors also belong to this group (GPR-3,
-6, and -12). It is interesting to note that the receptors in this group bind
structurally different ligands; melanocyte stimulating hormone (13-residue
peptide, MCRs); lysophosphatidic acid (lipid, EDGRs), and anandamide
(arachidonylethanolamide, CNRs) and adenosine. The orphan receptors are 55%
identical to each other and roughly 25% identical to the MCRs. The orphans
share several motifs with the MCRs, such as PM(Y/F)X(F/L)X(C/G)SLAXADXL in
TMIII, ALXY(H/Y) in TMIV, and PXIYAFR in TMVII. The CNRs share 39% identity to
each other and their chromosomal positions indicate a common ancestor, because
both genes are located in the paralogous group involving the positions 1p3 and
6q (Spring, 1997
)
(Fig. 4). GPR3 and GPR6 share
the same chromosomal positions as the CNRs, which may indicate that these
orphans share a common ancestor with the CNRs. The MCRs shares between 39 and
56% identity and belong to the 8q/16q/18/20q paralogon, supporting the idea
that they share a common ancestor (Fig.
4). The EDG receptors form clusters at chromosome 1p, 9q, and 19p,
suggesting two common ancestors together with one extra gene duplication at
position 19p, resulting in two EDGRs at 1p and 9q, together with four EDGRs at
chromosome 19p. These genes are all positioned in the paralogy group that was
first proposed by Katsanis et al.
(1996
) and subsequently
expanded by Popovici et al.
(2001
)
1/5p-q21/6p21-p25/9/15q11-q26/19p (Fig.
4). All the adenosine receptors except ADORA1 are located in the
paralogy group 7/16p/17/22q (Fig.
4):
ADORA3, NP_000668 [GenBank] .1, 1p13.3; ADORA1, NP_000671 [GenBank] .1, 8p21.2; ADORA2A, NP_000666 [GenBank] .1, 22q11.23; ADORA2B, NP_000667 [GenBank] .1, 17q12; GPR3, NP_005272 [GenBank] .1, 1p35 [PDB] .3; GPR12, NP_005279 [GenBank] .1, 13q12.13; GPR6, NP_005275 [GenBank] .1, 6q21 [PDB] ; MC2R, NP_000520 [GenBank] .1, 18p11.2; MC1R, NP_002377 [GenBank] .1, 16q24.3; MC3R, NP_063941 [GenBank] .1, 20q13.31; MC4R, NP_005903 [GenBank] .1, 18q22; MC5R, NP_005904 [GenBank] .1, 18p11.2; EDG7, NP_036284 [GenBank] .1, 1p22.3; EDG2, NP_001392 [GenBank] .1, 9q31.3; EDG4, NP_004711 [GenBank] .1, 19p12; EDG8, NP_110387 [GenBank] .1, 19p13.2; EDG5, NP_004221 [GenBank] .1, 19p13.2; EDG6, NP_003766 [GenBank] .1, 19p13.3; EDG3, NP_005217 [GenBank] .1, 9q22.1; EDG1, NP_001391 [GenBank] .1, 1p21; CNR1, NP_001831 [GenBank] .1, 6q15; CNR2, NP_001832 [GenBank] .1, 1p36.11
The
-Group of Rhodopsin Receptors (35). This group has
no main branches and includes 36 receptors
(Fig.
3). All the
known ligands to these receptors are peptides. The group includes the
hypocretin receptors (HCRTRs), the neuropeptide FF receptors (NPFFs), the
tachykinin receptors (TACRs), the cholecystokinin receptors (CCKs), the
neuropeptide Y receptors (NPYRs), the endothelin-related receptors (EDNR and
ETBRLP1/2), gastrin-releasing peptide receptor (GRPR), the neuromedin B
receptor (NMBR), the uterinbombesin receptor (BRS3), the neurotensin receptors
(NTSRs), the growth hormone secretagogues receptor (GHSR), the neuromedin
receptors (NMURs), the thyrotropin releasing hormone receptor (TRHR), the
ghrelin receptor, arginine vasopressin receptors (AVPRs), the
gonadotropin-releasing hormone receptors (GNRHRs), and the oxytocin receptor
(OXTR) and orphan receptor.
The NPY5R groups with the CCK receptors rather than with the other NPY receptors. This might seem confusing, but it is consistent regardless of the method used (maximum parsimony, neighbor joining). One reason for this topology is that the NPY5R has a large third extracellular loop that is not present in the other NPYRs but is found in the CCK receptors. This feature might be the reason for this seemingly large difference between the NPY5R and the other NPY receptors. If the third extracellular loop of the NPY5R is removed, the NPY5R places on the same branch as NPY2R (data not shown). Surprisingly, the NPY2R has a higher identity to PrRP and GPR72 than to the other NPY receptors. The receptor GPR118 is 27% identical to GPR72 whereas the identity to the other receptors on that branch is below 20%. Several of these receptor clusters (i.e., NPY, NPFF, CCK, TACR) are positioned within the MetaHOX paralogon, consisting of chromosomes 4, 5q, 10q21-26, 8p12-22, and 2p11-2