Complete author affiliations and disclosures are at the end of this activity.
Release Date: September 18, 2007
Since completion of the human genome sequence, considerable progress has been made in determining the genetic basis of human diseases. Understanding the genetic basis of coronary heart disease (CHD), the leading cause of mortality in developed countries, is a priority. Here we provide an update on the genetic basis of CHD, focusing mainly on the clinical manifestations rather than the risk factors, most of which are heritable and also influenced by genetic factors. The challenges faced when identifying clinically relevant genetic determinants of CHD include phenotypic and genetic heterogeneity, and gene-gene and gene-environment interactions. In addition, the etiologic spectrum includes common genetic variants with small effects, as well as rare genetic variants with large effects. Advances such as the cataloging of human genetic variation, new statistical approaches for analyzing massive amounts of genetic data, and the development of high-throughput single-nucleotide polymorphism genotyping platforms, will increase the likelihood of success in the search for genetic determinants of CHD. Such knowledge could refine cardiovascular risk stratification and facilitate the development of new therapies.
In 2003 in the US alone, there were an estimated 1.2 million cases of coronary heart disease (CHD), resulting in 479,000 deaths.[1] Although recognition and treatment of established risk factors for CHD will reduce the disease burden considerably, simultaneous efforts aimed at unraveling the genetic basis of CHD are important for the development of novel diagnostic and therapeutic methods.
Considerable progress has been made in determining the genetic basis of human diseases since the human genome has been sequenced. The genetic determinants of more than 1,600 Mendelian diseases are now known, and discovery of genomic regions and genetic polymorphisms that influence susceptibility to common 'complex' diseases is accelerating. In this Review, we discuss the challenges of elucidating the genetic basis of CHD, focusing mainly on its clinical manifestations rather than its risk factors, and summarize the studies that have yielded insights into the genetic basis of this common, complex disease. We also discuss how recent advances in this field will increase the likelihood of identifying genetic determinants of CHD.
Several challenges exist in identifying the genetic determinants of common, complex diseases such as CHD (Table 1). These include phenotypic and genetic heterogeneity, gene-gene and gene-environment interactions, and the fact that the etiologic spectrum ranges from common genetic variants with small effects to rare genetic variants with large effects. Below, we attempt to summarize the current state of knowledge about the genetic basis of CHD. A glossary of some common genetic terms used in this Review can be found in Box 1.
Twin and family studies have established that CHD aggregates in families and in fact family history of early-onset CHD has long been considered a risk factor for the disease.[2] Although a contentious issue, the familial clustering of CHD could be partly explained by heritable quantitative variation in known CHD risk factors. Evidence suggests that family history contributes to an increased risk of CHD independently of the known risk factors.[3,4] High-risk families make up a considerable proportion of early CHD cases in the general population. In one study, families with a history of early CHD represented only 14% of the general population but accounted for 72% of early CHD cases (men aged <55 years, women aged <65 years) and 48% of CHD at all ages.[5] A history of early CHD in a first-degree relative approximately doubles the risk of CHD, although the reported relative risk ranges from 1.3-11.3.[4,6-10] The highest relative hazard of CHD-related death can be seen in monozygotic twins, when one twin dies of early-onset CHD.[4] Furthermore, sibling history of myocardial infarction seems to be a greater risk factor than parental history of early-onset CHD.[11] A proposed family risk score for CHD evaluates the ratio of observed CHD events to expected events in an individual's first-degree relatives, adjusted for age and sex at the onset of the first event.[12] A higher family risk score is associated with greater CHD risk.
Mendelian disorders associated with CHD, such as familial hypercholesterolemia, comprise single-gene traits that are transmitted in an autosomal dominant, recessive or X-linked manner. For example, mutations in the LDL receptor gene (LDLR), the ligand-binding domain of apolipoprotein B100 (APOB), and proprotein convertase subtilisin/kexin type 9 gene (PCSK9) result in familial hypercholesterolemia transmitted in an autosomal dominant manner. The examination of disease pathophysiology and gene function in such Mendelian disorders might increase our understanding of the etiology of complex traits.[13] Additionally, common variation in genes implicated in Mendelian disorders could be used to determine disease susceptibility in the general population. Several Mendelian disorders of lipid metabolism are associated with increased CHD risk and have yielded novel insights into the mechanisms of CHD. Investigation of the molecular basis of the rare disorder familial homozygous hypercholesterolemia led to the discovery of the pathways of LDL cholesterol metabolism and the subsequent development of statins.[14] Rare allelic variants of three candidate genes that influence HDL cholesterol metabolism (ABCA1 [ATP-binding cassette A1], APOA1 [apolipoprotein A-1], and LCAT [lecithin-cholesterol acyltransferase]) are associated with low HDL-cholesterol level syndromes but are also found in individuals from the community with low HDL cholesterol levels.[15,16]
Linkage studies are performed by using polymorphic DNA markers (Box 2 and Figure 1). Microsatellite markers (short tandem repeat DNA sequences that are dispersed throughout the human genome) are typically used in linkage studies, although single-nucleotide polymorphisms (SNPs) can also be used (Box 1). Several genome-wide linkage studies for myocardial infarction and coronary artery disease have been reported (Table 2). The largest study, the British Heart Foundation Family Heart Study, included 4,175 individuals with CHD from 1,933 families recruited throughout the UK.[17] Despite the large sample size, a statistically significant logarithmic odds of linkage (LOD) score (i.e.≥3) was not obtained for any of the cardiovascular end points studied. For coronary artery disease (verified by exercise stress test or angiography), the highest LOD score was 2.70 (chromosome 2 at 149 cM) in families (n = 1,698) with age at onset of 56 years or less. For myocardial infarction, an overlapping peak with a LOD score of 2.1 (at 119.3 cM) in families (n = 801) with age of at onset 59 years or less, was observed. Genomic regions identified in the published linkage studies as being correlated with CHD are largely non-overlapping, suggesting genetic heterogeneity, although phenotypic heterogeneity could also have contributed to the non-replicability of results. Farrall et al. attempted to replicate a genomic locus for CHD by performing linkage analysis in two independent samples of European whites.[18] The investigators found evidence of replication for a locus on chromosome 17 (at 69 cM).
|
Figure 1. (click image to
zoom) Different genetic markers used in linkage and association studies.
(A) A short tandem repeat consists of short sequences of DNA
(normally 2-5 base pairs) that are repeated numerous times. A
single-nucleotide polymorphism is a single nucleotide change or
variation that occurs in a DNA sequence. (B) A haplotype is the
combination of alleles (for different markers) that are located close
together on the same chromosome and tend to be inherited together. |
Helgadottir and colleagues showed the utility of linkage analysis in identifying new genes for CHD.[19] They performed linkage analysis with 1,068 microsatellite markers and found a linkage signal (LOD 2.86) on chromosome 13 for 296 Icelandic families (713 individuals) enrolled on the basis of a history of myocardial infarction. The investigators then genotyped an additional 120 microsatellite markers in this interval in 802 cases of myocardial infarction and 837 controls, and found that a 4-SNP haplotype spanning the ALOX5AP gene (encoding arachidonate 5-lipoxygenase- activating protein) was associated with a doubled risk of myocardial infarction. A subsequent study found that ALOX5AP was associated with CHD in an English population and associated with stroke in Icelandic and Scottish populations.[20] Another example of a novel gene identified by linkage analysis in a pedigree with several members affected by early-onset CHD is MEF2 (myocyte enhancer factor 2), a transcription factor expressed in coronary artery endothelium.[21] The results of these studies have not yet translated into specific genetic tests but may point to novel drug targets; for example, an inhibitor of ALOX5AP pathway is being investigated for clinical use.[22]
Genome-wide linkage studies for quantitative measures of atherosclerotic burden, including coronary artery calcium levels, carotid intimamedia thickness and ankle-brachial index, have also been reported.[23-26] Although genomic regions with LOD scores greater than 3 have been linked to some of these traits, specific genes responsible for the linkage signals have yet to be identified.
Association studies compare allele frequencies in cases and controls to assess the contribution of genetic variants to phenotypes of interest (Box 2). In contrast to linkage studies, association studies of complex diseases localize disease-related genomic regions more precisely and have greater statistical power for detecting small gene effects.[27] A major concern, however, is the considerable proportion of associations between genetic variants and disease that are reported but not replicated.[28] The difficulty encountered in reproducing the results of genetic association studies could be attributable to several issues common to epidemiologic risk factor studies,[29-33] including faulty study design, inaccurate phenotyping, bias introduced during ascertainment and analyses, and confounding variables.[34] Issues specific to genetic association studies are briefly discussed below.
When disease-related alleles have only a small effect on the phenotype, the statistical power of association studies will be low, because such associations can be difficult to reproduce. Furthermore, gene effects are context-dependent and can be modified by the presence of other genetic or environmental factors, which can vary within study populations.[35] Spurious associations result from the presence of genetically different strata in a study sample (population stratification), each strata has varying frequencies of disease and different allele frequencies at the marker locus. Another important cause of irreproducible findings could be variation in linkage disequilibrium (Box 1).[36] The genetic marker used might be distinct from the polymorphism that affects disease but could be in linkage disequilibrium with the polymorphism (i.e. they are inherited together in a unit). Notably, the degree of linkage disequilibrium between the polymorphism and the marker can also vary among study populations. Genetic heterogeneity, wherein the disease phenotype results from multiple uncommon variations or variants with extremely low frequency[37] (as posited by the 'common disease-rare variants' hypothesis),[16,38,39] could also decrease the chances of replicating association study findings.
The results of several association studies for CHD have been validated by subsequent studies or in independent samples (Table 3). An example is the external validation of the role of ALOX5AP variants in several vascular disease phenotypes.[19,20] In another study, Ozaki et al. used two independent sample sets to validate that a functional SNP in the 5'-untranslated regions of PSMA6 (proteasome subunit, alpha type 6) conferred an increased risk for myocardial infarction.[40] Connelly et al. identified the transcription factor GATA2 (GATA-binding protein 2), which regulates several endothelial- specific genes, as a novel susceptibility gene for CHD in two independent case-control samples.[41] Another example of association study 'replication' is the study by Shiffman et al.[42] The investigators genotyped 11,053 putative functional SNPs in 6,891 genes and used a three-step process to reduce the number of hypotheses tested, thus identifying variants in four genes associated with myocardial infarction (PALLD, palladin, cytoskeletal associated protein; ROS1, v-ros UR2 sarcoma virus oncogene homolog 1 (avian); TAS2R50, taste receptor, type 2, member 50; and OR13G1, olfactory receptor, family 13, subfamily G, member 1). Further investigation will be needed to assess the utility of these polymorphisms in assessing CHD risk or in identifying new targets for drug therapy.
Although considerable overlap exists among various CHD phenotypes (Table 1), the underlying pathophysiology could vary considerably. Multiple risk factors and their interactions influence plaque stability and inflammation, platelet function, and the coagulation cascade. Different combinations of these risk factors can, therefore, predispose individuals to the development of different phenotypes of CHD.[43] Heterogeneity in the mechanisms underlying these phenotypes could explain why in some cases there is no overlap between linked regions in related disease states?linked regions that seem specific for myocardial infarction are not also specific for angiographic coronary artery disease.[18] At the outset of any genetic study, it is important to accurately define the phenotype of cases and controls. Given the multiple possible presentations of CHD, careful characterization of the phenotype is especially important. Rigorous, uniform criteria for cardiovascular events such as myocardial infarction or sudden cardiac death should be specified at the onset of the study. To avoid ascertainment bias, imaging studies that provide information about both atherosclerotic burden and plaque activity should be performed in community-based cohorts rather than in patients referred for such studies. To help diminish the likelihood of bias and reduce population stratification, it is preferable that cases and control individuals are drawn from the same geographic region and matched for age, sex and race. Furthermore, concomitant with efforts to determine accurate phenotypes, further improvement is needed in the assessment and measurement of environmental factors relevant to CHD such as cigarette smoking, physical activity and dietary intake.
As shown by Helgadottir et al.[19] and by Wang et al.,[21] linkage analyses for complex diseases have the potential to identify new candidate genes that previously would have remained unsuspected on the basis of a priori knowledge of disease mechanisms. The limitations associated with linkage studies of complex diseases include low statistical power and the inability to specify precise limits on the location of the causal gene or mutation. The statistical power of linkage studies could be improved by using larger sample sizes and pedigrees, and disease susceptibility loci can be defined more precisely by using a large number of markers across the genome. For example, John et al. used 11,245 SNPs in a genomic scan of families with rheumatoid arthritis and found that high SNP density localized disease susceptibility loci more precisely than the conventional 10-cM microsatellite scan that used approximately 400 microsatellite markers.[44]
Linkage analysis in pedigrees is an unbiased approach for identifying genomic loci for quantitative disease phenotypes.[45,46] Quantitative traits have a simpler genetic architecture than the disease phenotype and could, therefore, be easier to map. Quantitative traits related to atherosclerotic vascular disease and CHD that can be measured accurately, without bias in large population genetic studies include the following: carotid intima-medial thickness, presence and quantity of coronary artery calcium, coronary artery disease on angiography, coronary atherosclerotic burden on intravascular ultrasonography, carotid or femoral artery plaque burden and characteristics on MRI, ankle-brachial index, and aortic pulse-wave velocity.
A new alternative to conventional linkage analysis is admixture mapping.[47-49] This technique can be applied in a population formed by relatively recent (e.g. 5 generations) admixture of two or more ancestral populations (e.g. African Americans who have West African and white European ancestry). For African Americans with a particular disease, genomic regions that have an unusually high proportion of ancestry from either Europeans or Africans could harbor disease susceptibility variants. An example of the use of admixture mapping relevant to CHD is a study by Zhu et al.[39] In this investigation, a genomic admixture scan in 737 African Americans with hypertension and 573 controls using 269 microsatellite markers was performed. Evidence for association on chromosomes 6q24 and 21q21 was found. Although confirmatory studies are needed, these results suggest that admixture mapping could be useful in identifying genome regions that influence complex disease susceptibility.[50]
Developments include attempts to improve the replicability of association studies, candidate-gene resequencing studies, and genome- wide association studies. To obtain robust results from association studies the use of large samples (i.e. thousands of cases and controls, instead of hundreds) and stringent thresholds for statistical significance have been proposed.[51] Biologically plausible associations and risk alleles with functional effects are more likely to be 'true' associations, hence, should have replicable findings. The replication of SNP disease associations in independent samples is crucial for validating results, and by genotyping 'neutral' markers throughout the genome, potentially confounding population substructures can be excluded.[52,53] Alternatively, replication can be shown within a single study by dividing the study subjects into a 'test' group and a 'validation' group,[54] with both groups independently powered to detect an association. Association mapping could also be more successful with population isolates in which genetic stratification is minimal, as shown by the success of the deCODE project in Iceland.[55]
Candidate gene resequencing studies involve sequencing an entire candidate gene in cases and controls and identifying the sequence variants that clearly differ in frequency between the two groups. These studies are labor intensive and expensive but can identify rare variants that influence complex diseases or traits. This approach was used to identify rare variants of MC4R (the melanocortin 4 receptor gene) that were associated with severe early-onset obesity.[56] Cohen et al. also successfully used this approach to identify rare nonsynonymous SNPs that influence plasma levels of HDL cholesterol[16] and LDL cholesterol[57] in the general population. Extension of this approach to a genomic scale ('genome resequencing') with a large number of cases and controls would be the most comprehensive means of identifying genetic variants underlying complex diseases. Although, genome resequencing in large case-control studies is not feasible at present, it could soon become standard, as the costs for sequencing continue to drop.
In the interim, a genome-wide association approach, in which variants are tested for association with a trait or disease of interest, has become possible with data from the HapMap project and high-throughput SNP-typing platforms.[58] In 2002, the HapMap project was undertaken to catalogue patterns of genetic variation as a means of identifying common genomic variants contributing to the cause of prevalent diseases.[58,59] The human genome seems to be organized into a series of haplotype blocks,[60-62] each haplotype block shows low diversity and SNPs within a haplotype block show high linkage disequilibrium. One strategy to reduce genotyping effort in association mapping of complex diseases uses tag SNPs,[63] which correlate with much of the common variation in a genomic region, and, therefore, could serve as a marker of this common variation. Several studies have already reported convincing statistical evidence that links genetic polymorphisms with CHD risk factors as well as with CHD phenotypes: polymorphisms in INSIG2[64] (encoding insulin induced gene 2), and FTO[65](fat mass and obesity associated) have been associated with obesity; polymorphisms in IFIH1[66] (interferon-induced helicase C domain-containing protein 1) and IL2RA[67] (interleukin-2 receptor alpha chain) have been associated with type 1 diabetes mellitus; polymorphisms in TCF7L2[68,69] (transcription factor 7- like 2), SLC30A8[69-72] (zinc transporter 8), a locus near CDKN2A (cyclin-dependent kinase inhibitor 2A) and CDKN2B[69,71,72] (cyclin-dependent kinase inhibitor 2B), IGF2BP2[69,71,72] (insulin-like growth factor 2 mRNA binding protein 2), and in CDKAL1[69,71-73] (CDK5 regulatory subunit associated protein 1-like 1) have been associated with type 2 diabetes mellitus; and a polymorphism in a locus near CDKN2A and CDKN2B on chromosome 9p21 has been associated with CHD in several genome-wide association studies.[74-76]
Several different approaches fall under the rubric of genome-wide association studies (Figure 2). In broad terms, these approaches can be classified as 'map-based' (using uniformly spaced SNPs or tag SNPs) or 'gene-based' (using putative functional SNPs).[77] In the map-based approach, SNPs to be genotyped could be evenly spaced or tag SNPs could be used, with the presumption that linkage disequilibrium between such a tag SNP and the causal SNP would allow the detection of the causal SNP. The evenly spaced SNPs collection could provide sparse (e.g. 100,000 SNPs) or dense (e.g. 500,000-1,000,000 SNPs) coverage of the genome. The genotyping burden can be reduced markedly, however, by using tag SNPs across the genome. A collection of 250,000 tag SNPs, for example, would cover approximately 85% of the genome.[78] In the gene-based approach, putative functional SNPs throughout the genome are genotyped, including nonsynonymous SNPs, regulatory SNPs and SNPs in splice sites.
|
Figure 2. (click image to
zoom) Strategies for genome-wide association studies using SNPs. These
approaches can be classified as (A) 'map-based' (using uniformly
spaced SNPs or tag SNPs) or (B) 'gene-based' (using putative
functional SNPs).[77] Under the
'map-based' approach, a subset of SNPs is selected for genotyping?these
can be evenly spaced SNPs or tag SNPs. SNPs in strong linkage
disequilibrium are likely to be inherited together, so one can use a
subset of 'tag' SNPs as proxies for the entire set. For the 'gene-based'
approach, putatively functional SNPs located in regulatory regions and
non-synonymous SNPs are selected to be genotyped. Abbreviation: SNP =
single-nucleotide polymorphism. |
The development of appropriate statistical techniques to analyze the massive amount of genetic data gained from genome-wide association studies is crucial for the identification of genes for complex diseases. As hundreds of thousands of potential statistical tests might be computed in such studies, a major challenge is correction for the multiple testing that must be performed.[79] One approach is a multistage design that reduces the number of genotyped SNPs in each stage, achieving stepwise genome-wide significance.[28,80] The Bonferroni correction is considered to be overly conservative,[81] and other approaches have been proposed, including estimation of the false discovery rate.[82,83]
We feel it is necessary to highlight three important considerations relevant to clinical practice. First, as CHD clusters in families, obtaining detailed information on a patient's family history is important. Family history has been described as a "... free, well-proven, personalized genomic tool that captures many of the genes and environmental interactions and can serve as the cornerstone for individualized disease prevention."[84] Second, although genetic testing is not part of current CHD risk stratification algorithms, it is likely that multilocus genotyping to assess CHD risk will become part of clinical practice in the future. Not unexpectedly, entrepreneurial zeal has overtaken careful scientific validation, and several companies now market gene-based tests for assessing cardiovascular risk to patients directly via the internet.[85] Third, the immediate promise of identifying genetic determinants of CHD is greater in the therapeutic arena than in refining risk prediction. This potential includes use of genetic tests to allow individualized treatment (pharmacogenetics) and to facilitate discovery of new molecular pathways of CHD and drug targets. Of interest, the results of a drug trial that used the findings of a genetic study that identified a novel therapeutic target for CHD have already been published.[22]
We have attempted to summarize the current state of knowledge about the genetic basis of CHD and the new approaches that might lead to further successes in elucidating the basis of this disease. Increased knowledge of the genetic architecture of CHD will improve risk prediction and facilitate the development of new therapies for patients with CHD. Although considerable challenges exist, advances such as high-throughput SNP genotyping platforms and newer statistical and phenotyping methods show promise for accelerating progress in this field. Well-designed studies are needed to define clinically relevant phenotypes, identify genes and define environmental contributions to CHD. Given that CHD is a clinically heterogeneous chronic disease with multiple genetic and environmental contributions, identification of causal genes for this disease requires a vigorous multidisciplinary approach that includes physician investigators and laboratory scientists, and epidemiologists and statisticians with expertise in genetics. The task is challenging, but the goals justify the effort and the expense.
The authors would like to thank Bernard J Gersh for his helpful comments.
This work was supported in part by grants RO1 HL75794 and UO1 HL81331 from the National Institutes of Health, USA.
Iftikhar J. Kullo, Division of Cardiovascular Diseases, Mayo Clinic, 200 First Street Southwest, Rochester, MN 55905, USA. Email: kullo.iftikhar@mayo.edu
As an organization accredited by the ACCME, Medscape, LLC requires everyone who is in a position to control the content of an education activity to disclose all relevant financial relationships with any commercial interest. The ACCME defines "relevant financial relationships" as financial relationships in any amount, occurring within the past 12 months, including financial relationships of a spouse or life partner, that could create a conflict of interest.
Medscape, LLC encourages Authors to identify investigational products or off-label uses of products regulated by the US Food and Drug Administration, at first mention and where appropriate in the content.
Iftikhar J. Kullo, MD
Consultant and Associate Professor of Medicine, Division of Cardiovascular Diseases, Mayo Clinic College of Medicine, Rochester, Minnesota
Disclosure: Iftikhar J. Kullo, MD, has disclosed no relevant financial relationships.
Keyue Ding, MD
Research Fellow, Division of Cardiovascular Diseases, Mayo Clinic College of Medicine, Rochester, Minnesota
Disclosure: Keyue Ding, MD, has disclosed no relevant financial relationships.
Charles P. Vega, MD
Associate Professor; Residency Director, Department of Family Medicine, University of California, Irvine
Disclosure: Charles P. Vega, MD, has disclosed that he has served as an advisor or consultant to Novartis, Inc.