AIRE-1, involved in the polyglandular autoimmune syndrome APECED, contains the SAND domain and is a probable DNA-binding transcription factor

Toby J. Gibson, Chenna, Ramu and Christine Gemünd
European Molecular Biology Laboratory, Heidelberg, Germany

and

Rein Aasland
Department of Molecular Biology, University of Bergen, Norway



Mutations in the recently cloned AIRE-1 gene1,2 cause a recessive systemic disease, autoimmmune polyendocrinopathy, candidiasis and ectodermal dystrophy (acronyms APECED, APS 1, or PGA I) (OMIM:240300, Ref. 3). A highly variable set of symptoms characterise APECED, frequently including autoantibodies to the adrenal gland and chronic candidial infection. Unusually for an autoimmune-associated gene, AIRE-1 is found outside the major histocompatibility complex (chromosome 6) at chromosomal location 21q22.3. The disease is quite uncommon except in certain populations such as Finns (1:25,000), Iranian Jews (1:9,000) and Sardinians where there are higher carrier frequencies. In part due to the variability of the symptoms, APECED is a rather recently characterised disease: modern chromosomal mapping techniques have been essential in verifying genetic homogeneity4 and should continue to be useful in diagnosing APECED in populations where it is less common, hence more difficult to diagnose. The amino acid sequence of the AIRE-1 protein was reported to contain two PHD fingers, a zinc finger-like motif found in many nuclear proteins including transcriptional coactivators and chromatin-modulating proteins of the polycomb and trithorax groups5-7. The presence of PHD fingers implies that AIRE-1 is a nuclear protein and, broadly speaking, will function in the regulation of gene expression. Genes encoding PHD finger-containing proteins are increasingly being found to be involved in genetic disease. PHD finger mutations in the ATRX gene8,9 are sufficient to cause the ATRX syndrome (X-linked [alpha]-thalassaemia and mental retardation). Furthermore, analysis of chromosomal translocation breakpoints in leukaemias has revealed a remarkable variety of gene fusions involving the MLL(HRX), AF-10, AF-17, MOZ and CBP genes, which all contain PHD fingers (e.g. Refs: 10-12)

We were interested in examining the AIRE-1 sequence for additional domains. Blast2 searches of the NRDB (non-redundant database with >285,000 entries)13 with AIRE-1 residues 1-298 (excluding the PHD fingers) detected several nuclear proteins in the related nuclear phosphoprotein 41/75 and Sp100/Sp140 groups 14-16. Particularly noteworthy were the variants Sp140 (LySp100B) and NucP75 , each of which also contains a PHD finger14-16. The reported probabilities, with a best match to Sp100B (probability of a match by chance, P = 1.5e-4), were of borderline significance, but the presence of two regions in AIRE-1 (in addition to the shared PHD fingers) that independently detected the Sp100 proteins warranted more careful investigation.

With borderline hits of low sequence similarity, it is often helpful to undertake reciprocate searches with the top matching sequences since, if they are related, the matching signal may be expected to be above that of random scores (Bork and Gibson)17. An alignment of the Sp100 group was used to prepare profiles for searches of the NRDB with SearchWise18 or EMBL's Bioccelerator31 (Compugen Ltd., Israel19). A profile was prepared from the highly conserved N-terminal region of the alignment using ProfileWeight32 with sequence weighting and the Gonnet Pam250 matrix33. A search of the NRDB yielded AIRE-1 as the top non-self hit (expected frequency of false positives, E = 8e-2). No other matching sequences were found for this domain. The sequence alignment is shown in Fig. 1A, together with a secondary structure prediction20 suggesting that the domain is predominantly [alpha]-helical.

Reciprocate profile searches with the central conserved region of Sp100 confirmed the relationship with the nuclear phosphoproteins and also detected the Drosophila DEAF-1 transcription factor21 and its vertebrate homologue, termed suppressin22, with good statistical support (E = 6.5e-12). The AIRE-1 match was present, but weakly supported (E = 8.9e-2). While the AIRE-1 sequence is the most divergent of the set and is not well supported statistically, reciprocate detections by three different domains in independent database searches suggest that it is genuine. The colinear order of the domains in AIRE-1 and Sp140/LySp100B suggest that AIRE-1, though highly diverged, shares common ancestry with the Sp100 protein group and may therefore function similarly.

Database searches with a profile prepared after adding in the new sequences also detected four predicted ORFs from the C. elegans genome sequencing project23 and two ESTs24 that did not correspond to known proteins. No additional matching sequences were found after further search permutations: the domain may be restricted to animal phyla as we were unable to find any evidence for the domain in the yeast genome. The set of proteins have in common a conserved sequence of ~80 residues (Fig. 1B), for which we suggest the term SAND domain after Sp100, AIRE-1, NucP41/75 and DEAF-1/suppressin. Although SAND domain similarities have been reported before, the domain has not been characterised in detail, e.g. with regard to domain boundaries and secondary structure.

Conserved hydrophobic residues imply that the SAND domain has a globular fold, while several well-conserved positively-charged residues may be functionally important (Fig. 1B). Secondary structure prediction20 suggests that SAND has an all-[beta] structure with approximately eight [beta]-strands. There are three subgroups of SAND domain sequences typified by Sp100, DEAF-1 and the C. elegans ORFs (Fig. 1C). H_Est1 (a composite of two overlapping entries AA148980, AA279407) clearly belongs with the C. elegans group, yet the nematode sequences are most closely related to each other, suggesting lineage-specific gene duplication has occured. As expected, the AIRE-1 sequence joins the tree adjacent to the Sp100 group. The SAND domain occurs in different modular contexts, including the bromodomain26, the PHD finger and the MYND finger shared by DEAF-1/suppressin, mtg8 and nervy21,22 (Fig. 1D).

The SAND domain adds to the burgeoning set of domains present in modular chromatin-associated proteins. The functions of most of these domains are not at all well understood, and gaining a better understanding will be one key to understanding how chromatin is assembled and regulated. The SAND domain appears in various nuclear contexts. Sp100/Sp140 are found in recently described nuclear bodies or dots, discrete structures within the nucleus that do not yet have known functions16,17. The best clue to function is therefore supplied by the DEAF-1 DNA-binding transcription factor21.
Many small intracellular modules function in protein-protein interaction, building up higher order complexes. However, the conserved positive charges in SAND domains imply negatively charged ligands and, within the region of DEAF-1 which binds DNA, the SAND domain is the only motif which is also conserved in the homologous vertebrate suppressins. Additional positive charges are found in adjacent sequence for most of the SAND domains, well positioned for non-specific interactions with DNA phosphate groups. Thus SAND seems likely to be a DNA-binding domain22 despite the predicted all-[beta] structure, which is rare but not unknown for a DNA-binding domain, being found for example in the transcription factors NF-[kappa]B and NFAT26,27. This would be quite unusual, as PHD fingers and bromodomains are not usually found in combination with DNA-binding domains (although the PHD finger is found in some plant homeodomain proteins28). Confirmation of SAND domain DNA-binding function in DEAF-1 would lead to the idea that all these proteins are DNA-binding transcription factors. The SAND proteins which also contain PHD fingers, such as AIRE-1, are likely to regulate gene expression through the modulation of chromatin structure.

The typical symptoms of APECED differ in the Finnish and Iranian Jewish populations (e.g. the former, but not the latter, typically show chronic candidiasis), presumably because different mutations are predominant in each population. As more APECED mutations are revealed therefore, it will be important to determine whether there is a correlation between particular symptoms, the mutation site and the domain topology of AIRE-1.




References
1 Finnish-German APECED Consortium (1997) Nature Genet. 17, 399-403
2 Nagamine, K. et al. (1997) Nature Genet. 17, 393-398
3 OMIM - http://www.ncbi.nlm.nih.gov/omim/
4 Björses, P. et al. (1996) Am. J. Hum. Genet. 59, 879-886
5 Aasland, R., Gibson, T. J. and Stewart, A. F. (1995) Trends Biochem. Sci. 20, 56-59
6 Koken, M. H., Saib, A. and de Thé, H. (1995) C. R. Acad. Sci. III 318, 733-739
7 Saha, V. et al. (1995) Proc. Natl. Acad. Sci. U. S. A. 92, 9737-9741
8 Gibbons, R. J. et al. (1997) Nature Genet. 17, 146-148
9 Villard, L. et al. (1997) Genomics 43, 149-155
10 Young, B. D. and Saha, V. (1996) Cancer Surv. 28, 225-245
11 Borrow, J. et al. (1996) Nature Genet. 14, 33-41
12 Taki, T., Sako, M., Tsuchida, M. and Hayashi, Y. (1997) Blood 89, 3945-3950
13 Blast2 - http://www.bork.embl-heidelberg.de:8080/Blast2/
14 Kadereit, S. et al. (1993) J. Biol. Chem. 268, 24432-24441
15 Dent, A. L. et al. (1996) Blood 88, 1423-1426
16 Bloch, D. B. et al. (1996) J. Biol. Chem. 271, 29198-29204
17 Bork, P. and Gibson. T. J. (1996) Methods Enzymol. 266, 162-184
18 Birney, E., Thompson, J. D. and Gibson, T. J. (1996) Nucleic Acids Res. 24, 2730-2739.
19 Compugen - http://www.cgen.com/
20 Rost, B. and Sander C. (1993) J. Mol. Biol. 232, 584-599
21 Gross, C. T. and McGinnis, W. (1996) EMBO J. 15, 1961-1970
22 LeBoeuf, R. D. et al. (1998) J. Biol. Chem. 273, 361-368
23 Wilson, R. et al. (1994) Nature 368, 32-38
24 Adams, M. D. et al. (1991) Science 252, 1651-1656
25 Jeanmougin, F. et al. (1997) Trends Biochem. Sci. 22, 151-153
26 Muller, C. W. et al. (1995) Nature 373, 311-317
27 Wolfe, S. A. et al. (1997) Nature 385, 172-176
28 Schindler, U., Beckmann, H. and Cashmore, A. (1993) Plant J. 4, 137-150
29 Thompson, J. D. et al. (1997) Nucleic Acids Res. 25, 4876-4882
30 Maidak, B. L. et al. (1997) Nucleic Acids Res. 25, 109-111
31 BIC - http://shag.embl-heidelberg.de:8000/Bic/
32 Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994) Comput. Applic. Biosci. 10, 19-30
33 Benner, S. A., Cohen, M. A. and Gonnet, G. H. (1994) Protein Engng. 7, 1323-1332


[Figure 1a] [Figure 1b] [Figure 1c] [Figure 1d] [ sand www-page ]

This page was made by Rein Aasland MAY 24. 1998; Last updated MAY 25. 1998