Software
for analyzing case-parent triad (trio) data with SNP haplotypes
NOTE: THIS IS A
DEPRECATED VERSION. PLEASE USE THE LATEST VERSION OF HAPLIN
Web page last updated: Nov 14, 2005
Most recent version: Haplin 2.0 BETA, updated Nov 14, 2005
Important version note, Nov. 14,
2005: Due to changes made to the latest R release (2.2.0, Oct. 06,
2005), Haplin 1.0 for R exits prematurely when run under this release.
We recommend using Haplin 2.0 beta since this has now been tested more
extensively and seems to work well. It will soon be made the official
version.
Background
HAPLIN is free software written for the purpose of analyzing
case-parent triad (trio) data. Some of the main features of HAPLIN are:
Estimation is based on haplotypes, for instance SNP haplotypes,
even
though phase is not known from the genetic data
Estimation of relative risk (RR) associated with each haplotype,
not
only significance testing
Optimal use of data from triads with missing genotypic data, for
instance a single SNP has not been typed for some individuals, or when
the father has not been genotyped at all
Optional estimation of effects of maternal haplotypes, particularly
appropriate in perinatal epidemiology
The models estimated by HAPLIN are described in detail in Gjessing
HK
and Lie RT (2005). Case-parent triads: Estimating single- and
double-dose effects of fetal and maternal disease gene haplotypes. In
press, Annals of Human Genetics 2005.
HAPLIN is written by Hakon
K. Gjessing. The data reading and
preparation
parts have been extensively upgraded and improved by Hilde-Gunn Bruu.
Rolv Terje Lie has contributed with numerous useful and insightful
suggestions, and inspired the work from its beginning. Please feel free
to contact me at hakon.gjessing@fhi.no,
with questions or bug reports.
Note: Although we have done our
best to avoid errors, the software is offered without any warranties.
We cannot take responsibility for any problems or damages caused by
using it.
Please: If you use Haplin in
your analyses, it will be much appreciated if you refer to Haplin
(this web page) or to the forthcoming paper.
Installation
HAPLIN is written for use with S-PLUS or R. However, it is easy to
install and
requires no previous knowledge of S-PLUS or R. If you do not have an
S-PLUS license, R can be downloaded free of charge from The R Project for Statistical
Computing. HAPLIN
has been most extensively tested under S-PLUS 2000 for Windows, but
should run without problems on all reasonably new S-PLUS and R
versions, for Windows, Linux or UNIX.
NOTE (Nov. 14, 2005): The beta 2.0 version has now been tested for a
while and will soon be made official. I recommend using the beta 2.0
rather than the 1.0 version.
and save it in a suitable directory as a plain text file.
Start S-PLUS and type
source("C:/work/HAPLIN.q")
(or use whatever is the correct path). This installs Haplin.
For R, just use the file name "haplin.R" instead.
NOTE: when saving HAPLIN.q or HAPLIN.R (which are plain text files) it
often ends up with a .txt extension. You should save it without the
extension, or alternatively you can just use the name "HAPLIN.q.txt"
when sourcing. If haplin.q has been saved to the
working directory of S-PLUS the path
is
unnecessary. Note that S-PLUS and R prefer to use the "/" in path names.
HAPLIN will then be ready for use. In S-PLUS it will remain in place
until removed manually. In R, it will remain in place if the workspace
is saved after installation, otherwise you need to source it again next
time you start a session.
NOTE: It is recommended that HAPLIN is installed in an empty workspace
to avoid cluttering.
(or whatever the path to the data file is). The data file (data.dat)
can have any name,
but should be a text file in a specific format (see below). This
command reads data, performs the estimation and prints and plots the
result in one run.
For the 2.0 beta version, the typical command is
haplin.beta("C:/work/data.dat")
or
haplin.beta("C:/work/data.dat", use.missing = T)
to include triads with missing data in the analysis.
For more examples of how to run HAPLIN, see the haplin help file. (For the time being,
only haplin itself and haplin.beta have help files)
For more examples of how to run the BETA 2.0 version, see the haplin beta 2.0 help file. (For
the time being,
only haplin itself and haplin.beta have help files)
HAPLIN requires data to be in an ASCII file in a specific format. Each
line represents one triad. There are three columns for each locus, one
for the mother (M), one for the father (F) and one for the child (C).
The columns are placed in the following sequence (where the numbers
indicate marker):
M1 F1 C1 M2 F2 C2 ...etc.
There should be no row- or column names in the file, and columns are
separated by white space.
Important: Make sure the sequence is correct, this is the only way for
HAPLIN to figure out which is which.
Within each column the two alleles for that individual in that locus
are
separated by a semi-colon.
Thus, for 2 loci with 4 and 2 alleles,
respectively, the first four lines of data might look like
Note the NA that indicates missing genotype at the first marker of the
father in the third triad.
For user convenience, it is also possible to use different separators
between columns and within columns. In addition, HAPLIN includes
functions for converting to- and from the data format
used by the TRANSMIT program (see below). For more details, see
haplin format.
Trial run
To test that HAPLIN runs properly, you can download the trial data file
HAPLIN.trialdata.txt and run HAPLIN
with the command
haplin("HAPLIN.trialdata.txt")
The result should look something like this: HAPLIN.trialrun.txt. (This is from
S-Plus; there may be minor differences with R)
In addition, a plot is produced, which should look something like this:
HAPLIN.trialrun.jpg
Model and estimation
The models implemented in HAPLIN are extensions of the log-linear
models described and developed in the papers
Wilcox AJ, Weinberg CR, Lie RT (1998). Distinguishing the effects of
maternal and offspring genes through studies of "case-parent triads". American Journal of Epidemiology, 148(9): 893-901.
Weinberg CR, Wilcox AJ, Lie RT (1998). A log-linear approach to
case-parent-triad data: assessing effects of disease genes that act
directly or though maternal effects and that may be subject to parental
imprinting. American Journal of
Human Genetics, 62:
969-78
and follow-ups to these. The basic log-linear model for case-parent
triad data allows a user to compute relative risks associated with a
variant allele, together with corresponding confidence intervals and
p-values. It also allows a similar effect estimation for maternal
alleles, i.e. to study the effect of genes of the mother that may influence the
development of the fetus. HAPLIN extends these models to situations
with multiple densely spaced SNPs (or other markers), where phase is
unknown. HAPLIN then estimates the relative risks associated with haplotypes,
not only single
markers. HAPLIN is similar to the TRANSMIT program by David Clayton
(MRC Biostatistics Unit, Cambridge). However, HAPLIN returns explicit
estimates of relative risks with confidence intervals, and optionally
includes effects of maternal genes. In addition, HAPLIN uses a
parametrization that will detect (at least with sufficient sample size)
dominance- or recessive deviations from a dose-response model. For some
details about parametrization, choice of reference category and
interpretation of results, see parametrization.pdf.