HAPLIN
Software
for genetic association analyses in case-parent triads,
case-control
data (or combined case-parent control-parent triads), with SNP
haplotypes from candidate genes or GWAS data
Web page last updated: March 15, 2012
Most recent version: Haplin 4.1, uploaded March 15, 2012
Background
HAPLIN is free software written for the purpose of analyzing
case-parent triad (trio) data and/or case-control data. Some of the
main features of Haplin are:
- Analyses of the case-parent triad design, the case-control
design, and "hybrid" designs using combinations of case-parent
triads
and control-parent triads.
- Optimal use of designs with missing genotypic data, for
instance a single SNP has not been typed for some individuals,
or when
the case father has not been genotyped at all, or when the
control
parents are not available.
- Estimation is based on haplotypes, for instance SNP
haplotypes,
even
though phase is not known from the genetic data.
- Estimation of relative risk (RR) associated with each
haplotype,
not
only significance testing.
- Optional estimation of effects of maternal haplotypes, particularly
relevant in perinatal epidemiology.
- Estimation of RRs, haplotypes etc. also on the X chromosome,
with
models including dose-response and X-inactivation.
- Support for GWAS data and parallel processing.
The models estimated by Haplin are described in detail in Gjessing
HK
and Lie RT. Case-parent triads: Estimating single- and
double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics
(2006) 70,
pp.
382-396.
PDF version here.
Also available from Blackwell
What's new in this version of Haplin?
Some features high on the Wish
List
for Haplin
Authors
Haplin is written by Hakon
K.
Gjessing. Hilde-Gunn Bruu contributed to early versions of the
data reading and
preparation
parts.
Rolv Terje Lie has contributed with numerous useful and insightful
suggestions, and inspired the work from its beginning. Nguyen Trung
Truc programmed the nice external GUI for generating Haplin syntax.
Øivind
Skare has done extensive testing and simulations with the more
recent
versions of Haplin, and added a TDT test. Astanand (Anil) Jugessur
has
provided very useful feedback from a user's perspective.
Please feel free
to contact me at hakon.gjessing@fhi.no,
with
questions
or
bug
reports.
Note: Although we have done
our
best to avoid errors, the software is offered without any warranties.
We cannot take responsibility for any problems or damages caused by
using it.
Cite: If you use Haplin in
your publications, please refer to the Annals of Human Genetics
paper
above. In addition, typing citation(package
=
"Haplin") in R will give you
the
most recent reference to the Haplin R-library.
Installation
Haplin is written for use with the statistical software R. However,
it
is easy to
install and
requires no previous knowledge of R. R can be downloaded free of
charge
from The R Project for
Statistical
Computing. For Windows users, a shortcut to the R installation
file
is found here.
Haplin
is
implemented
as
a
standard R library, and should run without problems
on all reasonably new R
versions, for Windows, Linux or MAC.
To install Haplin in R:
Start R and type install.packages("Haplin")
Haplin will then be installed automatically over the
internet from the CRAN library.
------
To start using Haplin, use the R command library(Haplin).
Haplin is
then loaded and ready for use.
NOTE: Every time you start a new R session you must load Haplin with the R command
library(Haplin).
(However, you
only need to install it
from
CRAN once.)
NOTE: To S-Plus users: Previous versions of Haplin did also run
under
S-Plus, but due to S-Plus's new licencing system I have decided it
is
not worth the trouble to maintain an S-Plus version. However, it
should
be easy for you to download R and run Haplin very much the same way
as
you would under S-Plus.
Running Haplin
Haplin is run by the single command
haplin("C:/work/data.dat")
(or whatever the path to the data file is). The data file (data.dat)
can have any name,
but should be a text file in a specific format (see below). This
command reads data, performs the estimation and prints and plots the
result in one run.
By default, Haplin excludes triads with missing data. To include
these
triads in the calculations, include the use.missing argument:
haplin("C:/work/data.dat",
use.missing = T)
(The letter "T" is short
for TRUE in R)
For more examples of how to run Haplin, see the haplin help file (in
R,
type ?haplin).
I have collected a few pieces of
advice
that may be useful if you encounter problems.
The complete reference list of help files is here
Data format
The data format is a fairly simple ASCII file, described here.
For user convenience, it is also possible to convert files from the
standard ped-format to the Haplin format. See
here
for details.
Haplin now provides support for GWAS data, through the GenABEL data
format. A complete description of how to import GWAS data is found here.
Trial run
To test that Haplin runs properly, you can download the trial data
files
HAPLIN.trialdata.txt and HAPLIN.trialdata2.txt, and run
Haplin
with the commands
haplin("HAPLIN.trialdata.txt",
use.missing = T, maternal = T)
haplin("HAPLIN.trialdata2.txt", use.missing = T, n.vars = 2, ccvar
= 2,
design = "cc.triad", reference = "ref.cat", response = "mult")
The results should look something like this: HAPLIN.trialrun.txt, HAPLIN.trialrun2.txt.
In addition, a plot is produced, which should look something
like this:
HAPLIN.trialrun.jpg, HAPLIN.trialrun2.jpg.
GUI
An
easily accessible Graphical User Interface for
generating Haplin syntax is now available from haplin.fhi.no, thanks to Nguyen
Trung
Truc. The syntax generator helps setting up Haplin commands which
can
be cut and pasted into your own R window. It includes many (but not
all) of the features currently available in Haplin.
Model and estimation
The models implemented in Haplin are extensions of the log-linear
models described and developed in the papers
Gjessing
HK
and Lie RT. Case-parent triads: Estimating single- and
double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics
(2006) 70,
pp.
382-396. Wilcox AJ, Weinberg CR, Lie RT (1998).
Distinguishing
the effects of
maternal and offspring genes through studies of "case-parent
triads". American Journal of
Epidemiology, 148(9):
893-901.
Weinberg CR, Wilcox AJ, Lie RT (1998). A log-linear approach to
case-parent-triad data: assessing effects of disease genes that act
directly or though maternal effects and that may be subject to
parental
imprinting. American Journal of
Human Genetics, 62:
969-78
and follow-ups to these. The basic log-linear model for case-parent
triad data allows a user to compute relative risks associated with a
variant allele, together with corresponding confidence intervals and
p-values. It also allows a similar effect estimation for maternal
alleles, i.e. to study the effect of genes of the mother that may influence the
development of the fetus. Haplin extends these models to situations
with multiple densely spaced SNPs (or other markers), where phase is
unknown. Haplin then estimates the relative risks associated with haplotypes,
not only single
markers. In addition, Haplin uses a
parametrization that will detect (at least with sufficient sample
size)
dominance- or recessive deviations from a dose-response model. For
some
details about parametrization, choice of reference category and
interpretation of results, see parametrization.pdf.
The
most
recent
Haplin
version also includes the option to run on
case-control data, or to combine case-parent triads with
control-parent
triads.
Hakon K. Gjessing
Professor/Senior Scientist
Division of Epidemiology
Norwegian
Institute of Public Health
P.O.Box 4404 Nydalen
N-0403 Oslo,
NORWAY
Email: hakon.gjessing@fhi.no