HAPLIN

Software for analyzing case-parent triad (trio) data with SNP haplotypes


NOTE: THIS IS A DEPRECATED VERSION. PLEASE USE THE LATEST VERSION OF HAPLIN

Web page last updated: May 29, 2006
Most recent version: Haplin 2.1, uploaded May 29, 2006

Background

HAPLIN is free software written for the purpose of analyzing case-parent triad (trio) data. Some of the main features of HAPLIN are:
The models estimated by HAPLIN are described in detail in Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
PDF version here.
Also available from Blackwell

What's new in this version of Haplin?

Some features high on the Wish List for Haplin

Authors

HAPLIN is written by Hakon K. Gjessing. The data reading and preparation parts have been extensively upgraded and improved by Hilde-Gunn Bruu. Rolv Terje Lie has contributed with numerous useful and insightful suggestions, and inspired the work from its beginning. Please feel free to contact me at hakon.gjessing@fhi.no, with questions or bug reports.

Note: Although we have done our best to avoid errors, the software is offered without any warranties. We cannot take responsibility for any problems or damages caused by using it.

Please: If you use Haplin in your analyses, it will be much appreciated if you refer to Haplin (this web page), or better, to the Annals of Human Genetics paper above.

Installation

HAPLIN is written for use with S-PLUS or R. However, it is easy to install and requires no previous knowledge of S-PLUS or R. If you do not have an S-PLUS license, R can be downloaded free of charge from The R Project for Statistical Computing. For Windows users, a shortcut to the R installation file is found here. HAPLIN should run without problems on all reasonably new S-PLUS and R versions, for Windows, Linux or UNIX.

To install HAPLIN, download the appropriate file

HAPLIN 2.1 FOR S-PLUS: HAPLIN.q.txt
HAPLIN 2.1 FOR R: HAPLIN.R.txt

and save it in a suitable directory as a plain text file.

Start S-PLUS and type

source("C:/work/HAPLIN.q.txt")

(or use whatever is the correct path). This installs Haplin.
For R, just use the file name "HAPLIN.R.txt" instead.

NOTE: It is recommended that HAPLIN is installed in an empty workspace to avoid cluttering.

NOTE: when saving the HAPLIN files (which are plain text files) it usually ends up with a .txt extension, and if so, this should be included in the file name, as indicated above, when sourcing. If HAPLIN has been saved to the working directory of S-PLUS the path is unnecessary. Note that S-PLUS and R prefer to use the "/" in path names.

HAPLIN will then be ready for use. In S-PLUS it will remain in place until removed manually. In R, it will remain in place if the workspace is saved after installation, otherwise you need to source it again next time you start a session.

Some (unimportant) installation details here

Running HAPLIN

Haplin is run by the single command

haplin("C:/work/data.dat")

(or whatever the path to the data file is). The data file (data.dat) can have any name, but should be a text file in a specific format (see below). This command reads data, performs the estimation and prints and plots the result in one run.

By default, Haplin excludes triads with missing data. To include these triads in the calculations, include the use.missing argument:
haplin("C:/work/data.dat", use.missing = T)
(The letter "T" is short for TRUE in S-PLUS/R)

For more examples of how to run HAPLIN, see the haplin help file. (For the time being, only haplin itself has a help file)

I have collected a few pieces of advice that may be useful if you encounter problems.

Data format

HAPLIN requires data to be in an ASCII file in a specific format. Each line represents one triad. There are three columns for each locus, one for the mother (M), one for the father (F) and one for the child (C). The columns are placed in the following sequence (where the numbers indicate marker):

M1  F1  C1  M2  F2  C2  ...etc.

There should be no row- or column names in the file, and columns are separated by white space.
Important: Make sure the sequence is correct, this is the only way for HAPLIN to figure out which is which.

Within each column the two alleles for that individual in that locus are separated by a semi-colon.

Thus, for 2 loci with 4 and 2 alleles, respectively, the first four lines of data might look like

4;4 4;4 4;4 2;2 1;2 1;2
2;4 2;4 2;4 2;2 2;2 2;2
2;4 NA 2;4 2;2 2;2 2;2

2;4 2;4 2;4 2;2 2;2 2;2

Note the NA that indicates missing genotype at the first marker of the father in the third triad.

For user convenience, it is also possible to use different separators between columns and within columns. In addition, HAPLIN includes functions for converting to- and from the data format used by the TRANSMIT program (see below). For more details, see haplin format.

Trial run

To test that HAPLIN runs properly, you can download the trial data file HAPLIN.trialdata.txt and run HAPLIN with the command

haplin("HAPLIN.trialdata.txt", use.missing = T, verbose = F)

The result should look something like this: HAPLIN.trialrun.txt.

(This is from R; there may be minor differences with S-Plus. Note that some unimportant warnings about plotting parameters may sometimes appear in R. Please see a few pieces of advice for more details)

In addition, a plot is produced, which should look something like this: HAPLIN.trialrun.jpg

Model and estimation

The models implemented in HAPLIN are extensions of the log-linear models described and developed in the papers

Wilcox AJ, Weinberg CR, Lie RT (1998). Distinguishing the effects of maternal and offspring genes through studies of "case-parent triads". American Journal of Epidemiology, 148(9): 893-901.
Weinberg CR, Wilcox AJ, Lie RT (1998). A log-linear approach to case-parent-triad data: assessing effects of disease genes that act directly or though maternal effects and that may be subject to parental imprinting. American Journal of Human Genetics, 62: 969-78

and follow-ups to these. The basic log-linear model for case-parent triad data allows a user to compute relative risks associated with a variant allele, together with corresponding confidence intervals and p-values. It also allows a similar effect estimation for maternal alleles, i.e. to study the effect of genes of the mother that may influence the development of the fetus. HAPLIN extends these models to situations with multiple densely spaced SNPs (or other markers), where phase is unknown. HAPLIN then estimates the relative risks associated with haplotypes, not only single markers. HAPLIN is similar to the TRANSMIT program by David Clayton (MRC Biostatistics Unit, Cambridge). However, HAPLIN returns explicit estimates of relative risks with confidence intervals, and optionally includes effects of maternal genes. In addition, HAPLIN uses a parametrization that will detect (at least with sufficient sample size) dominance- or recessive deviations from a dose-response model. For some details about parametrization, choice of reference category and interpretation of results, see parametrization.pdf.


Old versions of HAPLIN


Hakon K. Gjessing
Professor, Senior Scientist
Division of Epidemiology
Norwegian Institute of Public Health
P.O.Box 4404 Nydalen
N-0403 Oslo, NORWAY
Email: hakon.gjessing@fhi.no