2. Input Description
Important Note: This page will be updated to include the download commands once the data is publicly available
2.1. Genotyping data
-
The GRLS genotyping data is available in binary PLINK file format for each of two Axiom array sets A and B. The PLINK documentation has more information about this file format. In this tutorial, the data was downloaded to the working directory where:
- The files of array A have the prefix
GRLS_Genotyping/setA/AxiomGT1.bin
- The files of array B have the prefix
GRLS_Genotyping/setB/AxiomGT1.bin
Note: These PLINK files have the gender as predicted by the Axiom genotyping analysis tools which is not always consistent with the gender in the dog records. Check the genotyping analysis notes for details.
- The files of array A have the prefix
-
There is A text file that maps between replicate IDs in the genotyping files, and their corresponding sample IDs in phenotype data files. The file includes gender information of the dogs as reported by their owners:
GRLS_Genotyping/map_id_sex.tab
.Here is a sample of the file:
Family_ID replicate_id public_id Sex GRLS grlsT459F8JJ_1 grlsT459F8JJ 1 GRLS grls9ZOSMBXX_1 grls9ZOSMBXX 2 GRLS grls9ZOSMBXX_2 grls9ZOSMBXX 2 GRLS grls9ZOSMBXX_3 grls9ZOSMBXX 2 PILOT grls1US5SPBB_1 grls1US5SPBB 1 Oldies grlsa5561a48_1 grlsa5561a48 2 Notes: 1. There are 3 possible variants of "Family_ID": PILOT, a small subset of the longitudinal GRLS population that were used as a proof of concept study. GRLS, the remainder of the GRLS cohort. And Oldies, a cohort of 200 golden retrievers from the Golden Oldies subproject. See the GRLS genotyping data description for details. 2. One sample (e.g. grls9ZOSMBXX) may have multiple duplicates in the genotyping files (e.g. grls9ZOSMBXX_1, grls9ZOSMBXX_2, grls9ZOSMBXX_3). 3. Sex is encoded where 1 is male and 2 is female.
2.2. Phenotype data
The phenotypic data of the Golden Retriever Lifetime Study is organized in modular data tables, covering 11 key subject areas: activity, behavior, dental, disease diagnoses, diet, environment, grooming, geographical locations, medications, physical exams and reproduction. You can learn more about the detailed description of each data table here
In this tutorial, the Conditions - Skin dataset will be used as an example. The phenotype table was downloaded locally as phenotypes/conditions_skin.csv
.