Forensic MPS AIMs Panel Reference Population Sets

Forensic MPS AIMs Panel Reference Population Sets to download

• The training sets provided for each forensic MPS ancestry panel have been constructed from the published SNP genotypes of 1000 Genomes (NYGC 30X sequence coverage variant calls); HGDP-CEPH, Simons Foundation and Estonian Genome Diversity Projects (SGDP and EGDP).

• The ‘active’ worksheet in position 1 of each ancestry marker panel’s training set file contains a suggested standard reference dataset for use in Snipper, comprising 1000 Genomes (1KG) African YRI, European CEU, South Asian GIH, East Asian CHB genotypes, HGDP-CEPH Oceanian and Native American genotypes, plus Middle East genotypes in the published data from Almarri et al., The genomic history of the Middle East, Cell, 2021 (PubMed ID 34352227). The VISAGE BT panel does not have Middle East genotypes in the reference dataset so it differentiates six population groups. The American genotypes also include 18 1KG Peruvians from Lima (PEL) with no detected admixture. Rows 2-5 of this worksheet are not read by Snipper and contain genomic data for the SNPs in each panel.

• The other worksheets compile the remaining 1000 Genomes populations into unadmixed and admixed sets; HGDP-CEPH populations not used in the reference datasetplus 31 Almarri population samples that are too small to be useful for allele frequency calculations (9 Iraqi Arab; 4 Iraqi Kurd; 3 Jordanian; 3 Omani; 12 Syrian); 130 SGDP samples; 402 EGDP samples —note some panel’s SNP data are missing from the EGDP dataset, notably, tri-allelic SNP genotypes removed by PLINK processing—. Additional in-house population data are included for the VISAGE Basic Tool ancestry panel.

• The Verogen ForenSeq DNA Signature Kit UAS software reports the allele from the opposite strand to the reference sequence in 18 of 56 SNPs, marked in blue. Nucleotide inversions have been applied to the original genotype data in these 18 SNPs. Each inverted SNP is in the Thermo Fisher Precision ID Ancestry panel with their unmodified reference strand genotypes. The Precision ID Ancestry panel includes 55 of 56 SNPs in the Verogen ForenSeq DNA Signature Kit, except rs1919550 which is in complete LD with rs12498138 and is therefore a redundant marker.

• The FORCE AIMs panel comprises a combination of the VISAGE Basic Tool ancestry panel and Thermo Fisher Precision ID Ancestry Panel SNPs. There is some overlap between both panels, but the unique VISAGE BT ancestry SNPs are marked in red; see Tillmar et al., The FORCE Panel: an all-in-one SNP marker set for confirming investigative genetic genealogy leads and for general forensic applications, Genes, 2021, (PubMed ID 34946917).

• Users can select and transpose sample rows from the other worksheets into the standard reference dataset to act as alternative/additional reference data (final column bears a ‘1’ label), or as test samples (final column bears a ‘0’ label). Cell A1 denotes the sample number and must be adjusted appropriately. Cell C1 denotes the number of populations in the training set and test samples are marked as belonging to one of the training set populations in column B.

• For your reference, we provide summary tables for each Reference Population Set: the VISAGE Basic Tool ancestry panel table, Verogen ForenSeq DNA Signature Kit 56-AIMs table, FORCE AIMs table, and TFS Precision ID Ancestry Panel table. Table headers are clickable to sort the table by the clicked column, in ascending or descending order.

• It is possible to download the Reference Population Set of choice: the VISAGE Basic Tool ancestry panel grid; Verogen ForenSeq DNA Signature Kit 56-AIMs grid; FORCE AIMs grid; and TFS Precision ID Ancestry Panel grid. Then paste in your profiles as extra rows ending in 0 (final column), and modify the total number of individuals in first leftmost cell accordingly. At this stage you will have a hybrid profile and training set file to analyse. Then go to the multiple profile classifying tool in Snipper and input the resulting file.