Population matched (PM) germline allelic variants of immunoglobulin (IG) loci: New pmIG database to better understand IG repertoire and selection processes in disease and vaccination

Abstract: At the population level, immunoglobulin (IG) loci harbor inter-individual allelic variants in the many different germline IG variable (V), Diversity (D) and Joining (J) genes of the IG heavy (IGH), IG kappa (IGK) and IG lambda (IGL) loci, which together form the genetic basis of the highly diverse antigen-specific B-cell receptors. These inter-individual allelic variants can be shared between or be specific to human populations. The current IG databases IMGT, VBASE2 and IgPdb hold information about germline alleles, most of which are partial sequences, obtained from a mixture of human (B-cell) samples, many with sequence errors and/or acquired (non-germline) IG variations, induced by somatic hypermutation (SHM) during antigen-specific B-cell responses. We systematically identified true germline alleles (without SHM) from 26 different human populations around the world, profiled by the “1000 Genomes data”. Our resource is uniquely enriched with complete IG allele sequences and their frequencies across human populations. We identified 409 IGHV, 179 IGKV, and 199 IGLV germline alleles supported by at least seven haplotypes (= minimum of four individuals), after removal of potential false-positives, based on using other genomic databases, i.e. ENSEMBL, TopMed, ExAC, ProjectMine. Remarkably, the positions of the identified variant nucleotides of the different alleles are not at random (as observed in case of SHM), but show striking patterns, restricted to limited nucleotide positions, the same as found in other IG data bases, suggesting over-time evolutionary selection processes. The identification of these specific patterns provides extra evidence that the identified variant nucleotides are not sequencing errors, but genuine allelic variants. The diversity of germline allelic variants in IGH and IGL loci is the highest in Africans, while the IGK locus is most diverse in Europeans. We also report on the presence of recombination signal sequences (RSS) in V pseudogenes, explaining their usage in V(D)J rearrangements. We propose that this new set of genuine germline IG sequences can serve as a new population-matched IG (pmIG) database for better understanding B-cell repertoire and B-cell receptor selection processes in disease and vaccination within and between different human populations. The database in format of fasta is available via GitHub (https://github.com/InduKhatri/pmIG).

Indu Khatri, Magdalena A.Berkowska, Erik van den Akker, Cristina Teodosio, Marcel Reinders, and Jacques J.M. van Dongen. Population matched (PM) germline allelic variants of immunoglobulin (IG) loci: New pmIG database to better understand IG repertoire and selection processes in disease and vaccination. BioRxiv, 2020.
@article { bib:2020_pmIG_preprint,
author = { Indu Khatri and Magdalena A.Berkowska and Erik van den Akker and Cristina Teodosio and Marcel Reinders and Jacques J.M. van Dongen },
title = { Population matched (PM) germline allelic variants of immunoglobulin (IG) loci: New pmIG database to better understand IG repertoire and selection processes in disease and vaccination },
year = { 2020 },
}