Research Papers

T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics.

Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants.

Recent advances in high-resolution mass spectrometry (MS) instruments and liquid chromatography (LC)-MS/MS data analysis software have enabled novel insights into the serum and mucosal antibody repertoire. There is a lack of standardization and benchmarking of antibody repertoire proteomics (Ab-seq) in both experimental and analytical pipelines.

Knowledge of the sequence composition and dynamics of the antibody repertoire remains limited due to the complexity of antibody biology as well as Ab-seq workflow-related technological, experimental, and computational challenges that hinder the development of vaccines, antibody therapeutics, and immunodiagnostics. Newly developed strategies can improve Ab-seq workflows and technology.

Humoral immunity is divided into the cellular B cell and protein-level antibody responses. High-throughput sequencing has advanced our understanding of both these fundamental aspects of B cell immunology as well as aspects pertaining to vaccine and therapeutics biotechnology.

Although the protein-level serum and mucosal antibody repertoire make major contributions to humoral protection, the sequence composition and dynamics of antibody repertoires remain underexplored. This limits insight into important immunological and biotechnological parameters such as the number of antigen-specific antibodies, which are for example, relevant for pathogen neutralization, microbiota regulation, severity of autoimmunity, and therapeutic efficacy.

High-resolution mass spectrometry (MS) has allowed initial insights into the antibody repertoire. We outline current challenges in MS-based sequence analysis of antibody repertoires and propose strategies for their resolution.

Immunogenomics studies have been largely limited to individuals of European ancestry, restricting the ability to identify variation in human adaptive immune responses across populations. Inclusion of a greater diversity of individuals in immunogenomics studies will substantially enhance our understanding of human immunology.

The process of recombination between variable (V), diversity (D), and joining (J) immunoglobulin (Ig) gene segments determines an individual’s naive Ig repertoire and, consequently, (auto)antigen recognition. VDJ recombination follows probabilistic rules that can be modeled statistically.

So far, it remains unknown whether VDJ recombination rules differ between individuals. If these rules differed, identical (auto)antigen-specific Ig sequences would be generated with individual-specific probabilities, signifying that the available Ig sequence space is individual specific. We devised a sensitivity-tested distance measure that enables inter-individual comparison of VDJ recombination models.

We discovered, accounting for several sources of noise as well as allelic variation in Ig sequencing data, that not only unrelated individuals but also human monozygotic twins and even inbred mice possess statistically distinguishable immunoglobulin recombination models. This suggests that, in addition to genetic, there is also nongenetic modulation of VDJ recombination.

We demonstrate that population-wide individualized VDJ recombination can result in orders of magnitude of difference in the probability to generate (auto)antigen-specific Ig sequences.

Our findings have implications for immune receptor–based individualized medicine approaches relevant to vaccination, infection, and autoimmunity.

The interactions between antibodies, SARS-CoV-2 and immune cells contribute to the pathogenesis of COVID-19 and protective immunity. To understand the differences between antibody responses in mild versus severe cases of COVID-19, we analyzed the B cell responses in patients 1.5 months post SARS-CoV-2 infection. Severe, and not mild, infection correlated with high titers of IgG against Spike receptor binding domain (RBD) that were capable of ACE2:RBD inhibition.

B cell receptor (BCR) sequencing revealed that VH3-53 was enriched during severe infection. Of the 22 antibodies cloned from two severe donors, six exhibited potent neutralization against authentic SARS-CoV-2, and inhibited syncytia formation.

Using peptide libraries, competition ELISA and mutagenesis of RBD, we mapped the epitopes of the neutralizing antibodies (nAbs) to three different sites on the Spike. Finally, we used combinations of nAbs targeting different immune-sites to efficiently block SARS-CoV-2 infection. Analysis of 49 healthy BCR repertoires revealed that the nAbs germline VHJH precursors comprise up to 2.7% of all VHJHs.

We demonstrate that severe COVID-19 is associated with unique BCR signatures and multi-clonal neutralizing responses that are relatively frequent in the population. Moreover, our data support the use of combination antibody therapy to prevent and treat COVID-19.

Albumin has a serum half-life of 3 weeks in humans. This feature can be used to improve the pharmacokinetics of shorter-lived biologics. For instance, an albumin-binding domain (ABD) can be used to recruit albumin.

A prerequisite for such design is that the ABD-albumin interaction does not interfere with pH-dependent binding of albumin to the human neonatal Fc receptor (FcRn), as FcRn acts as the principal regulator of the half-life of albumin. Thus, there is a need to know how ABDs act in the context of fusion partners and human FcRn.

Here, we studied the binding and transport properties of human immunoglobulin A1 (IgA1), fused to a Streptococcus protein G-derived engineered ABD, in in vitro and in vivo systems harboring human FcRn. IgA has great potential as a therapeutic protein, but its short half-life is a major drawback.

We demonstrate that ABD-fused IgA1 binds human FcRn pH-dependently and is rescued from cellular degradation in a receptor-specific manner in the presence of albumin. This occurs when ABD is fused to either the light or the heavy chain. In human FcRn transgenic mice, IgA1-ABD in complex with human albumin, gave 4-6-fold extended half-life compared to unmodified IgA1, where the light chain fusion showed the longest half-life.

When the heavy chain-fused protein was pre-incubated with an engineered human albumin with improved FcRn binding, cellular rescue and half-life was further enhanced. Our study reveals how an ABD, which does not interfere with albumin binding to human FcRn, may be used to extend the half-life of IgA.

Antibody-antigen binding relies on the specific interaction of amino acids at the paratope-epitope interface. The predictability of antibody-antigen binding is a prerequisite for de novo antibody and (neo-)epitope design. A fundamental premise for the predictability of antibody-antigen binding is the existence of paratope-epitope interaction motifs that are universally shared among antibody-antigen structures.

In a dataset of non-redundant antibody-antigen structures, we identify structural interaction motifs, which together compose a commonly shared structure-based vocabulary of paratope-epitope interactions. We show that this vocabulary enables the machine learnability of antibody-antigen binding on the paratope-epitope level using generative machine learning.

The vocabulary (1) is compact, less than 104 motifs; (2) distinct from non-immune protein-protein interactions; and (3) mediates specific oligo- and polyreactive interactions between paratope-epitope pairs. Our work leverages combined structure- and sequence-based learning to demonstrate that machine-learning-driven predictive paratope and epitope engineering is feasible.

Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells.

In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored.

To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification performance with an F1 score of 85%. Clusters of heavy and light chain sequences were inferred and used as features for the model, and signatures associated with the disease were then characterized.

These signatures included amino acid (AA) 3-mers with distinct bio-physiochemical characteristics and enriched V and J genes. We found that CeD-associated clusters can be identified and that common motifs can be characterized from naïve BCR repertoires. The results may indicate a genetic influence by BCR encoding genes in CeD.

Analysis of naïve BCRs as presented here may become an important part of assessing the risk of individuals to develop CeD. Our model demonstrates the potential of using BCR repertoires and in particular, naïve BCR repertoires, as disease susceptibility markers.

The adaptive immune system stores invaluable information about current and past immune responses and may serve as an ultrasensitive biosensor. Given the immune system’s critical role in a wide variety of disease types, this has broad implications for biomedicine.

Machine and deep learning is being leveraged to decipher how information is encoded in adaptive immune receptor repertoires to enable prediction from adaptive immune responses and fast-track vaccine, therapeutics, and diagnostics development. Recent advances include predicting the presence of immunity post-vaccination or infection, predicting the presence of disease, and designing antibody-based therapeutics.

Much is still not understood about the human adaptive immune response to SARS-CoV-2, the causative agent of COVID-19. In this paper, we demonstrate the use of machine learning to classify SARS-CoV-2 epitope specific T-cell clonotypes in T-cell receptor (TCR) sequencing data.

We apply these models to public TCR data and show how they can be used to study T-cell longitudinal profiles in COVID-19 patients to characterize how the adaptive immune system reacts to the SARS-CoV-2 virus. Our findings confirm prior knowledge that SARS-CoV-2 reactive T-cell diversity increases over the course of disease progression.

Monitoring the T cell receptor (TCR) repertoire in health and disease can provide key insights into adaptive immune responses, but the accuracy of current TCR sequencing (TCRseq) methods is unclear. In this study, we systematically compared the results of nine commercial and academic TCRseq methods, including six rapid amplification of complementary DNA ends (RACE)-polymerase chain reaction (PCR) and three multiplex-PCR approaches, when applied to the same T cell sample.

Colonization by the microbiota causes a marked stimulation of B cells and induction of immunoglobulin, but mammals colonized with many taxa have highly complex and individualized immunoglobulin repertoires.

Here we use a simplified model of defined transient exposures to different microbial taxa in germ-free mice to deconstruct how the microbiota shapes the B cell pool and its functional responsiveness. We followed the development of the immunoglobulin repertoire in B cell populations, as well as single cells by deep sequencing.

During a pandemic, data combined with the right context and meaning can be transformed into knowledge for informing public health responses. Timely and accurate collection, reporting and sharing of data with the research community, public health practitioners, clinicians and policy makers will inform assessment of the likely impact of a pandemic to implement efficient and effective response strategies.

Polymorphisms in human immunoglobulin heavy chain variable genes and their upstream regions

Germline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 potential novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals.

VDJbase: an adaptive immune receptor genotype and haplotype database

VDJbase is a publicly available database that offers easy searching of data describing the complete sets of gene sequences (genotypes and haplotypes) inferred from adaptive immune receptor repertoire sequencing datasets. VDJbase is designed to act as a resource that will allow the scientific community to explore the genetic variability of the immunoglobulin (Ig) and T cell receptor (TR) gene loci.

RAbHIT is an R Haplotype Antibody Inference Tool, that implements a novel algorithm to infer V(D)J haplotypes by adapting a Bayesian framework. RAbHIT offers inference of haplotype and gene deletions. It may be applied to sequences from naïve and non-naïve B-cells, sequenced by different library preparation protocols.

immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking

immuneSIM enables in silico generation of single and paired chain human and mouse B- and T-cell repertoires with user-defined tunable properties to provide the user with experimental-like (or aberrant) data to benchmark their repertoire analysis methods.

High frequency of shared clonotypes in human B cell receptor repertoires

The human genome contains approximately 20 thousand protein-coding genes1, but the size of the collection of antigen receptors of the adaptive immune system that is generated by the recombination of gene segments with non-templated junctional additions (on B cells) is unknown—although it is certainly orders of magnitude larger.

Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data

High-throughput sequencing (HTS) of immunoglobulin (B-cell receptor, antibody) and T-cell receptor repertoires has increased dramatically since the technique was introduced in 2009 (13). This experimental approach explores the maturation of the adaptive immune system and its response to antigens, pathogens, and disease conditions in exquisite detail.

Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-repertoire sequencing data

Antigen specificity is a cardinal feature of adaptive immunity that underlies immune homeostasis and control of pathogenic attack in higher vertebrates.

Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping

Analysis of antibody repertoires by high-throughput sequencing is of major importance in understanding adaptive immune responses. Our knowledge of variations in the genomic loci encoding immunoglobulin genes is incomplete, resulting in conflicting VDJ gene assignments and biased genotype and haplotype inference.