How does iReceptor Plus address privacy issues?


Praise for iReceptor Plus and its ability to help researchers around the world to share and analyze huge immunological distributed datasets from multiple countries relating to sequencing data on healthy and sick individuals was recently posted on ERCIM News of the European Research Consortium for Informatics and Mathematics.

The consortium dedicated to privacy-preserving computing aims to foster collaborative work within the European research community and increase cooperation with European industry.

The article, entitled “Handling Privacy Preservation in a Software Ecosystem for the Querying and Processing of Deep Sequencing Data” was published by Prof. Artur Rocha and colleagues at INESC TEC, an internationally-oriented, private and non-profit multidisciplinary associate laboratory, which is a partner of iReceptor Plus.

The authors noted that iReceptor Plus “most of the Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) data is currently stored and curated by individual labs, using a variety of tools and technologies. iReceptor Plus aims to lower the barrier to accessing and analyzing large AIRR-seq datasets, which will make this important data more available to academia, industry and clinical partners.”

The project “will stimulate the public sharing of AIRR-seq data, while providing a mechanism for users to protect private data when required. To this end, we are developing a layered security framework across a distributed (federated) software ecosystem.

The international iReceptor Plus consortium aims to promote human immunological data storage, integration and controlled sharing for a wide range of clinical and scientific purpose
Rocha and colleagues noted that “AIRR sequencing technology has made it possible to sample the immune repertoire in exquisite detail but also poses substantial challenges, such as the preservation of the privacy of data subjects.

The issue of privacy is a topic of continuous discussion within the health informatics community, especially when it comes to genetic datasets, which are subject to constraints of confidentiality, security, rights and ownership. While analyses performed on these datasets may provide crucial research evidence, both data access and their processing must be conducted in a way that does not compromise privacy.”

Overview of the Security Framework and interaction among its main components
Overview of the Security Framework and interaction among its main components

The layered security framework delivers to iReceptor+ a working authentication and authorization infrastructure enabling the following features federated authentication for data consumers, compatible with multiple third-party identity providers (and identity brokers); secure ADC repository endpoints according to the permissions set by data stewards; and a dashboard for data stewards to manage data consumer’s permissions for each endpoint and resource they own.

The main standard for managing authorization, they continued, “is user-managed access (UMA 2.0). UMA is an OAuth-based access management protocol for managing authorization to resources. It grants data stewards the ability to manage permissions and accessibility to their resources, and control who can access their resources…The basic workflow follows an exchange of permission tickets between the security framework and the requesting user. The process is used to identify the user, determine which dataset the user is trying to access, and finally to resolve which sets of data should be returned to the user. The UMA 2.0 authorization standard was designed specifically for protected data. However, in iReceptor Plus, protected data may live side by side with public data in the same repositories.”

The security dashboard is an interface that allows data stewards to control access using different levels of granularity through an interface modelled after the ADC data standards. It enables fine-grained customization over what is exposed by the security framework.

They concluded that “the layered security framework builds on the privacy by design and data minimization principles to attain privacy preservation in a federated software ecosystem for the querying and processing of AIRR-seq data. If data has been previously made public, it can be accessed via standard APIs without triggering the default UMA workflow. Should access restrictions apply, data stewards can use the security framework to configure adequate permission levels according to the sensitivity of the data to be shared.”

Aggregated data “can be set to an intermediary level of permissions. A registered user could then access these features in an exploratory data analysis stage, before deciding to activate the necessary legal instruments for the sharing of potential sensitive data.