Technology unit
Data Science, Data Management and Digital Solutions

Bringing the continuum of digital technologies and data science to help you achieve your project’s advanced data analytics objectives.

DATA SCIENCE, DATA MANAGEMENT AND DIGITAL SOLUTIONS

Bringing the continuum of digital technologies and data science to help you achieve your project’s advanced data analytics objectives.

The huge amounts of data generated by high-throughput experiments, NGS, multi-Omics analysis, and imaging have moved the life sciences into the era of big data. They have also contributed to the emergence of data science, which makes intensive use of machine learning (ML) and artificial intelligence (AI), to deal with the complexity of the organization of biological systems.

Every day new Machine Learning and Artificial Neural Network (ANN) architectures are developed to address unmet needs and tackle complex problems. Our staff of highly trained data scientists can suggest the optimal architecture to address ad-hoc problems between state-of-the-art available models and de-novo designed algorithms.

The principal challenge in successfully harnessing these advances is to integrate and transversally analyze all these data, in order to translate them into actionable knowledge that can be refined to reflect biological outcomes by interdisciplinary teams.

The analysis and leveraging of data obtained using this new systematic and integrative approach requires a proper flow and management of input information. For this reason, in BIOASTER data is managed following FAIR principles, and using the ideal architecture to address stakeholder’s needs.

Diagnostics

We provide a full-stack set of breakthrough digital core technologies to make your data a fully-integrated asset. This would include both custom-designed in silico models and solutions for hypothesis generation, testing, and validation, as well as decision-making. To this end we will blend the expertise of dedicated data scientists, software engineers, and data managers  and ensure that their endeavors are coordinated in a consistent manner along the length of the data-value chain, with the sole objective of making your project a success.

CLOUD COMPUTING& CONFORMITY DIGITAL SOLUTIONS DATASCIENCE DATAMANAGEMENT ADVANCED DATA ANALYTICS AND INTEGRATION USING THE LATEST DATA SCIENCE ANDAI/ML TECHNOLOGIES APPLYING DATA MANAGEMENT FUNDAMENTALS AS KEY TO THE SUCCESS OF A PROJECT DIGITALIZING R&D SOLUTIONS TO AUGMENT THE USER EXPERIENCE ROBUST HYBRID CLOUD INFRASTRUCTURE TO FULFIL BIO-IT CHALLENGES IN A CONSTRAINED ENVIRONMENT

Data science

ADVANCED DATA ANALYTICS AND INTEGRATION USING THE LATEST DATA SCIENCE AND AI/ML TECHNOLOGIES

The analysis of biological data is a challenging task, with the fast-growing volume of heterogeneous and complex data, the need for highly-integrated analysis, and the diversity of biological questions to be addressed using a range of animal species and microorganisms.

We have shaped our data science core to respond to the wide range of outcomes that might be expected in a multi-dimensional project.
The following list describes most of our capabilities, in order of complexity:

  • we build in silico models to assist with decision-making,
  • we build both predictive and explicative models to decipher molecular signatures (identify biomarkers) and mechanisms of action (MoA), and to provide complete functional analyses, including differential analysis, pathway enrichment and biological networks inferences (for example, using BTM, and WGCNA analysis),
  • we extend the biological relevance using existing reference databases or those relating to other species,
  • we provide a high level of integration of multi-Omics, cytometric, and unstructured readout (e.g. imaging) with the results of (pre)clinical studies and other real-world data (RWD),
  • we envision multi-scale analyses and event correlations by incorporating spatial (e.g. local vs. systemic) or temporal (e.g. longitudinal studies) dimensions,

We generate most of these biological outcomes through the intensive use of a combination of advanced statistical, mathematical, and machine learning (ML) approaches. Our team of data scientists has demonstrated its expertise in the benchmarking and application of such approaches in both unsupervised and supervised integrative analysis.

 

To be able to better integrate unstructured readouts, such as images, in intensive analysis or to address new decision making needs, such as faster/better human-mimicking, we are adding to our capabilities using recent deep-learning approaches and neural network algorithms; for example, CNN applied to Computer Vision through TensorFlow frameworks.

  • Application of AI/ML: Holistic analytical approach
  • Our integrated and practical approach consists of connecting you, from the very beginning, with experts in both lab and data analytics, who will help you refine your biological questions in order to identify the best technological approaches and avoid misunderstandings. The same bioinformaticians, data scientists, and experts in the appropriate fields of biology will add value through an iterative interchange of ideas to aid the interpretation of the findings.
  • In addition to having a toolbox of ready-to-use and pipelined statistical, ML, and AI approaches, we can identify, benchmark, select, assemble, or design innovative algorithms that are the most appropriate for your data type or biological question.

DATA MANAGEMENT

APPLYING DATA MANAGEMENT FUNDAMENTALS AS THE KEY TO THE SUCCESS OF A PROJECT

Efficient, robust, and accurate data analysis cannot be achieved without an appropriate level of data and metadata management and that this is often not adequately considered in the life sciences.

Data tracking, security, and integrity throughout the data life cycle are supported by digital solutions. The portfolio of solutions that we can use in your project comprises:

  • generic platforms for data warehousing and data sharing in a big data context,
  • custom solutions for the specific characteristics of the harvested data; e.g., LIMS, eCRF, LabKey (centralized multi-study projects), and tranSMART (clinical hypothesis testing)

We guarantee you a high level of Data Governance and the application of the required standards through:

  • the collaboration of our data steward, an internal multi-business executive committee, and data managers,
  • the ontology-driven organization and description of data/metadata,
  • the application of FAIR principles (Findable, Accessible, Interoperable, Reusable).
  • We favor data integration and knowledge generation by UID, data/metadata centralization, and platform interoperability.
  • Our data management frameworks are consistent with regulatory obligations (e.g. GDPR, clinical studies) and may be supplemented with a Data Management Plan (DMP).
  • The data generated is ready for further valorization, such as through IP rights, publications, transfer, and re-use, as well as long-term preservation.
  • We are used to dealing with multiple partners and complex projects, such as
    > H2020, IMI, Horizon Europe
    > Deputation of data management and centralization of the data/computational needs for multi-million-euro projects (e.g. COVID-AuRA Translate)

SCIENTIFIC DIGITAL SOLUTIONS

DIGITALIZING R&D SOLUTIONS TO AUGMENT THE USER EXPERIENCE

The results generated by bioinformatic and data science analytical pipelines should be further refined and translated into final outcomes by biological experts.
Digital systems, by promoting the interoperability of functional components and integration, provide a stimulating user experience (UX) by facilitating biological interpretation through dynamic and advanced visualization tools.

Such platforms reduce data complexity, facilitate the presentation of complex concepts, offer magnified or user-context knowledge representation, and offer business-focused interfaces. This allows a final user (a scientist, clinician, or patient) who is not familiar with bioinformatics, statistics, or digital matters, to interact with the data and fully understand the data.
We can design a broad range of applications, according to your needs:

  • decision-making/support platforms; for example, in a clinical context, using companion software,
  • hypothesis generation, testing, and/or validation; for example, in clinical context, using (tranSMART, etc.),
  • DataMining4DataMeaning”: mined data is used to construct and assess predictive models,
  • knowledge aggregation, supported by complex querying tools,
  • registration and integration of (meta)data to facilitate advanced data management; for example, using LabKey.

We suggest the most appropriate technological approach for your project, through:

  • customization of off-the-shelf digital solutions,
  • use of Web-applications, such as Rshiny and notebooks,
  • design, development, and deployment (DevOps, Continuous Integration, etc.) of specific applications by instantiating methods and tools at the core of our Software Engineering Platform.
  • We provide a 100% user-driven and case-oriented custom solution.
    We apply proven methods to maximize your satisfaction: together, we analyze your requirements, prepare a mock-up, provide a user manual and e-training, and design and implement responsive platform, right up until the final go-live and user acceptance.
  • We manage all the steps of product development from prototyping to incremental pilots, and product deployment.
  • Broad range of direct and indirect applications:
    > customized,
    > point-of-care; e.g., for diagnostics,
    > device miniaturization,
    > human-mimicking.
  • Wide range of direct and indirect gains:
    > lower costs,
    > shorter lag time to-results,
    > high/medium-throughput and scale-up.

Cloud computing & Conformity

ROBUST HYBRID CLOUD INFRASTRUCTURE TO FULFIL BIO-IT CHALLENGES IN A CONSTRAINED ENVIRONMENT

Big data, AI/ML, digital, and systems biology-related challenges, in the form of massive data storage, transfer, and computation, have to be addressed through the use of the latest, cutting-edge information technologies, with skilled operators, to cope with the fast-changing regulatory environment. Therefore, we complement our internal expertise and IT resources with fruitful external partnerships.

To achieve the targeted level of performance, diversity of technologies, and high availability, we complement BIOASTER’s own IT infrastructure with resources from the CC-IN2P3 mesocenter (CNRS, Lyon). This provides us with IaaS, PaaS, and SaaS services, high-throughput computing, petabyte-level storage capacity, and large bandwidth networks. We also supplement our Hybrid Cloud, which has been designed from the outset to be scalable and flexible, with resources and services from commercial GAFAM-like cloud providers.

The conformity of our information systems with security standards is a daily preoccupation, in order to guarantee data integrity and data privacy. We use monitoring and supervision IT business standards, embedded in certified ISO9001 and internal ISS Policy frameworks, and assessed by self-or partner-mandated audits.
These data privacy and cybersecurity concerns are of great importance with regard to compliance with GDPR requirements. We have demonstrated such compliance in clinical study-related privacy impact assessments (PIA), performed as controller or joint controller.

  • We offer 100% virtualized, scalable, and multi-sourced resources permitting capacity-planning adjusted to your project needs and schedule.
  • Big data and IA-ready infrastructure
    > For your most demanding 3rd generation sequencing, metagenomics, integrated multi-Omics, and deep learning projects.
  • Transfer of ready-to-use technologies
    > Allows real-time deployment in partner ecosystems (e.g. full reinstalls based on Docker-like container technology) once the project is complete.
Lucia PAGANI
Dr. Lucia PAGANIManager of Data Science & Data Management unitIcon email