Healthcare and life science organizations are increasingly working with large-scale, multimodal datasets that include structured records, clinical notes, diagnostic images, and PDF documents.

Sharing this data for research and AI development requires rigorous de-identification to ensure patient privacy — without compromising the ability to extract insights across time and modalities.

In this webinar, experts from John Snow Labs and Databricks will demonstrate an end-to-end solution for automating the de-identification and tokenization of medical data with regulatory-grade accuracy. You’ll learn how to:

  • Automatically de-identify structured data, unstructured text, DICOM & JPEG images, whole-slide pathology images (SVS), and PDFs using John Snow Labs’ industry-leading software and AI models
  • Apply patient tokenization to enable linking of de-identified data across modalities and time points
  • Use Databricks to process and scale these capabilities across large, real-world datasets
  • Support HIPAA, GDPR, and other regulatory requirements for privacy-preserving research

This session is ideal for data scientists, clinical researchers, compliance teams, and healthcare IT leaders working with multimodal patient data who want to enable longitudinal, privacy-compliant research at scale.

 

REGISTER HERE

PRESENTED BY:

Srikanth Kumar - Edited

Srikanth Kumar Rana
Solutions Architect
Databricks

Srikanth Kumar Rana is a seasoned Field Engineer at Databricks, bringing extensive experience in helping organizations unlock the full potential of data and AI. With a strong focus on empowering customers, Srikanth has consistently demonstrated expertise in complex deployments, driving adoption, and enabling businesses to achieve tangible outcomes on the Databricks Lakehouse platform.

 

Youssef_headshot - Edited

Youssef Mellah, Ph.D.
Senior Data Scientist, Machine Learning Engineer
John Snow Labs

Youssef Mellah, Ph.D., is a Senior Data Scientist and Machine Learning Engineer at John Snow Labs, specialist with more than 8 years of experience in artificial intelligence, natural language processing, and deep learning. He specializes in building, training, and deploying regulatory-grade ML/DL models and large language models (LLMs) for healthcare and life sciences, including the de-identification and tokenization of multimodal medical data. Youssef has a strong track record designing scalable, privacy-preserving AI solutions that enable compliant research and analytics across structured and unstructured data. He is passionate about advancing NLP technology, leading multidisciplinary teams, and transforming cutting-edge research into practical, real-world applications.


WATCH LAST YEAR'S WEBINAR