Sachini Weerasekara

I’m a Ph.D. candidate at Northeastern University, an applied machine learning researcher. My research interests broadly lie in the intersection between machine learning and large-scale systems relating to cell biology and healthcare.

I’m fortunate to be advised by Prof. Jacqueline Isaacs & Prof. Sagar Kamarthi. My recent work involves using generative AI to drive advancements in developing personalized therapies for complex diseases like cancer.

In my most recent collaboration with Takeda Pharmaceuticals, Cambridge, I work on building a transformer-based (LLM-style) single-cell foundation model for cellular identity understanding in tumors using large-scale cell biology (omics) data, and a framework for dissecting tumor microenvironments at the single-cell resolution.

Email / LinkedIn / GoogleScholar / GitHub / Blog

Recent Awards

2024 LEADERS Fellowship Award from Takeda Pharmaceutical
2023 Northeastern COE Ph.D. Research Expo Award
2022 Ferretti & Yamamura Award for Research Excellence, Northeastern University

Teaching

2022 Fall, 2023 Spring, 2023 Summer I – Teaching Assistant for Machine Learning For Engineering
2021 Spring, 2022 Spring – Teaching Assistant for IE7275 Data Mining in Engineering

Research & Projects

Deep supervised and transfer learning | Classical machine learning | Reinforcement learning

STPLM: Single-Cell Language Training with Spatial Transcriptomics for Cell Identity Understanding

Sachini Weerasekara, Natasha Darras, Colles Price, Sagar Kamarthi, Jacqueline Isaacs

In review in ICML 2025

ML techniques: Deep supervised learning, Transfer learning, Self-supervised learning, Contrastive learning

Abstract: Understanding cell identities in tissue samples from gene expression data has critical applications in cancer biology. However, due to the high dimensionality of the feature space, there have been efforts to leverage single-cell pre-trained language models (PLMs) to solve the cell identification problem. While gene expression data can be derived from RNA-Seq or spatial transcriptomic techniques (spatially resolved gene expression data), most existing PLMs are designed for RNA-Seq data, limiting their applicability to spatial transcriptomics. In this work, we address this gap by introducing STPLM, a pre-trained language model for spatial transcriptomics that explicitly models cell-neighborhood interactions for cell identity understanding. To tackle challenges posed by label scarcity and tissue heterogeneity in spatial transcriptomic data, STPLM facilitates few-shot settings by leveraging additional gene marker datasets to construct a unified representation of cell-gene expressions and markers. Experiment results show that STPLM outperforms existing RNA-Seq-based PLMs by 1.42-fold, indicating promising directions for cell identification. Broadly, STPLM takes the first step in annotating spatial transcriptomic data, which is a critical first step in laying the foundation for deeper insights into tissue architecture and immune microenvironments, ultimately driving improvements in therapeutic strategies.

Link

Dissecting tumor microenvironments at the single cell level using generative AI and spatial transcriptomics

Sachini Weerasekara, et.al.

ML technique: Deep supervised learning

Abstract: Spatial transcriptomics enables the in situ measurement of gene expressions across millions of tissue locations, though it involves a trade-off between transcriptome depth, spatial resolution, and sample size. Although integration of Hematoxylin and Eosin (HE) stains with spatial transcripts has enabled impactful work in this context, presently, there exist no approaches to annotate cells leveraging this information. Here, we propose a transformer encoder-based approach to annotate cells using in situ gene expression, HE staining, and cell neighborhood context. Next, we integrate these annotations into a four-way topological mining framework to dissect the tumor microenvironment. We showcase this on targeted and whole-transcriptome spatial platforms, improving cell classification and morphology identification for human lung and colon tumor tissues.

Link

Context-aware Patient Trajectories for Predicting Adverse Event Onsets in Critical Care

Sachini Weerasekara, Aranya Bagchi, Sagar Kamarthi, Jacqueline Isaacs

ML technique: Deep generative modeling

Abstract: This paper presents context-aware patient trajectories (CAPT), a novel approach for modeling and predicting event sequences in the Intensive Care Unit (ICU). Prior research has focused on using sequential representations to learn the time-varying intensity of patient events from Electronic Health Records (EHR). However, these approaches suffer from i) structured event formulations that group events into broad, generalized categories, leading to imprecise, coarse-grained predictions, and ii) the use of fixed ontologies that fail to capture the complex, context-specific information underlying patient trajectories in the ICU. This work addresses these limitations by incorporating temporal contextual features into event sequence representations, capturing both general event structures and fine-grained contextual details. Furthermore, we introduce a Gaussian mixture prior distribution as an additional inductive bias to overcome data limitations. We validate the effectiveness of the proposed approach through extensive experiments on three real-world datasets, and results show that it achieves state-of-the-art performance in predicting patient trajectories in the ICU.

Link

Learning for Disassembly Task Control

ML technique: Reinforcement learning

Controlling tasks in a disassembly line is challenging due to uncertainties related to end-of-life products. This work proposes a Deep Reinforcement Learning based control strategy for cost-efficient disassembly.

Link

Hand Posture Recognition

ML technique: Classical machine learning

Posture recognition is vital in human-computer interaction, surveillance systems, self-driving cars, deaf and dumb communication, etc… This work builds four classical machine learning models to classify the hand postures of an individual.

Link

Dengue Prediction

ML techniques: Classical machine learning

In recent years, dengue fever has been spreading. Historically, the disease has been most prevalent in Asia and the Pacific islands. These days many of the nearly half billion cases per year are occurring in Latin America. This study utilizes classical machine learning and neural networks to understand the relationship between climate and dengue dynamics. This understanding can improve research initiatives and resource allocation to help fight life-threatening pandemics.

Link

Keyword Co-Occurrence Network (KCN) for Industry 4.0 for Asset Life Cycle Management (ALCM)

ML technique: NLP

ALCM strategies such as predictive maintenance are vital for economically and environmentally sustainable manufacturing. This work uses NLP to analyze the knowledge base on Industry 4.0 applications for ALCM.

Link

Machine Learning Operations (MLOps) Pipeline for Pneumonia Onset Prediction

ML technique: MLOps

This project develops an end-to-end MLOps framework for pneumonia onset prediction in ICU patients. Implementations include MLFlow, DVC, Docker, git actions.

Link

Reshoring with Remote Manufacturing

Offshoring manufacturing operations is questionable because of the rising labor costs in offshoring destinations, supply chain resilience concerns, and thus diminishing cost advantages. Nevertheless, returning operations to the U.S. imposes challenges like heavy capital expenditure, labor costs, and raw material shortages. This study formulates a system dynamics model to investigate the capability of a remote manufacturing workforce in supporting bringing manufacturing facilities back to the U.S.

Link