Sachini Weerasekara
I’m a Ph.D. candidate at Northeastern University, an applied machine learning researcher. My research interests broadly lie in the intersection between machine learning and large-scale systems relating to cell biology and healthcare.
I’m fortunate to be advised by Prof. Jacqueline Isaacs & Prof. Sagar Kamarthi. My recent work involves using generative AI to drive advancements in developing personalized therapies for complex diseases like cancer.
In my most recent collaboration with Takeda Pharmaceuticals, Cambridge, I work on building a transformer-based (LLM-style) single-cell foundation model for cellular identity understanding in tumors using large-scale cell biology (omics) data, and a framework for dissecting tumor microenvironments at the single-cell resolution.
Email / LinkedIn / GoogleScholar / GitHub / Blog
Recent Awards
- 2024 LEADERS Fellowship Award from Takeda Pharmaceutical
- 2023 Northeastern COE Ph.D. Research Expo Award
- 2022 Ferretti & Yamamura Award for Research Excellence, Northeastern University

Teaching
- 2022 Fall, 2023 Spring, 2023 Summer I – Teaching Assistant for Machine Learning For Engineering
- 2021 Spring, 2022 Spring – Teaching Assistant for IE7275 Data Mining in Engineering
Research & Projects
Deep supervised and transfer learning | Classical machine learning | Reinforcement learning

STPLM: Single-Cell Language Training with Spatial Transcriptomics for Cell Identity Understanding
Sachini Weerasekara, Natasha Darras, Colles Price, Sagar Kamarthi, Jacqueline Isaacs
In review in ICML 2025
ML techniques: Deep supervised learning, Transfer learning, Self-supervised learning, Contrastive learning
Abstract: Understanding cell identities in tissue samples from gene expression data has critical applications in cancer biology. However, due to the high dimensionality of the feature space, there have been efforts to leverage single-cell pre-trained language models (PLMs) to solve the cell identification problem. While gene expression data can be derived from RNA-Seq or spatial transcriptomic techniques (spatially resolved gene expression data), most existing PLMs are designed for RNA-Seq data, limiting their applicability to spatial transcriptomics. In this work, we address this gap by introducing STPLM, a pre-trained language model for spatial transcriptomics that explicitly models cell-neighborhood interactions for cell identity understanding. To tackle challenges posed by label scarcity and tissue heterogeneity in spatial transcriptomic data, STPLM facilitates few-shot settings by leveraging additional gene marker datasets to construct a unified representation of cell-gene expressions and markers. Experiment results show that STPLM outperforms existing RNA-Seq-based PLMs by 1.42-fold, indicating promising directions for cell identification. Broadly, STPLM takes the first step in annotating spatial transcriptomic data, which is a critical first step in laying the foundation for deeper insights into tissue architecture and immune microenvironments, ultimately driving improvements in therapeutic strategies.

Dissecting tumor microenvironments at the single cell level using generative AI and spatial transcriptomics
Sachini Weerasekara, et.al.
ML technique: Deep supervised learning
Abstract: Spatial transcriptomics enables the in situ measurement of gene expressions across millions of tissue locations, though it involves a trade-off between transcriptome depth, spatial resolution, and sample size. Although integration of Hematoxylin and Eosin (HE) stains with spatial transcripts has enabled impactful work in this context, presently, there exist no approaches to annotate cells leveraging this information. Here, we propose a transformer encoder-based approach to annotate cells using in situ gene expression, HE staining, and cell neighborhood context. Next, we integrate these annotations into a four-way topological mining framework to dissect the tumor microenvironment. We showcase this on targeted and whole-transcriptome spatial platforms, improving cell classification and morphology identification for human lung and colon tumor tissues.

Context-aware Patient Trajectories for Predicting Adverse Event Onsets in Critical Care
Sachini Weerasekara, Aranya Bagchi, Sagar Kamarthi, Jacqueline Isaacs
ML technique: Deep generative modeling
Abstract: This paper presents context-aware patient trajectories (CAPT), a novel approach for modeling and predicting event sequences in the Intensive Care Unit (ICU). Prior research has focused on using sequential representations to learn the time-varying intensity of patient events from Electronic Health Records (EHR). However, these approaches suffer from i) structured event formulations that group events into broad, generalized categories, leading to imprecise, coarse-grained predictions, and ii) the use of fixed ontologies that fail to capture the complex, context-specific information underlying patient trajectories in the ICU. This work addresses these limitations by incorporating temporal contextual features into event sequence representations, capturing both general event structures and fine-grained contextual details. Furthermore, we introduce a Gaussian mixture prior distribution as an additional inductive bias to overcome data limitations. We validate the effectiveness of the proposed approach through extensive experiments on three real-world datasets, and results show that it achieves state-of-the-art performance in predicting patient trajectories in the ICU.

Learning for Disassembly Task Control
ML technique: Reinforcement learning
Controlling tasks in a disassembly line is challenging due to uncertainties related to end-of-life products. This work proposes a Deep Reinforcement Learning based control strategy for cost-efficient disassembly.

Hand Posture Recognition
ML technique: Classical machine learning
Posture recognition is vital in human-computer interaction, surveillance systems, self-driving cars, deaf and dumb communication, etc… This work builds four classical machine learning models to classify the hand postures of an individual.

Dengue Prediction
ML techniques: Classical machine learning
In recent years, dengue fever has been spreading. Historically, the disease has been most prevalent in Asia and the Pacific islands. These days many of the nearly half billion cases per year are occurring in Latin America. This study utilizes classical machine learning and neural networks to understand the relationship between climate and dengue dynamics. This understanding can improve research initiatives and resource allocation to help fight life-threatening pandemics.

Keyword Co-Occurrence Network (KCN) for Industry 4.0 for Asset Life Cycle Management (ALCM)
ML technique: NLP
ALCM strategies such as predictive maintenance are vital for economically and environmentally sustainable manufacturing. This work uses NLP to analyze the knowledge base on Industry 4.0 applications for ALCM.

Machine Learning Operations (MLOps) Pipeline for Pneumonia Onset Prediction
ML technique: MLOps
This project develops an end-to-end MLOps framework for pneumonia onset prediction in ICU patients. Implementations include MLFlow, DVC, Docker, git actions.

Reshoring with Remote Manufacturing
Offshoring manufacturing operations is questionable because of the rising labor costs in offshoring destinations, supply chain resilience concerns, and thus diminishing cost advantages. Nevertheless, returning operations to the U.S. imposes challenges like heavy capital expenditure, labor costs, and raw material shortages. This study formulates a system dynamics model to investigate the capability of a remote manufacturing workforce in supporting bringing manufacturing facilities back to the U.S.