Research

CXL-Enabled QoS-Aware Tiered Memory System

Research Assistant, Advisor: Prof. Mosharaf Chowdhury
Sep 2023-Now

Tiered memory systems have widely been adopted to provide larger memory capacity in response to increasing memory demands from memory-intensive workloads. Although increased memory capacity allows more applications to be deployed, existing solutions for tiered memory systems are not built with Quality-of-Service (QoS) support. As a result, they often cannot meet service-level objectives (SLOs) when multiple applications share a tiered memory system. Specifically, applications suffer from local memory contention and memory bandwidth interference, two sources of performance unpredictability unique to tiered memory systems. Indeed, we observe application performance drops by 43% and 70% during severe memory contention and interference. This paper presents Mercury, a QoS-aware tiered memory system that provides predictable performance for coexisting memory-intensive applications, each with different SLOs. Mercury enables per-tier page reclamation to enforce application-level resource management. It leverages a novel admission control and real-time adaptation algorithm to maximize local memory utilization while mitigating memory interference. Evaluations with real-world applications show that Mercury can provide QoS guarantees among multiple applications sharing a tiered memory system with up to 53.4% improvement in performance.

Co-simulation Framework for the Infrastructure Nexus

Research Assistant, Advisor: Prof. Ang Chen
Mar 2024-Now

Critical infrastructures like datacenters, power grids, and water systems are interdependent, forming complex “infrastructure nexuses” that require co-optimization for efficiency, resilience, and sustainability. We present OpenInfra, a co-simulation framework designed to model these interdependencies by integrating domain-specific simulators for datacenters, power grids, and cooling systems but focusing on stitching them together for end-to-end experimentation. OpenInfra enables seamless integration of diverse simulators and flexible configuration of infrastructure interactions. Our evaluation demonstrates its ability to simulate large-scale infrastructure dynamics, including 7,392 servers over 100+ hours.

CXL enabled Retrieval-Augmented Generation system

Research Assistant, Advisor: Prof. Mosharaf Chowdhury
Mar 2024-Now

Retrieval-Augmented Generation (RAG) is a popular technique for improving the reliability of Large Language Models (LLMs) by reducing hallucination. Implementing this effectively often requires repeatedly searching through large vector databases. At scale, these vector searches become a substantial computational bottleneck for RAG-enabled LLM inference. In this project we introduce Aether, a prototype system that enhances search efficiency across massive sharded datasets through scheduling optimizations and dynamic file management. Aether is a highly adaptable system that is designed to scale across multi-tier memory systems, such as CXL-enabled clusters. Through comprehensive evaluations across diverse workloads and under varying memory constraints, we demonstrate that Aether significantly outperforms baseline approaches. Leveraging asynchronous I/O, our scheduling optimizations achieve an average throughput improvement of 19.7% over their synchronous counterpart. Additionally, dynamic index file management using a novel LFU+ caching policy outperforms traditional LRU by 14.5% in serving throughput.

Application of MLM-WR Algorithm in Mobile Crowd Sensing Systems

Research Assistant, Advisor: Prof. Anfeng Liu (Central South University), Prof. Neal Xiong (Sul Ross State University)
Feb 2022-Aug 2023

Mobile Crowd Sensing (MCS) is a cloud-edge-terminal collaboration model that relies on edge terminal devices, or “workers,” to sense data and build applications for cloud-hosted platforms. However, to ensure high-quality application development, recruiting truthful workers in the edge network is crucial. With the emergence of Artificial Intelligence (AI), the Internet of Things (IoT) is entering a new era, known as Artificial Intelligence of Things (AIoT). This paper proposes an AI-enabled MCS system, which includes MLM-WR, a cloud-edge-terminal collaboration data collection scheme for AIoT. MLM-WR leverages swarm intelligence to match truthful workers with sensing tasks, enabling efficient and effective data collection for AIoT applications. The matching theory is applied from two perspectives: truthful workers discovery and sensing difference discovery. To identify truthful workers, we adjust their credibility based on the deviation of their sensing data with Ground Truth Data (GTD) obtained through collaboration with the Unmanned Aerial Vehicle (UAV). In the sensing difference discovery, we obtain workers’ sensing attribute reliability by calculating attribute data errors and incorporate absolute and relative sensing location preferences to determine workers’ sensing quality at different locations. Additionally, MLM-WR employs the Particle Swarm Optimization (PSO) algorithm to assign workers while considering sensing attribute and location reliability and recruitment cost, thus addressing the tradeoff between recruitment cost and data quality. The effectiveness of our approach is demonstrated through extensive evaluations, where MLM-WR outperformed the state-of-the-art approaches.

Lightweight Real-time Portrait Segmentation System

Team Leader (Research Assistant), Advisor: Prof. Yixiong Liang (Central South University)
Feb 2022-May 2022

In this research, I innovated a lightweight network structure based on BiSeNet and STDC, achieved a high frame rate (186 fps) on CPU and guaranteed the segmentation effect under high resolution up to 720p. Besides, I achieved an mIoU of 93.9% on Supervisely Person Dataset and deployed the model on the desktop (LibTorch, QT, C++) and Android (LibTorch, JNI, JAVA).