Ph.D. Project in Computer Science and Artificial Intelligence| IIT Delhi - Abu Dhabi

Performance and semantics of stream processing

Computer Science and Artificial Intelligence

Supervisors

Prof. Abhilash Jindal
Prof. Kumar Madhukar (IIT Delhi)

Project Description

Stream processing engines (SPEs) are a cornerstone of modern big data infrastructure, powering the real-time analytics that turn continuous streams of inputs (credit card transactions, social media activity, sensor readings) into actionable insights (fraud detection, trending topic, predictive maintenance). Unlike traditional computations, SPEs run indefinitely, which means they must continuously adapt to fluctuations in input rate, network disruptions, server faults, and available hardware.

Our recent work has established clearer formal semantics for stream processing, opening up a rigorous foundation for reasoning about SPE behavior. This project builds directly on that foundation across several fronts. On the performance side, we aim to leverage these semantics to improve SPE behavior during periodic checkpointing, in the presence of slow servers (stragglers), and during recovery from server failures. On the expressiveness side, we aim to extend the semantics to handle late data, retractions, and other practically important phenomena. Finally, we aim to explore how stream processing pipelines that incorporate machine learning models can automatically detect and respond to distribution drifts in input data.