Research

Research Interests:
Data-Driven Datacenters

SPAM/PHISHING Warning to Emory Students: Someone is sending emails to students from a Gmail account pretending to be me and offering them research opportunities. Please note that you will only receive legitimate email correspondence from me via my official Emory email account. If you receive external messages from other addresses pretending to be me: please DO NOT reply the message or click on any links in it; instead, please flag the message as "spam" in your inbox and follow Emory's "Spam and Phishing" reporting steps mentioned here.

Designing datacenters that are reliable, energy-efficient, and deliver high performance and high utilization is a nontrivial problem. I tackle this problem by implementing novel data-driven solutions that take advantage of the wealth of data generated in these systems to improve the way we run them.

I am particularly interested in applying machine-learning methods to extract insights from data collected at various layers in the datacenter: from server-level hardware and software to cluster-wide managers and facilitiy monitors. Based on these insights, I design predictive models that help us anticipate the future behavior of these systems and proactively make decisions, e.g. on how to optimize fault-tolerance policies or how to divide shared resources. This is a list of my publications so far, and below is a summary of current and past research projects.

How can predictive modeling help us auto-scale servers in online transaction processing systems (e.g. online shopping companies)?

When you add an item to your Amazon online cart (or proceed to checkout, etc.), one or more database servers will process your request. The amount of requests processed by these servers in online companies varies widely during the day: the difference between daytime and night traffic can be an order of magnitude.

Instead of dedicating a fixed number of servers for incoming requests (thus risking wasting resources), why don't we automatically turn on only as many servers as needed? This is called database elasticity. In this project, we used real-world data from a large online shopping company in South America to derive predictive models that help us proactively allocate servers ahead of time, based on the expected demand. Our evaluation shows that this approach outperforms state-of-the art techniques for database provisioning while requiring almost 50% less hardware (i.e. servers). [Check our SIGMOD'18 paper; Appendix C describes the dataset in more detail]

I am interested in further analyzing this rich dataset and applying machine-learning methods to uncover insights on user and system behavior in online e-commerce companies. If you are interested in working on this research project, please submit this online form.

Past Research Projects

Research Interests:
Data-Driven Datacenters

KPart: A novel technique for partitioning shared caches

Data-driven analysis & prediction of datacenter failures

Research Interests: Data-Driven Datacenters

KPart: A novel technique for partitioning shared caches

Data-driven analysis & prediction of datacenter failures

Research Interests:
Data-Driven Datacenters