SPAM/PHISHING Warning to Emory Students:
Someone is sending emails to students from a Gmail account pretending to be me and offering them research opportunities. Please note that you will only receive legitimate email correspondence from me via my official Emory email account. If you receive external messages from other addresses pretending to be me: please DO NOT reply the message or click on any links in it; instead, please flag the message as "spam" in your inbox and follow Emory's "Spam and Phishing" reporting steps mentioned here.
Designing datacenters that are reliable, energy-efficient, and deliver high performance and high utilization is a nontrivial problem. I tackle this problem by implementing novel data-driven solutions that take advantage of the wealth of data generated in these systems to improve the way we run them.
I am particularly interested in applying machine-learning methods to extract insights from data collected at various layers in the datacenter: from server-level hardware and software to cluster-wide managers and facilitiy monitors. Based on these insights, I design predictive models that help us anticipate the future behavior of these systems and proactively make decisions, e.g. on how to optimize fault-tolerance policies or how to divide shared resources. This is a list of my publications so far, and below is a summary of current and past research projects.
When you add an item to your Amazon online cart (or proceed to checkout, etc.), one or more database servers will process your request. The amount of requests processed by these servers in online companies varies widely during the day: the difference between daytime and night traffic can be an order of magnitude.
Instead of dedicating a fixed number of servers for incoming requests (thus risking wasting resources), why don't we automatically turn on only as many servers as needed? This is called database elasticity. In this project, we used real-world data from a large online shopping company in South America to derive predictive models that help us proactively allocate servers ahead of time, based on the expected demand. Our evaluation shows that this approach outperforms state-of-the art techniques for database provisioning while requiring almost 50% less hardware (i.e. servers). [Check our SIGMOD'18 paper; Appendix C describes the dataset in more detail]
How can we use machine-learning methods (e.g. clustering) to improve the way programs share hardware resources in datacenter servers? We designed and implemented a novel hierarchical clusteirng algorithm that uses cache-profiling data to divide the cache between co-running programs in a multicore server. Our approach can improve server utilization by an average of 25% over the default configuration!
- Read our paper here.
- Visit the project website here.
- Download KPart (open-sourced on Github) here.
Failures in datacenters often lead to significant waste of resources and user frustration. I am interested in understanding and predicting different types of datacenter failures. I use real-world data to analyze failure correlations and root-causes; then, based on the derived observations, I design predictive models that forecast future events in these systems and take proactive actions accordingly.
- Read more details here.
- See my publications list here.