The primary visual cortex (V1) is commonly held as a exemplar for general neural behavior with the additional benefit of reacting to well characterized stimuli. We calculate the abstract simplicial complex of spikes in V1 neurons for different stimuli and study the homology of the resulting surfaces to classify neurons by their stimuli response. The strength of this technique arises from the lack of a dependence on a coordinate plane; datasets collected from different animals, using different stimuli, or even from different areas of the brain can be readily compared.
In addition to classification, we are using topology to differentiate neurons that are thought to exhibit feed-back as well as feed-forward information transmission.
Most systems treat each data request as an independent event. However, such
requests in a computer system are driven by programs and user behaviour, and
are therefore far from random.
My thesis research focused on techniques for identifying working sets in
dynamic workloads using minimal trace data. We showed that with just block
I/O trace data, which can be collected non-intrusively through analyzing the
storage bus, we can detect groups of co-accessed data in real time.
We studied multiple
aspects of predicting data access behavior, and the identification and
exploitation of such behavior to produce: predictive caches, improved access
predictors, informed data layout, and the automated grouping of related data.
Group
identification can be used to proactively move data to reduce power
consumption, improve deduplication performance, and isolate faults. Systems
based on our technique have been implemented at Pure Storage and IBM.
Storing data for the long term is a complex balance of economics, curation, and monitoring. There are two components to this project. Most organizations consider static traces easy to collect because of their relatively high anonymizability and low performance impact for collection. We are studying how to characterize different classes of workloads using static trace data to optimize migration timing as well as discover enough workload characteristics to optimize the type of storage to build or grouping to do.
The other aspect to this project is ensuring that we can store data over long periods of time and not run out of money to migrate or otherwise maintain the data. We explore economic models to calculate the optimal endowment given assumptions about reliability requirements and the rate of decrease of the storage media cost per byte. These are both active projects with colleagues at UC Santa Cruz and Seagate.
A 10 PB system with an annual loss rate of 0.001% can still expect to lose a terabyte of data every decade. Loss of data is stochastic and tends to occur on a whole-device level. Yet, as devices get larger most individual files remain fairly small, causing each failure event to impact more files. Through isolating faults within working sets, adding large stripe parity, and designing erasure codes for flash devices, we are working towards understanding and improving storage reliability and availability in multi-user, multi-applications systems to reduce the effect of failures on working sets of files or blocks and thus improve net productivity.
I am currently researching the intersection of storage modeling and network modeling within the brain, particularly redundancy and error correction for fault tolerance. I am also involved in a project to translate computational neuroscience methods to analyze motor neuron and glial interactions in freely moving mice with the Nimmerjahn lab at the Waitt Advanced Biophotonics Center at Salk.
All publications