I'm a research staff member in the theory group at IBM Research Almaden. I joined IBM Almaden after completing a PhD in Computer Science at Princeton University in 2011. My advisor was Boaz Barak.

See selected publications.
Links: Full version (arXiv),
Abstract: Alternating Minimization is a widely used and empirically successful framework for Matrix Completion and related lowrank optimization problems. We give a new algorithm based on Alternating Minimization that provably recovers an unknown lowrank matrix from a random subsample of its entries under a standard incoherence assumption while achieving a linear convergence rate. Compared to previous work our results reduce the provable sample complexity requirements of the Alternating Minimization approach by at least a quartic factor in the rank and the condition number of the unknown matrix. These improvements apply when the matrix is exactly lowrank and when it is only close to lowrank in the Frobenius norm.
Underlying our work is a new robust convergence analysis of the wellknown Subspace Iteration algorithm for computing the dominant singular vectors of a matrix also known as the Power Method. This viewpoint leads to a conceptually simple understanding of Alternating Minimization that we exploit. Additionally, we contribute a new technique for controlling the coherence of intermediate solutions arising in iterative algorithms. These techniques may be of interest beyond their application here.
Links: Full version (arXiv)
Abstract: We provide a new robust convergence analysis of the wellknown subspace iteration algorithm for computing the dominant singular vectors of a matrix, also known as simultaneous iteration or power method. Our result characterizes the convergence behavior of the algorithm when a large amount noise is introduced after each matrixvector multiplication. While interesting in its own right, our main motivation comes from the problem of privacypreserving spectral analysis where noise is added in order to achieve the privacy guarantee known as differential privacy. Our contributions here are twofold:
We give nearly tight worstcase bounds for the problem of computing a differentially private lowrank approximation in the spectral norm. Our results extend to privacypreserving principal component analysis and apply to several variants of differential privacy that have been considered in the past. The running time of our algorithm is nearly linear in the input sparsity leading to strong improvements in running time over previous work while almost matching existing worstcase lower bounds on the amount of error that is required by any differentially private algorithm.
Complementing our worstcase bounds, we show that the error dependence of our algorithm on the matrix dimension can be replaced by an essentially tight dependence on the coherence of the matrix. This result resolves the main problem left open by previous results in this line of work. The coherence is always bounded by the matrix dimension but often substantially smaller thus leading to significant averagecase improvements over our worstcase bounds.
Links: Full version (PDF, arxiv) The arxiv paper is missing a plot. See PDF instead.
Abstract: Linear sketches are powerful algorithmic tools that turn an ndimensional input into a concise lowerdimensional representation via a linear transformation. Such sketches have seen a wide range of applications including norm estimation over data streams, compressed sensing, and distributed computing. In almost any realistic setting, however, a linear sketch faces the possibility that its inputs are correlated with previous evaluations of the sketch. Known techniques no longer guarantee the correctness of the output in the presence of such correlations. We therefore ask: Are linear sketches inherently nonrobust to adaptively chosen inputs? We give a strong affirmative answer to this question. Specifically, we show that no linear sketch approximates the Euclidean norm of its input to within an arbitrary multiplicative approximation factor on a polynomial number of adaptively chosen inputs. The result remains true even if the dimension of the sketch is d = n – o(n) and the sketch is given unbounded computation time. Our result is based on an algorithm with running time polynomial in d that adaptively finds a distribution over inputs on which the sketch is incorrect with constant probability. Our result implies several corollaries for related problems including lpnorm estimation and compressed sensing. Notably, we resolve an open problem in compressed sensing regarding the feasibility of l2/l2recovery guarantees in the presence of computationally bounded adversaries.
Links: Full version (arxiv)
Abstract: We consider differentially private approximate singular vector computation. Known worstcase lower bounds show that the error of any differentially private algorithm must scale polynomially with the dimension of the singular vector. We are able to replace this dependence on the dimension by a natural parameter known as the coherence of the matrix that is often observed to be significantly smaller than the dimension both theoretically and empirically. We also prove a matching lower bound showing that our guarantee is nearly optimal for every setting of the coherence parameter. Notably, we achieve our bounds by giving a robust analysis of the wellknown power iteration algorithm, which may be of independent interest. Our algorithm also leads to improvements in worstcase settings and to better lowrank approximations in the spectral norm.
Links: Full version (arxiv)
Abstract: We consider a fundamental problem in unsupervised learning: given a collection of m points in R^n, if many but not necessarily all of these points are contained in a ddimensional subspace T can we find it? The points contained in T are called inliers and the remaining points are outliers. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to adversarial outliers, and the estimators from robust statistics are hard to compute in high dimensions. This is a serious and persistent issue not just in this application, but for many other problems in unsupervised learning. Are there algorithms for linear regression that are both robust to outliers and efficient? We give an algorithm that finds T when it contains more than a d/n fraction of the points. Hence, for say d = n/2 this estimator is both easy to compute and wellbehaved when there are a constant fraction of outliers. We prove that it is small set expansion hard to find T when the fraction of errors is any larger and so our estimator is an optimal compromise between efficiency and robustness. In fact, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.
Links: PDF
Abstract: This short note is a response to Frank Pasquale's article "The Emperor's New Codes: Reputation and Search Algorithms in the Finance Sector" published as part of the conference Governing Algorithms, New York University, 2013. Pasquale describes how algorithms in the finance industry have failed to deliver on the promise of greater transparency, accountability, fairness, and efficiency. We discuss some of the steps that Computer Scientists have undertaken towards addressing these concerns. We focus on three problems: achieving fairness, designing algorithms that are robust to manipulation, and understanding the inherent complexity in financial products.
Abstract: Data is only as good as the similarity metric used to compare it. The all important notion of similarity allows us to leverage knowledge derived from prior observations to predict characteristics of new samples. In this paper we consider the problem of compiling a consistent and accurate view of similarity given its multiple incomplete and noisy approximations. We propose a new technique called Multiple Kernel Completion (MKC), which completes given similarity kernels as well as finds their best combination within a Support Vector Machine framework, so as to maximize the discrimination margin. We demonstrate the effectiveness of the proposed technique on datasets from UCI Machine Learning repository as well as for the task of heart valve disease discrimination using CW Doppler images. Our empirical results establish that MKC consistently outperforms existing data completion methods like 0imputation, meanimputation and matrix completion across datasets and training set sizes.
Links: Full version (PDF, arxiv)
Abstract: We present a new algorithm for differentially private data release, based on a simple combination of the Exponential Mechanism with the Multiplicative Weights update rule. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques.
Links: Full version (arXiv), Proceedings (ACM)
Abstract: Computing accurate low rank approximations of large matrices is a fundamental data mining task. In many applications however the matrix contains sensitive information about individuals. In such case we would like to release a low rank approximation that satisfies a strong privacy guarantee such as differential privacy. Unfortunately, to date the best known algorithm for this task that satisfies differential privacy is based on naive input perturbation or randomized response: Each entry of the matrix is perturbed independently by a sufficiently large random noise variable, a low rank approximation is then computed on the resulting matrix.
We give (the first) significant improvements in accuracy over randomized response under the natural and necessary assumption that the matrix has low coherence. Our algorithm is also very efficient and finds a constant rank approximation of an m x n matrix in time O(mn). Note that even generating the noise matrix required for randomized response already requires time O(mn).
Links: Full version (arXiv), Proceedings (SIAM), Short talk (PDF), Blog post at windowsontheory.org
Abstract: This work considers computationally efficient privacypreserving data release. We study the task of analyzing a database containing sensitive information about individual participants. Given a set of statistical queries on the data, we want to release approximate answers to the queries while also guaranteeing differential privacy–protecting each participant's sensitive data.
Our focus is on computationally efficient data release algorithms; we seek algorithms whose running time is polynomial, or at least subexponential, in the data dimensionality. Our primary contribution is a computationally efficient reduction from differentially private data release for a class of counting queries, to learning thresholded sums of predicates from a related class.
We instantiate this general reduction with a variety of algorithms for learning thresholds. These instantiations yield several new results for differentially private data release. As two examples, taking {0,1}^d to be the data domain (of dimension d), we obtain differentially private algorithms for:
Several other instantiations yield further results for privacypreserving data release. Of the two results highlighted above, the first learning algorithm uses techniques for representing thresholded sums of predicates as lowdegree polynomial threshold functions. The second learning algorithm is based on Jackson's Harmonic Sieve algorithm [Jackson 1997]. It utilizes Fourier analysis of the database viewed as a function mapping queries to answers.
Links: Full vesion (arxiv), Proceedings (ACM)
Abstract: We study fairness in classification, where individuals are classified, e.g., admitted to a university, and the goal is to prevent discrimination against individuals based on their membership in some group, while maintaining utility for the classifier (the university). The main conceptual contribution of this paper is a framework for fair classification comprising (1) a (hypothetical) taskspecific metric for determining the degree to which individuals are similar with respect to the classification task at hand; (2) an algorithm for maximizing utility subject to the fairness constraint, that similar individuals are treated similarly. We also present an adaptation of our approach to achieve the complementary goal of "fair affirmative action," which guarantees statistical parity (i.e., the demographics of the set of individuals receiving any classification are the same as the demographics of the underlying population), while treating similar individuals as similarly as possible. Finally, we discuss the relationship of fairness to privacy: when fairness implies privacy, and how tools developed in the context of differential privacy may be applied to fairness.
Links: Full version (PDF), Proceedings (ACM)
Abstract: We initiate a principled study of graph densification. Given a graph G the goal of graph densification is to come up with another graph H that has significantly more edges than G but nevertheless approximates G well with respect to some set of test functions. In this paper we focus on the case of cut and spectral approximations. As it turns out graph densification exhibits rich connections to a set of interesting and sometimes seemingly unrelated questions in graph theory and metric embeddings. In particular we show the following results:
Our results are mainly based on linear and semidefinite programs (and their duals) for computing the maximum weight densifier of a given graph. This also leads to efficient algorithms in the case of spectral densifiers and additive cut densifiers.
Links: PDF
Abstract: In this thesis we consider the challenges arising in the design of algorithms that interact with sensitive personal datasuch as medical records, online tracking data, or financial records.
One important goal is to protect the privacy of those individuals whose personal information contributed to the data set. We consider algorithms that satisfy the strong privacy guarantee known as differential privacy. A wide range of computational tasks reduces to the setting in which a trusted database curator responds to a number of statistical queries posed by an untrusted data analyst. The basic question is how accurately and efficiently the curator can release approximate answers to the given queries while satisfying differential privacy. We make the following main contributions to differentially private data analysis:
Not all problems arising in the presence of sensitive data are a matter of privacy. In the final part of this thesis, we isolate fairness in classification as a formidable concern and thus initiate its formal study. The goal of fairness is to prevent discrimination against protected subgroups of the population in a classification system. We argue that fairness cannot be achieved by blindness to the attribute we would like to protect. Our main conceptual contribution is in asserting that fairness is achieved when similar individuals are treated similarly. Based on the goal of treating similar individuals similarly, we formalize and show how to achieve fairness in classification, given a similarity metric. We also observe that our notion of fairness can be seen as a generalization of differential privacy.
Links: Full version (arXiv)
Abstract: Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better?
• We show that the number of statistical queries necessary and sufficient for this task isup to polynomial factorsequal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern.
• We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. In doing so we also give a new learning algorithm for submodular functions that improves upon recent results in a different context.
While interesting from a learning theoretic point of view, our main applications are in privacypreserving data analysis:
Here, our second result leads to the first algorithm that efficiently releases differentially private answers to all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacypreserving data analysis.
Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially nonprivacypreserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQmodel as a new barrier in the design of differentially private algorithms.
Links: Full version (ECCC), Talk (PDF)
Short abstract: We consider the following seemingly unrelated questions:
• Is the MaxCut problem hard on random geometric graphs of the type considered by Feige and Schechtman (2002)?
• Is the value of a mathematical relaxation for a constraintsatisfaction problem (CSP) preserved when one passes from an instance P to a random induced subformula of P?
It turns out that the answer to the first question is ``no'' and in fact this is intimately related to the second question. The answer to the second question is much more subtle, and, in contrast to the case of the objective value of the CSP, the answer strongly depends on the type of relaxation and CSP.
Links: Full version (PDF), Talk (PDF, Video)
Short abstract: We consider privacypreserving statistical data analysis with online queries. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacypreserving answers to queries as they arrive online. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of counting queries that arrive online and may be adaptively chosen. Our mechanism is the first to achieve worstcase accuracy guarantees (accuracy on every input database) for a large number of online queries together with a runtime that is polynomial in the data universe. The error is optimal in its dependence on the size of the database and depends only logarithmically on the number of queries being answered. The runtime is nearly linear in the size of the data universe.
Our main technical contributions are a new application of multiplicative weights techniques to the differential privacy setting, and a new privacy analysis for multiplicative weights algorithms.
Links: Full version (PDF, arXiv), Proceedings (DOI)
Short abstract: We give a connection between convex geometry and differentially private data analysis. Specifically, we study the noise complexity of differentially private mechanisms in the setting where the data is represented by a histogram and the user asks a number of linear queries nonadaptively. We show that the amount of noise necessary and sufficient to achieve differential privacy is determined by two geometric parameters of a convex body associated with the set of queries. We use this connection to give tight upper and lower bounds for random linear queries. Assuming the truth of a deep conjecture from convex geometry, known as the Hyperplane conjecture, we can extend our results to arbitrary linear queries giving nearly matching upper and lower bounds.
Short abstract: We show an essentially tight connection between the semidefinite programming relaxation of Unique Games and their behavior under parallel repetition. Our results generalize and help to explain Raz's refutation of the Strong Parallel Repetition Conjecture. Indeed, we show that a Unique Game is a counterexample to strong parallel repetition whenever its SDP value is significantly larger than its integral value.
See my DBLP and Google Scholar entry for more information.
Converted to PDF for compatability, some step effects missing. Powerpoint slides available on request.