Publications

Conference Papers


FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?

Published in Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way out of contributing to federated training. In an effort to make free-riding-averse federated mechanisms truthful, and consequently less prone to breaking down in practice, we propose FACT. FACT is the first federated mechanism that: (1) eliminates federated free riding by using a penalty system, (2) ensures agents provide truthful information by creating a competitive environment, and (3) encourages agent participation by offering better performance than training alone. Empirically, FACT avoids free-riding when agents are untruthful, and reduces agent loss by over 4x.

View Paper

Large-Scale Distributed Learning via Private On-Device LSH

Published in Thirty-Seventh Annual Conference on Neural Information Processing Systems (NeurIPS), 2023

Locality-sensitive hashing (LSH) based frameworks have been used efficiently to select weight vectors in a dense hidden layer with high cosine similarity to an input, enabling dynamic pruning. While this type of scheme has been shown to improve computational training efficiency, existing algorithms require repeated randomized projection of the full layer weight, which is impractical for computational- and memory-constrained devices. In a distributed setting, deferring LSH analysis to a centralized host is (i) slow if the device cluster is large and (ii) requires access to input data which is forbidden in a federated context. Using a new family of hash functions, we develop one of the first private, personalized, and memory-efficient on-device LSH frameworks. Our framework enables privacy and personalization by allowing each device to generate hash tables, without the help of a central host, using device-specific hashing hyper-parameters (e.g. number of hash tables or hash length). Hash tables are generated with a compressed set of the full weights, and can be serially generated and discarded if the process is memory-intensive. This allows devices to avoid maintaining (i) the fully-sized model and (ii) large amounts of hash tables in local memory for LSH analysis. We prove several statistical and sensitivity properties of our hash functions and experimentally demonstrate that our framework is competitive in training large-scale recommender networks compared to other LSH frameworks which assume unrestricted on-device capacity.

View Paper

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Published in The Eleventh International Conference on Learning Representations (ICLR), 2023

The decentralized Federated Learning (FL) setting avoids the role of a poten- tially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client mod- els where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that al- lows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard iteration convergence rate O(1/√T ) of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations T). Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same iteration convergence rate with respect to T as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to run-time due to its wait-free structure. Our experimental results demonstrate that SWIFT’s run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms.

View Paper

Optimal nanoparticles for heat absorption and cost

Published in International Journal of Heat and Mass Transfer, 2019

As the demand for solar energy increases, so too does the demand for efficient absorption of solar energy and solar energy cost reduction. Prior research has shown that nanofluids improve the efficiency of solar thermal collector absorption. Nanofluids are a composition that consists of varying-sized nanoparticles within a base fluid. This research introduces linear and non-linear programs that are formulated and solved to determine two optimally efficient nanoparticle combinations. The first optimal combination maximizes the absorption of heat within a heterogeneous nanofluid while the second minimizes the cost of another heterogeneous nanofluid. The nanoparticles analyzed for optimal efficiencies are silver, aluminum, gold, cobalt, copper, chromium, iron, manganese, nickel, palladium, titanium, vanadium, and graphite, and all range in size from 10 nm to 150 nm. All of these nanoparticles form the database of particles which is optimized, however multiple subsets of the database are also optimized in this research. The optimal combinations of these nanoparticles are determined numerically for varying particle concentration, container height, and temperature.

View Paper

Preprint


Auction-Based Regulation for Artificial Intelligence

Published in arXiv, 2024

In an era of “moving fast and breaking things”, regulators have moved slowly to pick up the safety, bias, and legal pieces left in the wake of broken Artificial Intelligence (AI) deployment. Since AI models, such as large language models, are able to push misinformation and stoke division within our society, it is imperative for regulators to employ a framework that mitigates these dangers and ensures user safety. While there is much-warranted discussion about how to address the safety, bias, and legal woes of state-of-the-art AI models, the number of rigorous and realistic mathematical frameworks to regulate AI safety is lacking. We take on this challenge, proposing an auction-based regulatory mechanism that provably incentivizes model-building agents (i) to deploy safer models and (ii) to participate in the regulation process. We provably guarantee, via derived Nash Equilibria, that each participating agent’s best strategy is to submit a model safer than a prescribed minimum-safety threshold. Empirical results show that our regulatory auction boosts safety and participation rates by 20% and 15% respectively, outperforming simple regulatory frameworks that merely enforce minimum safety standards. Code can be found on GitHub at https://github.com/marcobo rnstein/AI-Regulatory-Auctions.

View Paper

Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution

Published in arXiv, 2024

Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we propose RealFM: the first federated mechanism that (1) realistically models device utility, (2) incentivizes data contribution and device participation, (3) provably removes the free-rider dilemma, and (4) relaxes assumptions on data homogeneity and data sharing. Compared to previous FL mechanisms, RealFM allows for a non-linear relationship between model accuracy and utility, which improves the utility gained by the server and participating devices. On real-world data, RealFM improves device and server utility, as well as data contribution, by over 3 and 4 magnitudes respectively compared to baselines.

View Paper

Comfetch: Federated Learning of Large Networks on Constrained Clients via Sketching

Published in arXiv, 2023

Federated learning (FL) is a popular paradigm for private and collaborative model training on the edge. In centralized FL, the parameters of a global architecture (such as a deep neural network) are maintained and distributed by a central server/controller to clients who transmit model updates (gradients) back to the server based on local optimization. While many efforts have focused on reducing the communication complexity of gradient transmission, the vast majority of compression-based algorithms assume that each participating client is able to download and train the current and full set of parameters, which may not be a practical assumption depending on the resource constraints of smaller clients such as mobile devices. In this work, we propose a simple yet effective novel algorithm, Comfetch, which allows clients to train large networks using reduced representations of the global architecture via the count sketch, which reduces local computational and memory costs along with bi-directional communication complexity. We provide a nonconvex convergence guarantee and experimentally demonstrate that it is possible to learn large models, such as a deep convolutional network, through federated training on their sketched counterparts. The resulting global models exhibit competitive test accuracy over CIFAR10/100 classification when compared against un-compressed model training.

View Paper

Escaping From Saddle Points Using Asynchronous Coordinate Gradient Descent

Published in arXiv, 2022

Large-scale non-convex optimization problems are expensive to solve due to computational and memory costs. To reduce the costs, first-order (computationally efficient) and asynchronous-parallel (memory efficient) algorithms are necessary to minimize non-convex functions in machine learning. However, asynchronous-first-order methods applied within non-convex settings run into two difficulties: (i) parallelization delays, which affect convergence by disrupting the monotonicity of first-order methods, and (ii) sub-optimal saddle points where the gradient is zero. To solve these two difficulties, we propose an asynchronous-coordinate-gradient-descent algorithm shown to converge to local minima with a bounded delay. Our algorithm overcomes parallelization-delay issues by using a carefully constructed Hamiltonian function. We prove that our designed kinetic-energy term, incorporated within the Hamiltonian, allows our algorithm to decrease monotonically per iteration. Next, our algorithm steers iterates clear of saddle points by utilizing a perturbation sub-routine. Similar to other state-of-the-art (SOTA) algorithms, we achieve a poly-logarithmic convergence rate with respect to dimension. Unlike other SOTA algorithms, which are synchronous, our work is the first to study how parallelization delays affect the convergence rate of asynchronous first-order algorithms. We prove that our algorithm outperforms synchronous counterparts under large parallelization delays, with convergence depending sublinearly with respect to delays. To our knowledge, this is the first local optima convergence result of a first-order asynchronous algorithm for non-convex settings.

View Paper