About Me
I am a Masters student in Computer Science at Stanford University with Artificial Intelligence and Systems specialization and a graduate of Electrical Engineering and Computer Sciences at UC Berkeley.
Previously, I spent over two years as a Machine Learning Researcher at the PALLAS Group at Berkeley Artificial Intelligence Research Lab (BAIR), advised by Professor Kurt Keutzer.
Currently, I am working as a Machine Learning Engineer at IntuigenceAI, where I finetune synthetic Engineerin models and design and deploy multi-agent LLM systems for industrial applications.
My research focuses on efficient deep learning, particularly for Large Language Models. I am interested in KV cache quantization, speculative decoding, and building scalable AI agent systems. I have published at venues including ACL, NeurIPS. I enjoy working at the intersection of algorithms and systems, turning cutting-edge research into practical, high-impact tools.
Research Areas
Large Language Models
Research on efficient LLM inference, including KV cache quantization and speculative decoding techniques.
Efficient Deep Learning
Developing algorithms to compress large neural network models, focusing on reducing inference time and improving training efficiency.
AI Agents
Exploring AI Agents, focusing on the key components required to build them and the system level decisions that affect their performance.
Model Compression
Research on sparsity, quantization, and new training methods to enable models that can learn more efficiently.
Publications
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
NeurIPS 2024 (Poster)SPEED: Speculative Pipelined Execution for Efficient Decoding
ENLSP NeurIPS 2023 WorkshopPlume-induced delamination initiated at rift zones on Venus
Journal of Geophysical Research: PlanetsExperience
Work Experience
Machine Learning Engineer and Data Scientist
IntuigenceAI (AI Agents for Industrial)Designed and deployed multi-agent LLM systems leveraging domain-specific fine-tuned models. Built context-aware RAG pipelines and robust prompt engineering strategies for scalable agent performance. Deployed scalable inference workflows on Azure AI Studio with GPU orchestration and caching strategies.
Modeling and Data Science Intern
Span.io (Series B Startup)Designed and implemented Python software to solve Nonlinear Differential Equations, speeding up analytics by 75%. Simulated home appliance power consumption using Span Panel data to inform next product iteration.
Research Experience
Machine Learning Researcher in NLP
PALLAS Group at UC Berkeley AI Research Lab (BAIR)Building efficient LLM-based systems by contributing to Squeezed Attention, a technique to accelerate LLM inference. Collaborated on KVQuant enabling serving LLaMA-7B with 1M tokens on an A100 GPU. Built an architecture to accelerate generative LLM inference by 40% as co-author for SPEED (NeurIPS ENLSP 2023).
Undergraduate NLP Researcher
Sky Computing Lab at UC BerkeleyCompleted individual course of study with Prof. Joseph Gonzalez to design project for efficient language models. Fine and prompt tuned language models to build scientific article-focused chatbots.
Undergraduate Researcher
Computational Infrastructure for Geodynamics, NSF, UCSD, NASA/JPLBuilt and analyzed a model of Venus on supercomputers (Python, Fortran) and co-authored a paper supporting NASA's VERITAS mission, showing 80% faster plume-assisted tectonic subduction.
Education
Masters of Science in Computer Science (Artificial Intelligence and Systems Specialization)
Stanford UniversityBachelor of Science: Electrical Engineering and Computer Sciences
University of California, BerkeleySkills
Awards
Third Place at SCET's Annual Collider Cup XIII
Won for the TensorZipper Project - a novel AI model compression algorithm
December 2023AnyScale's Sponsor Prize
Winner from Skydeck and Cal Hacks AI Hackathon
Summer 2023Undergraduate Summer Fellowship
Two-time recipient from Sky Computing Lab at UC Berkeley
2022, 2023Contact
Feel free to reach out for research and project collaborations, questions, or opportunities.