About Me

I am a Masters student in Computer Science at Stanford University with Artificial Intelligence and Systems specialization and a graduate of Electrical Engineering and Computer Sciences at UC Berkeley.

Previously, I spent over two years as a Machine Learning Researcher at the PALLAS Group at Berkeley Artificial Intelligence Research Lab (BAIR), advised by Professor Kurt Keutzer.

Currently, I am working as a Machine Learning Engineer at IntuigenceAI, where I finetune synthetic Engineerin models and design and deploy multi-agent LLM systems for industrial applications.

My research focuses on efficient deep learning, particularly for Large Language Models. I am interested in KV cache quantization, speculative decoding, and building scalable AI agent systems. I have published at venues including ACL, NeurIPS. I enjoy working at the intersection of algorithms and systems, turning cutting-edge research into practical, high-impact tools.

Research Areas

Large Language Models

Research on efficient LLM inference, including KV cache quantization and speculative decoding techniques.

Efficient Deep Learning

Developing algorithms to compress large neural network models, focusing on reducing inference time and improving training efficiency.

AI Agents

Exploring AI Agents, focusing on the key components required to build them and the system level decisions that affect their performance.

Model Compression

Research on sparsity, quantization, and new training methods to enable models that can learn more efficiently.

Publications

Squeezed Attention: Accelerating Long Context Length LLM Inference

Coleman Hooper*, Sehoon Kim*, Hiva Mohammadzadeh, Monishwaran Maheswaran, June Paik, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

ACL 2025

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

NeurIPS 2024 (Poster)

SPEED: Speculative Pipelined Execution for Efficient Decoding

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao

ENLSP NeurIPS 2023 Workshop

Plume-induced delamination initiated at rift zones on Venus

Andrea Adams, Dave Stegman, Hiva Mohammadzadeh, Suzanne Smrekar, Paul Tackley

Journal of Geophysical Research: Planets

Experience

Work Experience

Dec 2024 - Present

Machine Learning Engineer and Data Scientist

IntuigenceAI (AI Agents for Industrial)

Designed and deployed multi-agent LLM systems leveraging domain-specific fine-tuned models. Built context-aware RAG pipelines and robust prompt engineering strategies for scalable agent performance. Deployed scalable inference workflows on Azure AI Studio with GPU orchestration and caching strategies.

May 2022 - Sep 2022

Modeling and Data Science Intern

Span.io (Series B Startup)

Designed and implemented Python software to solve Nonlinear Differential Equations, speeding up analytics by 75%. Simulated home appliance power consumption using Span Panel data to inform next product iteration.

Research Experience

Feb 2023 - Aug 2025

Machine Learning Researcher in NLP

PALLAS Group at UC Berkeley AI Research Lab (BAIR)

Building efficient LLM-based systems by contributing to Squeezed Attention, a technique to accelerate LLM inference. Collaborated on KVQuant enabling serving LLaMA-7B with 1M tokens on an A100 GPU. Built an architecture to accelerate generative LLM inference by 40% as co-author for SPEED (NeurIPS ENLSP 2023).

Aug 2022 - Feb 2023

Undergraduate NLP Researcher

Sky Computing Lab at UC Berkeley

Completed individual course of study with Prof. Joseph Gonzalez to design project for efficient language models. Fine and prompt tuned language models to build scientific article-focused chatbots.

Jun 2021 - Oct 2021

Undergraduate Researcher

Computational Infrastructure for Geodynamics, NSF, UCSD, NASA/JPL

Built and analyzed a model of Venus on supercomputers (Python, Fortran) and co-authored a paper supporting NASA's VERITAS mission, showing 80% faster plume-assisted tectonic subduction.

Education

Sep 2025 - Jun 2027

Masters of Science in Computer Science (Artificial Intelligence and Systems Specialization)

Stanford University
Aug 2021 - Dec 2023

Bachelor of Science: Electrical Engineering and Computer Sciences

University of California, Berkeley

Skills

Python
PyTorch
Machine Learning
C/C++
Java
SQL

Awards

Third Place at SCET's Annual Collider Cup XIII

Won for the TensorZipper Project - a novel AI model compression algorithm

December 2023

AnyScale's Sponsor Prize

Winner from Skydeck and Cal Hacks AI Hackathon

Summer 2023

Undergraduate Summer Fellowship

Two-time recipient from Sky Computing Lab at UC Berkeley

2022, 2023

Contact

Feel free to reach out for research and project collaborations, questions, or opportunities.