About Me

I am a Masters student in Computer Science at Stanford University with Artificial Intelligence and Systems specialization and a graduate of Electrical Engineering and Computer Sciences at UC Berkeley.

Previously, I spent over two years as a Machine Learning Researcher at the PALLAS Group at Berkeley Artificial Intelligence Research Lab (BAIR), advised by Professor Kurt Keutzer.

Currently, I am working as a Machine Learning Engineer at IntuigenceAI, where I finetune synthetic Engineerin models and design and deploy multi-agent LLM systems for industrial applications.

My research focuses on efficient deep learning, particularly for Large Language Models. I am interested in KV cache quantization, speculative decoding, and building scalable AI agent systems. I have published at venues including ACL, NeurIPS. I enjoy working at the intersection of algorithms and systems, turning cutting-edge research into practical, high-impact tools.

Research Areas

Large Language Models

Research on efficient LLM inference, including KV cache quantization and speculative decoding techniques.

Efficient Deep Learning

Developing algorithms to compress large neural network models, focusing on reducing inference time and improving training efficiency.

AI Agents

Exploring AI Agents, focusing on the key components required to build them and the system level decisions that affect their performance.

Model Compression

Research on sparsity, quantization, and new training methods to enable models that can learn more efficiently.

Publications

Squeezed Attention: Accelerating Long Context Length LLM Inference

Coleman Hooper*, Sehoon Kim*, Hiva Mohammadzadeh, Monishwaran Maheswaran, June Paik, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

ACL 2025

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

NeurIPS 2024 (Poster)

SPEED: Speculative Pipelined Execution for Efficient Decoding

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao

ENLSP NeurIPS 2023 Workshop

Plume-induced delamination initiated at rift zones on Venus

Andrea Adams, Dave Stegman, Hiva Mohammadzadeh, Suzanne Smrekar, Paul Tackley

Journal of Geophysical Research: Planets

Experience

Work Experience

Dec 2024 - Present

Machine Learning Engineer and Data Scientist

IntuigenceAI (AI Agents for Industrial)

Designed and deployed multi-agent LLM systems leveraging domain-specific fine-tuned models. Built context-aware RAG pipelines and robust prompt engineering strategies for scalable agent performance. Deployed scalable inference workflows on Azure AI Studio with GPU orchestration and caching strategies.

May 2022 - Sep 2022

Modeling and Data Science Intern

Span.io (Series B Startup)

Designed and implemented Python software to solve Nonlinear Differential Equations, speeding up analytics by 75%. Simulated home appliance power consumption using Span Panel data to inform next product iteration.

Research Experience

Feb 2023 - Aug 2025

Machine Learning Researcher in NLP

PALLAS Group at UC Berkeley AI Research Lab (BAIR)

Building efficient LLM-based systems by contributing to Squeezed Attention, a technique to accelerate LLM inference. Collaborated on KVQuant enabling serving LLaMA-7B with 1M tokens on an A100 GPU. Built an architecture to accelerate generative LLM inference by 40% as co-author for SPEED (NeurIPS ENLSP 2023).

Aug 2022 - Feb 2023

Undergraduate NLP Researcher

Sky Computing Lab at UC Berkeley

Completed individual course of study with Prof. Joseph Gonzalez to design project for efficient language models. Fine and prompt tuned language models to build scientific article-focused chatbots.

Jun 2021 - Oct 2021

Undergraduate Researcher

Computational Infrastructure for Geodynamics, NSF, UCSD, NASA/JPL

Built and analyzed a model of Venus on supercomputers (Python, Fortran) and co-authored a paper supporting NASA's VERITAS mission, showing 80% faster plume-assisted tectonic subduction.

Education

Sep 2025 - Jun 2027

Masters of Science in Computer Science (Artificial Intelligence and Systems Specialization)

Stanford University
Aug 2021 - Dec 2023

Bachelor of Science: Electrical Engineering and Computer Sciences

University of California, Berkeley

Skills

Python
PyTorch
Machine Learning
C/C++
Java
SQL

Awards

Third Place at SCET's Annual Collider Cup XIII

Won for the TensorZipper Project - a novel AI model compression algorithm

December 2023

AnyScale's Sponsor Prize

Winner from Skydeck and Cal Hacks AI Hackathon

Summer 2023

Undergraduate Summer Fellowship

Two-time recipient from Sky Computing Lab at UC Berkeley

2022, 2023

Projects

Neural Verifier for Structured Table Extraction

Framework to evaluate structural correctness of extracted tables from documents. Uses neural scoring to detect malformed or incomplete tables beyond rule-based approaches.

Stanford · Sep 2025 - Dec 2025

Self-Evolving LLM Agent with Long-Term Reasoning Memory

Continual-learning conversational AI agent with persistent reasoning memory. Integrates ReasoningBank-style memory for retrieval, tool use, and self-reflection without retraining.

Stanford · Sep 2025 - Dec 2025

DIQL: Distance-Sensitive Q-Learning

Novel offline reinforcement learning algorithm that adjusts conservatism based on dataset point proximity, extending Implicit Q-Learning with distance-sensitive adjustments.

UC Berkeley · Aug 2023 - Dec 2023

Exploration of Transformer Model Layer Interaction and Optimal Merging Strategies

Examines merging transformer models by mixing layers using model averaging and Fisher-weighted averaging. Introduces a Domain Specific Language for weaving layer configurations.

UC Berkeley · Aug 2023 - Dec 2023

Implementing Policy Gradients in Aiding Transformer Interpretability

Extends transformer interpretability research by combining reinforcement learning (PPO, policy gradients) with interpretability techniques for transformer adjustment and analysis.

UC Berkeley · Mar 2023 - May 2023

Survey of LLM Architectures, Benchmarks, and Techniques

Comprehensive survey of BERT, GPT-2, T5, LLaMA, activation functions, normalization techniques, GLUE/SuperGLUE benchmarks, and prompt tuning. Expanded into a 10-part blog series.

UC Berkeley CS199 · 2023

NumC

High-performance numerical computing library -- a simplified NumPy optimized with RISC-V assembly and C for maximum speed.

UC Berkeley CS61C · Apr 2022 - May 2022

Gitlet

A version-control system mimicking core Git features -- commits, branching, merging, and remote operations implemented from scratch in Java.

UC Berkeley CS61B · Apr 2022

Ataxx

Two-player strategy board game on a 7x7 board with an AI opponent achieving 80% win rate, complete with GUI visualization.

UC Berkeley CS61B · Mar 2022 - Apr 2022

Enigma

Software representation of the WWII German Enigma encryption machine with configurable rotors and plugboard.

UC Berkeley CS61B · Feb 2022 - Mar 2022

Scheme Interpreter

Built an interpreter for a subset of the Scheme programming language, implementing parsing, evaluation, and tail-call optimization in Python.

UC Berkeley CS61A · Nov 2021 - Dec 2021

Interactions between Subduction and Mantle Plumes in Venus

Geodynamics model examining plume-lithosphere interactions on Venus, run on STAMPEDE supercomputer. Published in Journal of Geophysical Research: Planets (Oct 2023).

UCSD / NASA JPL · Jun 2021 - Oct 2021

Contact

Feel free to reach out for research and project collaborations, questions, or opportunities.