Talks and presentations

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction

August 14, 2023

Talk, 2023 ACS Fall, San Francisco, California

Introduces Joint Multi-Domain Pre-training (JMP), a robust supervised pre-training approach which simultaneously trains on data from multiple chemical domains. JMP demonstrates state-of-the-art results on key small molecule, large molecule, and materials datasets and offers insights into the influence of pre-training strategies on fine-tuning.

SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering

April 30, 2021

Talk, CS 7290: Advanced Microarchitecture (Georgia Tech), Atlanta, Georgia

Introduces the Smart Quantization (SmaQ) technique for DNN training. SmaQ is a novel quantization which exploits the observed (normally distributed) value clustering in DNNs to quantize neural network weight, gradient, feature map, gradient map, and optimizer state values. SmaQ is able to reduce memory usage during training by up to 6.7x with no loss in accuracy.

Legal Text Summarization Using Transformer Models

November 23, 2020

Talk, CS 8803-DLT: Deep Learning for Text (Georgia Tech), Atlanta, Georgia

Develops a transformer-based encoder-decoder architecture for abstractive legal text summarization. Combines PEGASUS’ (from Zhang et al. 2020) pre-training objective with Longformer’s (from Beltagy et al. 2020) dilated attention mechanism to create a model that can handle extremely long input sequences to generate summaries of legal documents. Achieves state-of-the-art summarization performance on the BIGPATENT dataset.

Paper Review: Attention is All You Need

September 16, 2020

Talk, CS 8803-DLT: Deep Learning for Text (Georgia Tech), Atlanta, Georgia

Paper review of the Attention is All You Need paper by Vaswani et al, which introduces the Transformer, a sequence-to-sequence model by Vaswani et al. (2017) which uses the attention mechanism to learn dependencies between input and output tokens. Goes in-depth into the self-attention and multi-head attention mechanisms and discusses the advantages of the Transformer over recurrent neural networks.