Arjun Ashok

Arjun Ashok

I am a Visiting Researcher (Full-Time) at ServiceNow Research, Montreal and a PhD student at MILA-Quebec AI Institute and CERC-AAI, Université de Montréal advised by Irina Rish and Alexandre Drouin. At ServiceNow, I also work closely with Étienne Marcotte, Valentina Zantedeschi and Nicolas Chapados. My current research interests are in time series forecasting and decision-making, with a focus on designing scalable general-purpose models for time series prediction tasks (forecasting, interpolation, imputation etc).

My email address is arjun.ashok [at] servicenow [dot] com.



Descriptive Alt Text

News

Feb '24 The full version of Lag-Llama released with open-source model checkpoints! Check the announcement here!
Jan '24 I gave a talk on our efforts Towards General-Purpose Models for Time-Series Prediction at the Winter 2024 Montreal Time Series Meetup.
Jan '24 TACTiS-2 accepted at ICLR 2024!
Dec '23 I gave a talk on Building Foundation Models for Time Series Data at the 6th workshop on Neural Scaling Laws co-located with NeurIPS 2023.
Oct '23 TACTiS-2 is out on arXiv.
Oct '23 A preliminary version of Lag-Llama is out on arXiv.
Jan '23 One paper on out-of-distribution detection accepted to ICLR 2023. This is work in collaboration with folks at ML Collective mentored by Rosanne Liu.
Jan '23 Started as a Visiting Researcher (Full-Time) at ServiceNow Research, Montreal. Excited to continue working on problems in time series representation learning!
Aug '22 Preliminary work on self-supervised learning objectives for weather time series accepted at the AAAI 2022 Fall Symposium on Climate Change.
Jul '22 One paper on Class-Incremental Learning accepted as a full paper at ECCV 2022.
Jun '22 Started as a Research Intern at IBM Research, India. I'll be working on building self-supervised learning objectives and pre-trained models for geospatial weather time series.
Jun '22 One paper on cross-task generalization in NLP submitted to EMNLP 2022 (Update: Accepted).
Apr '22 One paper on Class-Incremental Learning accepted at the CLVISION Workshop at CVPR 2022 as a non-archival paper (Update: Accepted at ECCV 2022).
Apr '22 One reproducibility report on Self-Supervision and Few-shot Learning accepted at the ML Reproducibility Challenge 2021 (Fall Edition) and published at ReScience-C.
Oct '21 One paper on out-of-distribution generalization accepted as AAAI 2022 as a student abstract.
Jun '21 Started as a Research Assistant at IIT Hyderabad under Prof. Vineeth Balasubramanian.

Latest Papers

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series
Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, Alexandre Drouin
Accepted at ICLR 2024

arXiv Code OpenReview

A flexible model for multivariate probabilistic time series prediction, simplifying the training of attentional copulas, with state-of-the-art accuracy on diverse forecasting tasks, while supporting interpolation and learning from irregular data.
We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including forecasting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series.
Lag-Llama: Towards Foundation Models for Time Series Forecasting
Kashif Rasul*, Arjun Ashok*, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, Irina Rish
(* Co-first authorship, equal contribution, order arbitrary)
Preprint. Preliminary work presented at NeurIPS 2023 Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models

Paper Code Weights Demo Tweet

A foundation model for probabilistic time series forecasting with strong zero-shot and few-shot capabilities
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.

Previous Work

I previously worked on problems in out-of-distribution generalization, continual learning, and few-shot learning, spanning the domains of computer vision and natural language processing.
Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
Arjun Ashok, K J Joseph, Vineeth Balasubramanian
Accepted at ECCV 2022

Paper arXiv Project Page Code

We propose two distillation-based objectives for class incremental learning that leverage the structure of the feature space to maintain accuracy on previous classes, as well as enable learning the new classes
In class-incremental learning, the model is expected to learn new classes continually while maintaining knowledge on previous classes. The challenge here lies in preserving the model's ability to effectively represent prior classes in the feature space, while adapting it to represent incoming new classes. We propose two distillation-based objectives for class incremental learning that leverage the structure of the feature space to maintain accuracy on previous classes, as well as enable learning the new classes. In our first objective, termed cross-space clustering (CSC), we propose to use the feature space structure of the previous model to characterize directions of optimization that maximally preserve the class - directions that all instances of a specific class should collectively optimize towards, and those that they should collectively optimize away from. Apart from minimizing forgetting, this indirectly encourages the model to cluster all instances of a class in the current feature space, and gives rise to a sense of herd-immunity, allowing all samples of a class to jointly combat the model from forgetting the class. Our second objective termed controlled transfer (CT) tackles incremental learning from an understudied perspective of inter-class transfer. CT explicitly approximates and conditions the current model on the semantic similarities between incrementally arriving classes and prior classes. This allows the model to learn classes in such a way that it maximizes positive forward transfer from similar prior classes, thus increasing plasticity, and minimizes negative backward transfer on dissimilar prior classes, whereby strengthening stability. We perform extensive experiments on two benchmark datasets, adding our method (CSCCT) on top of three prominent class-incremental learning methods. We observe consistent performance improvement on a variety of experimental settings.
Extremely Simple Activation Shaping for Out-of-Distribution Detection
Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, Rosanne Liu
Accepted at ICLR 2023

arXiv Project Page Code

We develop an extremely simple, post hoc, on-the-fly, and plug-and-play activation shaping method for out-of-distribution detection.
The separation between training and deployment of machine learning models implies that not all scenarios encountered in deployment can be anticipated during training, and therefore relying solely on advancements in training has its limits. Out-of-distribution (OOD) detection is an important area that stress-tests a model's ability to handle unseen situations: Do models know when they don't know? Existing OOD detection methods either incur extra training steps, additional data or make nontrivial modifications to the trained network. In contrast, in this work, we propose an extremely simple, post-hoc, on-the-fly activation shaping method, ASH, where a large portion (e.g. 90%) of a sample's activation at a late layer is removed, and the rest (e.g. 10%) simplified or lightly adjusted. The shaping is applied at inference time, and does not require any statistics calculated from training data. Experiments show that such a simple treatment enhances in-distribution and out-of-distribution sample distinction so as to allow state-of-the-art OOD detection on ImageNet, and does not noticeably deteriorate the in-distribution accuracy. We release alongside the paper two calls for explanation and validation, believing the collective power to further validate and understand the discovery.
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, ..., Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi (40 authors)
Accepted at EMNLP 2022

arXiv Dataset Code Project Page

We introduce a benchmark of 1,600+ diverse language tasks and their expert-written instructions, and rigorously benchmark cross-task/unseen-task generalization of models. We introduce Tk-Instruct, an encoder-decoder Transformer that is trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples) which outperforms existing larger models on our benchmark.
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones. Furthermore, we build Tk-Instruct, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-Instruct outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.
Self-Supervised Representations of Geolocated Weather Time Series - an Evaluation and Analysis
Arjun Ashok, Devyani Lambhate, Jitendra Singh
Accepted at AAAI 2022 Climate Change Symposium

Preprint

We analyse existing self-supervised multivariate time series learning algorithms on their ability to learn representations of weather features, evaluating them on weather-driven downstream applications
Self-supervised learning (SSL) algorithms are gaining traction in various domains as a general paradigm of learning representations from data, largely outperforming supervised learning algorithms in tasks where labelled data is limited and costly to collect. In this work, we analyse existing self-supervised multivariate time series learning algorithms on their ability to learn representations of weather features, evaluating them on weather-driven downstream applications involving regression, classification and forecasting tasks. We experiment with a two-step protocol. In the first step, we employ an SSL algorithm and learn generic weather representations from multivariate weather data. Then, in the next step, we use these representations and train simple linear models for multiple downstream tasks. Through our experiments on air quality prediction tasks, we highlight the benefits of self-supervised weather representations. The benefits include improved performance across multiple tasks, the ability to generalize with limited in-task data, and a reduction in training time and carbon emissions. We highlight several areas of future work and the potential impact that such algorithms can have on real-world problems. We expect such a direction to be relevant in multiple weather-driven applications supporting climate change mitigation and adaptation efforts.
Learning Modular Structures That Generalize Out-Of-Distribution
Arjun Ashok, Chaitanya TD, Vineeth Balasubramanian
Accepted at AAAI 2022 Student Track

Short Version

Designed two regularizers that enforce a network to preserve expert features that are reusable across domains, enabling them to extrapolate to unseen distributions better
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine learning systems. We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains. Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network. Preliminary evaluation on two benchmark datasets corroborates the promise of our method.
Does Self-Supervision Always Improve Few-Shot Learning?
Arjun Ashok, Haswanth Aekula
Accepted at ReScience-C Journal through the Machine Learning Reproduciblity Challenge (MLRC) 2021
To be presented at the Journal Showcase Poster Session at NeurIPS 2022

PDF W&B Blog Code

In contrast to prior literature, we show that the effectiveness of self-supervision in improving few-shot learning highly depends on the architecture and image size used, and that using self-supervised to train models decreases cross-domain few-shot performance

Music

I am a Carnatic Vocalist and a student of Vidwan Bharat Sundar. I have performed Carnatic concerts in multiple venues in India, and continue to perform in Montréal.
May 2022

Jan 2019