Arjun Ashok

Arjun Ashok

I am a Visiting Researcher (Full-Time) at ServiceNow Research, Montreal. Here, I work with Dr. Étienne Marcotte, Dr. Alexandre Drouin, Dr. Valentina Zantedeschi and Dr. Nicolas Chapados on efficient transformer-based architectures for time series representation learning.

My current research interests are in transfer learning, particularly in pre-training, multi-task learning, continual learning and out-of-distribution generalization. I am currently interested in working on time series data, however I have previously worked in both computer vision and natural language processing.



Integrated B.Sc-M.Sc., Software Systems
PSG Tech, Coimbatore
2018 - 2023
Research Intern
IIT Madras
May '20 - Aug '20
Research Assistant
IIT Hyderabad
Jun '21 - May '22
Research Engineering Intern
Mar '22 - Dec '22
Research Intern
IBM Research, India
June '22 - Aug '22
Visiting Researcher (Full-Time)
ServiceNow Research, Montreal
Jan '23 - Present


Jan '23 ⭐ One paper on out-of-distribution detection accepted to ICLR 2023. This is work in collaboration with folks at ML Collective mentored by Rosanne Liu.
Jan '23 Started as a Visiting Researcher (Full-Time) at ServiceNow Research, Montreal. Excited to continue working on problems in time series representation learning!
Aug '22 Preliminary work on self-supervised learning objectives for weather time series accepted at the AAAI 2022 Fall Symposium on Climate Change.
Jul '22 ⭐ One paper on Class-Incremental Learning accepted as a full paper at ECCV 2022.
Jun '22 Started as a Research Intern at IBM Research, India. I'll be working on building self-supervised learning objectives and pre-trained models for geospatial weather time series.
Jun '22 ⭐ One paper on cross-task generalization in NLP submitted to EMNLP 2022 (Update: Accepted).
Apr '22 One paper on Class-Incremental Learning accepted at the CLVISION Workshop at CVPR 2022 as a non-archival paper (Update: Accepted at ECCV 2022).
Apr '22 One reproducibility report on Self-Supervision and Few-shot Learning accepted at the ML Reproducibility Challenge 2021 (Fall Edition) and published at ReScience-C.
Oct '21 One paper on out-of-distribution generalization accepted as AAAI 2022 as a student abstract.
Jun '21 Started as a Research Assistant at IIT Hyderabad. Grateful to be working under Prof. Vineeth Balasubramanian.


Conference Publications

Publications as First-Author

Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
Arjun Ashok, K J Joseph, Vineeth Balasubramanian
Accepted at ECCV 2022

Paper arXiv Project Page Code

We propose two distillation-based objectives for class incremental learning that leverage the structure of the feature space to maintain accuracy on previous classes, as well as enable learning the new classes
In class-incremental learning, the model is expected to learn new classes continually while maintaining knowledge on previous classes. The challenge here lies in preserving the model's ability to effectively represent prior classes in the feature space, while adapting it to represent incoming new classes. We propose two distillation-based objectives for class incremental learning that leverage the structure of the feature space to maintain accuracy on previous classes, as well as enable learning the new classes. In our first objective, termed cross-space clustering (CSC), we propose to use the feature space structure of the previous model to characterize directions of optimization that maximally preserve the class - directions that all instances of a specific class should collectively optimize towards, and those that they should collectively optimize away from. Apart from minimizing forgetting, this indirectly encourages the model to cluster all instances of a class in the current feature space, and gives rise to a sense of herd-immunity, allowing all samples of a class to jointly combat the model from forgetting the class. Our second objective termed controlled transfer (CT) tackles incremental learning from an understudied perspective of inter-class transfer. CT explicitly approximates and conditions the current model on the semantic similarities between incrementally arriving classes and prior classes. This allows the model to learn classes in such a way that it maximizes positive forward transfer from similar prior classes, thus increasing plasticity, and minimizes negative backward transfer on dissimilar prior classes, whereby strengthening stability. We perform extensive experiments on two benchmark datasets, adding our method (CSCCT) on top of three prominent class-incremental learning methods. We observe consistent performance improvement on a variety of experimental settings.

Publications as Co-Author

Extremely Simple Activation Shaping for Out-of-Distribution Detection
Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, Rosanne Liu
Accepted at ICLR 2023

arXiv Project Page Code

We develop an extremely simple, post hoc, on-the-fly, and plug-and-play activation shaping method for out-of-distribution detection.
The separation between training and deployment of machine learning models implies that not all scenarios encountered in deployment can be anticipated during training, and therefore relying solely on advancements in training has its limits. Out-of-distribution (OOD) detection is an important area that stress-tests a model's ability to handle unseen situations: Do models know when they don't know? Existing OOD detection methods either incur extra training steps, additional data or make nontrivial modifications to the trained network. In contrast, in this work, we propose an extremely simple, post-hoc, on-the-fly activation shaping method, ASH, where a large portion (e.g. 90%) of a sample's activation at a late layer is removed, and the rest (e.g. 10%) simplified or lightly adjusted. The shaping is applied at inference time, and does not require any statistics calculated from training data. Experiments show that such a simple treatment enhances in-distribution and out-of-distribution sample distinction so as to allow state-of-the-art OOD detection on ImageNet, and does not noticeably deteriorate the in-distribution accuracy. We release alongside the paper two calls for explanation and validation, believing the collective power to further validate and understand the discovery.
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, ..., Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi (40 authors)
Accepted at EMNLP 2022

arXiv Dataset Code Project Page

We introduce a benchmark of 1,600+ diverse language tasks and their expert-written instructions, and rigorously benchmark cross-task/unseen-task generalization of models. We introduce Tk-Instruct, an encoder-decoder Transformer that is trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples) which outperforms existing larger models on our benchmark.
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones. Furthermore, we build Tk-Instruct, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-Instruct outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.

Workshop/Symposium Publications

Self-Supervised Representations of Geolocated Weather Time Series - an Evaluation and Analysis
Arjun Ashok, Devyani Lambhate, Jitendra Singh
Accepted at AAAI 2022 Climate Change Symposium


We analyse existing self-supervised multivariate time series learning algorithms on their ability to learn representations of weather features, evaluating them on weather-driven downstream applications
Self-supervised learning (SSL) algorithms are gaining traction in various domains as a general paradigm of learning representations from data, largely outperforming supervised learning algorithms in tasks where labelled data is limited and costly to collect. In this work, we analyse existing self-supervised multivariate time series learning algorithms on their ability to learn representations of weather features, evaluating them on weather-driven downstream applications involving regression, classification and forecasting tasks. We experiment with a two-step protocol. In the first step, we employ an SSL algorithm and learn generic weather representations from multivariate weather data. Then, in the next step, we use these representations and train simple linear models for multiple downstream tasks. Through our experiments on air quality prediction tasks, we highlight the benefits of self-supervised weather representations. The benefits include improved performance across multiple tasks, the ability to generalize with limited in-task data, and a reduction in training time and carbon emissions. We highlight several areas of future work and the potential impact that such algorithms can have on real-world problems. We expect such a direction to be relevant in multiple weather-driven applications supporting climate change mitigation and adaptation efforts.
Learning Modular Structures That Generalize Out-Of-Distribution
Arjun Ashok, Chaitanya TD, Vineeth Balasubramanian
Accepted at AAAI 2022 Student Track

Short Version

Designed two regularizers that enforce a network to preserve expert features that are reusable across domains, enabling them to extrapolate to unseen distributions better
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine learning systems. We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains. Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network. Preliminary evaluation on two benchmark datasets corroborates the promise of our method.
Does Self-Supervision Always Improve Few-Shot Learning?
Arjun Ashok, Haswanth Aekula
Accepted at ReScience-C Journal through the Machine Learning Reproduciblity Challenge (MLRC) 2021
To be presented at the Journal Showcase Poster Session at NeurIPS 2022

PDF W&B Blog Code

In contrast to prior literature, we show that the effectiveness of self-supervision in improving few-shot learning highly depends on the architecture and image size used, and that using self-supervised to train models decreases cross-domain few-shot performance


Fall 2021 Deep Learning for Computer Vision, NPTEL (Online Course) Instructor: Vineeth Balasubramanian Taken By: 6426 students


I am a Carnatic Vocalist and a student of Vidwan Bharat Sundar.

I have performed in multiple venues in India. Here is a news article that covered one of my concerts in Coimbatore.

May 2022

Jan 2019


  • Department Rank 1 among 120 students during all five academic years at PSG Tech.
  • State Rank 4 among 104,000 candidates in TNHSE examinations in 2018, 100th percentile (one among 10 students out of all candidates).
  • Institute Gold Medal and Outstanding Student Award, G.D. Matriculation Higher Secondary School in 2018 - chosen out of 140 students in the graduating batch.
  • PASCH Scholarship to attend a language summer school at Frankfurt, Germany in 2017.
    Awarded to only 80 students worldwide in 2017.