Indian Symposium on Machine Learning (IndoML – 2022)
December 15 – 17, 2022 | IIT Gandhinagar
The Third Indian Symposium on Machine Learning (IndoML) will be hosted by the Indian Institute of Technology Gandhinagar (IITGN) between 15-17 December 2022. The symposium aims to be a forum to discuss state-of-the-art ML research through invited talks from leading experts within India and abroad. IndoML fosters mentoring of Indian Ph.D./Master students to network with their peers, seek expert guidance and develop early-stage collaborations.
IndoML aims to provide an opportunity for the faculty to engage with leading research groups in the country and conduct high-quality research leading to competitive publications. It will also provide a platform for industrial partners, including startups, working in ML-related areas to showcase their products and receive reviews/feedback as well as setup potential collaborations.
Confirmed Speakers

Anshumali Shrivastava
Rice University, US

Ashique KhudaBukhsh
Rochester Institute of Technology, US

Auroop R. Ganguly
Northeastern University, Boston

Chiranjib Bhattacharyya
IISC, India

DWEEPOBOTEE BRAHMA
Indian Institute of Technology Jodhpur, India

Kamal Choudhary
National Institute of Standards and Technology, US

Kapil Gupta
Accenture, India

Krishnamurthy Dvijotham
Google, Singapore

MOHIT IYYER
University of Massachusetts Amherst, US

MONOJIT CHOUDHURY
Microsoft Turing, India

Navveen Balani
Accenture, India

Niloy Ganguly
Indian Institute of Technology Kharagpur, India

Prasenjit Dey
Merlyn Mind, India

Ram Vasudevan
University of Michigan, US

Saikat Mukherjee
Hewlett Packard Enterprise, India

Somak Aditya
Indian Institute of Technology Kharagpur, India

Srijan Kumar
Georgia Institute of Technology, Georgia

Ujwal Gadiraju
Delft University of Technology, Netherland
Tutorials

Arnab Sinha
Amazon, US

ABIR DE
Indian Institute of Technology Bombay, India

Rishabh Iyer
University of Texas, Dallas, US
Organizers
-
Program Chairs:
Anirban Dasgupta (IIT Gandhinagar)
Animesh Mukherjee (IIT Kharagpur)
-
Sponsorship Chair:
Mayank Singh (IIT Gandhinagar)
-
Tutorial Chair:
Amrith Krishna (Uniphore)
-
Publicity Chair:
Vivek Srivastava (TCS Research Pune)
Ameena Khaleel (Google Research India)
-
Local Organizing Committee:
Anirban Dasgupta (IIT Gandhinagar)
Mayank Singh (IIT Gandhinagar)
Udit Bhatia (IIT Gandhinagar)
-
Datathon Chairs:
Bidisha Samanta (Google Research India),
Jayesh Choudhari (University of Warwick),
Somak Aditya (IIT Kharagpur),
Sandipan Sikdar (RWTH Aachen University, Germany)
-
Fellowship/Graduate Forum Chair:
Udit Bhatia (IIT Gandhinagar)
Pawan Goyal (IIT Kharagpur)
-
Website Master:
Raviraj Sukhadiya (IIT Gandhinagar)

Monojit Choudhury
Microsoft Turing, India
Title: T for “Terrorist”, “Tropical” or “Territorial”? Teachings Ethics to Large Language Models
Abstract: Large language models and their multilingual counterparts have revolutionized the way we build Natural Language Technology, to an extent that often seems magical. At the same time, LLMs are also known to display strong social biases that pick up from the data these models are trained on. Therefore, one of the key challenges we face today, especially in the industry, is about building language technology that will enable and delight the users, and at the same time will minimize the potential harms due to the biases in LMs. In this talk, I will take a few case studies of real world technologies – chat bots and text prediction – to illustrate the various principles and challenges of “Responsible AI” and ways to mitigate the harms without compromising on the performance of the systems.
Bio: Dr. Monojit Choudhury is currently a Principal Applied Scientist at Microsoft Turing, and previously was a Principal researcher at Microsoft Research Lab India. His research interests include various subfields of NLP and its intersections with cognition, linguistics and societal impact. He has made fundamental contributions to processing of code-mixed language, and is also known for his work on language technology for low resource languages. Dr. Choudhury is an adjunct faculty in Plaksha University and IIIT Hyderabad, and has also taught courses at Ashoka University and IIT Kharagpur. He has served as Area-chair, senior PC member and PC member for AAAI, ACL, EMNLP, NAACL and many other NLP conferences. He also served as an associate editor of ACM TALLIP. He is the general chair of the Panini Linguistics Olympiad and the founding co-chair of Asia Pacific Linguistics Olympiad – programs to introduce bright young students to linguistics and computational linguistics through puzzles. Dr. Choudhury holds PhD and B.Tech degrees in Computer Science and Engineering from IIT Kharagpur.

Arnab Sinha
Amazon, US
Title: How to experiment without losing customers’ trust – a practitioner’s approach
Abstract: Online controlled experiments (a.k.a. A/B tests) have become the gold standard for evaluating improvements in products. Several major internet companies use online controlled experiments as a data-driven tool to guide product development, including Amazon, Google, Facebook and Microsoft. At Amazon, we run 100s of experiments that impact several critical systems (e.g. Product Detail Page). These experiments are the launch vehicles of innovative products and features. However, a negative experiment can significantly hurt the customers’ trust. This talk will present a general overview of how to design and evaluate controlled experiments while ensuring customer trust, from a practitioner’s viewpoint.
Objectives: At the end of the tutorial, the students will:
1. Understand the goal, the method, the basic theory of controlled experiment design in the context of e-commerce.
2. Understand the method to design, run and evaluate experiment results and make decisions without compromising the trust.
3. Be able to implement major components of A/B testing.
4. Understand the anatomy of a large scale online experimentation platform.

ABIR DE
Indian Institute of Technology Bombay, India

Rishabh Iyer
University of Texas, Dallas, US
Title: Coresets and Combinatorial Optimization for Efficient and Robust Deep Learning
Abstract: In this tutorial, we review the basics of combinatorial optimization, and specifically, submodular optimization, and the closely related topic of coresets. Specifically, we will study the central problem of selecting a subset of instances, thereby enabling efficiency (compute or label efficiency) while maintaining certain desiderate (e.g, accuracy, robustness). We will study subset selection approaches for a number of applications including compute-efficient and robust deep learning, label efficient learning for active and semi-supervised learning and meta learning, and robust learning in the presence of noise, adversaries, out-of-distribution data, rare classes, imbalance, and distribution shift.
In the first part of this tutorial, we will review the basic optimization tools and constructs we will use here, including coresets, submodularity, and information theory. Next, we will discuss the applications of subset selection to, a) compute-efficient deep learning, b) active learning in realistic scenarios, c) robust learning with adverseries, distribution shift, OOD instances, and imbalance. Along the way, we will introduce a number of new optimization problems and constructs including mixed discrete-continuous optimization, discrete bi-level optimization, and submodular information measures.

Kamal Choudhary
National Institute of Standards and Technology , US
Title: Machine-learning for Materials Design using Atomistic structure, Spectral, Image and Text data
Abstract: Machine learning has an immense potential to aid materials design processes. Some of the major application areas include scalar and vector-valued atomistic property predictions, microscopy image and scholarly article type classifications. In this talk, we will first discuss Atomistic Line Graph Neural Network (ALIGNN) that can be used for molecular and solid-state property predictions. Unlike many other GNNs for atomistic property predictions using bond-distances mainly, ALIGNN performs message passing on both the bond-distances as well as bond-angles leading to superior performance. As example applications, we will use ALIGNN for finding new superconductors and CO2 capture metal-organic-framework materials. Then, we will discuss the AtomVision library that can be used to develop, classify and perform segmentation tasks on microscopy images using convolutional neural networks and GNNs. We will apply AtomVision for classifying two-dimensional materials microscopy images into five Bravais lattice categories. Finally, we will discuss the ChemNLP library that can be used to obtain text dataset for materials and perform text clustering and classification tasks using natural language processing methods. We will use ChemNLP to classify arXiv condensed matter Physics articles into nine categories given the title and abstract information. All the above projects are part of the NIST-JARVIS infrastructure (https://jarvis.nist.gov/).
Bio: Kamal Choudhary is a research scientist in the Materials measurement laboratory at the National Institute of Standards and Technology (NIST), Maryland, USA. He received his PhD in materials science and engineering from University of Florida in 2015 and then joined NIST. His research interests are focused on atomistic materials design using classical, quantum, and machine learning methods. In particular, he has developed the JARVIS database and tools (https://jarvis.nist.gov/) that are used by thousands of researchers all around the world. He is an associate editor for the journal Nature NPJ Computational Materials. He has published more than 70 research articles in various reputed journals and is an active member of TMS, APS, and MRS societies

Auroop R. Ganguly
Northeastern University, Boston
Title: Artificial intelligence with uncertainty quantification can plug gaps in climate science and
inform multi sector resilience
Abstract: Global climate and earth system models (ESMs), which numerically solve partial differential equations with high performance simulations, continue to have knowledge gaps and exhibit intrinsic variability for stakeholder relevant variables and resolutions. Data-driven sciences integrated with process understanding, especially the physics or biogeochemistry that may not be fully captured within the simulations, are critical to improve model parameterizations, develop a comprehensive characterization of variability and uncertainty, and extract scientific insights from archived model simulations. Furthermore, data-driven discrete event simulations have been proposed to incorporate societal dimensions such as management of watersheds in the land component of earth system models. The first part of this presentation will rely on our work at the Sustainability and Data Sciences Laboratory (SDS Lab) and the extant literature to elucidate the role of Artificial Intelligence (AI) and high performance computing (HPC), along with falsifiability and Uncertainty Quantification (UQ), in three areas, specifically, post- processing ESM simulations with knowledge-guided AI for extracting stakeholder and policy relevant insights, embedding AI within ESM for improving processes and parameterizations, and
incorporating human and societal dimensions within ESMs. The second part of the presentation will focus on Machine Learning (ML), even touching upon the unreasonable effectiveness of Deep Learning, based downscaling in climate with a particular focus on UQ along with evaluation and falsifiability, such that ESM simulations at lower resolutions can be credibly translated to information across local to regional scales to enable stakeholder decisions and policy. The presentation will conclude with a short discussion on making climate science actionable by relying not just on governmental or intergovernmental action but also through innovations in the private sector via large corporations and sustainable startups.

Ram Vasudevan
University of Michigan, US
Title: Bridging the Gap Between Safety and Real-Time Performance during Trajectory Optimization: Reachability-based Trajectory Design
Abstract: Autonomous systems offer the promise of providing greater safety and access. However, this positive impact will only be achieved if the underlying algorithms that control such systems can be certified to behave robustly. This talk describes a technique called Reachability-based Trajectory Design, which constructs a parameterized representation of the forward reachable set that it then uses in concert with predictions to enable real-time, certified, collision checking. This approach, which is guaranteed to generate not-at-fault behavior, is demonstrated across a variety of different real-world platforms including ground vehicles, manipulators, and walking robots.
Bio: Ram Vasudevan is an associate professor in Mechanical Engineering and the Robotics Institute at the University of Michigan. He received a BS in Electrical Engineering and Computer Sciences, an MS degree in Electrical Engineering, and a PhD in Electrical Engineering all from the University of California, Berkeley. He is a recipient of the NSF CAREER Award, the ONR Young Investigator Award, and the 1938E Award from the University of Michigan. His work has received best paper awards at the IEEE Conference on Robotics and Automation, the ASME Dynamics Systems and Controls Conference, IEEE International Conference on Biomedical Robotics and Biomechatronics, and IEEE OCEANS Conference and has been finalist for best paper at Robotics: Science and Systems.

Somak Aditya
Indian Institute of Technology Kharagpur, India
Title: The Increasing Relevance of Representation and Reasoning in the era of Language Models
Abstract: Lifelong acquisition and reasoning with knowledge is known to play an integral part in the way humans understand the universe. Recently, Transformers and Graph Neural Networks seemed to exhibit some capabilities for complex reasoning. Pre-trained Language Models (and LLMs) also demonstrate knowledge about the world and commonsense facts to some extent. However, these deep models often fail at non-adversarial simpler examples, exhibiting a lack of logical consistency and guarantees observed in formal axiomatic systems; especially across the dimensions of numeric, spatial, temporal and causal reasoning.
To resolve this discrepancy, I have worked with the hypothesis that different types of reasoning may require different models (may be symbolic); and our recent work show the indication of moving beyond the one-model-to-reason-them- all strategy currently adopted by the community. In this talk, I will first discuss the dimensions of reasoning, where Language Models show such discrepancies. I will summarize interesting advancements along such dimensions in NLP-adjacent tasks such as symbolic polynomial simplification, code retrieval; where neuro-symbolic methods show non-trivial performance gains. I will conclude with possibilities in NLP tasks.
Bio: Dr. Somak Aditya (Personal Webpage) is currently an Assistant Professor in CSE, IIT Kharagpur. He was a postdoctoral researcher at Microsoft Research India for nearly 2 years, and a full time researcher at Adobe Research for 1.5 years. He did his PhD thesis on “Knowledge and Reasoning for Image Understanding” from Arizona State University in 2018. He has a background in Knowledge Representation and Reasoning, NLP, and Machine Learning; and is experienced in conducting interdisciplinary research in the field of AI. After demonstrating the need for knowledge integration in vision and language, he has explored the same in the Marketing domain and in Natural Language Inference during his postdoctoral stints. Through four workshops in top conferences such as IJCAI, CIKM, and now CVPR, Somak Aditya has actively promoted knowledge integration (and neuro-symbolic systems) in the community. He was also a part of the organizing committee of IndoML 2021 (and 2022), an annually hosted pan-Indian symposium for Machine Learning. He has filed 2 patents and written over 20 research articles in top AI/NLP/CV conferences.

Krishnamurthy Dvijotham
Google, Singapore
Title: Reliable AI via formal verification and human-AI collaboration
Abstract: Deep learning based AI systems exhibit strong generalization and prediction capabilities. Yet, they are susceptible to for example under natural or adversarial distribution shifts. I will discuss two approaches to identifying and improving this worst case behavior: 1) Formal verification of neural networks, i.e, mathematical certificates that a neural network satisfies a desired input-output specification like robustness to adversarial perturbations. 2) Human-AI collaboration: Developing techniques that recognize when a neural network is less like to be accurate than a human expert, and deferring decisions to a human expert in these situations.

Prasenjit Dey
Merlyn Mind, India
Title: Experiences and Insights from Building a Domain Specific AI Assistant
Abstract: In this talk my key objective would be to share our experiences in building a sophisticated AI product and partner with communities such as IndoML to create an ecosystem that drives more such innovations. In building an AI product, a small part of it is about actual algorithms, and a large part of it is about adapting the technology for the user and ecosystem needs. Merlyn Mind’s core business of building an AI assistant for the classrooms has been an amazing intersection of technology, regulations, user needs, and education industry ecosystem. Each of these challenges invite separate technology responses which lead to an innovative AI product which we call the Merlyn Assistant (MA). The hardware device itself (Symphony Classroom (SC)) on which the MA resides is a full-stack edge device which has been custom optimized with powerful CPU, GPU and NPU, and a high performance microphone array. The SC device lives within a complex environment of numerous in-classroom devices (teacher laptop, smart board, overhead projectors, microscope etc.) and applications which have to be seamlessly orchestrated to remove friction from teacher’s in-classroom workflows. Being a voice computing device, the classroom environment is an unforgiving noisy environment and hence it becomes extremely challenging for a typical wakeword detection or speech recognition to work. To address this we deploy optimized algorithms, and carefully designed multimodal interactions that are seamless and intuitive. Moreover, the content that the assistant brings up need to be carefully filtered and curated to ensure they are kid friendly and the teacher has full control to take care of any inadvertent situation.
Bio: Prasenjit Dey is currently the Senior Vice President of Innovations at Merlyn Mind Inc. As part of his role at Merlyn Mind he looks at various problems that are at the intersection of AI and HCI and helps drive them to new products and services for the education industry. He particularly looks at the use of Multimodal Interactions and Large Language Models (LLM) to improve workflow automation and assistance for teachers inside and outside the classroom. In the past he has worked at these intersections for domains such as Education, Retail, and Finance. He has lead various global research strategy initiatives and research groups in the past at IBM Research and HP Labs. Prasenjit has a background in information theory and machine learning, and received his Ph.D. from Swiss Federal Institute of Technology (EPFL), Lausanne. He holds about 35 granted patents and has numerous peer reviewed publications.

MOHIT IYYER
University of Massachusetts Amherst, US
Title: Challenges in building and evaluating natural language generation systems
Abstract: Recent advances in neural language modeling have opened up a variety of exciting new text generation applications. However, evaluating systems built for these tasks remains difficult. Most prior work relies on a combination of automatic metrics such as BLEU (which are often uninformative) and crowdsourced human evaluation (which are also usually uninformative, especially when conducted without careful task design). In this talk, I first focus on the task of long-form question answering, in which a machine must generate a paragraph-length answer to a question. I will go over our recent work on building models for this application and then describe the ensuing struggles to properly compare them to baselines. We identify (and propose solutions for) issues with existing evaluations, including improper aggregation of multiple metrics, missing control experiments with simple baselines, and high cognitive load placed on human evaluators. Then, I’ll discuss our recent work on improving decoding algorithms for text generation via large-scale ranking models, which increase the relevance and coherence of generated text. I will conclude by describing my group’s ongoing work on document-level literary translation, which leverages large-scale multilingual language models to translate paragraphs from novels and also comes with a host of evaluation challenges.
Bio: Mohit Iyyer is an assistant professor in computer science at the University of Massachusetts Amherst. His research focuses broadly on designing machine learning models for discourse-level language generation (e.g., for story generation and machine translation), and his group also works on tasks involving creative language understanding (e.g., modeling fictional narratives and characters). He is the recipient of best paper awards at NAACL (2016, 2018) and a best demo award at NeurIPS 2015. He received his PhD in computer science from the University of Maryland, College Park in 2017, advised by Jordan Boyd-Graber and Hal Daumé III, and spent the following year as a researcher at the Allen Institute for Artificial Intelligence.

Anshumali Shrivastava
Rice University, US
Title: Probabilistic Hash Functions and Hash Tables: A New Paradigm
for Efficient AI Training and Inference
bstract: Neural Scaling Law informally states that an increase in model size and data automatically improves AI. However, we have reached a point where the growth has reached a tipping end where the cost and energy associated with AI are becoming prohibitive.
This talk will demonstrate the algorithmic progress that can exponentially reduce the compute and memory cost of training and inference with neural networks. We will show how data structures, particularly randomized hash tables, can be used to design an efficient “associative memory” that reduces the number of multiplications associated with the training of the neural networks. Implementation of this algorithm challenges the common knowledge prevailing in the community that specialized processors like GPUs are significantly superior to CPUs for training large neural networks. The resulting algorithm is orders of magnitude cheaper and energy-efficient. Our careful implementations can train billions of parameter recommendations and NLP models on commodity desktop CPUs significantly faster than top-of-the-line TensorFlow alternatives on the most potent A100 GPU clusters, with the same or better accuracies. The same idea can also result in more than 50x faster and cheaper inference.
In the end, I will highlight a cache-friendly compression scheme that can compress embedding models by 10000x (100GB Embedding Table to 10MB) and still achieves the MLPerf benchmark AUC of 0.8025 on the Terabyte click-through Criteo data, getting 3x inference speedup for free.
Bio: Anshumali Shrivastava is an associate professor in the computer science department at Rice University. He is also the Founder and CEO of ThirdAI Corp, a company that is democratizing AI to commodity hardware through software innovations. His broad research interests include probabilistic algorithms for resource-frugal deep learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Office of Scientific Research, a machine learning research award from Amazon, and a Data Science Research Award from Adobe. He has won numerous paper awards, including Best Paper Award at NIPS 2014, MLSys 2022, and Most Reproducible Paper Award at SIGMOD 2019. His work on efficient machine learning technologies on CPUs has been covered by popular press including Wall Street Journal, New York Times, TechCrunch, NDTV, Engadget, Ars technica, etc.

Niloy Ganguly
Indian Institute of Technology Kharagpur, India
Title: Distilling pre-trained knowledge to enhance property prediction for crystalline materials
Abstract: Rapid and accurate prediction of different properties of crystalline materials is a challenging task and has lots of interest to the materials science community since it is imperative for finding new functional materials. However, there are major challenges like scarcity of tagged data, DFT error bias in existing models, and lack of interpretability and algorithmic transparency, which need to be addressed. Graph-based representations are a more natural data structure to represent relational and structural information in crystals and hence we can explore the developments of graph neural networks (GNNs) in expressing these special chemical graphs. We will present in this talk the exact procedure followed.
Beside the scarcity of tagged (to a property) data, there is an availability of a huge amount of (untagged) crystal data with its chemical composition and structural bonds. To leverage these untapped data, we develop a pre-trained GNN framework for crystalline materials, which remarkably improves the classification performance and removes DFT-induced bias.
Bio: Dr. Niloy Ganguly is a Professor in the Dept. of Computer Science and Engineering at IIT Kharagpur and presently a visiting professor at Leibniz University of Hannover. He is also a Fellow of Indian Academy of Engineering. He spent 2 years as a Research Scientist in Technical University, Dresden, before joining IIT Kharagpur in 2005, and has risen to the rank of Professor in 2014. He has done his Btech from IIT Kharagpur and his Phd from IIEST, Shibpur. His research interests lie primarily in Social Computing, Machine Learning, and Network Science. He has published in around 250 papers in reputed journals and conferences.

DWEEPOBOTEE BRAHMA
Indian Institute of Technology Jodhpur, India
Title: Econometrics and Machine Learning: Two old friends
Abstract: The talk will discuss the exploding area of intersection between econometrics and machine learning. First, I will discuss the trajectories that these two disciplines of econometrics and Machine Learning (ML) have traditionally taken and discuss their goals, settings and approaches. Second, I will outline the area of intersection between the ML literature and, econometrics literature. Here, I will discuss several streams in the literature that have emerged within this intersection. These include using Big Data in economics and policy-making, predictive modelling for socio-economic outcomes, supervised and unsupervised learning methods used in economics. Third, I will discuss some newly developed methods in the intersection of econometrics and ML which typically improve upon the performances of methods in traditional econometrics and traditional Machine Learning. These include several causal Machine Learning models for conducting inference on average treatment effect, optimal policy estimation, counterfactual estimation and adaptive survey designs. Finally, the talk will conclude with discussions on relatively unexplored topics in the intersection of econometrics and ML.
Bio: Dr. Dweepobotee Brahma is an Assistant Professor at the Centre for Mathematical and Computational Economics in the School of AI and Data Science at IIT Jodhpur. She has a PhD in Applied Economics from Western Michigan University. She works in the intersection of Machine Learning and Causal Inference, and Development Economics. She studies topics in maternal and child health outcomes, child mortality and morbidity, malnutrition, immunization, health insurance and health financing, where she uses Machine Learning techniques to improve targeting of public policies. Dr. Brahma has recently been awarded the Google India Research Award to study child health inequities using Machine Learning.

Saikat Mukherjee
Hewlett Packard Enterprise, India
Title: Decentralized and confidential model building from distributed data
Abstract: Data has gravity and thus movement of large data is not viable economically. Data is often private and sometimes prejudicial too, if leaked. Government regulations such as GDPR and HIPPA mandates very strict rules on storage, sharing and ownership of private data. This is why, nowadays, data is mostly stored locally where it got generated, and not moved to other location(s) or centralized store or cloud. This results in many isolated small data lakes that produce potentially biased and inefficient models in each site. Federated learning has addressed this problem by building a global model from distributed sources without sharing raw data. However most of the prior works in the field of federated learning are heavily dependent on third party cloud or central authority to manage and orchestrate the multi-party collaborative training and merge models’ parameters periodically.
In this talk I will describe how machine learning model can be built from distributed data
sources in a completely decentralized way without any central coordinator. I will discuss in
detail – the high level architecture of the solution, how to create decentralized control plane
that can orchestrate collaborative model training, enablement of decentralized training
capability in an existing model, techniques for parameter sharing and merging and finally,
preserving the data confidentiality and security. I will also share some of the encouraging
results that were produced as an outcome of experimenting this novel technique on clinical
data such as blood transcriptomes for developing disease classifiers. Finally I will go through the existing challenges in the area of decentralized model training.
Bio: Saikat Mukherjee is working as an expert technologist at Labs and HPC and AI Advanced
Development, Hewlett Packard Enterprise. Saikat brings with him 18 years of professional
experience spanning across engineering and research in the areas of networking, storage andAI. His current research areas are federated learning, federated workflow and data centric AI. He is co-author of 6 granted patents and peer reviewed publications in international journals including nature and IEEE Transactions on Cloud Computing.

Srijan Kumar
Georgia Institute of Technology, Georgia
Title: Advances in Data Science for Accuracte and Robust Web Safety and Integrity
Abstract: The safety, integrity, and well-being of users, communities, and platforms on web and social media is a critical, yet challenging task. In this talk, I will describe the machine learning methods, leveraging behavior modeling, graph analytics, and deep learning, that my group has developed to efficiently detect malicious users and bad content online. While developing models that are highly accurate is important, it is also crucial to ensure that the systems are trustworthy and robust. Thus, I will describe my group’s work on creating multi-X detection models, namely multi-platform, multi-modal, and multi-lingual, as well as innovative methods to benchmark the adversarial robustness of these methods against smart adversaries.
Bio: Srijan Kumar is an Assistant Professor at the College of Computing at Georgia Institute of Technology. He develops data science, machine learning, and AI solutions for the pressing challenges of Web Safety and Integrity. His methods have been used in production at Flipkart (India’s largest e-commerce platform), influenced Twitter’s Birdwatch system, and taught at graduate-level courses worldwide. He has been named to the Forbes 30 under 30 Class of 2022, selected as a Kavli Fellow and a CRA Computing Innovations Mentor, and has received several awards including the Facebook Faculty Award, Adobe Faculty Award, ACM SIGKDD Doctoral Dissertation Award runner-up 2018, Larry S. Davis Doctoral Dissertation Award 2018, and best paper honorable mention award from WWW conference. His research has been a part of a documentary and covered in the popular press, including CNN, The Wall Street Journal, Wired, and New York Magazine. He completed his postdoctoral training at Stanford University, received a Ph.D. in Computer Science from the University of Maryland, College Park, and B.Tech. from the Indian Institute of Technology, Kharagpur.

Chiranjib Bhattacharyya
IISC, India
Title: Latent k-polytope: Convex Geometry meets Generative Modelling.
Abstract: This talk will introduce Latent k− Polytope(LkP), a new convex geometry based model for Generative modelling. Data is generated from LkP by perturbing points from a polytope with k-vertices. Surprisingly this model recovers several Ad-mixture models, such as Topic Models(TMs). We will show that the problem of inferring the vertices of LkP from a finite number of such points can be efficiently solved if the observed data satisfies a small set of deterministic assumptions. This algorithm readily applies to TMs thus providing a provably efficient alternative to MCMC based approaches.

Ashique KhudaBukhsh
Rochester Institute of Technology, US
Title: Reimagining Machine Translation and Text Classification to
Understand Media and Politics
Abstract: This talk is arranged in two parts.
In the first part, I will describe a new methodology that offers a fresh perspective on
interpreting and understanding political and ideological biases through machine
translation. Focusing on a year that saw a raging pandemic, sustained worldwide
protests demanding racial justice, an election of global consequence, and a far-from-
peaceful transfer of power, I will show how our methods can shed light on the
deepening political divide in the US.
In the second part, I will talk about the police portrayal in mainstream US media. The
thirteen-month period spanning the murder of George Floyd by police officer Derek
Chauvin on May 25, 2020, the Capitol Riot on January 6, 2021, Chauvin’s conviction
for the Floyd murder on April 21, 2021, and his sentencing on June 25, 2021, were
momentous events for policing in the United States in part due to the sustained media
attention given to police practice, conduct, and function. Using advanced natural
language processing methods, I will present our key findings upon analyzing the
different responses of three major media outlets — Fox News, CNN, and MSNBC —
to these seminal events.
Bio: Ashique KhudaBukhsh is an assistant professor at the Golisano College of
Computing and Information Sciences, Rochester Institute of Technology (RIT). His
current research lies at the intersection of NLP and AI for Social Impact as applied to:
(i) globally important events arising in linguistically diverse regions requiring methods
to tackle practical challenges involving multilingual, noisy, social media texts; (ii)
polarization in the context of the current US political crisis; and iii) auditing AI
systems and platforms for unintended harms. In addition to having his research been
accepted at top artificial intelligence conferences and journals, his work has also
received widespread international media attention that includes coverage from the
New York Times, BBC, Wired, Times of India, the Indian Express, The Independent,
VentureBeat, and Digital Trends.

Navveen Balani
Accenture, India
Title: Sustainability and AI – An industry perspective
Abstract: It’s well known that Artificial Intelligence (AI) is an enabler to multiple use cases of sustainability, however few recognize that AI is creating an ever-growing environmental footprint. To develop sustainable solutions to the growing problem of AI carbon emissions, companies must reframe their approach to Green AI – from focusing narrowly on AI model training to a holistic assessment across the full lifecycle of AI solutions to understand and reduce the carbon footprint of complete AI development and deployment process.
The session will cover:
– Introduction to the environmental impact of AI and why it’s critical to address it
– Exploring Accenture’s thought leadership, energy-efficient AI principles
– Engaging with green software community through Green Software Foundation (GSF).
Bio: Navveen Balani is the Chief Technologist of Technology Sustainability Innovation at Accenture. In his role, he leads various technology sustainability research and innovation initiatives and represents Accenture in the Green Software Foundation. He has around 22 years of experience in building enterprise products and services using exponential technology. He is the author of several leading technology books and actively blogs on his website at https://navveenbalani.dev/.

Kapil Gupta
Accenture, India
Title: Sustainability and AI – An industry perspective
Abstract: It’s well known that Artificial Intelligence (AI) is an enabler to multiple use cases of sustainability, however few recognize that AI is creating an ever-growing environmental footprint. To develop sustainable solutions to the growing problem of AI carbon emissions, companies must reframe their approach to Green AI – from focusing narrowly on AI model training to a holistic assessment across the full lifecycle of AI solutions to understand and reduce the carbon footprint of complete AI development and deployment process.
The session will cover:
– Introduction to the environmental impact of AI and why it’s critical to address it
– Exploring Accenture’s thought leadership, energy-efficient AI principles
– Engaging with green software community through Green Software Foundation (GSF).
Bio: Kapil Gupta is the Strategic Area & Asset Manager for Green AI at Technology Sustainability Innovation, Accenture. In his role, he is responsible for shaping the roadmap and scaling of Green AI innovation. Kapil brings with him more than 23 years of work experience spanning across engineering & research teams across leading global organisations.

Ujwal Gadiraju
Delft University of Technology, Netherland
Title: Human-Centered Artificial Intelligence – A Crowd Computing Perspective
Abstract: The unprecedented rise in the adoption of artificial intelligence techniques and automation in many contexts is concomitant with the shortcomings of such technology concerning robustness, interpretability, usability, trustworthiness, and explainability. Crowd computing offers a viable means to leverage human intelligence at scale for data creation, enrichment, and interpretation, demonstrating a great potential to improve the performance of AI systems and increase the adoption of AI in general. In this talk, I will discuss opportunities in crowd computing to propel better AI technology and argue that to make such progress, fundamental problems need to be tackled from both the computational and interactional standpoints. This talk will shed light on the research needed to help pave a future where humans can benefit by working seamlessly with AI systems.
Bio: Ujwal Gadiraju is a tenured Assistant professor in the Software Technology Department of the EEMCS Faculty, Delft University of Technology. He is a Director of the Delft AI “Design@Scale” Lab and a member of the program management team of the TU Delft AI Labs. In addition, Ujwal co-leads a research line on Crowd Computing and Human-Centered AI at the Web Infomation Systems group. He is a Distinguished Speaker of the ACM and a board member of CHI Netherlands. Before joining the WIS group, Ujwal worked at the L3S Research Center as a Postdoctoral researcher between 2017-2020. He received a PhD degree (Dr. rer. nat.) in Computer Science with a summa cum laude recognition from the Leibniz University of Hannover, Germany, in 2017 and an MSc. Computer Science degree from TU Delft, the Netherlands, in 2012. His research interests lie at the intersection of Human-Computer Interaction (HCI), Artificial Intelligence (AI), and Information Retrieval (IR), with a special focus on Crowd Computing. Ujwal has published over 125 peer-reviewed articles, including at premier venues such as ACM CHI, ACM CSCW, ACM TOCHI, AAAI HCOMP, ACM TheWebConf, ACM SIGIR, ACM UBICOMP, ACM CIKM, ACM WSDM, ACM HT, ACM UMAP, among others. His work has been recognized with several honors, including a nomination for the Best Paper award at TheWebConf 2022, Amazon Best Paper Award at AAAI HCOMP 2021, Best Student Paper Award at AAAI HCOMP 2020, Best Paper Award Honorable Mention at ACM CSCW 2020, Douglas Engelbart Best Paper Award at ACM HT 2017. Ujwal’s prior work in Crowd Computing has explored methods to improve the effectiveness of the crowdsourcing paradigm, running large-scale human-centered experiments to understand the interaction between humans and machines and the societal impact of algorithmic decision-making. His current research focuses on creating novel methods, interfaces, systems, and tools to overcome existing challenges on our path toward building trustworthy AI systems and facilitating better reliance of humans on AI systems. For more details, see http://ujwalgadiraju.com.

Udit Bhatia
IIT Gandhinagar, India
Title: Robustness and recovery of built and natural systems subject to hydrometeorological extremes: Integrating data, dynamics, and complexity
Abstract: In the presentation, I will discuss how the integration of non-linear dynamics, data with complex network representation help us understand the robustness and recovery characteristics of built critical infrastructure systems and natural ecosystems, which can inform the resilient design and near-optimal restoration strategies for such systems. My talk will include specific examples from our work on understanding tolerance of Indian Railways Network, US National Airspace Airport Network, unfolding of concurrent hazards on synthetic and regional transportation networks during 2018 extreme precipitation events in Kerala, and generalisable restoration strategies for degraded ecological networks located across the globe. Further, I will discuss our work and opportunities in the field of physics guided machine learning for predictive understanding of hydrological processes.
Bio: Udit Bhatia is Assistant Professor in Civil Engineering Discipline at Indian Institute of Technology, Gandhinagar. His research interests include resilience of built-natural systems, uncertainty assessment in hydroclimate extremes and physics-guided data sciences. He is co-author of the book titled,” Critical Infrastructures Resilience: Policy and Engineering”. please visit : https://iitgn.ac.in/faculty/civil/fac-udit

Vivek Raghavan
Chief Product Manager and Biometric Architect at UIDAI, India
Title: Creating Datasets for Public Good
Abstract: The talk will highlight recent (ongoing) efforts to collect datasets for training ML models for public good in the Indian Context. Examples of data collection efforts in the Fintech, Indian Languages and the Law and Justice fields will be analyzed. The interplay of these efforts with policy and technology advances and the challenges ahead will be discussed..
Bio: Dr. Vivek Raghavan is an out-of-the-box problem solver, former serial entrepreneur and angel investor. Vivek has an M.S. and Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University and a B. Tech. from IIT Delhi.
Vivek is the Chief Product Manager and Biometric Architect at the Unique Identification Authority of India. He has been responsible for the design, implementation and scale out of the technology platform for Aadhaar, the world’s largest identity program mostly as a volunteer. He joined the Aadhaar project just when the first Aadhaar was issued, and was there when the billionth Aadhaar was generated. In the past 3 years, Vivek has been exploring the use of AI in many public and governance domains. He is driving many AI initiatives at Aadhaar in the areas of biometrics and document validation. Vivek has been responsible for guiding the development of ML models at GSTN and NPCI. As Chief AI Evangelist at the EkStep foundation, Vivek has been advising the National Language Translation Mission to develop open datasets and open models for language AI technologies for Indian languages. Vivek serves as a member of the AI committee of the Supreme Court of India. Vivek has been a long term volunteer with the ISPIRT foundation where he is contributor to DEPA.
Vivek had also served as an volunteer CTO for Team Indus, India’s entry to the Google Lunar X-Prize, which aimed to land a spacecraft on the surface of the moon. Vivek spent 20 years in the field of Electronic Design Automation (EDA), successfully founding, running and selling two EDA companies and being responsible for the design and development of multiple market leading EDA products. He has held senior management positions at Magma Design Automation, Synopsys and Avant! Corporation. Vivek has an eclectic portfolio of angel investments, including Team Indus, Gear Design Automation, eZeTap, ZipDial, HealthifyMe, and Vayavya Labs among others.
Hotels in Gandhinagar
Hotel Name | Address |
---|---|
Hotel Middle Town (Room Tariff starts from INR 1200/-) |
Pramukh Arcade, Near Infocity, Reliance Chowkdi, Kudasan, Gandhinagar 382421 Ph. 07923213887, 9408288883 Email: enquiry@hotelmiddletown.com Web: www.hotelmiddletown.com |
Hotel Pathikashram (Room Tariff starts from INR 1500/-) |
Hotel Pathikashram, Near S.T. Bus stand, 098258 01782, GH-3, Gandhinagar |
Hotel Best Velly (Room Tariff starts from INR 1600/-) |
Radhe Square, Nr. Reliance Chowkdi, Kudasan, Gandhinagar Ph. 9924438844, 9924072444 Email: info.hotelbestvelly@gmail.com Web: www.hotelbestvelly.com |
Hotel Hilton Inn (Room Tariff starts from INR 1800/-) |
A-301to310, 3rd Floor, Pramukh Arcade, Reliance Chowdi, Kudasan, Gandhinagar Ph. 9925436221, 079-23213208 Email: hotelhilltoninn@gmail.com |
Hotel Midway Residency (Room Tariff starts from INR 1800/-) |
Sarkhej – Gandhinagar Hwy, Infocity, Gandhinagar, Gujarat 382421 Ph. 079-23213866, 9979359233 Email: midwayresidency@gmail.com Web: www.hotelmidwayresidency.in |
Hotel 7 Wonder (Room Tariff starts from INR 2500/-) |
7th Floor, Ugati corporate park, Opp. Pratik Mall, Kudasan, 96879 57777, Gandhinagar- 382421 |
Hotel Resort & Spa (Room Tariff starts from INR 1999/-) |
Near Sarojba Petroleum, Chiloda – Gandhinagar road, Chiloda, 07966709000, Gandhinagar |
Hotel Prominent (Room Tariff starts from INR 3000/-) |
Opp Pratik Mall,B/h Ugati hights, Kudasan Por Road,Kudasan, Contact person: Mr. Ketan patel- 7227036409 / 9998620188 |
Hotels in Ahmedabad
Hotel Name | Address |
---|---|
Hotel Hill Park Inn | 401, 4th floor, Vitthal The Mall, Nr. Engg College, Visat-Gandhinagar Highway, Motera, Ahmedabad-382424 Ph.- 079-40069797, 079-40069799, 9099979797 |
Hotel Kum Kum | Shukan mall, near Visat Petrol pump, near Deshavar hotel, Visat-Gandhinagar Highway, Ahmedabad Ph.: 079- 27700360, 9624066786, 9979359233 |
Ratna Palace | Baronet Complex, 4TH Floor, NR Sabarmati Police Station, Motera Stadium Cross Road, Sabarmati, Ahmedabad. Ph.- 079-27508218 |
Hotel Silver Cloud | Opp Gandhi Ashram Dandi Bridge, Ashram Road, Wadaj, Ahmedabad. Ph.- 079-66156470 |
The Metropole Hotel | NR RTO Circle, Subhash Bridge Corner, Subhash Bridge, Ahmedabad. Ph- 079-66153358 |
Skylon Hotel | Dev Complex, K-7 Circle, Sector-26, Gandhinagar – 382028. Gujarat, India. Mobile: +91-8306007001, +91-8866007001 Phone: 079-23288300 |
Hotel Comfort Inn Sunset | Comfort Inn Sunset, Airport Circle, Ahmedabad 382475 Tel: +91.79.2286.2200, Mobile: +91.99252.31974 E-mail: sales@cisunset.net, reservations@cisunset.net |
Venue
Building: Jibaben Patel Memorial Auditorium, Academic Block
Schedule
Mentioned time is Indian Standard Time (GMT+5 hr 30 mins)
Time
|
Talk |
Speaker |
Day 1 : Session I
|
||
Session Chair: Anirban Dasgupta |
||
08:00 – 08:30 |
Anirban Dasgupta: Inaugural talk |
|
08:30 – 09:10 |
Niloy Ganguly: Distilling pre-trained knowledge to enhance property prediction for crystalline materials |
|
09:15 – 09:50 |
Monojit Choudhury: T for “Terrorist”, “Tropical” or “Territorial”? Teachings Ethics to Large Language Models |
|
10:00 – 10:40 |
Mohit Iyyer : Challenges in building and evaluating natural language generation systems |
|
10:40 – 11:00 |
Coffee |
|
11:00 – 11:40 |
Krishnamurthy Dvijotham: Reliable AI via formal verification and human-AI collaboration |
|
Day 1 : Session II |
||
Session Chair: Mayank Singh |
||
11:45 – 12:25 |
Dweepobotee Brahma: Econometrics and Machine Learning: Two old friends
|
|
12:30 – 14:30 |
Lunch + Posters Session |
|
14:30 – 15:10 |
Ashique KhudaBukhsh: Reimagining Machine Translation and Text Classification to
|
|
15:15 – 15:55 |
Prasenjit Dey: Experiences and Insights from Building a Domain Specific AI Assistant |
|
15:55 – 16:10 |
Coffee |
|
16:10 – 17:10 |
ABIR DE ( Part- I ) Tutorial – 2: Coresets and Combinatorial Optimization for Efficient and Robust Deep Learning |
|
17:10 – 18:10 |
Rishabh Iyer (Part- II) Tutorial – 2 : Coresets and Combinatorial Optimization for Efficient and Robust Deep Learning |
|
19:00 |
Business meeting (speakers & organizers)+Dinner |
Time (IST: GMT+5:30) |
Talk |
Speaker |
Day 2 : Session I |
||
Session Chair: Niloy Ganguly |
||
08:00 – 09:10 |
Srijan Kumar: Advances in Data Science for Accuracte and Robust Web Safety and Integrity |
|
09:15 – 09:55 |
Anshumali Shrivastava: Probabilistic Hash Functions and Hash Tables: A New Paradigmfor Efficient AI Training and Inference |
|
10:00 – 10:15 |
Coffee |
|
10:15 – 12:00 |
Arnab Sinha Tutorial 1: How to experiment without losing customers’ trust – a practitioner’s approach |
|
12:00 – 13:00 |
Lunch |
|
13:00 – 14:00 |
Student-Speaker interaction |
|
Day 2 : Session II |
||
Session Chair: Udit Bhatia |
||
14:00 – 14:20 |
Coffee |
|
14:20 – 15:00 |
Auroop R. Ganguly: Artificial intelligence with uncertainty quantification can plug gaps in climate science andinform multi sector resilience |
|
15:05 – 15:45 |
Chiranjib Bhattacharyya: Latent k-polytope: Convex Geometry meets Generative Modelling |
|
15:50 – 16:30 |
Navveen Balani | Kapil Gupta : Sustainability and AI – An industry perspective |
|
16:30 – 16:45 |
Coffee |
|
16:45 – 18:15 |
Datathon |
|
19:00 |
Banquet |
Time (IST: GMT+5:30) |
Talk |
Speaker |
Day 3 : Session I |
||
Session Chair: Pawan Goyal |
||
08:30 – 09:10 |
Kamal Choudhary: Machine-learning for Materials Design using Atomistic structure, Spectral, Image and Text data |
|
09:15 – 09:55 |
Ram Vasudevan: Bridging the Gap Between Safety and Real-Time Performance during Trajectory Optimization: Reachability-based Trajectory Design |
|
10:00 – 10:40 |
Somak Aditya: The Increasing Relevance of Representation and Reasoning in the era of Language Models |
|
10:40 – 11.10 |
Coffee + Snack
|
|
Day 3 : Session II |
||
Session Chair: Animesh Mukherjee |
||
11:10 – 11:50 |
Ujwal Gadiraju: Human-Centered Artificial Intelligence – A Crowd Computing Perspective |
|
11:55 – 12:35 |
Saikat Mukherjee: Decentralized and confidential model building from distributed data |
|
12:40 – 13:00 |
Animesh Mukherjee : Conclusion and Vote Of Thanks |
|
13:00 – 14:30 |
Lunch |
Registration
IndoML looks forward to the participation of students (category A, Fee: INR 2500) and researchers from Industry and Academia (category B, Fee INR 5000). The two categories have a different registration process described below:
Registration for Category A (Student)
IndoML 2022 will provide travel assistance to a limited number of full-time students from degree granting institutions. The purpose of the travel grant is to promote participation of students from a diverse set of institutions as well as backgrounds. Given the excellent speakers in the symposium, this would be a unique opportunity for the Senior UG students looking for MS/ PhD applications, Masters’ students looking for PhD, and PhD students looking for postdoc opportunities. Further, PhD students who are just beginning their research formulation can expect good feedback on their thesis proposal / work in progress.
The travel grant includes 1) free accommodation, 2) food, 3) to-and-fro travel cost equivalent to the standard AC 3-tier fare. The prospective participants will have to pay the registration fee to avail themselves of the grant. However, the entire registration fee will be refunded after the conference barring GST charges.
The number of awards will be decided based on the available budget.
How to submit the application
Please submit your travel grant application that includes the applicant’s information and a statement of purpose. The form is available here.
- Applicant’s Information:
- Email Address: Your institutional email id
- Name: Applicant’s full name
- Current Place of Study: Name and address of the current institution
- Program: Bachelor’s, MS, PhD, Other (specify)
- Number of years in the program: e.g. In the second year of a 4-year program or 2nd year of a PhD program
- Whether participating in the Datathon@IndoML
- Yes / No
- If Yes, provide the email address by which you have registered for Datathon
- Statement of Purpose:
- Applicant’s CV mentioning his/her research interest and current accomplishments, and future research plans (Provide a viewable link via Google Drive / webpage / LinkedIn)
- A short description of how the grant will help him/her with the research plans (not more than 5 sentences)
- The total estimated travel cost (standard AC 3-tier to and fro) for attending indoML
- One page proposal (optional, for MS/PhD students only, preferably thesis proposal or work-in-progress) for poster presentation. Provide a viewable link via Google Drive
- Contact information of at least one recommender (Provide name, affiliation, email id)
Deadline
- Please fill out the application form before midnight, September 20th, 2022, IST.
- The decision will be made and communicated by September 30th, 2022.
- Shortlisted candidates must pay registration fee by October 10th, 2022 to avail themselves of the scholarship.
Registration for Category B (Non-Student)
The registration for category B attendees is a three-step process. In the first step, please submit your application that includes your basic information. The form is available here. We shall process the application on a first come first serve basis and share the payment link over email. The registration fee is Rs. 5000/- which will include free meals for all the days of the conference. Once we receive the payment of INR 5000/- (non-refundable), your registration will be confirmed.
Deadline
- Please fill out the application form before midnight, September 28th, 2022, IST.
- We shall send the payment link by midnight, September 30th, 2022, IST.
- The registration fees need to be paid by October 10th, 2022.
FAQs
- Will IndoML provide free accommodation to all attendees?
The accommodation details are:
- For (Students) category A: Yes. IndoML will provide free hostel accommodation to all the participating students.
- For (Non-Students) category B: No. There are several nearby hotels. The list is available here.
- Will IndoML provide options for different food preferences?
Yes. IndoML provides options for different food preferences. You will receive an email to specify your food preferences.
- I have recently submitted by PhD/Mtech thesis, am I eligible to participate?
Yes. We look forward to your participation. If you do not fall in category B (i.e., not working in an industry/academia), please register for category A, else category B.
In-Cooperation With
Sponsors
Diamond Sponsors




Gold Sponsors




Silver Sponsors
