Indian Symposium on Machine Learning (IndoML – 2022)
December 15 – 17, 2022 | IIT Gandhinagar
The Third Indian Symposium on Machine Learning (IndoML) will be hosted by the Indian Institute of Technology Gandhinagar (IITGN) between 15-17 December 2022. The symposium aims to be a forum to discuss state-of-the-art ML research through invited talks from leading experts within India and abroad. IndoML fosters mentoring of Indian Ph.D./Master students to network with their peers, seek expert guidance and develop early-stage collaborations.
IndoML aims to provide an opportunity for the faculty to engage with leading research groups in the country and conduct high-quality research leading to competitive publications. It will also provide a platform for industrial partners, including startups, working in ML-related areas to showcase their products and receive reviews/feedback as well as setup potential collaborations.
Stay Tuned, We are updating other information soon.
Organizers
-
Program Chair:
Anirban Dasgupta (IIT Gandhinagar)
Animesh Mukherjee (IIT Kharagpur)
-
Sponsorship Chair:
Mayank Singh (IIT Gandhinagar)
-
Tutorial Chair:
Amrith Krishna (University of Cambridge)
-
Datathon Chair:
Sandipan Sikdar (RWTH Aachen University, Germany)
-
Publicity Chair:
Vivek Srivastava (TCS Research Pune)
-
Fellowship/Graduate Forum Chair:
Udit Bhatia (IIT Gandhinagar)
-
Website Master:
Raviraj Sukhadiya (IIT Gandhinagar)
-
Local Organizing Committee:
TBD

Monojit Choudhury
Microsoft Research Lab, India
Title: Applications and Techniques for Code-mixed Text Generation
Abstract: Code-mixing refers to mixing of more than one language in a single conversation or sentence, and is commonly observed in most multilingual societies. Yet, code-mixed text data is scarce as it is predominantly found in informal speech. Therefore, the primary hindrance in building models for processing code-mixed text is the lack of large quantities of labeled and unlabeled datasets. Through a series of research studies and building on decades of research in linguistics, we developed techniques that can automatically generate nearly natural code-mixed text that in turn can help in training models and building applications for code-mixing. In this talk, I will highlight how linguistic theory-guided computational analysis of real code-mixed data help us build such models, and how they can help in building pre-trained LMs that feed into other NLP applications.
Bio: Dr. Monojit Choudhury is a principal researcher at Microsoft Research Lab India. His research interests include various subfields of NLP and its intersections with cognition, linguistics and societal impact. He has made fundamental contributions to processing of code-mixed language, and is also known for his work on language technology for low resource languages. Dr. Choudhury is an adjunct faculty in Plaksha University and IIIT Hyderabad, and has also taught courses at Ashoka University and IIT Kharagpur. He has served as Area-chair, senior PC member and PC member for AAAI, ACL, EMNLP, NAACL and many other NLP conferences. He also served as an associate editor of ACM TALLIP. He is the general chair of the Panini Linguistics Olympiad and the founding co-chair of Asia Pacific Linguistics Olympiad – programs to introduce bright young students to linguistics and computational linguistics through puzzles. Dr. Choudhury holds PhD and B.Tech degrees in Computer Science and Engineering from IIT Kharagpur.

Dragomir R. Radev
Yale University, US
Title: Closing the loop in natural language interfaces to databases: Parsing, dialogue, and generation
Abstract: One of the most interesting tasks in semantic parsing is the translation of natural language sentences to database queries. As part of the Yale Spider project, researchers at Yale University have developed three new datasets and matching shared tasks, which will be covered in this session. Spider is a collection of 10,181 natural language questions and 5,693 matching database queries from 138 domains. SParC (Semantic Parsing in Context) consists of 4,298 coherent sequences of questions and the matching queries. Finally, CoSQL consists of a Wizard-of-Oz collection of 3,000 dialogues, a total of 30,000 turns, and their translations to SQL. This session will also introduce GraPPa, a pretraining approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. The researchers used GraPPa to obtain state-of-the-art performance on four popular fully supervised and weakly supervised table semantic parsing benchmarks. If time permits, I will also talk about some other recent NLP projects at Yale. This is joint work with Tao Yu, Rui Zhang, Victoria Lin, Caiming Xiong, and many others.
Bio: Dragomir Radev is the A. Bartlett Giamatti Professor of Computer Science at Yale University. He leads the Natural Language Processing (NLP) lab at Yale (http://lily.yale.edu). Dragomir’s research focuses on computational models for natural language understanding and generation, as well as their applications. More specifically, he has worked on text summarization, semantic parsing, natural language interfaces to databases, sentiment analysis, crosslingual information retrieval, question answering, educational applications, etc. Dragomir is originally from Bulgaria and he holds a PhD in Computer Science from Columbia University. He is one of the co-founders of the North American Computational Linguistics Open Competition (http://www.nacloweb.org). Dragomir Radev is Fellow of ACM, AAAI, AAAS, and ACL.

Raymond J. Mooney
University of Texas at Austin, US
Title: Deep Learning for Automating Software Documentation Maintenance
Abstract: Applying deep learning to large open-source software repositories offers the potential to develop many useful tools for aiding software development, including automated program synthesis and documentation generation. Specifically, we have developed methods that learn to automatically update existing natural language comments based on changes to the body of code they accompany. Developers frequently forget to update comments when they change code, which is detrimental to the software development cycle, causing confusion and bugs . First, we use methods for “just in time” comment/code inconsistency detection which learn to recognize when changes to code render it incompatible with its existing documentation. We then learn a model that appropriately updates a comment when it is judged to be inconsistent. Our approach learns to correlate changes across two distinct language representations, generating a sequence of edits that are applied to an existing comment to reflect source code modifications. We train and evaluate our model using a large dataset collected from commit histories of open-source Java software projects, with each example consisting of an update to a method and any concurrent edit to its corresponding comment. We compare our approach against multiple baselines using both automatic metrics and human evaluation. Results reflect the challenge of this task and that our model outperforms many baselines with respect to detecting inconsistent comments and appropriately updating them.
Bio: Raymond J. Mooney is a Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 180 published research papers, primarily in the areas of machine learning and natural language processing. He was the President of the International Machine Learning Society from 2008-2011, program co-chair for AAAI 2006, general chair for HLT-EMNLP 2005, and co-chair for ICML 1990. He is a Fellow of AAAI, ACM, and ACL and the recipient of the Classic Paper award from AAAI-19 and best paper awards from AAAI-96, KDD-04, ICML-05 and ACL-07.

Sharad Goel
Harvard University, US
Title: Designing Equitable Algorithms for Criminal Justice and Beyond
Abstract: Machine learning algorithms are now used to automate routine tasks and guide high-stakes decisions, but, if not carefully designed, they can exacerbate inequities. I’ll start by describing an evaluation of automated speech recognition (ASR) tools, which power popular virtual assistants, facilitate automated closed captioning, and enable digital dictation platforms for health care. We find that five state-of-the-art ASR systems — developed by Amazon, Apple, Google, IBM, and Microsoft — exhibited substantial racial disparities, making twice as many errors for Black speakers compared to white speakers, a gap we trace back to a lack of diversity in the audio data used to train the models. I’ll then describe recent attempts to mathematically formalize fairness. I’ll argue that some of the most popular definitions, when used as a design principle, can, perversely, harm the very groups they were created to protect. I’ll conclude by describing a general, consequentialist paradigm for designing equitable algorithms that aims to mitigate the limitations of dominant approaches to building fair machine learning systems.
Bio: Sharad Goel is Professor of Public Policy at Harvard Kennedy School, He recently worked on are: policing practices, including statistical tests for discrimination; fair machine learning, including in automated speech recognition; and democratic governance, including swing voting, polling errors, voter fraud, and political polarization.
https://5harad.com/

Chris Potts
Stanford University, California
Title: Benchmark datasets: The essential resources on which all NLP depends
Abstract: Like many areas of AI, present-day NLP is data-driven. As a result, the available benchmark datasets are the primary factor in shaping the field itself. This has wide-ranging consequences for research, technology, and increasingly for society. How do we conceptualize different tasks, and which tasks receive the most attention from researchers? Which languages are adequately represented in our literature? Which groups benefit most from language technologies? Where will our systems deliver results that are embarrassing or worse? The answers to all these questions lie largely in the data we have for training and assessment. It is therefore in our best interests to deeply understand the datasets on which we are so dependent, and to seek out innovative new ways of collecting and validating relevant data. In this talk, I will report on a number of recent efforts to create more meaningful benchmarks for the field, and I will seek to identify persistent challenges and open questions in this area.
Bio: Christopher Potts is Professor and Chair of Linguistics and Professor (by courtesy) of Computer Science at Stanford, and a faculty member in the Stanford NLP Group and the Stanford AI Lab. His group uses computational methods to explore topics in emotion expression, context-dependent language use, systematicity and compositionality, and model interpretability. This research combines methods from linguistics, cognitive psychology, and computer science, in the service of both scientific discovery and technology development.

Dipanjan Das
Google AI, New York
Title: Trustworthy Natural Language Generation with Communicative Goals
Abstract: While recent work in natural language generation based on deep neural networks have made such systems fluent and easier to train from human-curated data, these models suffer from content “hallucination”, where model-generated statements are not attributable to provided sources in communicative scenarios (e.g. summarization and responses in dialogue systems). Furthermore, evaluation of generation systems remains challenging: (1) human evaluation studies are not reproducible and there is a lack of common benchmarking for diverse tasks, and (2) popular automatic evaluation methods such as BLEU and ROUGE correlate poorly with human judgments. In this talk, I will present a body of work that attempts to solve the above problems for a variety of natural language generation problems and showcase a vision towards future work in this area.
Bio: Dipanjan Das is a senior staff research scientist at Google Research, in the Language team, based in New York City. He leads teams of researchers distributed between New York, London, Berlin and Seattle focusing on language technologies. Dipanjan’s research focus is on natural language generation, grounded dialogue systems and model interpretability. Prior to joining Google, he completed a Ph.D. from the Language Technologies Institute, School of Computer Science at Carnegie Mellon University in 2012. In 2005, he completed a B.Tech. in Computer Science and Engineering from IIT Kharagpur. Dipanjan’s work has received paper awards at the ACL and EMNLP conferences; he serves as action editor for TACL and ARR and as senior committee member at various *ACL and machine learning venues.

Iryna Gurevych
Technische Universität Darmstadt,Germany
Title: Towards consent-driven, ethically sound NLP for peer reviews
Abstract: Peer review is the major way to determine the status and importance of research outputs in science. The explosive publication growth of the past decades puts a strain on the traditional peer reviewing. Peer reviews are text, and thus make a promising target for natural language processing, from simple reviewer assistance to end-to-end review generation. However, peer reviewing data available for NLP research is scarce: existing datasets come from limited domains and are associated with a range of ethical and privacy-related challenges. What data does a single peer reviewing campaign produce? Who owns this data, and who should have a say in making it public? How can this data be redistributed and built upon? How can a research community transition to semi-open peer review, if decided to do so?
UKP Lab (TU Darmstadt) leads the discussion on making data from the ACL community available for peer reviewing research. In this talk, I will discuss the major challenges of peer review as the data type for NLP, and present our past and ongoing work in analyzing peer reviewing data, providing secure access to sensitive review texts, and building sustainable workflows for continuous peer reviewing data collection. Our efforts contribute to consent-driven, ethically sound NLP for peer reviews in the ACL community and beyond.
Bio: Iryna Gurevych (PhD 2003, U. Duisburg-Essen, Germany) is professor of Computer Science and director of the Ubiquitous Knowledge Processing (UKP) Lab at the Technical University (TU) of Darmstadt in Germany. She joined TU Darmstadt in 2005 (tenured as full professor in 2009). Her main research interests are in machine learning for large-scale language understanding, text semantics and scientific literature mining. Iryna’s work has received numerous awards, e.g. ACL fellow 2020, or the first Hessian LOEWE Distinguished Chair (2,5 mil. Euro) in 2021. Currently, Iryna is the SIGDAT president and the co-director of the ELLIS NLP program. She was PC co-chair of ACL 2018 and has been elected to be the future president (2023) of the international Association for Computational Linguistics (ACL).

Abhijnan Chakraborty
Indian Institute of Technology Delhi, India
Title: Fair Partitioning of Public Resources:Countering Inequality in Public School Funding
Abstract: Public schools in the US offer tuition-free primary and secondary education to students, and are divided into school districts funded by the local and state governments. Although the primary source of school district revenue is public money, several studies have pointed to the inequality in funding across different school districts. In this talk, I’ll focus on the spatial geometry/distribution of such inequality, i.e., how the highlyfunded and lesser funded school districts are located relative to each other. Due to major reliance on local property taxes for school funding, we find the existing school district boundaries promoting financial segregation, with highly-funded school districts surrounded by lesser-funded districts and vice-versa. To counter such issues, we formally propose the Fair Partitioning problem to divide a given set of schools into k districts such that the spatial inequality in district-level funding is minimized. However, the Fair Partitioning problem turns out to be computationally challenging, and thus we provide a greedy approximation algorithm to offer practical solution to Fair Partitioning, and show its effectiveness in lowering spatial inequality in school district funding across different states in the US.
Bio: Abhijnan Chakraborty is an Assistant Professor at Indian Institute of Technology (IIT) Delhi. His research interests fall under the broad theme of Computing and Society, covering the research areas of Social Computing, Information Retrieval and Fairness in Machine Learning. Prior to joining IIT Delhi, he spent two and half years at the Max Planck Institute for Software Systems (MPI-SWS), Germany as a post-doctoral researcher. He obtained his PhD from Indian Institute of Technology (IIT) Kharagpur, where he was awarded the Google India PhD Fellowship and the Prime Minister’s Fellowship for Doctoral Research. Prior to joining PhD, he spent two years at Microsoft Research, working in the area of mobile systems. He has authored several papers in top-tier computer science conferences including WWW, KDD, AAAI, AAMAS, CSCW, ICWSM and MobiCom. His research works have won the best paper award at ASONAM’16 and best poster award at ECIR’19. He is one of the recipients of an internationally competitive research grant from the Data Transparency Lab to advance his research on fairness and transparency in algorithmic systems. More details about him can be found at https://www.cse.iitd.ac.in/~ abhijnan/

TIM BALDWIN
University of Melbourne, Australia
Title: Fairness in Natural Language Processing
Abstract: Natural language processing (NLP) has made truly impressive progress in recent
years, and is being deployed in an ever-increasing range of user-facing settings. Accompanied by this progress has been a growing realisation of inequities in the performance of naively-trained NLP models for users of different demographics, with minorities typically experiencing lower performance levels. In this talk, I will illustrate the nature and magnitude of the problem, and outline a number of approaches that can be used to train fairer models based on different data settings, without sacrificing overall performance levels.
Bio: Tim Baldwin is a Melbourne Laureate Professor in the School of Computing and Information Systems, The University of Melbourne, and also Director of the ARC Centre for Cognitive Computing in Medical Technologies and Vice President of the Association for Computational Linguistics. His primary research focus is on natural language processing (NLP), including social media analytics, deep learning, and computational social science.
Tim completed a BSc(CS/Maths) and BA(Linguistics/Japanese) at The University of Melbourne in 1995, and an MEng(CS) and PhD(CS) at the Tokyo Institute of Technology in 1998 and 2001, respectively. Prior to joining The University of Melbourne in 2004, he was a Senior Research Engineer at the Center for the Study of Language and Information, Stanford University (2001-2004). His research has been funded by organisations including the Australia Research Council, Google, Microsoft, Xerox, ByteDance, SEEK, NTT, and Fujitsu, and has been featured in MIT Tech Review, IEEE Spectrum, The Times, ABC News, The Age/Sydney Morning Herald, Australian Financial Review, and The Australian. He is the author of well over 400 peer-reviewed publications across diverse topics in natural language processing and AI, with over 17,000 citations and an h-index of 61 (Google Scholar), in addition to being an ARC Future Fellow, and the recipient of a number of best paper awards at top conferences.

MOHIT BANSAL
UNC Chapel Hill, USA
Title: Knowledgeable & Spatial-Temporal Vision+Language
Abstract: In this talk, I will present work on knowledgeable and spatial-temporal vision+language reasoning. First, we will discuss how to address the tasks of information extraction, retrieval, verification, and inference when grounded in dynamic spatio-temporal multimodal information of videos and the dialogue-based language in them. We will look at the domain of compositional, multi-hop understanding by combining information across videos and dialogue, and based on the diverse reasoning-capability tasks of temporal moment retrieval (including multilingual), spatial grounding, question answering, future next-event inference, and retrieval highlighting/saliency, as well as a multi-task benchmark for video-and-language understanding evaluation to promote generalizable methods. Second, I will present work on how to enhance large-scale pretraining language models with different aspects of visual grounding and multimodality, so as to improve cross-modal image/video+language tasks as well as language understanding tasks, and how to unify and generalize vision-language pretraining via text generation.
Bio: Dr. Mohit Bansal is the John R. & Louise S. Parker Associate Professor and the Director of the MURGe-Lab (UNC-NLP Group) in the Computer Science department at University of North Carolina (UNC) Chapel Hill. Prior to this, he was a research assistant professor (3-year endowed position) at TTI-Chicago. He received his PhD from UC Berkeley in 2013 (where he was advised by Dan Klein) and his BTech from IIT Kanpur in 2008. His research expertise is in statistical natural language processing and machine learning, with a particular focus on multimodal, grounded, and embodied semantics (i.e., language with vision and speech, for robotics), human-like language generation and Q&A/dialogue, and interpretable and generalizable deep learning. He is a recipient of the 2020 IJCAI Early CAREER Spotlight, 2019 DARPA Director’s fellowship, 2019 Google Focused Research Award, 2019 Microsoft Investigator Fellowship, 2019 NSF CAREER Award, 2018 ARO Young Investigator Award (YIP), 2017 DARPA Young Faculty Award (YFA), 2017 ACL Outstanding Paper Award, 2014 ACL Best Paper Award Honorable Mention, 2018 COLING Area Chair Favorites Paper Award, and 2019 ACL Best Short Paper Nomination. His service includes Program Co-Chair for CoNLL 2019, Senior Area Chair for several ACL and EMNLP conferences, Americas Sponsorship Co-Chair for the ACL, and Associate/Action Editor for TACL, Computational Linguistics (CL), IEEE/ACM TASLP, and CSL journals. Webpages: cs.unc.edu/~mbansal, murgelab.cs.unc.edu, https://nlp.cs.unc.edu/

THOMAS VANDAL
NASA Earth eXchange, USA
Title: GeoNEX-ML: A Machine Learning System for Earth Observations
Abstract: Improved capabilities of earth monitoring satellites are enabling a wide range of studies on the environmental effects of climate change, often leveraging the recent advancements in machine learning. At the same time, the new capabilities, including higher spatial resolution and temporal frequency, are expanding the amount of data generated at exponential rates. At the NASA Earth eXchange (NEX), we build deep learning methods to learn from cross sensor satellite-based Earth observations for generating new datasets with efficient processing techniques. Using current generation geostationary satellites on NEX, we present an interchangeable set of machine models to perform spectral adjustment, physical model emulation, LEO-GEO emulation, and optical flow. These tools are used to generate consistent virtual observations across sensors, perform atmospheric correction and cloud detection, and estimate land surface temperature and atmospheric winds. This approach aims to improve the robustness of remotely sensed data processing by learning from diverse sets of observations while enabling near real-time and on-demand capabilities.
Bio: Dr. Thomas Vandal is a research scientist at the NASA Earth eXchange (NEX) at Ames Research Center and Bay Area Environmental Research Institute. Vandal is currently a PI in NASA’s research program for geostationary satellites developing optical flow technology to improve atmospheric wind estimation. His research interests include machine learning, computer vision, and data mining for applications to climate science and remote sensing. He received a Ph.D. in Interdisciplinary Engineering from Northeastern University in 2018 and B.S. in mathematics at the University of Maryland, College Park in 2012

ROBERT HOEHNDORF
King Abdullah University of Science and Technology, Saudi Arabia
Title: Machine learning with ontologies in biomedicine
Abstract: The life sciences have invested significant resources in the development and application of semantic technologies to make research data accessible and interlinked, and to enable the integration and analysis of data. Utilizing the semantics associated with research data in data analysis approaches is often challenging. Now, novel methods are becoming available that combine symbolic methods and statistical methods in Artificial Intelligence. In my talk, I will show how to use biomedical ontologies in machine learning models, either to provide structured output that can be tested for consistency, or to provide background knowledge. I will show how these methods enable new applications of machine learning in precision health, in particular in finding disease-associated genes based on clinical phenotypes.
Bio: Robert Hoehndorf is an Associate Professor in Computer Science at King Abdullah University of Science and Technology (KAUST) in Thuwal. Prior to joining KAUST, Robert had research positions at Aberystwyth University, the University of Cambridge, the European Bioinformatics Institute, and the Max Planck Institute for Evolutionary Anthropology. His research focuses on the development and application of knowledge-based algorithms in biology and biomedicine, with main applications in understanding the molecular basis of disease.

TAVPRITESH SETHI
IIIT Delhi, India
Title: What would an artificial intelligence augmented pandemic response look like?
Abstract: My talk will focus on the opportunities and challenges in creating next-generation health systems. Such systems will be augmented with machine learning and artificial intelligence, as the pandemic response has taught us. I will present some examples from our own work where we developed real-world solutions for pandemic response such as pathogen surveillance, resource allocation, and Infodemic management through a combination of domain understanding and AI approaches.
Bio: Dr. Tavpritesh Sethi is a physician-scientist and Associate Professor of Computational Biology at Indraprastha Institute of Information Technology Delhi, India and a fellow of the Wellcome Trust/DBT India Alliance at All India Institute of Medical Sciences, New Delhi, India. Over the past two years, he has been a visiting faculty member at Stanford University, School of Medicine from February 2017 to January 2019. He received his M.B.B.S from Government Medical College, Amritsar and PhD from CSIR-Institute of Genomics and Integrative Biology, New Delhi, India. Dr. Sethi specializes in improving outcomes in neonatal, child and maternal health by bridging medicine and artificial intelligence. His research is focused on development and deployment of machine-learning based solutions to enable decisions and policy in pressing healthcare questions such as antimicrobial resistance, sepsis and health inequalities in intensive care and public health settings. He has authored over 20 research articles and has been a recipient of MIT-TR35 India Innovators under 35, Wellcome Trust/DBT India Alliance Early Career Award. He is an editorial board member of PLOS One, Systems Medicine and Journal of Genetics. Dr. Sethi is a member of the European Association of Systems Medicine and leads the Australasia region for International Association of Systems and Networks Medicine (IASyM).

RONITA BARDHAN
University of Cambridge, UK
Title: Deep built environment design for decoupling energy and health burdens from poverty
Abstract: As the climate heats up, built environment is significantly becoming a modifiable factor that implicate health and energy (in)equality. Yet the magnitude and pathways of how built environment design parameters structurally affects the disease burden or household energy consumption remains understudied. The impacts of a dysfunctional space design are most aggravated in poorer communities where the asymmetries are profound. This talk scientifically unfolds how various data streams : (i)quantitative data from environmental/energy sensors, (i)qualitative data on agency and use of space, and (iii) big data on performance metrics like energy consumption can enable understanding the effects of building design parameters quantifiable outcomes. It advances the innovative paradigm of data-driven design to decouple health and energy burdens from poverty. Using novel datasets Global South and Global North, the talk demonstrates how design can help understand health metrics like walkability in cities, outdoor heat stress due to climate change and indoor environmental quality in slum transitional housing. One of the challenges of working in resource constraint communities is the absence of data. This talk discusses how “deep” knowledge can systematically be to generate new information and inform policy space for sustainable and healthy future.
Bio: Dr. Ronita Bardhan (https://www.arct.cam.ac.uk/people/dr-ronita-bardhan) is Assistant Professor of Sustainability in Built Environment at the Department of Architecture, University of Cambridge. She is Director of MPhil in Architecture and Urban Studies (MAUS) and leads the Sustainable Design Group (www.sdgresearch.org) at the Department of Architecture. She believes that data-driven intelligence of built environments can effectively address decarbonisation efforts towards net-zero. Her research on sustainable built environment informs health and energy decisions in the warming climate. Bardhan uses data-driven methods that couples architectural engineering, AI and machine learning with social sciences to provided built environment design solutions for health in resources constraint societies. Barchan’s research informs demand-side design solutions using digital tools which positively affects well-being, energy security, and gender equality while entailing fewer environmental risks. Bardhan works in Slum Rehabilitation (social) housing design in India, Ethiopia, Indonesia, South Africa and Brazil. Her impactful work on tuberculosis and poor indoor air quality in slum rehabilitation housing of Mumbai has received traction from policymakers and several news media. She has written over 100 academic articles on health and environmental design of residential built environment. Ronita is part of AI for Environmental Risk, Cambridge Public Health, Centre for Science and Policy, Cambridge Zero , Cambridge Global Challenges and Sustainability Leadership for Built Environment (IDBE). Dr Bardhan holds the position of Director of and Studies and Fellow in Architecture at Selwyn College in Cambridge. Bardhan is strongly committed to and is an ardent advocate of the shared vision of equality, diversity, inclusion, and belonging in all spheres of her research and teaching. She Chairs the Equality and Diversity committee at the department and believes that everyone benefits from strength in difference and diversity is instrumental to success.

THOMAS VANDAL
NASA Earth eXchange, USA
Title: GeoNEX-ML: A Machine Learning System for Earth Observations
Abstract: Improved capabilities of earth monitoring satellites are enabling a wide range of studies on the environmental effects of climate change, often leveraging the recent advancements in machine learning. At the same time, the new capabilities, including higher spatial resolution and temporal frequency, are expanding the amount of data generated at exponential rates. At the NASA Earth eXchange (NEX), we build deep learning methods to learn from cross sensor satellite-based Earth observations for generating new datasets with efficient processing techniques. Using current generation geostationary satellites on NEX, we present an interchangeable set of machine models to perform spectral adjustment, physical model emulation, LEO-GEO emulation, and optical flow. These tools are used to generate consistent virtual observations across sensors, perform atmospheric correction and cloud detection, and estimate land surface temperature and atmospheric winds. This approach aims to improve the robustness of remotely sensed data processing by learning from diverse sets of observations while enabling near real-time and on-demand capabilities.
Bio: Dr. Thomas Vandal is a research scientist at the NASA Earth eXchange (NEX) at Ames Research Center and Bay Area Environmental Research Institute. Vandal is currently a PI in NASA’s research program for geostationary satellites developing optical flow technology to improve atmospheric wind estimation. His research interests include machine learning, computer vision, and data mining for applications to climate science and remote sensing. He received a Ph.D. in Interdisciplinary Engineering from Northeastern University in 2018 and B.S. in mathematics at the University of Maryland, College Park in 2012.

Ido Dagan
Bar-Ilan University, Israel
Title: Novel data for modeling textual information: decomposition, multi-text, and interaction
Abstract: In the last several years, deep learning methods yielded amazing boosts in natural language processing (NLP) performance. Yet, such methods model language and information structure implicitly, via distributed representations that are hard to control and interpret, often limiting their capabilities in challenging settings. In this talk, I will propose three directions for collecting new types of data that may allow enriched modeling of textual information and its use by humans: (1) decomposing textual information into a semi-structured representation, composed of natural language question-answer pairs; (2) representing information structure across multiple-texts; and (3) data of human interaction with NLP systems, particularly for dynamically summarizing and exploring textual information.
Bio: Ido Dagan is a Professor at the Department of Computer Science at Bar-Ilan University, Israel, the founder of the Natural Language Processing (NLP) Lab at Bar-Ilan, the founder and head of the nationally funded Bar-Ilan University Data Science Institute, and a Fellow of the Association for Computational Linguistics (ACL). His interests are in applied semantic processing, focusing on textual inference, natural open semantic representations, consolidation and summarization of multi-text information, and interactive text summarization and exploration. Dagan and colleagues initiated and promoted textual entailment recognition (RTE, later aka NLI) as a generic empirical task. He was the President of the ACL in 2010 and served on its Executive Committee during 2008-2011. In that capacity, he led the establishment of the journal Transactions of the Association for Computational Linguistics, which became one of two premiere journals in NLP. Dagan received his B.A. summa cum laude and his Ph.D. (1992) in Computer Science from the Technion. He was a research fellow at the IBM Haifa Scientific Center (1991) and a Member of Technical Staff at AT&T Bell Laboratories (1992-1994). During 1998-2003 he was co-founder and CTO of FocusEngine and VP of Technology of LingoMotors, and has been regularly consulting in the industry. His academic research has involved extensive industrial collaboration, including funds from IBM, Google, Thomson-Reuters, Bloomberg, Intel and Facebook, as well as collaboration with local companies under funded projects of the Israel Innovation Authority.

MARINKA ZITNIK
Harvard University, US
Title: Infusing Structure and Knowledge Into Biomedical AI
Abstract: Grand challenges in biology and medicine often lack annotated examples and require generalization to entirely new scenarios not seen during training. However, standard supervised learning is incredibly limited in scenarios, such as designing novel medicines, modeling emerging pathogens, and treating rare diseases. In this talk, I present our efforts to overcome these obstacles by infusing structure and knowledge into learning algorithms. First, I outline our subgraph neural networks that disentangle distinct aspects of subgraph structure in networks. I will then present a general-purpose approach for few-shot learning on graphs. At the core is the notion of local subgraphs that transfer knowledge from one task to another, even when only a handful of labeled examples are available. This principle is theoretically justified as we show the evidence for predictions can be found in subgraphs surrounding the targets. I will conclude with applications in drug development and precision medicine where the algorithmic predictions were validated in human cells and led to the discovery of a new class of drugs.
Bio: Marinka Zitnik (https://zitniklab.hms.harvard.edu) is an Assistant Professor at Harvard University with appointments in the Department of Biomedical Informatics, Broad Institute of MIT and Harvard, and Harvard Data Science. Her research recently won best paper and research awards from the International Society for Computational Biology, Bayer Early Excellence in Science Award, Amazon Faculty Research Award, Rising Star Award in Electrical Engineering and Computer Science (EECS), and Next Generation Recognition in Biomedicine, being the only young scientist with such recognition in both EECS and Biomedicine.

MICHAEL BRONSTEIN
Imperial College London, London
Title: Neural diffusion PDEs, differential geometry, and graph neural networks
Abstract: In this talk, I will make connections between Graph Neural Networks (GNNs) and non-Euclidean diffusion equations. I will show that drawing on methods from the domain of differential geometry, it is possible to provide a principled view on such GNN architectural choices as positional encoding and graph rewiring as well as explain and remedy the phenomena of over squashing and bottlenecks.
Bio: Michael Bronstein is a professor at Imperial College London, where he holds the Chair in Machine Learning and Pattern Recognition, and Head of Graph Learning Research at Twitter. Michael received his PhD from the Technion in 2007. He has held visiting appointments at Stanford, MIT, and Harvard, and has also been affiliated with three Institutes for Advanced Study (at TUM as a Rudolf Diesel Fellow (2017-2019), at Harvard as a Radcliffe fellow (2017-2018), and at Princeton as a short-time scholar (2020)). Michael is the recipient of the Royal Society Wolfson Research Merit Award, Royal Academy of Engineering Silver Medal, five ERC grants, two Google Faculty Research Awards, and two Amazon AWS ML Research Awards. He is a Member of the Academia Europaea, Fellow of IEEE, IAPR, BCS, and ELLIS, ACM Distinguished Speaker, and World Economic Forum Young Scientist. In addition to his academic career, Michael is a serial entrepreneur and founder of multiple startup companies, including Novafora, Invision (acquired by Intel in 2012), Videocites, and Fabula AI (acquired by Twitter in 2019).

SAMEER SINGH
University of California, Irvine
Title: Evaluating and Testing Natural Language Processing Models
Abstract: Current evaluation of the generalization of natural language processing (NLP) systems, and much of machine learning, primarily consists of measuring the accuracy on held-out instances of the dataset. Since the held-out instances are often gathered using similar annotation process as the training data, they include the same biases that act as shortcuts for machine learning models, allowing them to achieve accurate results without requiring actual natural language understanding. Thus held-out accuracy is often a poor proxy for measuring generalization. Further, aggregate metrics have little to say about where the problems may lie, and how to address them.
In this talk, I will introduce a number of approaches we are investigating to perform a more thorough evaluation of NLP systems. I will first provide a quick overview of automated techniques for perturbing instances in the dataset that identify loopholes and shortcuts in NLP models, including semantic adversaries and universal triggers. I will then describe recent work on creating comprehensive and thorough tests and evaluation benchmarks for NLP using CheckList, that aim to directly evaluate comprehension and understanding capabilities. The talk will include a number of NLP tasks, such as sentiment analysis, textual entailment, paraphrase detection, and question answering.
Bio: Dr. Sameer Singh is an Associate Professor of Computer Science at the University of California, Irvine (UCI) and an Allen AI Fellow at Allen Institute for AI. He is working primarily on robustness and interpretability of machine learning algorithms, along with models that reason with text and structure for natural language processing. Sameer was a postdoctoral researcher at the University of Washington and received his PhD from the University of Massachusetts, Amherst. He has received the NSF CAREER award, selected as a DARPA Riser, UCI Distinguished Early Career Faculty award, and the Hellman Faculty Fellowship. His group has received funding from Allen Institute for AI, Amazon, NSF, DARPA, Adobe Research, Hasso Plattner Institute, NEC, Base 11, and FICO. Sameer has published extensively at machine learning and natural language processing venues and received conference paper awards at KDD 2016, ACL 2018, EMNLP 2019, AKBC 2020, and ACL 2020. (https://sameersingh.org/)

Udit Bhatia
IIT Gandhinagar, India
Title: Robustness and recovery of built and natural systems subject to hydrometeorological extremes: Integrating data, dynamics, and complexity
Abstract: In the presentation, I will discuss how the integration of non-linear dynamics, data with complex network representation help us understand the robustness and recovery characteristics of built critical infrastructure systems and natural ecosystems, which can inform the resilient design and near-optimal restoration strategies for such systems. My talk will include specific examples from our work on understanding tolerance of Indian Railways Network, US National Airspace Airport Network, unfolding of concurrent hazards on synthetic and regional transportation networks during 2018 extreme precipitation events in Kerala, and generalisable restoration strategies for degraded ecological networks located across the globe. Further, I will discuss our work and opportunities in the field of physics guided machine learning for predictive understanding of hydrological processes.
Bio: Udit Bhatia is Assistant Professor in Civil Engineering Discipline at Indian Institute of Technology, Gandhinagar. His research interests include resilience of built-natural systems, uncertainty assessment in hydroclimate extremes and physics-guided data sciences. He is co-author of the book titled,” Critical Infrastructures Resilience: Policy and Engineering”. please visit : https://iitgn.ac.in/faculty/civil/fac-udit

Vivek Raghavan
Chief Product Manager and Biometric Architect at UIDAI, India
Title: Creating Datasets for Public Good
Abstract: The talk will highlight recent (ongoing) efforts to collect datasets for training ML models for public good in the Indian Context. Examples of data collection efforts in the Fintech, Indian Languages and the Law and Justice fields will be analyzed. The interplay of these efforts with policy and technology advances and the challenges ahead will be discussed..
Bio: Dr. Vivek Raghavan is an out-of-the-box problem solver, former serial entrepreneur and angel investor. Vivek has an M.S. and Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University and a B. Tech. from IIT Delhi.
Vivek is the Chief Product Manager and Biometric Architect at the Unique Identification Authority of India. He has been responsible for the design, implementation and scale out of the technology platform for Aadhaar, the world’s largest identity program mostly as a volunteer. He joined the Aadhaar project just when the first Aadhaar was issued, and was there when the billionth Aadhaar was generated. In the past 3 years, Vivek has been exploring the use of AI in many public and governance domains. He is driving many AI initiatives at Aadhaar in the areas of biometrics and document validation. Vivek has been responsible for guiding the development of ML models at GSTN and NPCI. As Chief AI Evangelist at the EkStep foundation, Vivek has been advising the National Language Translation Mission to develop open datasets and open models for language AI technologies for Indian languages. Vivek serves as a member of the AI committee of the Supreme Court of India. Vivek has been a long term volunteer with the ISPIRT foundation where he is contributor to DEPA.
Vivek had also served as an volunteer CTO for Team Indus, India’s entry to the Google Lunar X-Prize, which aimed to land a spacecraft on the surface of the moon. Vivek spent 20 years in the field of Electronic Design Automation (EDA), successfully founding, running and selling two EDA companies and being responsible for the design and development of multiple market leading EDA products. He has held senior management positions at Magma Design Automation, Synopsys and Avant! Corporation. Vivek has an eclectic portfolio of angel investments, including Team Indus, Gear Design Automation, eZeTap, ZipDial, HealthifyMe, and Vayavya Labs among others.

SUBIMAL GHOSH
IITB, India