SWAGATA DUARI

Machine Learning Engineer | Ph.D. (Computer Science)

Gurugram, Haryana, India

About me

I am a Machine Learning Engineer specializing in NLP, with 4+ years of experience developing and deploying impactful solutions for e-commerce and the short-term vacation rental industry. My expertise spans the entire ML pipeline, from research and exploratory data analysis (EDA) to model implementation, production deployment, and ongoing maintenance. I have a proven track record of building recommendation systems, search query expansion, and predictive merchandising solutions.

My academic foundation includes a Ph.D. in Computer Science from the University of Delhi, where my thesis focused on "Complex Networks for Textual Discourse Coherence Analysis." My research interests encompass Text Analytics, Natural Language Processing (NLP), Graph Analytics, and Machine Learning.

My experience includes building text similarity-based recommendation models, contextual search query expansion, and predictive merchandising systems. I hold a Ph.D. in Computer Science from the University of Delhi, with a thesis on "Complex Networks for Textual Discourse Coherence Analysis," and my research interests are centered around NLP, Graph Analytics, and Machine Learning.

Key skills: Python | R | MySQL | AWS | GCP | Airflow | NLP | Machine Learning | Data Science | Text Analytics

In my free time, I enjoy reading fictions and classic novels, watching culinary and travel vlogs, and weaving SF&F worlds.

For more details, please download my CV.

Publications

Duari, S., & Bhatnagar, V. (March, 2019). sCAKE: Semantic Connectivity Aware Keyword Extraction. Information Sciences, 477, 100 – 117. DOI: 10.1016/j.ins.2018.10.034. (Paper: ScienceDirect, arXiv Preprint) (Data and Code) (SCI, Impact Factor: 8.233 (2021))

Duari, S., & Bhatnagar, V. (2020). Complex Network based Supervised Keyword Extractor. Expert Systems with Applications, 140, 112876. DOI: 10.1016/j.eswa.2019.112876. (Paper: ScienceDirect, arXiv Preprint) (Data and Code) (SCIE, Impact Factor: 8.665 (2021))

Duari, S., & Bhatnagar, V. (May, 2019). Semi-automatic System for Title Construction. Gani A., Das P., Kharb L., Chahal D. (eds) Information, Communication and Computing Technology. ICICCT 2019. Communications in Computer and Information Science, Springer, Singapore., 1025, 216-227. DOI: 10.1007/978-981-15-1384-8_18. (Paper: SpringerLink, arXiv Preprint) (Data and Code)

Chaturvedi, R., Dhani, J.S., Joshi, A., Khanna, A., Tomar, N., ​Duari, S.​, Khurana, A. and Bhatnagar, V. (November, 2020). Divide and Conquer: From Complexity to Simplicity for Lay Summarization. In ​Proceedings of the First Workshop on Scholarly Document Processing (pp. 344-355). (Paper: ACL Anthology)

Duari, S. and Bhatnagar, V., (2021). FFCD: A Fast-and-Frugal Coherence Detection Method. IEEE Access. vol. 10, pp. 85305-85314. DOI: 10.1109/ACCESS.2021.3135048. (SCIE, Impact Factor: 3.476 (2021))

Bhatnagar, V., Duari, S., and Gupta, S. K. (2022). Quantitative Discourse Cohesion Analysis of Scientific Scholarly Texts using Multilayer Networks. IEEE Access. vol. 10, pp. 88538-88557. DOI: 10.1109/ACCESS.2022.3198952. (SCIE, Impact Factor: 3.476 (2021))

Projects

ACADEMIC AND RESEARCH PROJECTS
DISCOURSE COHERENCE ANALYSIS

I am currently working on a project on computational discourse coherence analysis. My objective is to analyse scientific articles on the basis of their cohesion and coherence, and quantify the measure of writing quality in terms of these properties. We have recently communicated a paper, where we explored complex network based framework for modelling textual discourse

SUPERVISED KEYWORD EXTRACTION

Used classical ML algorithms to engineer solutions for automatically extracting keywords from single documents. We transformed the text to a complex network representation and extracted node properties as features. The proposed method works on all texts, irrespective of the domain, collection, or language.

View Project
UNSUPERVISED KEYWORD EXTRACTION

Engineered solutions for unsupervised, graph-based keyword extraction from single documents. We proposed a novel, parameterless, graph-based keyword extraction algorithm (sCAKE) and its language-agnostic variant (LAKE). We also proposed a context-aware graph construction method and a semantic connectivity based word scoring method.

View Project
SEMI-AUTOMATIC TITLE CONSTRUCTION

Designed a semi-automatic system for identifying and recommending keywords for inclusion in the title of a scientific manuscript. Here, the keyword extraction phase is automatic and title construction from extracted keyword is manual. For extracting keywords, we induced supervised models using a graph- theoretic feature set.

View Project
REVENUE MANAGEMENT SYSTEM

Designed a Revenue Management System for Assam Power Distribution Company Limited (APDCL), Assam, India. It was developed using JSP as front-end and MySQL as back-end. The objective was to design an efficient database to store information regarding revenue collection of APDCL and to build an web-based application to effectively view, manipulate, and aggregate the stored information.

iRAT - A REVIEW AGGREGATION TOOL

An NLP and sentiment Analysis based project for movie classification. The objective was to analyse movie reviews and assign an aggregated sentiment polarity (positive, negative, and neutral) to the reviews. It was developed using JSP, and bag-of-words model was used for document representation and polarity dataset v1.0 was used for evaluation.

FAKE NEWS DETECTION

I am collaborating with Masters students (2019 Batch) of the department for this project on fake news detection. My role is to provide guidance and do brainstorming with the students. I work as an assistant under the guidance of my PhD supervisor, Prof. Vasudha Bhatnagar.

Work Experience

ACADEMIA AND INDUSTRY

MACHINE LEARNING ENGINEER (REMOTE)

Aidaptive, powered by Jarvis ML
OCTOBER 2022 - PRESENT

Working remotely in a US-based startup focused on building an autonomous intelligence platform for digital commerce and short-term vacation rental.

  • Implemented text similarity-based recommendation models for short-term vacation rental and e-commerce customers, which improved revenue earned by nearly 40%.
  • Developed contextual search query expansion for e-commerce customers. Built a named entity recognition (NER) model for detecting entities for e-commerce search queries.
  • Developed predictive merchandising for e-commerce and vacation rental management, including preparation of taxonomy for product/property categorization and predicting product/property category using that taxonomy.
  • Developed pipeline to generate alternate titles and summary for property listings for vacation rental management customers.
  • Perform exploratory data analysis, R&D and proof-of-concept, implementation and testing, deployment, and maintenance for each solution (ML model).

NLP RESEARCHER (REMOTE)

Aisle-3
APRIL 2021 - SEPTEMBER 2022

Worked remotely in a UK-based AI commerce startup focused on crawling the internet and matching product data from various merchant sites to provide a single point-of-search to get the best deal available.

  • Built a named entity recognition (NER) model for detecting entities from product titles and descriptions in the e-commerce domain. We used the NER model in several downstream tasks, including entity extraction for knowledge graph creation, cleaning of product names, and product searching and retrieval..
  • Worked on NLP-enabled product matching by extracting features from textual descriptions from multiple sources.
  • Worked on knowledge graph creation from product descriptions using entity extraction, entity linking, and relationship extraction.
  • Onboarded merchants onto the platform to enrich the quality and quantity of offers. Prepared codes and performed data normalization, validation, and quality checks for onboarded merchants.

ASSISTANT PROFESSOR, COMPUTER SCIENCE

Sibsagar Commerce College, Sivasagar, Assam, India
AUGUST 2013 - MAY 2014

Courses taught: Introduction to Computers, Algorithms, Data Structures, Internet and Web Technologies, DBMS, Software Engineering

ASSISTANT PROFESSOR, COMPUTER SCIENCE

Jhanji H.N.S. College, Sivasagar, Assam, India
OCTOBER 2012 - JULY 2013

Courses taught: "Computer Skills"-A compulsory paper for all 2nd year undergraduate students.
Special Mention: Took initiative to teach computer skills to economically weaker students from the college. Simultaneously taught 3 batches of HS and Bachelor level students (class size = max 10 students) 3-months courses on basic computer skill. This initiative was outside of my regular academic responsibilities.

GRADUATE TRAINEE

Tata Consultancy Services
OCTOBER 2008 - MAY 2009

I was trained in J2EE. As a group project during training, our team developed a finance management system using J2EE as front-end and MySQL as back-end.

Education

ACADEMIC CAREER

Ph.D. - COMPUTER SCIENCE

UNIVERITY OF DELHI, New Delhi
JUNE 2015 - NOVEMBER 2022

Thesis Title: Complex Networks for Textual Discourse Coherence Analysis
Supervisor: Prof. Vasudha Bhatnagar
Research Area: Text Analytics, NLP, and Computational Discourse Analysis
Relevant Coursework: Machine Learning, Special Topics in Data Mining (Graph Analytics), Text Mining

MASTER OF COMPUTER APPLICATION

NORTH EASTERN HILL UNIVERSITY, SHILLONG, MEGHALAYA
2009 - 2012

Graduated with CGPA 9/10 | Secured 2nd rank
Relevant Coursework: Algorithms, Data Mining, Data Structures, DBMS, Advance Discreet Structures, Compiler Design

BACHELOR OF COMPUTER APPLICATION

DIBRUGARH UNIVERSITY, DIBRUGARH, ASSAM
2005 - 2008

Graduated with 80.2% | Secured 1st rank
Relevant Coursework: Data Structures, DBMS, Algorithms,Theory of Computing, Programming in C and C++

SKILLS

TECHNICAL AND NON-TECHNICAL
PROGRAMMING LANGUAGES AND SCRIPTS

Python, R , Java (Prior experience), C/C++ (Prior experience)

DATABASES

MySQL, BigQuery, Amazon RDS (Familiar), Oracle (Prior experience), PostgreSQL (Limited exposure)

CLOUD TOOLS AND PLATFORMS

AWS Lambda, EC2, Amazon SageMaker, Amazon S3, GCP, Apache Airflow

DATA SCIENCE SKILLS

Machine Learning, Natural Language Processing, Deep Learning, Data Science, Text Analytics, Data Analytics

Research and Learning

TALKS, TUTORIALS, COURSES, AND PROFESSIONAL SERVICES
TALKS AND TUTORIALS
  • Presented a contributed talk on "Language-Agnostic Keyword Extraction" at the 14th Inter-Research-Institute Student Seminar (IRISS) in Computer Science held on February 13-14, 2020 at IIT Gandhinagar, India. (pdf)
  • Presented a paper titled "Semi-automatic System for Title Construction" at the 4th International Conference on Information, Communication & Computing Technology (ICICCT-2019) held on May 11, 2019 at New Delhi, India. (pdf)
  • Conducted hands-on sessions on behalf of Network Science Lab at the Faculty Development Program on "Network Science: Foundation of Social Network Analysis" jointly organized by Teaching Learning Centre, Ramanujan College and Dept. of Computer Science during December 3-8, 2018. (Presentations and R scripts)
  • Delivered a talk and demonstration on "Supervised Learning with R" at the Workshop on R Programming organized by Department of Computer Science, University of Delhi, on February 12, 2018. (pdf) (R Scripts)
  • Conducted hands-on session on "Introduction to Text Analytics using R" at National Workshop on Machine Learning organized by Deen Dayal Upadhyaya College, New Delhi, on December 27, 2017. (pdf) (R Scripts)
  • Presented a tutorial on "Unsupervised Keyword Extraction from Single Document" at the 5th International Conference on Big Data Analytics (BDA 2017) held at IIIT Hyderabad, India during December 12-15, 2017. (pdf)
  • Conducted hands-on session on "Text Analytics using R" at a Faculty Development Program on “Big Data Analytics” organized by Geethanjali College of Engineering and Technology, Hyderabad, on November 29, 2017. (pdf) (R Scripts)
  • Conducted hands-on session on "R for Text Analytics" at a faculty development program on “Data Sciences and Machine Learning” organized by Bharati Vidyapeeth’s College of Engineering, New Delhi, on November 10, 2016. (pdf) (R Scripts)
  • Presented a paper titled "Facebook as a platform for Ecommerce: A Brief Study" in the UGC sponsored National Seminar on “Ecommerce: an Emerging Issue in the Global Business” held on April 26-27, 2013 at Sibsagar Commerce College, Sivasagar.

COURSES
  • Introduction to Data Science - Specialization offered by IBM on Coursera - Certificate Link
  • Applied Data Science - Specialization offered by IBM on Coursera - (completed courses 1, 2, 3, 4, 5, 6, 7, 8 and currently studying course 9)
  • Deep learning specialization (completed courses 1, 2, 3 and currently studying courses 4 and 5) - Coursera (deeplearning.ai)
  • Introduction to Data Science - IBM Cognitive Class Certificate Link
  • MHRD-GIAN course titled "Complex Networks" - organized by IIT Delhi on December 18-22, 2017.
  • MHRD-GIAN course titled "Fundamentals and Applications of the Principles of Optimization to various disciplines – Engineering, Business, Life Sciences, Social Sciences and Physical Sciences" - organized by IIT Indore on July 17-21, 2017.
  • MHRD-GIAN course titled "Graph Data Mining and Analysis" - organized by Jamia Millia Islamia, New Delhi, on December 17-23, 2015.

PROFESSIONAL SERVICES
  • Served as a reviewer for IEEE Access (2019 onwards)
  • Served as an additional reviewer for DAWAK 2016 and DAWAK 2017
  • Served as a volunteer for International Conference on Big Data Analytics (BDA 2015)
  • Served as a volunteer for National Conference on Emerging Trends and Applications in Computer Science (NCETACS 2010 and NCETACS 2011)

“If we knew what it was we were doing, it would not be called research, would it?” ― Albert Einstein