SWAGATA DUARI

Machine Learning Engineer | Ph.D. (Computer Science)

Gurugram, Haryana, India

About me

Recent Updates

I am a Text Analytics and NLP Researcher, currently working as a ML Engineer at Aidaptive, powered by Jarvis ML. I have completed my Ph.D. degree from Dept. of Computer Science, University of Delhi, under the supervision of Prof. Vasudha Bhatnagar. During my Ph.D., I was a researcher at the Network Science Lab lead by Prof. Bhatnagar. My research interests include Text Mining, Graph Analytics, Natural Language Processing, Machine Learning, and Discourse Analysis.

Apart from research, I enjoy reading fictions and classic novels, watching culinary and travel vlogs, and writing poems and short stories.

For more details, please download my CV.

November 28, 2022: Successfully defended my Ph.D. thesis titled "Complex Networks for Textual Discourse Coherence Analysis".


June 22, 2020: Presented my Pre-PhD seminar on proposed thesis title "Complex Networks for Textual Discourse Coherence Analysis". Currently, I am in the process of drafting my thesis, which I intend to submit by the end of February, 2021.


June 04, 2020: Communicated our work on discourse coherence analysis of scholarly articles. This work is currently under review.

Publications

Duari, S., & Bhatnagar, V. (March, 2019). sCAKE: Semantic Connectivity Aware Keyword Extraction. Information Sciences, 477, 100 – 117. DOI: 10.1016/j.ins.2018.10.034. (Paper: ScienceDirect, arXiv Preprint) (Data and Code) (SCI, Impact Factor: 8.233 (2021))

Duari, S., & Bhatnagar, V. (2020). Complex Network based Supervised Keyword Extractor. Expert Systems with Applications, 140, 112876. DOI: 10.1016/j.eswa.2019.112876. (Paper: ScienceDirect, arXiv Preprint) (Data and Code) (SCIE, Impact Factor: 8.665 (2021))

Duari, S., & Bhatnagar, V. (May, 2019). Semi-automatic System for Title Construction. Gani A., Das P., Kharb L., Chahal D. (eds) Information, Communication and Computing Technology. ICICCT 2019. Communications in Computer and Information Science, Springer, Singapore., 1025, 216-227. DOI: 10.1007/978-981-15-1384-8_18. (Paper: SpringerLink, arXiv Preprint) (Data and Code)

Chaturvedi, R., Dhani, J.S., Joshi, A., Khanna, A., Tomar, N., ​Duari, S.​, Khurana, A. and Bhatnagar, V. (November, 2020). Divide and Conquer: From Complexity to Simplicity for Lay Summarization. In ​Proceedings of the First Workshop on Scholarly Document Processing (pp. 344-355). (Paper: ACL Anthology)

Duari, S. and Bhatnagar, V., (2021). FFCD: A Fast-and-Frugal Coherence Detection Method. IEEE Access. vol. 10, pp. 85305-85314. DOI: 10.1109/ACCESS.2021.3135048. (SCIE, Impact Factor: 3.476 (2021))

Bhatnagar, V., Duari, S., and Gupta, S. K. (2022). Quantitative Discourse Cohesion Analysis of Scientific Scholarly Texts using Multilayer Networks. IEEE Access. vol. 10, pp. 88538-88557. DOI: 10.1109/ACCESS.2022.3198952. (SCIE, Impact Factor: 3.476 (2021))

Projects

ACADEMIC AND RESEARCH PROJECTS
DISCOURSE COHERENCE ANALYSIS

I am currently working on a project on computational discourse coherence analysis. My objective is to analyse scientific articles on the basis of their cohesion and coherence, and quantify the measure of writing quality in terms of these properties. We have recently communicated a paper, where we explored complex network based framework for modelling textual discourse

SUPERVISED KEYWORD EXTRACTION

Used classical ML algorithms to engineer solutions for automatically extracting keywords from single documents. We transformed the text to a complex network representation and extracted node properties as features. The proposed method works on all texts, irrespective of the domain, collection, or language.

View Project
UNSUPERVISED KEYWORD EXTRACTION

Engineered solutions for unsupervised, graph-based keyword extraction from single documents. We proposed a novel, parameterless, graph-based keyword extraction algorithm (sCAKE) and its language-agnostic variant (LAKE). We also proposed a context-aware graph construction method and a semantic connectivity based word scoring method.

View Project
SEMI-AUTOMATIC TITLE CONSTRUCTION

Designed a semi-automatic system for identifying and recommending keywords for inclusion in the title of a scientific manuscript. Here, the keyword extraction phase is automatic and title construction from extracted keyword is manual. For extracting keywords, we induced supervised models using a graph- theoretic feature set.

View Project
REVENUE MANAGEMENT SYSTEM

Designed a Revenue Management System for Assam Power Distribution Company Limited (APDCL), Assam, India. It was developed using JSP as front-end and MySQL as back-end. The objective was to design an efficient database to store information regarding revenue collection of APDCL and to build an web-based application to effectively view, manipulate, and aggregate the stored information.

iRAT - A REVIEW AGGREGATION TOOL

An NLP and sentiment Analysis based project for movie classification. The objective was to analyse movie reviews and assign an aggregated sentiment polarity (positive, negative, and neutral) to the reviews. It was developed using JSP, and bag-of-words model was used for document representation and polarity dataset v1.0 was used for evaluation.

FAKE NEWS DETECTION

I am collaborating with Masters students (2019 Batch) of the department for this project on fake news detection. My role is to provide guidance and do brainstorming with the students. I work as an assistant under the guidance of my PhD supervisor, Prof. Vasudha Bhatnagar.

Work Experience

ACADEMIA AND INDUSTRY

MACHINE LEARNING ENGINEER (REMOTE)

Aidaptive, powered by Jarvis ML
OCTOBER 2022 - PRESENT

I work on developing property recommendation models for short-term vacation rental using textual similarity.

NLP RESEARCHER (REMOTE)

Aisle-3
APRIL 2021 - SEPTEMBER 2022

Highlights:

  • Worked on NLP-enabled product matching by extracting product features from textual descriptions from multiple sources. Achieved accuracy more than 95%.
  • Knowledge graph creation from product descriptions using entity extraction, entity linking, and relationship extraction.
  • Merchant onboarding onto the platform from multiple sources. Prepared codes and performed data normalisation, data validation, and data quality check for onboarded merchants.

ASSISTANT PROFESSOR, COMPUTER SCIENCE

Sibsagar Commerce College, Sivasagar, Assam, India
AUGUST 2013 - MAY 2014

Courses taught: Introduction to Computers, Algorithms, Data Structures, Internet and Web Technologies, DBMS, Software Engineering

ASSISTANT PROFESSOR, COMPUTER SCIENCE

Jhanji H.N.S. College, Sivasagar, Assam, India
OCTOBER 2012 - JULY 2013

Courses taught: "Computer Skills"-A compulsory paper for all 2nd year undergraduate students.
Special Mention: Took initiative to teach computer skills to economically weaker students from the college. Simultaneously taught 3 batches of HS and BA students (class size = max 10 students) 3-months courses on basic computer skill. This initiative was outside of my regular academic responsibilities.

GRADUATE TRAINEE

Tata Consultancy Services
OCTOBER 2008 - MAY 2009

I was trained in J2EE. As a group project during training, our team developed a finance management system using J2EE as front-end and MySQL as back-end.

Education

ACADEMIC CAREER

Ph.D. - COMPUTER SCIENCE

UNIVERITY OF DELHI, New Delhi
JUNE 2015 - NOVEMBER 2022

Thesis Title: Complex Networks for Textual Discourse Coherence Analysis
Supervisor: Prof. Vasudha Bhatnagar
Research Area: Text Analytics, NLP, and Computational Discourse Analysis
Relevant Coursework: Machine Learning, Special Topics in Data Mining (Graph Analytics), Text Mining

MASTER OF COMPUTER APPLICATION

NORTH EASTERN HILL UNIVERSITY, SHILLONG, MEGHALAYA
2009 - 2012

Graduated with CGPA 9/10 | Secured 2nd rank
Relevant Coursework: Algorithms, Data Mining, Data Structures, DBMS, Advance Discreet Structures, Compiler Design

BACHELOR OF COMPUTER APPLICATION

DIBRUGARH UNIVERSITY, DIBRUGARH, ASSAM
2005 - 2008,

Graduated with 80.2% | Secured 1st rank
Relevant Coursework: Data Structures, DBMS, Algorithms,Theory of Computing, Programming in C and C++

SKILLS

TECHNICAL AND NON-TECHNICAL
PROGRAMMING LANGUAGES AND SCRIPTS

Python (Proficient), R (Proficient), Java (Proficient), C (Proficient), C++ (Prior experience), JSP (Prior experience), JavaScript (Prior experience), HTML (Prior experience), CSS (Prior experience).

DATABASES

BigQuery (Intermediate), Amazon RDS (Familiar), MySQL (Prior experience), Oracle (Prior experience)

SOFTWARES/TOOLS

LaTeX, Weka, AWS Lambda, EC2, Amazon SageMaker, Amazon S3, GCP, Apache Airflow

DATA SCIENCE SKILLS

Text Analytics, Machine Learning, Deep Learning, Data Analytics

Research and Learning

TALKS, TUTORIALS, COURSES, AND PROFESSIONAL SERVICES
TALKS AND TUTORIALS
  • Presented a contributed talk on "Language-Agnostic Keyword Extraction" at the 14th Inter-Research-Institute Student Seminar (IRISS) in Computer Science held on February 13-14, 2020 at IIT Gandhinagar, India. (pdf)
  • Presented a paper titled "Semi-automatic System for Title Construction" at the 4th International Conference on Information, Communication & Computing Technology (ICICCT-2019) held on May 11, 2019 at New Delhi, India. (pdf)
  • Conducted hands-on sessions on behalf of Network Science Lab at the Faculty Development Program on "Network Science: Foundation of Social Network Analysis" jointly organized by Teaching Learning Centre, Ramanujan College and Dept. of Computer Science during December 3-8, 2018. (Presentations and R scripts)
  • Delivered a talk and demonstration on "Supervised Learning with R" at the Workshop on R Programming organized by Department of Computer Science, University of Delhi, on February 12, 2018. (pdf) (R Scripts)
  • Conducted hands-on session on "Introduction to Text Analytics using R" at National Workshop on Machine Learning organized by Deen Dayal Upadhyaya College, New Delhi, on December 27, 2017. (pdf) (R Scripts)
  • Presented a tutorial on "Unsupervised Keyword Extraction from Single Document" at the 5th International Conference on Big Data Analytics (BDA 2017) held at IIIT Hyderabad, India during December 12-15, 2017. (pdf)
  • Conducted hands-on session on "Text Analytics using R" at a Faculty Development Program on “Big Data Analytics” organized by Geethanjali College of Engineering and Technology, Hyderabad, on November 29, 2017. (pdf) (R Scripts)
  • Conducted hands-on session on "R for Text Analytics" at a faculty development program on “Data Sciences and Machine Learning” organized by Bharati Vidyapeeth’s College of Engineering, New Delhi, on November 10, 2016. (pdf) (R Scripts)
  • Presented a paper titled "Facebook as a platform for Ecommerce: A Brief Study" in the UGC sponsored National Seminar on “Ecommerce: an Emerging Issue in the Global Business” held on April 26-27, 2013 at Sibsagar Commerce College, Sivasagar.

COURSES
  • Introduction to Data Science - Specialization offered by IBM on Coursera - Certificate Link
  • Applied Data Science - Specialization offered by IBM on Coursera - (completed courses 1, 2, 3, 4, 5, 6, 7, 8 and currently studying course 9)
  • Deep learning specialization (completed courses 1, 2, 3 and currently studying courses 4 and 5) - Coursera (deeplearning.ai)
  • Introduction to Data Science - IBM Cognitive Class Certificate Link
  • MHRD-GIAN course titled "Complex Networks" - organized by IIT Delhi on December 18-22, 2017.
  • MHRD-GIAN course titled "Fundamentals and Applications of the Principles of Optimization to various disciplines – Engineering, Business, Life Sciences, Social Sciences and Physical Sciences" - organized by IIT Indore on July 17-21, 2017.
  • MHRD-GIAN course titled "Graph Data Mining and Analysis" - organized by Jamia Millia Islamia, New Delhi, on December 17-23, 2015.

PROFESSIONAL SERVICES
  • Served as a reviewer for IEEE Access (2019 onwards)
  • Served as an additional reviewer for DAWAK 2016 and DAWAK 2017
  • Served as a volunteer for International Conference on Big Data Analytics (BDA 2015)
  • Served as a volunteer for National Conference on Emerging Trends and Applications in Computer Science (NCETACS 2010 and NCETACS 2011)

“If we knew what it was we were doing, it would not be called research, would it?” ― Albert Einstein