I am a Machine Learning Engineer specializing in NLP, with 4+ years of experience developing and deploying impactful solutions for e-commerce and the short-term vacation rental industry. My expertise spans the entire ML pipeline, from research and exploratory data analysis (EDA) to model implementation, production deployment, and ongoing maintenance. I have a proven track record of building recommendation systems, search query expansion, and predictive merchandising solutions.
My academic foundation includes a Ph.D. in Computer Science from the University of Delhi, where my thesis focused on "Complex Networks for Textual Discourse Coherence Analysis." My research interests encompass Text Analytics, Natural Language Processing (NLP), Graph Analytics, and Machine Learning.
My experience includes building text similarity-based recommendation models, contextual search query expansion, and predictive merchandising systems. I hold a Ph.D. in Computer Science from the University of Delhi, with a thesis on "Complex Networks for Textual Discourse Coherence Analysis," and my research interests are centered around NLP, Graph Analytics, and Machine Learning.
Key skills: Python | R | MySQL | AWS | GCP | Airflow | NLP | Machine Learning | Data Science | Text Analytics
In my free time, I enjoy reading fictions and classic novels, watching culinary and travel vlogs, and weaving SF&F worlds.
For more details, please download my CV.
Duari, S., & Bhatnagar, V. (March, 2019). sCAKE: Semantic Connectivity Aware Keyword Extraction. Information Sciences, 477, 100 – 117. DOI: 10.1016/j.ins.2018.10.034. (Paper: ScienceDirect, arXiv Preprint) (Data and Code) (SCI, Impact Factor: 8.233 (2021))
Duari, S., & Bhatnagar, V. (2020). Complex Network based Supervised Keyword Extractor. Expert Systems with Applications, 140, 112876. DOI: 10.1016/j.eswa.2019.112876. (Paper: ScienceDirect, arXiv Preprint) (Data and Code) (SCIE, Impact Factor: 8.665 (2021))
Duari, S., & Bhatnagar, V. (May, 2019). Semi-automatic System for Title Construction. Gani A., Das P., Kharb L., Chahal D. (eds) Information, Communication and Computing Technology. ICICCT 2019. Communications in Computer and Information Science, Springer, Singapore., 1025, 216-227. DOI: 10.1007/978-981-15-1384-8_18. (Paper: SpringerLink, arXiv Preprint) (Data and Code)
Chaturvedi, R., Dhani, J.S., Joshi, A., Khanna, A., Tomar, N., Duari, S., Khurana, A. and Bhatnagar, V. (November, 2020). Divide and Conquer: From Complexity to Simplicity for Lay Summarization. In Proceedings of the First Workshop on Scholarly Document Processing (pp. 344-355). (Paper: ACL Anthology)
Duari, S. and Bhatnagar, V., (2021). FFCD: A Fast-and-Frugal Coherence Detection Method. IEEE Access. vol. 10, pp. 85305-85314. DOI: 10.1109/ACCESS.2021.3135048. (SCIE, Impact Factor: 3.476 (2021)) Bhatnagar, V., Duari, S., and Gupta, S. K. (2022). Quantitative Discourse Cohesion Analysis of Scientific Scholarly Texts using Multilayer Networks. IEEE Access. vol. 10, pp. 88538-88557. DOI: 10.1109/ACCESS.2022.3198952. (SCIE, Impact Factor: 3.476 (2021))I am currently working on a project on computational discourse coherence analysis. My objective is to analyse scientific articles on the basis of their cohesion and coherence, and quantify the measure of writing quality in terms of these properties. We have recently communicated a paper, where we explored complex network based framework for modelling textual discourse
Used classical ML algorithms to engineer solutions for automatically extracting keywords from single documents. We transformed the text to a complex network representation and extracted node properties as features. The proposed method works on all texts, irrespective of the domain, collection, or language.
View ProjectEngineered solutions for unsupervised, graph-based keyword extraction from single documents. We proposed a novel, parameterless, graph-based keyword extraction algorithm (sCAKE) and its language-agnostic variant (LAKE). We also proposed a context-aware graph construction method and a semantic connectivity based word scoring method.
View ProjectDesigned a semi-automatic system for identifying and recommending keywords for inclusion in the title of a scientific manuscript. Here, the keyword extraction phase is automatic and title construction from extracted keyword is manual. For extracting keywords, we induced supervised models using a graph- theoretic feature set.
View ProjectDesigned a Revenue Management System for Assam Power Distribution Company Limited (APDCL), Assam, India. It was developed using JSP as front-end and MySQL as back-end. The objective was to design an efficient database to store information regarding revenue collection of APDCL and to build an web-based application to effectively view, manipulate, and aggregate the stored information.
An NLP and sentiment Analysis based project for movie classification. The objective was to analyse movie reviews and assign an aggregated sentiment polarity (positive, negative, and neutral) to the reviews. It was developed using JSP, and bag-of-words model was used for document representation and polarity dataset v1.0 was used for evaluation.
I am collaborating with Masters students (2019 Batch) of the department for this project on fake news detection. My role is to provide guidance and do brainstorming with the students. I work as an assistant under the guidance of my PhD supervisor, Prof. Vasudha Bhatnagar.
Working remotely in a US-based startup focused on building an autonomous intelligence platform for digital commerce and short-term vacation rental.
Worked remotely in a UK-based AI commerce startup focused on crawling the internet and matching product data from various merchant sites to provide a single point-of-search to get the best deal available.
Courses taught: Introduction to Computers, Algorithms, Data Structures, Internet and Web Technologies, DBMS, Software Engineering
Courses taught: "Computer Skills"-A compulsory paper for all 2nd year undergraduate students. Special Mention: Took initiative to teach computer skills to economically weaker students from the college. Simultaneously taught 3 batches of HS and Bachelor level students (class size = max 10 students) 3-months courses on basic computer skill. This initiative was outside of my regular academic responsibilities.
I was trained in J2EE. As a group project during training, our team developed a finance management system using J2EE as front-end and MySQL as back-end.
Thesis Title: Complex Networks for Textual Discourse Coherence Analysis Supervisor: Prof. Vasudha Bhatnagar Research Area: Text Analytics, NLP, and Computational Discourse Analysis Relevant Coursework: Machine Learning, Special Topics in Data Mining (Graph Analytics), Text Mining
Graduated with CGPA 9/10 | Secured 2nd rank Relevant Coursework: Algorithms, Data Mining, Data Structures, DBMS, Advance Discreet Structures, Compiler Design
Graduated with 80.2% | Secured 1st rank Relevant Coursework: Data Structures, DBMS, Algorithms,Theory of Computing, Programming in C and C++
Python, R , Java (Prior experience), C/C++ (Prior experience)
MySQL, BigQuery, Amazon RDS (Familiar), Oracle (Prior experience), PostgreSQL (Limited exposure)
AWS Lambda, EC2, Amazon SageMaker, Amazon S3, GCP, Apache Airflow
Machine Learning, Natural Language Processing, Deep Learning, Data Science, Text Analytics, Data Analytics