Le Khanh Duy

Logo

Resume IconResume | Resume IconLinkedIn | Github IconGitHub

I am pursing BS degree in Computer Science at Ho Chi Minh City University of Technology.

I was a recent Research Intern at KAIST Interaction Lab, where I had a publication on Large Language Models in Human-AI Interaction.

Portfolio


Natural Language Processing

CS224n: Natural Language Processing with Deep Learning

My complete implementation of assignments and projects in CS224n: Natural Language Processing with Deep Learning by Stanford (Winter, 2023).

View on GitHub

Neural Machine Translation: An NMT system which translates texts from Spanish to English using a Bidirectional LSTM encoder for the source sentence and a Unidirectional LSTM Decoder with multiplicative attention for the target sentence (GitHub).

Dependency Parsing: A Neural Transition-Based Dependency Parsing system with one-layer MLP (GitHub).


Zone-h-esque Captcha Recognition

Run in Google Colab View on GitHub

I contributed a dataset comprising 50,000 captcha images and their corresponding labels. The dataset is used to train a model composed of CNN + RNN + CTC-loss.




Kalapa Medical QA

View on GitHub

We developed a Vietnamese Medical QA system leveraging a medical corpus comprising 600 articles that integrated Retrieval Augmented Generation (RAG) by ensembling SimeCSE_Vietnamese and BM25, with a layer of keyword search using Ahocorasick on top. We also finetuned a 3B reader Vietcunna with LoRA on GPT-4 generated data. The pipeline achieved 59.6% accuracy, 0.7 score, and positioned at top 10 out of 300 teams.




ZaloAI Math-Solver System

View on GitHub

We developed a Vietnamese Math Solver System leveraging Llama-7B for vi2en translation plus custom parser and Tool- integrated Reasoning Agent (ToRA). We finetuned a baseline Vi-DeBERTA, and experimented with various of SOTA mathword models like MetaMath, GPT4 COT. The system achieved 75% accuracy




DataDive: Supporting Readers’ Contextualization of Statistical Statements with Data Exploration

View on GitHub

I proposed and implemented a multi-staged pipeline for data exploration focusing on text highlights, recommendations, and visualization using Large Language Models (LLM). I grounded LLM's visual reasoning on data using Text-to-SQL and Semantic Similarity. Recommendations were ranked with pair-wise ranking prompting. My pipeline achieved 80.5% match accuracy; 98% relevance and 4.07/5 interestingness for recommendation. Got accepted at IUI.




Machine Learning

CS189/289A: Introduction to Machine Learning

My complete implementation of assignments and projects in CS189/289A: Introduction to Machine Learning by UC Berkeley (Fall, 2023).

View on GitHub

Robot Localization: A Hidden Markov Model (HMM) to localize a robot in a 2D grid world. The robot can move in four directions (up, down, left, right) and has a sensor that can identify obstruction. The goal is to estimate the robot’s position given a sequence of noisy sensor measurements. (GitHub)

Jack’s Car Rentals: A reinforcement learning problem where the goal is to maximize the profit of a car rental company by deciding how many cars to move between two locations every night. The model used is a Markov Decision Process. (GitHub)


Hierarchical Stock Clustering

Run in Google Colab

I performed hierarchical clustering on S&P 500 data using Hierarchical Agglomerative Linkage algorithms and Directed Bubble Hierarchical Tree (DBHT), and benchmarked them against GICS. I then Tailored DBHT with Ward Linkage to improve its Adjusted Rand Index by 20%. I also constructed a portfolio out of S&P 500 by applying Hierarchical Risk Parity on the clusters produced above.




Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells

View on GitHub

We integrate the consistency training and self-training curriculum. We performed extensive experiment on different white-blood-cell datasets and report SOTA results.




Software Engineering

Urban Waste Management System

View on GitHub

This projects creates a web-based dashboard view for managers of an automated waste-collection system. The system follows an MVC architecture with a custom MySQL database. It automates route and workload arrangement using Travelling Salesman Heuristics and Open Source Routing Machine (OSRM)




P2PChat

View on GitHub

I implemented a hybrid Client-Server x P2P chat app where authentication and matching is done on the server. The system enabled realtime P2P communication using TCP protocol and Multithreading between authenticated peers together with File Transfer between peers.




© 2024 Le Khanh Duy. Powered by Jekyll and the Minimal Theme.