Portfolio
Natural Language Processing
CS224n: Natural Language Processing with Deep Learning
My complete implementation of assignments and projects in CS224n: Natural Language Processing with Deep Learning by Stanford (Winter, 2023).

Neural Machine Translation: An NMT system which translates texts from Spanish to English using a Bidirectional LSTM encoder for the source sentence and a Unidirectional LSTM Decoder with multiplicative attention for the target sentence (GitHub).
Dependency Parsing: A Neural Transition-Based Dependency Parsing system with one-layer MLP (GitHub).
Zone-h-esque Captcha Recognition

I contributed a dataset comprising 50,000 captcha images and their corresponding labels. The dataset is used to train a model composed of CNN + RNN + CTC-loss.
Kalapa Medical QA

We developed a Vietnamese Medical QA system leveraging a medical corpus comprising 600 articles that integrated Retrieval Augmented Generation (RAG) by ensembling SimeCSE_Vietnamese and BM25, with a layer of keyword search using Ahocorasick on top. We also finetuned a 3B reader Vietcunna with LoRA on GPT-4 generated data. The pipeline achieved 59.6% accuracy, 0.7 score, and positioned at top 10 out of 300 teams.
ZaloAI Math-Solver System

We developed a Vietnamese Math Solver System leveraging Llama-7B for vi2en translation plus custom parser and Tool-
integrated Reasoning Agent (ToRA). We finetuned a baseline Vi-DeBERTA, and experimented with various of SOTA mathword models like MetaMath, GPT4 COT. The system achieved 75% accuracy
DataDive: Supporting Readers’ Contextualization of Statistical Statements with Data Exploration

I proposed and implemented a multi-staged pipeline for data exploration focusing on text highlights, recommendations, and visualization using Large Language Models (LLM). I grounded LLM's visual reasoning on data using Text-to-SQL and Semantic Similarity. Recommendations were ranked with pair-wise ranking prompting. My pipeline achieved 80.5% match accuracy; 98% relevance and 4.07/5 interestingness for recommendation. Got accepted at IUI.
Machine Learning
CS189/289A: Introduction to Machine Learning
My complete implementation of assignments and projects in CS189/289A: Introduction to Machine Learning by UC Berkeley (Fall, 2023).

Robot Localization: A Hidden Markov Model (HMM) to localize a robot in a 2D grid world. The robot can move in four directions (up, down, left, right) and has a sensor that can identify obstruction. The goal is to estimate the robot’s position given a sequence of noisy sensor measurements. (GitHub)
Jack’s Car Rentals: A reinforcement learning problem where the goal is to maximize the profit of a car rental company by deciding how many cars to move between two locations every night. The model used is a Markov Decision Process. (GitHub)
Hierarchical Stock Clustering

I performed hierarchical clustering on S&P 500 data using Hierarchical Agglomerative Linkage algorithms and Directed Bubble Hierarchical Tree (DBHT), and benchmarked them against GICS. I then Tailored DBHT with Ward Linkage to improve its Adjusted Rand Index by 20%. I also constructed a portfolio out of S&P 500 by applying Hierarchical Risk Parity on the clusters produced above.
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells

We integrate the consistency training and self-training curriculum. We performed extensive experiment on different white-blood-cell datasets and report SOTA results.
Software Engineering
Urban Waste Management System

This projects creates a web-based dashboard view for managers of an automated waste-collection system. The system follows an MVC architecture with a custom MySQL database. It automates route and workload arrangement using Travelling Salesman Heuristics and Open Source Routing Machine (OSRM)
P2PChat

I implemented a hybrid Client-Server x P2P chat app where authentication and matching is done on the server. The system enabled realtime P2P communication using TCP protocol and Multithreading between authenticated peers together with File Transfer between peers.
© 2024 Le Khanh Duy. Powered by Jekyll and the Minimal Theme.