Vietnamese LLM Evaluation
Finetuning and comprehensive evaluation of Vietnamese large language models at Stanford AI Lab.
At the Stanford Trustworthy AI Research Lab, I worked on improving the capabilities of large language models for Vietnamese, a language spoken by over 100 million people.
Key contributions:
- Architected a novel AI evaluation framework combining public benchmark design with private test content to prevent data contamination, achieving a 40% reduction in benchmark gaming
- Built a comprehensive evaluation framework with 10 NLP tasks (translation, summarization, Q&A, sentiment analysis) and 31 automated metrics, processing 50,000+ test samples
- Published as co-first author at NAACL 2024 (Truong et al., 2024)
Press coverage: Featured in The New York Times, Stanford HAI, and Stanford AI Lab Blog.
References
2024
- NAACLCrossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language ModelsIn Findings of the Association for Computational Linguistics: NAACL 2024. Featured in The New York Times and Stanford HAI. , Mar 2024