Vietnamese LLM Evaluation

Finetuning and comprehensive evaluation of Vietnamese large language models at Stanford AI Lab.

At the Stanford Trustworthy AI Research Lab, I worked on improving the capabilities of large language models for Vietnamese, a language spoken by over 100 million people.

Key contributions:

  • Architected a novel AI evaluation framework combining public benchmark design with private test content to prevent data contamination, achieving a 40% reduction in benchmark gaming
  • Built a comprehensive evaluation framework with 10 NLP tasks (translation, summarization, Q&A, sentiment analysis) and 31 automated metrics, processing 50,000+ test samples
  • Published as co-first author at NAACL 2024 (Truong et al., 2024)

Press coverage: Featured in The New York Times, Stanford HAI, and Stanford AI Lab Blog.

References

2024

  1. NAACL
    Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
    Nhi N. Truong, Hien Vo, Nhat Tran, and 3 more authors
    In Findings of the Association for Computational Linguistics: NAACL 2024. Featured in The New York Times and Stanford HAI. , Mar 2024