Liwei Cai

NLP, linguistics, algorithms

I am currently a first-year M.S. in Intelligent Information Systems (a master program under the School of Computer Science) at Carnegie Mellon University, and I am aiming at becoming a versatile software engineer. I am interested in natural language processing (NLP) and have research and work experience related to it. I also developed strong coding skills in Python, Java, C++, and deep understandings of algorithms while working on NLP projects. At CMU, I will gradually change my interest into general computer science, and take more courses on that.

The PDF version of my Resume/CV is also available.

(academic, professional): liweicaiandrewcmuedu
(personal, casual): cai.lw123gmailcom


Mobvoi (Beijing)

NLP Intern
  • Designed architectures of well-structured knowledge base ontology and automatized pipeline for knowledge extraction from web pages with teammates.
  • Investigated and developed the knowledge storage and query module, which represented facts as typed RDF triples and stored them in MySQL, and translated queries into SQL with type check.
  • Migrated the existing unannotated knowledge base to the new type-safe one, increasing the robustness of the question answering (QA) service it supports.
April 2018 - July 2018

Natural Language Processing Lab, UC Santa Barbara

Research Assistant
  • Proposed a generative adversarial model that can adaptively generate better negative training data according to the behavior of the knowledge graph embedding (KGE) model being trained.
  • Implemented the model in PyTorch and conducted experiments on various datasets to demonstrate its ability to improve the Hit@10 performance metric of existing KGE models by up to 3%.
  • Authored the paper KBGAN: Adversarial Learning for Knowledge Graph Embedding as the first author and published it in NAACL 2018. Paper and Code are available online.
June 2017 - September 2017


Carnegie Mellon University

Master of Science (expected)
M.S. in Intelligent Information Systems, Language Technology Institute, School of Computer Science
August 2018 - December 2019 (expected)

Tsinghua University

Bachelor in Engineering
Electronic Information Science and Technology, Electronic Engineering Department

GPA: 89/100 (top 20% in the department)

August 2014 - May 2018


Content-based Recommender System for Electronic Medical Records (EMRs)

Bachelor's Thesis Project, at Tsinghua University
  • Developed a two-stage content-based text recommender system, in which candidate documents are first retrieved by approximate nearest neighbor search, and then ranked by various scoring methods based on word embeddings and RNN models.
  • Conducted comprehensive experiments on an EMR dataset to demonstrate its improvement in multiple precision metrics by up to 6%, compared to traditional vector space models.
March 2018 - June 2018

Image Captioning in Chinese

Course project of Pattern Recognition (a graduate-level course), at Tsinghua University

Code is available online.

  • Implemented and tuned recurrent neural network (RNN) with and without attention mechanism from scratch in Tensorflow. The model ranked top 10% in the class in performance on a concealed test set.
  • Incorporated dropout, skip connection, beam search width limitation and short sequence punishment in the model to control overfitting.
May 2017 - June 2017

Interpreter of Lisp-like Language

Course project of Functional Programming, at Tsinghua University

Code is available online.

  • Developed an interpreter of a Lisp-like mini language in Haskell, using a modified “finally tagless” mechanism for computation and type check, and parser combinator library for parsing source code.
  • Supported lambda calculus as well as primitive numeric/boolean/string operations with dynamic type check.
December 2016 - January 2017


( - Proficient, - Familiar, roughly in descending order of expertise)

Programming Languages

Python C++ MATLAB Java JavaScript SQL HTML Rust Haskell

Softwares and Tools

PyTorch Tensorflow Git MongoDB MySQL Bootstrap Vue.js

Development Environment

Windows Linux Mac


I practice competitive programming in my spare time mostly for fun (and also for learning algorithms). I regularly participate in contests at Codeforces (my profile) and Leetcode (my profile).

I like to play rhythm games, strategic games and survival games on mobile devices and PCs. These types of games are relatively sparse and usually hard to operate on game consoles, so I don't own any of them.

I should have lived a healthier life, but I have failed to get interest in gym, sports or outdoor activities. Hope that will change in the future...