Project 6 - Recurrent Neural Network (Due 11/20)

Objectives:

The objective of this project is to learn how to train a simple RNN for natural language processing (NLP) purpose. We start from character-level-rnn then move on to word-level-rnn.

Data set used:

Hitchhiker's Guide to the Galaxy

Requirements:

Task 1: Study the sample code provided by Shang that includes a word2vec trainer and visualizer (cbow_gensim.py) and a character-level RNN (character_rnn.py) that trains on "Hitchhiker's Guide to the Galaxy."
Task 2: Adapt the sample code and train word embeddings on a corpus of your own choice.
Task 3: Write a word-level RNN that generates new text segments based on the selected corpus (by adapting the sample code).

Report

Word2Vec visualization of the trained word embeddings
Samples of the generated text over time with how long the RNN was trained for each sample.
(For 692 students only) Read [HAN:2016] and [TextCNN:2016] which represent state-of-the-art for text classification but use very different approaches. Write a no more than 1-page report on the uniqueness of these two approaches.