Project 6 - Recurrent Neural Network (Due 11/20)
Objectives:
The objective of this project is to learn how to train a simple RNN for
natural language processing (NLP) purpose. We start from
character-level-rnn then move on to word-level-rnn.
Data set used:
Hitchhiker's Guide to the Galaxy
Requirements:
- Task 1: Study the sample code provided by Shang that includes a
word2vec trainer and visualizer (cbow_gensim.py) and a character-level RNN
(character_rnn.py)
that trains on "Hitchhiker's Guide to the Galaxy."
- Task 2: Adapt the sample code and train word embeddings on a
corpus of your own choice.
- Task 3: Write a word-level RNN that generates new text segments
based on the selected corpus (by adapting the sample code).
Report
- Word2Vec visualization of the trained word embeddings
- Samples of the generated text over time with how long the RNN
was trained for each sample.
- (For 692 students only) Read [HAN:2016] and [TextCNN:2016]
which represent state-of-the-art for text classification but use
very different approaches. Write a no more than 1-page report on
the uniqueness of these two approaches.