Difficulty: beginner
Estimated Time: 20 minutes

Natural Language Processing aggregates several tasks that can be performed, like:

  • Part of speech tagging
  • Word segmentation
  • Named entity recognition
  • Machine translation
  • Question answering
  • Sentiment analysis
  • Topic segmentation and recognition
  • Natural language generation

It all starts though with preparing text for further processing. In this lab you will learn how to use word embeddings.

You've completed Word Embeddings scenario.

Don’t stop now! The next scenario will only take about 10 minutes to complete.

Introduction to word embeddings

Step 1 of 4

Read embeddings

In this lab we will play with words embeddings. We will use already prepare dataset downloaded from fasttext. Our vocabulary will be limited and containing only 10000 words (for the performance reasons).

The file includes lines representing word and the vector of embedding values. For instance:

day 0.0320 0.0381 -0.0299 -0.0745 -0.0624 ...

Let us download the embeddings file:

curl -LO https://github.com/BasiaFusinska/katacoda-scenarios/raw/master/nlp-with-python/embeddings/assets/vectors.vec

To start working with Python use the following command:

python

The task of reading the vectors is simple going through every line and building the dictionary {word: [embeddings]}. We've prepared a function to read data in the data_reader module.

import data_reader embeddings, words = data_reader.load_embeddings()

Variable embeddings is a dictionary, words is just an array of words (read in the order presented in the file). Let's print some of the embeddings to get some feel of the data.

for word in words[:10]:
    print(word, embeddings[word])