Show as:

DETAILS ABOUT http://data.wordlift.io/wl0216/entity/what_are_embeddings_

Property Value
dct:relation http://data.wordlift.io/wl0216/entity/natural_language_processing
dct:relation http://data.wordlift.io/wl0216/entity/deep_learning
dct:relation http://data.wordlift.io/wl0216/entity/machine_learning
dct:relation http://data.wordlift.io/wl0216/entity/artificial_intelligence
wordpress:content "

In natural language processing the goal is to have machines understand human language. Unfortunately, machine learning and deep learning algorithms only work with numbers so how can we convert the meaning of a word to a number?

This is what embeddings are for. Teaching language to computers by translating meanings into mathematical vectors (series of numbers).

Word embeddings

In word embeddings, the vectors of semantically similar terms are close to each other. In other terms, words that have a similar meaning will have a similar distance in a multi-dimensional vector space.

Here is a classical example - “king is to queen as man is to woman” encoded in the vector space as well as verb Tense and Country and their capitals are encoded in low dimensional space preserving the semantic relationships.

Knowledge Graph embeddings

We can use the same technique used for words also for analyzing nodes (entities) and edges (relationship) in a knowledge graph. By doing so we can encode the meanings in a graph in a format (numerical vectors) that we can use for machine learning applications.

https://wordlift.io/blog/en/?post_type=post&p=18740
You can create graph embeddings from the Knowledge Graph that WordLift creates by reading the article above.

In the following presentation, I introduce the concept of multidimensional meanings using a song from "The Notorious B.I.G.", undoubtedly one of the biggest rappers of all time. The song is called What’s Beef?".

In the text of the song, there is a play on the homophones “I see you” and “ICU” which is the acronym for intensive care unit most interestingly the word Beef assumes different meanings in every sentence. As we can see meanings change based on the words of each sentence. The idea that meanings can be derived by the analysis of the closest words was introduced by Firth, an English linguist and a leading figure in British linguistics during the 1950s.

Firth is known as the father of distributional semantics a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data

It is using this exact framework (studying semantic similarities between terms inside a given context window) that we can train a machine to understand its meaning.

Cosine similarity

When we want to analyze the semantic similarity of two documents (or two queries), and we have turned these documents into mathematical vectors, we can use the cosine of the angle between their respective vectors.

The real advantage is that two similar documents might still be far apart when calculating the Euclidean distance if they use different words with similar meanings. We might have, for example, the term ‘soccer’ that appears fifty times in one document and ten times in another. Still, they will be considered similar when we analyze their respective vectors within the same multidimensional space. 

The reason is that even if the terms used are different, as long as their meaning is similar, the orientation of their vectors will also be similar. In other words, a smaller angle between two vectors represents a higher degree of similarity.

Embeddings are one of the different techniques we can use to analyze and cluster queries. See our web story on keyword research using AI to find out more.

The Power of Embeddings in SEO 🚀

Embeddings have revolutionized Search and the SEO landscape, and to help you navigate this shift, I've created a straightforward tutorial on fine-tuning embeddings tailored to your content (here is the 🐍 code on a notebook that you can play with).

But before we delve into that, let's understand the significance of embeddings and how they can help you with your SEO strategies. Here are some tasks where embeddings prove to be extremely useful:

  • Content Optimization: Analyze embeddings of top-ranking pages to identify themes.
  • Link Building: Find semantically related pages by comparing embeddings.
  • Keyword Research: Discover semantically related queries to target keywords.
  • Ranking Models: Predict page rankings for specific keywords using embeddings.
  • Site Migration: Check out the latest article by Michael King on this.
  • Dynamic 404 Pages: Learn more about it here (specifically for e-Commerce)

Train your embeddings for your next SEO task

But, can we train embeddings using our content? Absolutely! Especially when dealing with domain-specific language, it's beneficial to train your embeddings.

Here's how:

👉 You can jump directly to the Colab here and follow the instruction in the notebook.

  • We will use the Finetuner by Jina AI.
  • A simple dataset from datadotworld.
  • Last we'll visualize the embeddings using Atlas by Nomic AI.

In this tutorial, I used a craft beer dataset. Our objective is straightforward: to train our embeddings using the names and styles of the beers. To achieve this, we'll employ Finetuner by Jina AI, which allows us to tailor an existing model (in our case, bert-base-en) to craft our custom embeddings.

Having a small dataset the process is blazing fast and you can immediately see the performance improvement in comparison with the standard model.

We "learn" embeddings by running a classification task, in our tutorial we work on classifying beer names with beer styles.

How can we use embeddings?

Now that the fine-tuning process is completed we can do something as simple as providing a list of beers and look for the ones that match the "Indian Pale Ale" style.

We are bringing all the embeddings into a DocArray and use query.match() to find the beers that best match our query for "Indian Pale Ale".

Visualizing Embeddings using Atlas

We can also visualize our embedding using Atlas by Nomic AI. Atlas is a super simple library to help us visualize the latent space behind the embeddings.

Let's have a look at the Map that we have created 👉 https://wor.ai/tBb7mM and how for example we can easily group all the "Ale" beers.

The Atlas map that represents our embeddings.

This method offers an effective way to review and share embeddings with your team. It's invaluable in helping everyone making sure whether the embeddings are being fine-tuned correctly and if semantically similar items are indeed moving closer together.

Why create your own embeddings?

  1. Customization: Pre-trained embeddings are trained on general text corpora (like Wikipedia or news articles), so they might not adequately capture the semantics of domain-specific language. If you're dealing with a specialized field, like beer brewing in my example, pre-trained embeddings might not know that "stout" and "porter" are similar because they are both dark beers. Training your own embeddings on your specific text data can capture these domain-specific relationships.
  2. Up-to-date language: Language evolves over time, and pre-trained embeddings might not include the most recent language trends or slang. By training your own embeddings, you can ensure that they are up-to-date with the latest language used in your specific domain.
  3. Reduced size: Pre-trained embeddings often include vectors for millions of words, many of which you might not need. By training your own embeddings, you can limit the vocabulary to just the words you care about, reducing the size of your embeddings.

[wl_faceted_search limit="13" title="Related articles" template_id="" post_id="" uniqid="" preview="" preview_src="https://wordlift.io/blog/en/wp-content/plugins/wordlift/images/block-previews/faceted-search.png" post_types="" ]"^^xsd:string
wordpress:customType "entity"^^xsd:string
wordpress:id "18741"^^xsd:integer
wordpress:permalink "https://wordlift.io/blog/en/entity/what-are-embeddings/"^^xsd:string
wordpress:status "publish"^^xsd:string
wordpress:sticky "false"^^xsd:boolean
wordpress:terms "wl_entity_type:Thing"^^xsd:string
wordpress:terms "category:seo"^^xsd:string
wordpress:title "What are embeddings?"^^xsd:string
wordpress:type "post"^^xsd:string
schema:alternateName "Embeddings"^^xsd:string
schema:alternateName "graph embeddings"^^xsd:string
schema:alternateName "embeddings"^^xsd:string
schema:alternateName "Graph embeddings"^^xsd:string
schema:alternateName "word embeddings"^^xsd:string
schema:description "Dive into the dynamic world of embeddings to discover their impact on search relevance and user experience."^^xsd:string
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NzE0MDMwNjg3MTQwNTkyNjcwNA
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/Nzk4NjY0OTA0Mjc3NDk4OTMxMg
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTg1NzU4OTc0ODcxNDMzNTQ1MTI
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTkxMTE4NTM5NDIzMDk1Nzg5NDQ
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/OTY0NTI2NDg1NDE4MTQ1NjMy
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTQyNDMyMTI2MDE1MzQ2MTM3NjA
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTczMzM0MTQ0NTg3OTcyMzc2MzI
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NjA1MTQ4OTIzNTk1MTk3NDI3Mg
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTc0NzIyOTYwMDE5NTkwNDMzNg
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTYyOTE3Mzk2ODM0NjEwNjU3Ng
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTY0MDIyNTEzMDAzODUwMjI2OA
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/MjQyMzkzNjgyODc0MDg5NzQwOA
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTc4NjQxMjc2NzgzODI0OTI5NDQ
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NTcyNDY0NTQ1ODczMTQwMjQ5Ng
schema:image http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NzU3MTIwODE5MDY3OTAzMjE5Mg
schema:mainEntityOfPage https://wordlift.io/blog/en/entity/what-are-embeddings/
schema:name "What are embeddings?"^^xsd:string
schema:url https://wordlift.io/blog/en/entity/what-are-embeddings/
rdf:type schema:Thing