DETAILS ABOUT http://data.wordlift.io/wl0216/entity/what_are_embeddings_
Property | Value |
---|---|
dct:relation | http://data.wordlift.io/wl0216/entity/natural_language_processing |
dct:relation | http://data.wordlift.io/wl0216/entity/deep_learning |
dct:relation | http://data.wordlift.io/wl0216/entity/machine_learning |
dct:relation | http://data.wordlift.io/wl0216/entity/artificial_intelligence |
wordpress:content |
"
In natural language processing the goal is to have machines understand human language. Unfortunately, machine learning and deep learning algorithms only work with numbers so how can we convert the meaning of a word to a number? This is what embeddings are for. Teaching language to computers by translating meanings into mathematical vectors (series of numbers). Word embeddingsIn word embeddings, the vectors of semantically similar terms are close to each other. In other terms, words that have a similar meaning will have a similar distance in a multi-dimensional vector space. ![]() Knowledge Graph embeddingsWe can use the same technique used for words also for analyzing nodes (entities) and edges (relationship) in a knowledge graph. By doing so we can encode the meanings in a graph in a format (numerical vectors) that we can use for machine learning applications. In the following presentation, I introduce the concept of multidimensional meanings using a song from "The Notorious B.I.G.", undoubtedly one of the biggest rappers of all time. The song is called What’s Beef?". In the text of the song, there is a play on the homophones “I see you” and “ICU” which is the acronym for intensive care unit most interestingly the word Beef assumes different meanings in every sentence. As we can see meanings change based on the words of each sentence. The idea that meanings can be derived by the analysis of the closest words was introduced by Firth, an English linguist and a leading figure in British linguistics during the 1950s. Firth is known as the father of distributional semantics a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. It is using this exact framework (studying semantic similarities between terms inside a given context window) that we can train a machine to understand its meaning. Cosine similarityWhen we want to analyze the semantic similarity of two documents (or two queries), and we have turned these documents into mathematical vectors, we can use the cosine of the angle between their respective vectors. The real advantage is that two similar documents might still be far apart when calculating the Euclidean distance if they use different words with similar meanings. We might have, for example, the term ‘soccer’ that appears fifty times in one document and ten times in another. Still, they will be considered similar when we analyze their respective vectors within the same multidimensional space. The reason is that even if the terms used are different, as long as their meaning is similar, the orientation of their vectors will also be similar. In other words, a smaller angle between two vectors represents a higher degree of similarity. Embeddings are one of the different techniques we can use to analyze and cluster queries. See our web story on keyword research using AI to find out more. The Power of Embeddings in SEO 🚀Embeddings have revolutionized Search and the SEO landscape, and to help you navigate this shift, I've created a straightforward tutorial on fine-tuning embeddings tailored to your content (here is the 🐍 code on a notebook that you can play with). But before we delve into that, let's understand the significance of embeddings and how they can help you with your SEO strategies. Here are some tasks where embeddings prove to be extremely useful:
Train your embeddings for your next SEO taskBut, can we train embeddings using our content? Absolutely! Especially when dealing with domain-specific language, it's beneficial to train your embeddings. ![]() Here's how: 👉 You can jump directly to the Colab here and follow the instruction in the notebook.
In this tutorial, I used a craft beer dataset. Our objective is straightforward: to train our embeddings using the names and styles of the beers. To achieve this, we'll employ Finetuner by Jina AI, which allows us to tailor an existing model (in our case, bert-base-en) to craft our custom embeddings. ![]() Having a small dataset the process is blazing fast and you can immediately see the performance improvement in comparison with the standard model. ![]() We "learn" embeddings by running a classification task, in our tutorial we work on classifying beer names with beer styles. How can we use embeddings?Now that the fine-tuning process is completed we can do something as simple as providing a list of beers and look for the ones that match the "Indian Pale Ale" style. ![]() We are bringing all the embeddings into a DocArray and use query.match() to find the beers that best match our query for "Indian Pale Ale". Visualizing Embeddings using AtlasWe can also visualize our embedding using Atlas by Nomic AI. Atlas is a super simple library to help us visualize the latent space behind the embeddings. Let's have a look at the Map that we have created 👉 https://wor.ai/tBb7mM and how for example we can easily group all the "Ale" beers. ![]() The Atlas map that represents our embeddings. This method offers an effective way to review and share embeddings with your team. It's invaluable in helping everyone making sure whether the embeddings are being fine-tuned correctly and if semantically similar items are indeed moving closer together. Why create your own embeddings?
|
wordpress:customType | "entity"^^xsd:string |
wordpress:id | "18741"^^xsd:integer |
wordpress:permalink | "https://wordlift.io/blog/en/entity/what-are-embeddings/"^^xsd:string |
wordpress:status | "publish"^^xsd:string |
wordpress:sticky | "false"^^xsd:boolean |
wordpress:terms | "wl_entity_type:Thing"^^xsd:string |
wordpress:terms | "category:seo"^^xsd:string |
wordpress:title | "What are embeddings?"^^xsd:string |
wordpress:type | "post"^^xsd:string |
schema:alternateName | "Embeddings"^^xsd:string |
schema:alternateName | "graph embeddings"^^xsd:string |
schema:alternateName | "embeddings"^^xsd:string |
schema:alternateName | "Graph embeddings"^^xsd:string |
schema:alternateName | "word embeddings"^^xsd:string |
schema:description | "Dive into the dynamic world of embeddings to discover their impact on search relevance and user experience."^^xsd:string |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NzE0MDMwNjg3MTQwNTkyNjcwNA |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/Nzk4NjY0OTA0Mjc3NDk4OTMxMg |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTg1NzU4OTc0ODcxNDMzNTQ1MTI |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTkxMTE4NTM5NDIzMDk1Nzg5NDQ |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/OTY0NTI2NDg1NDE4MTQ1NjMy |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTQyNDMyMTI2MDE1MzQ2MTM3NjA |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTczMzM0MTQ0NTg3OTcyMzc2MzI |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NjA1MTQ4OTIzNTk1MTk3NDI3Mg |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTc0NzIyOTYwMDE5NTkwNDMzNg |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTYyOTE3Mzk2ODM0NjEwNjU3Ng |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTY0MDIyNTEzMDAzODUwMjI2OA |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/MjQyMzkzNjgyODc0MDg5NzQwOA |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/LTc4NjQxMjc2NzgzODI0OTI5NDQ |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NTcyNDY0NTQ1ODczMTQwMjQ5Ng |
schema:image | http://data.wordlift.io/wl0216/entity/what_are_embeddings_/NzU3MTIwODE5MDY3OTAzMjE5Mg |
schema:mainEntityOfPage | https://wordlift.io/blog/en/entity/what-are-embeddings/ |
schema:name | "What are embeddings?"^^xsd:string |
schema:url | https://wordlift.io/blog/en/entity/what-are-embeddings/ |
rdf:type | schema:Thing |