This video belongs to the openHPI course Knowledge Graphs - Foundations and Applications. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00Welcome, I'm Harald Sack and I'm Ann Tan and this is Knowledge Graphs.
- 00:04Lecture Number Six: Intelligent Applications with Knowledge Graphs
- 00:08and Deep Learning. In this section of the lecture, we are going
- 00:11to talk about Knowledge Graph Completion.
- 00:15Ok, let's see the following problem.
- 00:18When can we say a Knowledge Graph is complete? That's of course
- 00:23a very difficult question simply because of the open World assumption. However,
- 00:28we know already different knowledge bases and we might see
- 00:31which of those you know covers more. So
- 00:35check out here Db Pedia and Wikipedia and we want to see whether
- 00:39Skai Five films that are in Db Pedia are also labelled as such in Wiki data,
- 00:46which means to see whether of course Wiki data is as complete
- 00:49as Db Pedia is or not.
- 00:51We can do this by a simple federated query. so we have learned
- 00:55about federated queries if we look at that. So we have the upper part here.
- 00:59Let me quickly switch on the laser pointer
- 01:02we have here the service part where we are talking about Wikipedia
- 01:06and here we are looking for Sky Fi movies and then we connected here
- 01:10w the item here to the Wiki Data service end point and here then
- 01:16we simply look for things which not exist here as being science fiction films
- 01:23and this is exactly what we try out to do on the next page.
- 01:26And of course we can try out this query life before I present you our
- 01:31pre-fabricated result. So let's see what happens here.
- 01:36Takes a while and then you see. Ok, there are
- 01:40286 A movies that are not labeled a Sky Fi movie
- 01:46in Wikidata and there are
- 01:49let's say such famous movies as Under the Son of Satan
- 01:53and Pet Sematary Corra Linn.
- 01:56I remember Coraline. That was nice. That was a game.
- 01:59Yeah about that cat and stuff like that. Ok,
- 02:04yeah, but that doesn't matter. Let's return to our general problem. So
- 02:09the problem of completeness of Knowledge graphs here. So we see
- 02:13for example here 72 172 films were
- 02:17missing when we did exactly this query. when we did it right
- 02:21now it was another number and I'm pretty sure when you do this,
- 02:23then in October you will find again another number.
- 02:27Luckily it would. it would grow smaller, which means Wiki data would become
- 02:31more complete in the end. But the thing is, it's there a way
- 02:35how we could predict what is missing and could we complement exactly
- 02:41what is missing. And there we are here in the realm of so called Knowledge Graph
- 02:45Refinement. So in Knowledge Graph Refinement we have this assumption that
- 02:51Knowledge Graphs cannot be complete because it's part of a real world, is
- 02:57it's an approximation and it's a work in progress. so it might contain
- 03:02information about each and every entity in the universe. But
- 03:06then the attributes for example of these entities may not be
- 03:09there yet as you can see with the
- 03:11tagging of movies as science fiction, right?
- 03:14So it is unlikely, particularly if we use Heuristics methods to
- 03:21apply to acknowledge graph construction that we will be able
- 03:24to have a Knowledge graph that is 100% correct, because
- 03:29particularly in the instance of Wikidata, the data there is
- 03:36added by a lot of people and some of the users may may or may
- 03:41not be experts in the data that they are trying to edit.
- 03:47So to address those shortcomings, fair use methods for Knowledge graph
- 03:51refinement have been proposed. So for example, such as entity resolution
- 03:57or collective reasoning which involves probabilistic soft logic
- 04:02And for this
- 04:05lesson, we will talk about link prediction on Knowledge Graph completion. We will
- 04:10be focusing on this approach and as well as dealing with dealing
- 04:15with missing or erroneous values and so on.
- 04:21Ok, so we focus on Knowledge Graph completion. So how does this
- 04:25differ from error detection in Knowledge Graph completion? What
- 04:28we do is adding missing knowledge to the knowledge graph. For
- 04:32example, we could add an entire triple. For example, if it's missing that Isaac Asimov
- 04:36is a science fiction writer, so his occupation is a science fiction writer,
- 04:40we could simply add it.
- 04:43Of course, it's rather easy then to distinguish this from error
- 04:46detection. there we identify wrong information in the Knowledge graph.
- 04:51For example, we have to find inconsistencies. For example, when we have
- 04:55two triples in and one is Isaac Asimov is a human and the other
- 04:59one is Isaac Asimov is a novel,
- 05:03then of course we see yeah, there is probably some thing wrong. So
- 05:09let's continue with Knowledge Graph Completion.
- 05:12So an approach to Knowledge Graph completion is as we already
- 05:16discussed in the previous
- 05:18slide is to use Knowledge Graph embeddings where we
- 05:22put or we embed the semantics of the entities and relations
- 05:27in a Latin space where and then after which we can make inferences
- 05:32by learning and operating on these representations and such
- 05:36embedding models. However, do not make use of any rules as you can also
- 05:42remember from our lecture or loss functions in
- 05:48creating negative triples. we didn't even write
- 05:52things there. That said, ok, if this, this entity is a human, therefore it cannot be
- 05:58a novel. So during inference time we use
- 06:03the embeddings to predict. But this means that
- 06:07the embeddings we have is just also an approximation. Therefore, the
- 06:12accuracy will not be very high. So for example, if we have to predict
- 06:17in wiki data which fact may be complemented, for example,
- 06:22the matrix, the movie has a genre of science fiction film.
- 06:27We don't know the
- 06:30that or it has not been encoded in our knowledge graph. That
- 06:34the matrix belongs to the genre of science fiction film and Knowledge Graph Completion
- 06:40This is what we predict. We predict the tail
- 06:44entity,
- 06:47but that's not the only thing. what you can do Knowledge Graph completion. So
- 06:51given a specific triple that we might suggest like for example
- 06:55is Kasimov is a science fiction writer, we could
- 06:58ask is this correct or not So that would then be characterized
- 07:03as triple- classification. So the result of that would be probably
- 07:06yes and you would have then here
- 07:09a probability of 95% for example
- 07:13what an chess told us is we have here for example a subject
- 07:17and a property and we want to know what is potentially the
- 07:21right object. So what would be the tail of that
- 07:25triple that is missing? So tail prediction here for Isaac Asimov, an
- 07:30occupation would probably result in something which gives us
- 07:33science fiction writer with a high probability and also of
- 07:36course he was a chemist, a biochemist to be more precise
- 07:39and stuff like that that would probably follow later on
- 07:42the other way around. If the subject is unknown, we are in the
- 07:45realm of head prediction There we want to know we have a given
- 07:49object and a given property and we want to know who was a science
- 07:52fiction writer and then of course according to the likelihood
- 07:56and of course the possibilities of your trained knowledge graph
- 08:00that you have probably driven who might be one of the most
- 08:03let's say famous science fiction writers of all times he might be
- 08:08suggested as on the first place for example with
- 08:1291% closely followed by by Herbert George Wells for example,
- 08:16if the middle part is missing so the property this is called then relation prediction
- 08:22Isaac Asimov and then how does it relate to Sky Fae two science fiction writer.
- 08:27The answer might be we might choose here the property occupation
- 08:31because this fits best according to the structural properties
- 08:34in our graph and that might have a high probability of
- 08:3995 percent. A special case of link prediction that we also consider is
- 08:43called entity classification And there we want to know the
- 08:46type. So this is type prediction. We want to know what type is Isaac Asimov
- 08:52And of course it's clear that Isaac Asimov might be a person,
- 08:55might be a human, might be a Sky High writer, might be many things,
- 08:59so this is not necessarily 1 one on one problem. So that can be
- 09:03particularly in that case, many classes. So type prediction is a special case
- 09:08of tail prediction and of link prediction.
- 09:12So here we illustrate link prediction task with knowledge. Graph Embeddings
- 09:17In particular, we will use translational embeddings, an unsupervised method,
- 09:23for example we have already discussed in the previous lecture Trans E.
- 09:27In Trans E, we have the embeddings of the head or the subject
- 09:31and the embodies of the relation or the predicate.
- 09:34And to be able to predict the tail, we just apply a vector arithmetic.
- 09:41So we sum up the embeddings of Isaac Asimov and Occupation and then
- 09:48we look for the nearest neighbor in the embedding space. So
- 09:52how do we do this? We can apply for example, cause a distance
- 09:57and as an example. here we can say that we get the score from the
- 10:01cosine distance or a cosine similarity. We say that Safe Eye
- 10:06is the most likely tail for the head and relation pair of Isaac
- 10:12Asimov and Occupation.
- 10:16Ok, what we have to distinguish or differentiate for link prediction
- 10:21are two different tasks that are have different difficulty.
- 10:25First of all, let's have a look at so called transacted link
- 10:29prediction. What is that? So we predict now links in the same
- 10:33knowledge graph that has been also used for the training data.
- 10:36So for example we have here a knowledge graph that has been used
- 10:40in the training. And then here we want to predict a link that
- 10:42is not occurring but between nodes that have been learned.
- 10:45So entities at training time are exactly the same entities
- 10:49as used for prediction time. Which means this cannot or can only
- 10:54rather badly operate on unseen graphs. Which means after a dynamic graph update
- 11:00or a new subgraphs comprised of completely new entities, this
- 11:03thing has to be retrained, otherwise it doesn't work anymore.
- 11:08The point is in transacted link prediction. You see here the
- 11:12scores of the state of the art models. They haven't made much progress recently
- 11:18so they are quite good here, but there has not
- 11:21been made much process. So therefore another task came up where
- 11:26one tried to let's say
- 11:29try to deploy the power of Knowledge graph embeddings in a better way
- 11:33and that then would be inductively predicted.
- 11:37So in the inductive link prediction the links are different
- 11:41on the training data. Which means that the entities trained
- 11:45are not the same as the entities that we are trying to predict. Which means that
- 11:50here, we can operate on unseen entities. So here, to the left
- 11:55you have the training, the entities that we use
- 11:58for the training, and to the right are the entities that we did not see
- 12:02in the knowledge graph that we trained and embeddings with.
- 12:06So then we distinguish between two types of inductive link prediction.
- 12:11The first one is fully inductive link prediction, and the other
- 12:15one is the semi inductive link prediction. So what are the difference
- 12:19between these two types or sub-types of inductive link prediction?
- 12:24Ok, let's start with fully inductive link prediction. Dear the
- 12:28Prediction: Knowledge Graph is a completely new knowledge graph
- 12:31totally disconnected from the training graph. Which means link prediction
- 12:35is performed over a completely new graph with unseen entities only.
- 12:40So the pattern we follow here is unseen to unseen.
- 12:45If we go further, then we also have semi inductive link prediction.
- 12:50So in a semi inductive link prediction, the prediction Knowledge
- 12:54Graph is larger because then it includes the updated Knowledge graph
- 12:58that has unseen entities. Which means that the link prediction can involve
- 13:04involve both seen and unseen entities and the patterns for prediction
- 13:10include seen to unseen entities and seen to seen entities as well as
- 13:15both sides being unseen.
- 13:18So this is much my I would say
- 13:22has more scope than the fully inductive link prediction but
- 13:27the fully inductive link prediction is more complicated than this
- 13:31one because in that predictions are more difficult because you you don't see
- 13:36the both entities
- 13:40true. Ok the last thing we want to talk about here is entity
- 13:45classification. As we have said already, this is a special type of link prediction
- 13:50and this is about predicting a type or class for an entity
- 13:54given some characteristics of the entity. and it's a very common
- 13:57problem in machine learning that is also known as base classification.
- 14:01For example, if you want to find out what type is the entity,
- 14:05I sarcasm off. So Isaac Asimov
- 14:09whatever this would be a tail prediction on an object which
- 14:13is supposed to be a class, so it would be entity classification.
- 14:17One way to do that is of course in a supervised learning approach
- 14:20they're Type prediction can be addressed by a classification model
- 14:24based on labelled training data and typically this is a set
- 14:28of entities in a knowledge graph which have types attached
- 14:31to it. So that's the typical scenario we are talking about
- 14:36and there are several type prediction or entity classification
- 14:40tasks. As with any classification, there is a multi class prediction
- 14:45where in a knowledge graph there are more than one classes for prediction.
- 14:50For example, I can be a cypher i also chemist, climatologist, etcetera.
- 14:57And then there is a single labour classification where there
- 15:00is a restriction that only one type can be assigned per entity.
- 15:04And lastly, there is a multi labour classification where in entities
- 15:10can have more than one type or can be assigned to more than one class.
- 15:19Further applications on link prediction and knowledge graph
- 15:22completion would be for example, Identity prediction. So this
- 15:25is about predicting the entity of the identity of two entities. So for example,
- 15:31searching for notes in the knowledge graph that refer to the
- 15:33same entity but are not explicitly stated or entailed to be the same.
- 15:38So this is more or less the same like entity matching or record linkage or
- 15:43deed duplication. you might have heard about that
- 15:46another one and in our times really important problem would
- 15:50be fact checking and validation in which we try to predict the possibility
- 15:54of a given fact. So this is a triple classification problem
- 15:58of course with a specific scenario that you have here in mind
- 16:03and of course but also rather important when you do fact checking
- 16:07and you find out something is identified to be wrong. This is
- 16:10the Knowledge Graph Correction means first you have to identify
- 16:13wrong information and then probably by complementing the missing
- 16:17informations which means you cut away the wrong information,
- 16:20you do knowledge graph completion again to correct than the
- 16:24knowledge graph at that point.
- 16:27Ok, so you have seen many scenarios where Knowledge graph embeddings
- 16:31are rather handy to complement and to help with problems.
- 16:35First of all in the Knowledge Graphs, but also then to create a representation
- 16:39of these knowledge Graphs that can directly be used in models
- 16:43for prediction and for classification.
- 16:46However, we live in a time where the announcement of large language
- 16:51models of course floods the media every day. So we had
- 16:55Gpt. We had Gpt Three, Gpt four. Maybe in October there will be
- 16:59already Gpt five, or whatever model.
- 17:02So it's time now that we also focus on the relation between knowledge graphs
- 17:08and language models and exactly that we are doing in the next lecture.
To enable the transcript, please select a language in the video player settings menu.