Este vídeo pertenece al curso Knowledge Graphs de openHPI. ¿Quiere ver más?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:00Welcome to knowledge graphs. This is lecture number three querying rdfs
- 00:05with SPARQL. So how can we query rdf and rdfs, this is
- 00:12the question we are dealing with here in the first part of this lecture.
- 00:16And we are now on the query level within the semantic web technology stack
- 00:20and we are talking about the query language SPARQL.
- 00:24So what is SPARQL? SPARQL, it's a query language for rdfs and it's working
- 00:31based on a client server paradigm which means you have a SPARQL endpoint
- 00:36that's a server it's like a database for rdf data
- 00:41and exactly this SPARQL endpoint can be queried via the internet
- 00:48by a client and this client accesses the server via http and above http
- 00:55there is the SPARQL protocol layer which means a SPARQL query
- 00:59is sent via http to the server, the server processes the SPARQL query
- 01:05delivers the answer then back to the client where it is displayed then finally
- 01:12within the user interface of the client.
- 01:14And how this works we will see that in the course of these lectures.
- 01:19SPARQL in general is much more than a query language. So it
- 01:23is constituted first of course of the query language for
- 01:27graph traversal within rdf, so it's based on rdf graph traversal. This is
- 01:32the basic principle how it works and we will explain this later in detail.
- 01:37Then SPARQL defines also the protocol layer I was already
- 01:41talking about and referring to. So this is SPARQL via http.
- 01:47And then there is of course a specific SPARQL xml output format
- 01:52specification. This is exactly the form of the output that is delivered
- 01:56back from the SPARQL endpoint, from the server back to the
- 02:00client and the client of course has to parse this xml output form
- 02:05and to do something with it.
- 02:08The standard of course for SPARQL is from two thousand and
- 02:12thirteen, this now is SPARQL 1.1
- 02:15and as many other query languages for data, SPARQL is also based
- 02:20and inspired on the sequel query language for relational databases.
- 02:25And some of you might already know sequel or sql for relational
- 02:30databases which is a rather simple or easy to learn query language.
- 02:34And this is the same with SPARQL. So SPARQL and sequel are similar
- 02:40but not identical. There are significant differences and the most
- 02:44important difference is is that SPARQL is really a very language
- 02:48for graph traversal and why how this works you will see
- 02:52now later on. This is for example a SPARQL end point, a rather
- 02:57prominent endpoint that you might see more often also here
- 03:00in the course of the lecture, this is the end point of the dbpedia
- 03:05database or dbpedia knowledge base which has been derived from wikipedia.
- 03:11In the very first or the second excursion of this lecture you
- 03:16will learn about dbpedia and also about wikidata, the two
- 03:21knowledge bases that we are going to query with SPARQL.
- 03:26So I told you that SPARQL is a query language for rdf
- 03:31graph traversal, so what we need is now an rdf graph.
- 03:35This is an example of an rdf graph which comprises three different
- 03:39books. On the first glance it might look rather complicated,
- 03:42but I walk you through it. So there is one book Nineteen Eighty-
- 03:45four that has been written by George Orwell in nineteen forty
- 03:48eight and this is kind of a dystopian novel and of course it's a book.
- 03:54And then we have An Inconvenient Truth here written by Al Gore
- 03:58and this is a book about global warming, which of course is
- 04:01related to climate change. And this has been released in two thousand six.
- 04:05And of course as we all know Al Gore is a person.
- 04:09And then we have a third book, this also is an interesting novel called
- 04:14Make Room! Make Room! from Harry Harrison. It's dystopian and it's based on
- 04:20or there is something there's a film based on exactly that
- 04:23novel which is quite interesting, you should see it it's called Soylent Green
- 04:26and this is a book about environmental fiction and yeah
- 04:31the author Harry Harrison is also a person. So we will use this
- 04:36graph later and for explaining how SPARQL is going to work.
- 04:45So let's see how SPARQL works. For queries, first if we want to query rdf
- 04:50we are in need of variables that have to be filled with the
- 04:55results that we access in the data. So we have to define so called SPARQL variables
- 05:00which then are later bound to rdf terms while query processing
- 05:06and to distinguish then our usual vocabulary that we use in
- 05:10SPARQL from the variables, all of the variables that we define
- 05:15they have a question mark in the beginning. So for example
- 05:19?title, ?author and ?date they
- 05:24are variables that can be used and then later on be found which means filled
- 05:28with rdf data.
- 05:31The very itself is formulated in exactly the same way like in sequel,
- 05:36so you query for variables and this is performed
- 05:41in a so called select statement. So this is exactly like sequel
- 05:45and then you have the keyword select and usually after select you
- 05:51simply list all of the variables that you want to have filled.
- 05:55And for example we can see select title, author and date
- 05:59and so on and so on so we don't specifically say how exactly
- 06:02they should be selected and
- 06:05the result from that query then will be something like a table.
- 06:10So usually it's a table because all of these variables are ordered
- 06:15and also then you have rows of the single results that are returned.
- 06:19So it's a table, the heading again here you see title author and date and
- 06:24regarding to our graph this could be if we formulate the rest of the query
- 06:29accordingly then titles like nineteen eighty four, an inconvenient
- 06:33truth and make room make room,
- 06:36then the authors George Orwell, Al Gore and Harry Harrison,
- 06:39and the publication dates nineteen forty eight two thousand six and nineteen
- 06:44sixty six. This would be the SPARQL result.
- 06:49So far it seems rather easy, doesn't it?
- 06:54What does it mean?
- 06:57In SPARQL to query an rdf graph what SPARQL does is so called graph pattern matching.
- 07:05First of all what you formulate in SPARQL and also the patterns
- 07:09that you formulate follow closely the idea of turtle serialization.
- 07:14So SPARQL is based on first rdf turtle serialization and secondly
- 07:19under on a process which is called basic graph pattern matching.
- 07:24What is the graph pattern? A graph pattern is simply an rdf triple
- 07:28where one or more of the triple constituents
- 07:33is a variable, so either subject, property or object or several
- 07:38of them. So a graphpattern or triple pattern is nothing else but
- 07:41turtle plus variables. A simple example if we want to look for authors
- 07:47and their books or the other way around we want to look for
- 07:51books and their corresponding authors and for that we want
- 07:54to use a specific property that exists for example in the dbpedia
- 07:59ontology you will find dbo:author which is associated with a book
- 08:04and then you can, for example, look for the following graph triple pattern
- 08:09and this would be book as a variable leading question mark,
- 08:13then you simply use the URI of the property that should
- 08:16be there. This is dbo:author and then comes the next variable
- 08:21this is then again the author.
- 08:24First letter there also is then
- 08:27a question mark. And since it follows turtle, each of these graph patterns usually
- 08:33is closed with a period as you see here.
- 08:37So this is rather easy,
- 08:39these are the variables as we already know
- 08:41and what does it do? So the graph then is filtered with exactly
- 08:46that kind of pattern. So this pattern goes through the entire
- 08:50graph and you see here where it matches, so where you find exactly
- 08:54dbo:author and you have there a head and a tail of
- 08:58course associated exactly with that property. This is returned then
- 09:02in the resulting tables, so for example Nineteen Eighty Four
- 09:05has the author George Orwell, An Inconvenient Truth has the author Al Gore
- 09:10and Make Room! Make Room! has the author Harry Harrison.
- 09:15It's as easy as that. So this is a simple graph pattern matching.
- 09:20Of course then you can do complex pattern matching and usually
- 09:24then SPARQL graph button can be combined to form complex, in
- 09:29this case conjunctive, queries for rdf traversal. So this is
- 09:32the most easiest thing if you simply combine two of these graph patterns,
- 09:36they are always combined conjunctively, so with a logical AND.
- 09:41Later on you will learn that there is also a logical OR but now
- 09:45we focus first on the conjunctive connection of these two patterns.
- 09:51For example, find the books, their authors and their according literary genres.
- 09:57So what I do here, first pattern book dbo: author and then author as a variable
- 10:03like before. However we want to have to connect exactly the same book
- 10:08then with the property literary genre
- 10:12and fill then the genre variable.
- 10:15So you see here we are referring to in that graph pattern to
- 10:19the very same book. So the same book should be then associated
- 10:23with the second pattern. This means this is a special way of connecting this
- 10:28conjunctively. So we are referring to the same subject here in the query.
- 10:33So this it's already quite interesting.
- 10:36However you can go way more complex, so we try now something more difficult. Given
- 10:42a specific book URI, so here we are talking about the
- 10:45book Brave New World which is not in our graph, but of course
- 10:48we can imagine that. Find its author or authors
- 10:52and find then the birth places of its authors including the
- 10:55number of the population of the birthplace. So this is already a complex query.
- 11:00Let's have a look it doesn't look so complicated at all. So
- 11:03we have a URL. This is first URI
- 11:08of Brave New World, so the book we are looking for. And we want
- 11:11to have the authors of that which means we connect this then
- 11:14second place with the property dbo:author and what we want
- 11:18to have filled of course is the object and there should be the author.
- 11:21We make a period to simply close the first graph pattern.
- 11:25Secondly we want to know from this author the birthplace. So,
- 11:30next graph pattern we start with author and say, okay, author should then be connected
- 11:34with the birth place to the variable birthplace which should
- 11:38be filled after that period to close the second graph pattern.
- 11:43And then comes the last one, the birthplace. Of course we want
- 11:46to know its population. So we have birthplace here.
- 11:50We ask for the property population total and then ask or fill
- 11:55the variable population.
- 11:58You see here, author here refers also to the same author here
- 12:03in two places. So this means this object author should be the same as
- 12:07the other in the second line as the subject.
- 12:11And also then the birth place that we have in the object of
- 12:14the second line should be the same as the subject in the first line.
- 12:19So this is already a more complex graph pattern. You will get easily
- 12:23used by that over time so this is really a nice and easy way to
- 12:28query SPARQL and to do really sophisticated queries with it.
- 12:32And we will see this in lots of examples. So don't worry, you will learn
- 12:36how to use that. Okay, so let's have a first look on a full fledged SPARQL query.
- 12:46We said already that it's based on turtle which means we need,
- 12:48of course, prefixes to make this look a bit more readable.
- 12:53The example we are following here is quite easy. So we want
- 12:56to look for authors and the titles of their notable works.
- 13:01And we want to look there in dbpedia for exactly that information.
- 13:05So what we define first here is we have to define the according
- 13:09name spaces that we need.
- 13:11We always need name spaces for rdf and rdfs since we will
- 13:15often use rdf and rdfs properties. Secondly if we are talking
- 13:20about dbpedia we might also use the dbpedia ontology namespace which
- 13:25gives us the names of lots of the properties that are used there. So now
- 13:32we start with a select statement which specifies exactly what
- 13:35I want to have in my output
- 13:38and in my output I want to have the name of the author I call
- 13:41this all the name and I want to have the title of the books.
- 13:45So this is then the title.
- 13:48Next I usually have to specify the graph to be queried, especially
- 13:53if I met a SPARQL end point where many different graphs might be available.
- 13:58In case they have only one graph then this is the so called
- 14:02default graph and I can also leave out that line and I don't
- 14:08need to specify the from case. For sake of completeness of course
- 14:13I show you here that you need the from but you will see we will need it
- 14:17in case we are querying several graphs and combine the results of them together.
- 14:24And then comes the so called where clause.
- 14:27So with the keyword where I specify all the constraints that
- 14:31are put into the graph patterns that have to be matched. So
- 14:35you see here I'm looking for somebody or some variable I refer
- 14:39to as author and author should be of the type dbo:writer. So this
- 14:44means this is some author.
- 14:46Next they say okay, I want not to have the entity, I want to have
- 14:50the name of the entity which means the label.
- 14:52So then I connect author variable here again with rdfs label which gives me
- 14:58the name in human readable form of the author.
- 15:02Interestingly dbo which means dbpedia ontology gives me
- 15:06also a property that states but what is the notable work of
- 15:09somebody. So I connect author then with notable work and then
- 15:13look for the works in a variable called work and then I say okay
- 15:19I also don't want to have exactly that entity, I want to have the title
- 15:24of that entity, so I say work should be connected via rdfs label
- 15:30to its title that has to be filled and queried from the graph
- 15:33and then in the end what I have is the name of the author and the title
- 15:37of the book of his or her notable work.
- 15:42And for all SPARQL queries that we present you here
- 15:46in the lecture we are going to present you in future lectures,
- 15:50you will always see that no matter whether we are querying dbpedia
- 15:54as a knowledge base or whether we are carrying the data as
- 15:57a knowledge base we always have here in the lower right
- 16:02part of the slide you will see a link which tells use query
- 16:08SPARQL endpoint. If you click on that link I show you how this works.
- 16:12So I click here on that link.
- 16:14You will be directly transported in your browser to the SPARQL endpoint
- 16:20and the query will show up immediately here in the query window.
- 16:24You see here exactly the same query, we have here the prefixes,
- 16:28the select statement, the from, the where clause and so on and so on.
- 16:32So what I do here then is simply I go here and say run query
- 16:39and you see the result. So we have here all the names,
- 16:42first column and second column here title.
- 16:47And as you see here it's interesting in that knowledge base
- 16:50of course there are labels in many different languages. So you
- 16:53have here the name of the author Abbie Hoffman written in English,
- 16:56in German, in Spanish, in French, in Italian, in Japanese, in
- 17:00Dutch and so on and so on. So you see this holds for the author name as well
- 17:05as for the title name. So if you want to have, let's say,
- 17:09usable results you should restrict to one language.
- 17:13But we will see later on how exactly this is going to work. So
- 17:18let's continue with the lecture and with learning SPARQL.
- 17:23Next thing I'm going to teach you is how I can transform
- 17:28this output with so called solution sequence modifiers, which
- 17:33means I can for example order the output.
- 17:36It can limit the output that only a specific numbers of items
- 17:40will be displayed and also I can say, yeah,
- 17:42leave out the first n rows of the result. I only want to see
- 17:47this starting, let's say, from row ten or starting from row number
- 17:51twenty and so on. So these are so called output solution or solution sequence modifiers.
- 17:58Our example here is again quite similar - search for all authors
- 18:01and the titles of the notable works. But now
- 18:04ordered by authors in ascending order and limit the results
- 18:08to the first one hundred results starting the list at offset ten position.
- 18:14So what I do is exactly the same query that we had before. And now
- 18:18what I'm going to do is the following - what I'm going to do here is I
- 18:23put here some solution sequence modifiers, so first I say, order by,
- 18:29so this means the result will be ordered. And then I have the
- 18:32choice between ascending and descending and this series is ascending.
- 18:36And then I have to specify,
- 18:39in parenthesis, the variable according to which the output should
- 18:43be ordered. And this is here the author name.
- 18:47Next keyword that you learn here is limit. Limit tells you, okay, only display
- 18:52the number of results that I state here after limit and this is, well,
- 18:56one hundred here. And then I can give also an offset nd here it is
- 19:00offset ten which means line ten to one hundred and ten are
- 19:05displayed. We can try this out again you see here I have
- 19:08a small link given in the slide. So simply you download the
- 19:12slides and then click on the link. Sorry,
- 19:16now I was too fast. I really have to click on the link here
- 19:19and you see here is the query with limited offset and I do run query.
- 19:25And what you see now is
- 19:27the table stops here, most likely after one hundred entries and
- 19:32it starts at entry number ten and this here should be
- 19:36Aaron Covington and you see here the author names are ordered
- 19:39alphabetically, so Aaron McGruder with double a in the beginning
- 19:43comes before Abbie Hoffman with a b in the beginning.
- 19:47So you see this most likely is working.
- 19:52Okay so these were output or solution sequence modifiers.
- 20:00The next thing we are going to do is we want to further, let's say, filter through
- 20:05the output of the results. Sometimes we are only interested in some
- 20:10of the output that fulfilled specific requirements.
- 20:15And simply these kind of constraints, these filter constraints
- 20:20can be expressed within the graph patterns by adding to a graph pattern
- 20:25a filter condition. Of course here with the keyword filter. You
- 20:29see what we want to do in the example is search for all authors
- 20:33and the titles of the notable works that have more than five hundred pages,
- 20:37and again limit the result to the first one hundred, so the
- 20:41table will not be too long. So what I do here is then of course
- 20:45I have or I want to have the number of pages here, and you see here
- 20:49of the work, the notable work that I have selected what I do
- 20:53here I select here another property dbo:number of pages and
- 20:58fill a variable called pages. And of course also I bring the pages here
- 21:02in the output in within the select state. And so I select here
- 21:07also name title and pages and here of course I have to
- 21:10specify a graph pattern for these pages.
- 21:14And again what I do now is I add here a filter condition
- 21:19and in the filter condition I say simply here
- 21:23the pages, so this is the variable I have selected before, should be
- 21:28more or greater than five hundred and I close the parenthesis and that's it.
- 21:34So let's see what exactly happens if I query that on the SPARQL
- 21:37endpoint so you see it here
- 21:39and what we are going to do
- 21:42is the following - you see here now a table with author name, title and pages.
- 21:48It's a bit long, simply because pages is given here with
- 21:51a data type is a positive integer and it starts here with Victor Hugo
- 21:56and the book is The Hunchback of Notre Dame and the
- 21:59number of pages is nine hundred forty and since we haven't restricted so far
- 22:03the language you see here the author name in all languages
- 22:06including Arabic and also then the same book all over again
- 22:10in different languages. So you have here Nuestra Senora de Paris
- 22:14which is in Spanish and stuff like that. So this is interesting
- 22:19to see what exactly the result we were looking for.
- 22:25Okay two last things I want to give you on your way in that part of the lecture.
- 22:31Of course what you can do
- 22:33when you are using these filter conditions is you can combine several conditions
- 22:40with further restrictions and they are so called unary operators
- 22:44that you can use. So for example if your condition results into a boolean value
- 22:49you can negate it. So this is this negation operator.
- 22:53If it results or it's part it's a numeric variable then
- 22:57of course you can either turn it to positive or negative with
- 23:01a plus or a minus sign.
- 23:03You can ask but a variable really is bound which means does
- 23:06this have a result or not with
- 23:09the operator bound and then in parenthesis is the variable name
- 23:12that you are asking for. You can ask if the variable result here
- 23:17is a URI, you can ask whether this is a blank node, you can
- 23:20ask whether it is a literal.
- 23:23What you further can do is you can transform, for example, an
- 23:27output into a string. So you can transform any kind of literal
- 23:31or URL into a string. And you can ask for a specific language
- 23:36so you could for example find out what the language of a
- 23:39specific string is. Is it English, is it French, is it Italian and you
- 23:42can ask for a specific data type and you will get back that
- 23:46URI of the data type. So these are things you can combine within
- 23:50the filter constraints and then you can do really interesting and sophisticated
- 23:55queries with it.
- 23:58The very last query I'm going to show you here in this part of the lecture
- 24:03makes use here for example of this language operator and we
- 24:07have talked about that now of course we want to have all the authors and
- 24:10the titles of their books
- 24:12in possibly English language. And the other thing of course
- 24:17what I want to do here is I want to do a bit, let's say, further
- 24:20restrictions. I search for authors and their books filter with
- 24:24the results for English labels and environmental fiction books
- 24:28and of course in the end I limit again to one hundred resulting lines.
- 24:33Let's have a look how to do that so we include to filter conditions.
- 24:36So first you see here that we say ok first filter condition
- 24:39we want to have the author name
- 24:42to be English - en - and of course we also want to have the title name
- 24:48to be English. And the last thing is of course the work should be,
- 24:52among the class should have dtc:subject
- 24:55environmental fiction books.
- 24:58So if we do that and we query that on the SPARQL end point, let's look,
- 25:07the result is much shorter as you see here. So there are not
- 25:10so many environmental fiction books within dbpedia. However these
- 25:16are the english author names and the english titles and there
- 25:19you have Paul Hawken or people named like Bjorn Lomborg,
- 25:23Nicky Hager and so on and so on and you see also the title
- 25:28of the results in English. You could do now exactly the same
- 25:31for French, for Italian for any other language. Simply try it out,
- 25:36use exactly the link we have provided, you use the query and
- 25:40then modify it accordingly and play around with it. So this is
- 25:44the purpose of exactly what we did so far.
- 25:48Okay one of the, let's say, basic questions you might have right
- 25:53now is, yeah, this is all nice what you do with this dbpedia
- 25:57but how in the world do you come up you know with the names
- 26:01of the properties, with the names of the classes, with the names
- 26:04of the stuff you are using.
- 26:06For that of course you have to get some knowledge about these
- 26:09databases or knowledge bases we are talking about and therefore now
- 26:14in the lecture that will come to excursions the first one on
- 26:17dbpedia knowledge base and the second one on wikidata database that you
- 26:22get a bit better acquainted with exactly these resources that
- 26:26we will use heavily then in the course of this lecture.
To enable the transcript, please select a language in the video player settings menu.