3.1 How to Query RDF(S)

Este vídeo pertenece al curso Knowledge Graphs de openHPI. ¿Quiere ver más?

3.1 How to Query RDF(S)

Duración: aproximada 27 minutes

An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.

Scroll to current position

00:00Welcome to knowledge graphs. This is lecture number three querying rdfs
00:05with SPARQL. So how can we query rdf and rdfs, this is
00:12the question we are dealing with here in the first part of this lecture.
00:16And we are now on the query level within the semantic web technology stack
00:20and we are talking about the query language SPARQL.
00:24So what is SPARQL? SPARQL, it's a query language for rdfs and it's working
00:31based on a client server paradigm which means you have a SPARQL endpoint
00:36that's a server it's like a database for rdf data
00:41and exactly this SPARQL endpoint can be queried via the internet
00:48by a client and this client accesses the server via http and above http
00:55there is the SPARQL protocol layer which means a SPARQL query
00:59is sent via http to the server, the server processes the SPARQL query
01:05delivers the answer then back to the client where it is displayed then finally
01:12within the user interface of the client.
01:14And how this works we will see that in the course of these lectures.
01:19SPARQL in general is much more than a query language. So it
01:23is constituted first of course of the query language for
01:27graph traversal within rdf, so it's based on rdf graph traversal. This is
01:32the basic principle how it works and we will explain this later in detail.
01:37Then SPARQL defines also the protocol layer I was already
01:41talking about and referring to. So this is SPARQL via http.
01:47And then there is of course a specific SPARQL xml output format
01:52specification. This is exactly the form of the output that is delivered
01:56back from the SPARQL endpoint, from the server back to the
02:00client and the client of course has to parse this xml output form
02:05and to do something with it.
02:08The standard of course for SPARQL is from two thousand and
02:12thirteen, this now is SPARQL 1.1
02:15and as many other query languages for data, SPARQL is also based
02:20and inspired on the sequel query language for relational databases.
02:25And some of you might already know sequel or sql for relational
02:30databases which is a rather simple or easy to learn query language.
02:34And this is the same with SPARQL. So SPARQL and sequel are similar
02:40but not identical. There are significant differences and the most
02:44important difference is is that SPARQL is really a very language
02:48for graph traversal and why how this works you will see
02:52now later on. This is for example a SPARQL end point, a rather
02:57prominent endpoint that you might see more often also here
03:00in the course of the lecture, this is the end point of the dbpedia
03:05database or dbpedia knowledge base which has been derived from wikipedia.
03:11In the very first or the second excursion of this lecture you
03:16will learn about dbpedia and also about wikidata, the two
03:21knowledge bases that we are going to query with SPARQL.
03:26So I told you that SPARQL is a query language for rdf
03:31graph traversal, so what we need is now an rdf graph.
03:35This is an example of an rdf graph which comprises three different
03:39books. On the first glance it might look rather complicated,
03:42but I walk you through it. So there is one book Nineteen Eighty-
03:45four that has been written by George Orwell in nineteen forty
03:48eight and this is kind of a dystopian novel and of course it's a book.
03:54And then we have An Inconvenient Truth here written by Al Gore
03:58and this is a book about global warming, which of course is
04:01related to climate change. And this has been released in two thousand six.
04:05And of course as we all know Al Gore is a person.
04:09And then we have a third book, this also is an interesting novel called
04:14Make Room! Make Room! from Harry Harrison. It's dystopian and it's based on
04:20or there is something there's a film based on exactly that
04:23novel which is quite interesting, you should see it it's called Soylent Green
04:26and this is a book about environmental fiction and yeah
04:31the author Harry Harrison is also a person. So we will use this
04:36graph later and for explaining how SPARQL is going to work.
04:45So let's see how SPARQL works. For queries, first if we want to query rdf
04:50we are in need of variables that have to be filled with the
04:55results that we access in the data. So we have to define so called SPARQL variables
05:00which then are later bound to rdf terms while query processing
05:06and to distinguish then our usual vocabulary that we use in
05:10SPARQL from the variables, all of the variables that we define
05:15they have a question mark in the beginning. So for example
05:19?title, ?author and ?date they
05:24are variables that can be used and then later on be found which means filled
05:28with rdf data.
05:31The very itself is formulated in exactly the same way like in sequel,
05:36so you query for variables and this is performed
05:41in a so called select statement. So this is exactly like sequel
05:45and then you have the keyword select and usually after select you
05:51simply list all of the variables that you want to have filled.
05:55And for example we can see select title, author and date
05:59and so on and so on so we don't specifically say how exactly
06:02they should be selected and
06:05the result from that query then will be something like a table.
06:10So usually it's a table because all of these variables are ordered
06:15and also then you have rows of the single results that are returned.
06:19So it's a table, the heading again here you see title author and date and
06:24regarding to our graph this could be if we formulate the rest of the query
06:29accordingly then titles like nineteen eighty four, an inconvenient
06:33truth and make room make room,
06:36then the authors George Orwell, Al Gore and Harry Harrison,
06:39and the publication dates nineteen forty eight two thousand six and nineteen
06:44sixty six. This would be the SPARQL result.
06:49So far it seems rather easy, doesn't it?
06:54What does it mean?
06:57In SPARQL to query an rdf graph what SPARQL does is so called graph pattern matching.
07:05First of all what you formulate in SPARQL and also the patterns
07:09that you formulate follow closely the idea of turtle serialization.
07:14So SPARQL is based on first rdf turtle serialization and secondly
07:19under on a process which is called basic graph pattern matching.
07:24What is the graph pattern? A graph pattern is simply an rdf triple
07:28where one or more of the triple constituents
07:33is a variable, so either subject, property or object or several
07:38of them. So a graphpattern or triple pattern is nothing else but
07:41turtle plus variables. A simple example if we want to look for authors
07:47and their books or the other way around we want to look for
07:51books and their corresponding authors and for that we want
07:54to use a specific property that exists for example in the dbpedia
07:59ontology you will find dbo:author which is associated with a book
08:04and then you can, for example, look for the following graph triple pattern
08:09and this would be book as a variable leading question mark,
08:13then you simply use the URI of the property that should
08:16be there. This is dbo:author and then comes the next variable
08:21this is then again the author.
08:24First letter there also is then
08:27a question mark. And since it follows turtle, each of these graph patterns usually
08:33is closed with a period as you see here.
08:37So this is rather easy,
08:39these are the variables as we already know
08:41and what does it do? So the graph then is filtered with exactly
08:46that kind of pattern. So this pattern goes through the entire
08:50graph and you see here where it matches, so where you find exactly
08:54dbo:author and you have there a head and a tail of
08:58course associated exactly with that property. This is returned then
09:02in the resulting tables, so for example Nineteen Eighty Four
09:05has the author George Orwell, An Inconvenient Truth has the author Al Gore
09:10and Make Room! Make Room! has the author Harry Harrison.
09:15It's as easy as that. So this is a simple graph pattern matching.
09:20Of course then you can do complex pattern matching and usually
09:24then SPARQL graph button can be combined to form complex, in
09:29this case conjunctive, queries for rdf traversal. So this is
09:32the most easiest thing if you simply combine two of these graph patterns,
09:36they are always combined conjunctively, so with a logical AND.
09:41Later on you will learn that there is also a logical OR but now
09:45we focus first on the conjunctive connection of these two patterns.
09:51For example, find the books, their authors and their according literary genres.
09:57So what I do here, first pattern book dbo: author and then author as a variable
10:03like before. However we want to have to connect exactly the same book
10:08then with the property literary genre
10:12and fill then the genre variable.
10:15So you see here we are referring to in that graph pattern to
10:19the very same book. So the same book should be then associated
10:23with the second pattern. This means this is a special way of connecting this
10:28conjunctively. So we are referring to the same subject here in the query.
10:33So this it's already quite interesting.
10:36However you can go way more complex, so we try now something more difficult. Given
10:42a specific book URI, so here we are talking about the
10:45book Brave New World which is not in our graph, but of course
10:48we can imagine that. Find its author or authors
10:52and find then the birth places of its authors including the
10:55number of the population of the birthplace. So this is already a complex query.
11:00Let's have a look it doesn't look so complicated at all. So
11:03we have a URL. This is first URI
11:08of Brave New World, so the book we are looking for. And we want
11:11to have the authors of that which means we connect this then
11:14second place with the property dbo:author and what we want
11:18to have filled of course is the object and there should be the author.
11:21We make a period to simply close the first graph pattern.
11:25Secondly we want to know from this author the birthplace. So,
11:30next graph pattern we start with author and say, okay, author should then be connected
11:34with the birth place to the variable birthplace which should
11:38be filled after that period to close the second graph pattern.
11:43And then comes the last one, the birthplace. Of course we want
11:46to know its population. So we have birthplace here.
11:50We ask for the property population total and then ask or fill
11:55the variable population.
11:58You see here, author here refers also to the same author here
12:03in two places. So this means this object author should be the same as
12:07the other in the second line as the subject.
12:11And also then the birth place that we have in the object of
12:14the second line should be the same as the subject in the first line.
12:19So this is already a more complex graph pattern. You will get easily
12:23used by that over time so this is really a nice and easy way to
12:28query SPARQL and to do really sophisticated queries with it.
12:32And we will see this in lots of examples. So don't worry, you will learn
12:36how to use that. Okay, so let's have a first look on a full fledged SPARQL query.
12:46We said already that it's based on turtle which means we need,
12:48of course, prefixes to make this look a bit more readable.
12:53The example we are following here is quite easy. So we want
12:56to look for authors and the titles of their notable works.
13:01And we want to look there in dbpedia for exactly that information.
13:05So what we define first here is we have to define the according
13:09name spaces that we need.
13:11We always need name spaces for rdf and rdfs since we will
13:15often use rdf and rdfs properties. Secondly if we are talking
13:20about dbpedia we might also use the dbpedia ontology namespace which
13:25gives us the names of lots of the properties that are used there. So now
13:32we start with a select statement which specifies exactly what
13:35I want to have in my output
13:38and in my output I want to have the name of the author I call
13:41this all the name and I want to have the title of the books.
13:45So this is then the title.
13:48Next I usually have to specify the graph to be queried, especially
13:53if I met a SPARQL end point where many different graphs might be available.
13:58In case they have only one graph then this is the so called
14:02default graph and I can also leave out that line and I don't
14:08need to specify the from case. For sake of completeness of course
14:13I show you here that you need the from but you will see we will need it
14:17in case we are querying several graphs and combine the results of them together.
14:24And then comes the so called where clause.
14:27So with the keyword where I specify all the constraints that
14:31are put into the graph patterns that have to be matched. So
14:35you see here I'm looking for somebody or some variable I refer
14:39to as author and author should be of the type dbo:writer. So this
14:44means this is some author.
14:46Next they say okay, I want not to have the entity, I want to have
14:50the name of the entity which means the label.
14:52So then I connect author variable here again with rdfs label which gives me
14:58the name in human readable form of the author.
15:02Interestingly dbo which means dbpedia ontology gives me
15:06also a property that states but what is the notable work of
15:09somebody. So I connect author then with notable work and then
15:13look for the works in a variable called work and then I say okay
15:19I also don't want to have exactly that entity, I want to have the title
15:24of that entity, so I say work should be connected via rdfs label
15:30to its title that has to be filled and queried from the graph
15:33and then in the end what I have is the name of the author and the title
15:37of the book of his or her notable work.
15:42And for all SPARQL queries that we present you here
15:46in the lecture we are going to present you in future lectures,
15:50you will always see that no matter whether we are querying dbpedia
15:54as a knowledge base or whether we are carrying the data as
15:57a knowledge base we always have here in the lower right
16:02part of the slide you will see a link which tells use query
16:08SPARQL endpoint. If you click on that link I show you how this works.
16:12So I click here on that link.
16:14You will be directly transported in your browser to the SPARQL endpoint
16:20and the query will show up immediately here in the query window.
16:24You see here exactly the same query, we have here the prefixes,
16:28the select statement, the from, the where clause and so on and so on.
16:32So what I do here then is simply I go here and say run query
16:39and you see the result. So we have here all the names,
16:42first column and second column here title.
16:47And as you see here it's interesting in that knowledge base
16:50of course there are labels in many different languages. So you
16:53have here the name of the author Abbie Hoffman written in English,
16:56in German, in Spanish, in French, in Italian, in Japanese, in
17:00Dutch and so on and so on. So you see this holds for the author name as well
17:05as for the title name. So if you want to have, let's say,
17:09usable results you should restrict to one language.
17:13But we will see later on how exactly this is going to work. So
17:18let's continue with the lecture and with learning SPARQL.
17:23Next thing I'm going to teach you is how I can transform
17:28this output with so called solution sequence modifiers, which
17:33means I can for example order the output.
17:36It can limit the output that only a specific numbers of items
17:40will be displayed and also I can say, yeah,
17:42leave out the first n rows of the result. I only want to see
17:47this starting, let's say, from row ten or starting from row number
17:51twenty and so on. So these are so called output solution or solution sequence modifiers.
17:58Our example here is again quite similar - search for all authors
18:01and the titles of the notable works. But now
18:04ordered by authors in ascending order and limit the results
18:08to the first one hundred results starting the list at offset ten position.
18:14So what I do is exactly the same query that we had before. And now
18:18what I'm going to do is the following - what I'm going to do here is I
18:23put here some solution sequence modifiers, so first I say, order by,
18:29so this means the result will be ordered. And then I have the
18:32choice between ascending and descending and this series is ascending.
18:36And then I have to specify,
18:39in parenthesis, the variable according to which the output should
18:43be ordered. And this is here the author name.
18:47Next keyword that you learn here is limit. Limit tells you, okay, only display
18:52the number of results that I state here after limit and this is, well,
18:56one hundred here. And then I can give also an offset nd here it is
19:00offset ten which means line ten to one hundred and ten are
19:05displayed. We can try this out again you see here I have
19:08a small link given in the slide. So simply you download the
19:12slides and then click on the link. Sorry,
19:16now I was too fast. I really have to click on the link here
19:19and you see here is the query with limited offset and I do run query.
19:25And what you see now is
19:27the table stops here, most likely after one hundred entries and
19:32it starts at entry number ten and this here should be
19:36Aaron Covington and you see here the author names are ordered
19:39alphabetically, so Aaron McGruder with double a in the beginning
19:43comes before Abbie Hoffman with a b in the beginning.
19:47So you see this most likely is working.
19:52Okay so these were output or solution sequence modifiers.
20:00The next thing we are going to do is we want to further, let's say, filter through
20:05the output of the results. Sometimes we are only interested in some
20:10of the output that fulfilled specific requirements.
20:15And simply these kind of constraints, these filter constraints
20:20can be expressed within the graph patterns by adding to a graph pattern
20:25a filter condition. Of course here with the keyword filter. You
20:29see what we want to do in the example is search for all authors
20:33and the titles of the notable works that have more than five hundred pages,
20:37and again limit the result to the first one hundred, so the
20:41table will not be too long. So what I do here is then of course
20:45I have or I want to have the number of pages here, and you see here
20:49of the work, the notable work that I have selected what I do
20:53here I select here another property dbo:number of pages and
20:58fill a variable called pages. And of course also I bring the pages here
21:02in the output in within the select state. And so I select here
21:07also name title and pages and here of course I have to
21:10specify a graph pattern for these pages.
21:14And again what I do now is I add here a filter condition
21:19and in the filter condition I say simply here
21:23the pages, so this is the variable I have selected before, should be
21:28more or greater than five hundred and I close the parenthesis and that's it.
21:34So let's see what exactly happens if I query that on the SPARQL
21:37endpoint so you see it here
21:39and what we are going to do
21:42is the following - you see here now a table with author name, title and pages.
21:48It's a bit long, simply because pages is given here with
21:51a data type is a positive integer and it starts here with Victor Hugo
21:56and the book is The Hunchback of Notre Dame and the
21:59number of pages is nine hundred forty and since we haven't restricted so far
22:03the language you see here the author name in all languages
22:06including Arabic and also then the same book all over again
22:10in different languages. So you have here Nuestra Senora de Paris
22:14which is in Spanish and stuff like that. So this is interesting
22:19to see what exactly the result we were looking for.
22:25Okay two last things I want to give you on your way in that part of the lecture.
22:31Of course what you can do
22:33when you are using these filter conditions is you can combine several conditions
22:40with further restrictions and they are so called unary operators
22:44that you can use. So for example if your condition results into a boolean value
22:49you can negate it. So this is this negation operator.
22:53If it results or it's part it's a numeric variable then
22:57of course you can either turn it to positive or negative with
23:01a plus or a minus sign.
23:03You can ask but a variable really is bound which means does
23:06this have a result or not with
23:09the operator bound and then in parenthesis is the variable name
23:12that you are asking for. You can ask if the variable result here
23:17is a URI, you can ask whether this is a blank node, you can
23:20ask whether it is a literal.
23:23What you further can do is you can transform, for example, an
23:27output into a string. So you can transform any kind of literal
23:31or URL into a string. And you can ask for a specific language
23:36so you could for example find out what the language of a
23:39specific string is. Is it English, is it French, is it Italian and you
23:42can ask for a specific data type and you will get back that
23:46URI of the data type. So these are things you can combine within
23:50the filter constraints and then you can do really interesting and sophisticated
23:55queries with it.
23:58The very last query I'm going to show you here in this part of the lecture
24:03makes use here for example of this language operator and we
24:07have talked about that now of course we want to have all the authors and
24:10the titles of their books
24:12in possibly English language. And the other thing of course
24:17what I want to do here is I want to do a bit, let's say, further
24:20restrictions. I search for authors and their books filter with
24:24the results for English labels and environmental fiction books
24:28and of course in the end I limit again to one hundred resulting lines.
24:33Let's have a look how to do that so we include to filter conditions.
24:36So first you see here that we say ok first filter condition
24:39we want to have the author name
24:42to be English - en - and of course we also want to have the title name
24:48to be English. And the last thing is of course the work should be,
24:52among the class should have dtc:subject
24:55environmental fiction books.
24:58So if we do that and we query that on the SPARQL end point, let's look,
25:07the result is much shorter as you see here. So there are not
25:10so many environmental fiction books within dbpedia. However these
25:16are the english author names and the english titles and there
25:19you have Paul Hawken or people named like Bjorn Lomborg,
25:23Nicky Hager and so on and so on and you see also the title
25:28of the results in English. You could do now exactly the same
25:31for French, for Italian for any other language. Simply try it out,
25:36use exactly the link we have provided, you use the query and
25:40then modify it accordingly and play around with it. So this is
25:44the purpose of exactly what we did so far.
25:48Okay one of the, let's say, basic questions you might have right
25:53now is, yeah, this is all nice what you do with this dbpedia
25:57but how in the world do you come up you know with the names
26:01of the properties, with the names of the classes, with the names
26:04of the stuff you are using.
26:06For that of course you have to get some knowledge about these
26:09databases or knowledge bases we are talking about and therefore now
26:14in the lecture that will come to excursions the first one on
26:17dbpedia knowledge base and the second one on wikidata database that you
26:22get a bit better acquainted with exactly these resources that
26:26we will use heavily then in the course of this lecture.