About: https://www.dajobe.org/blog/2010/12/05/rasqal-0921-and-sparql-11-query-aggregation/

Not logged in : Login

(Sponging disallowed)

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: https://www.dajobe.org/blog/2010/12/05/rasqal-0921-and-sparql-11-query-aggregation/ Goto Sponge NotDistinct Permalink

An Entity of Type : rss:item, within Data Space : demo.openlinksw.com associated with source document(s)

Attributes	Values
type	rss:item
Creator	dajobe
described by	proxy:entity/http/journal.dajobe.org/journal/comments.rdf
Date	2010-12-06T01:24:00Z
rss:title	Rasqal 0.9.21 and SPARQL 1.1 Query aggregation
rss:link	https://www.dajobe.org/blog/2010/12/05/rasqal-0921-and-sparql-11-query-aggregation/
rss:description	Rasqal 0.9.21 was just released on Saturday 2010-12-04 (announcement) containing the following new features: Updated to handle aggregate expression execution as defined by the SPARQL 1.1 Query W3C working draft of 14 October 2010 Executes grouping of results: `GROUP BY` Executes aggregate expressions: `AVG`, `COUNT`, `GROUP_CONCAT`, `MAX`, `MIN`, `SAMPLE`, `SUM` Executes filtering of aggregate expressions: `HAVING` Parses new syntax: `BINDINGS`, `isNUMERIC()`, `MINUS`, sub `SELECT` and `SERVICE`. The syntax format for parsing data graphs at URIs can be explictly declared. The `roqet` utility can execute queries over SPARQL HTTP Protocol and operate over data from stdin. Added several new APIs Fixed Issue: #0000388 See the Rasqal 0.9.21 Release Notes for the full details of the changes. I'd like to emphasis a couple of the changes to the `roqet(1)` utility program: you can now use it to query over data from standard input, i.e. use it as a filter, but only if you are querying over one graph. You can also specify the format of the data graphs either on standard input or from URIs, if the format can't be determined or guessed from the mime type, name or bytes. Finally `roqet(1)` can execute remote queries at a SPARQL Protocol HTTP service, sometimes called a "SPARQL endpoint". The new support for SPARQL Query 1.1 aggregate queries (and other features) led me to make comments to the SPARQL working group about the latest SPARQL Query 1.1 working draft based on the implementation experience. The comments (below) were based on the implementation I previously outlined in Writing an RDF query engine. Twice at the end of October 2010. The implementation work to create the new features was substantial but relatively simple to describe: new rowsources were added for each of the aggregation operations and then included in the execution plan when the query structure indicated their presence after parsing. There was some additional glue code that needed to be add to allow rows to indicate their presence in a group; a simple integer group ID was sufficient and the ID value has no semantics, just a check for a change of ID is enough to know a group started or ended. I also introduced an internal variable to bind the result of SELECT aggregate expressions after grouping (`$$aggID$$` which are illegal names in sparql). I then used that to replace the aggregate expression in the SELECT and the HAVING expressions and used the normal evaluation mechanisms. As I understand it, the SPARQL WG is now considering adding a way to name these explicitly in the GROUP statement. A happy coincidence since I had implemented it without knowing that. To prepare this I did think about the approach a lot and developed a couple of diagrams for the grouping and aggregation rowsources that might help to understand how they work, how they can be implemented and tested as standalone unit modules, which they were. Rasqal Group By Rowsource As always, the above isn't quite how it is implemented. There is no group by node if there is an implicit group when `GROUP BY` is missing but an aggregate expression is used; instead the Rasqal rowsource class synthesizes 1 group around the incoming row, when grouping is requested. Rasqal Aggregation Rowsource This shows the guts of the aggregate expression evaluation where internal variables are introduced, substituted into the `SELECT` and `HAVING` expressions and then created as the aggregate expressions are executed over the groups. The rest of this post are my detailed thoughts on this draft of SPARQL 1.1 Query as posted to the SPARQL WG comments list but turned into HTML markup here. Dave Beckett comments on SPARQL 1.1 Query Language W3C WD 2010-10-14 These are my personal comments (not speaking for any past or current employer) on: SPARQL 1.1 Query Language W3C Working Draft 14 October 2010 My comments are based on the work I did to add some SPARQL 1.1 query and update support to my Rasqal RDF query library (engine and API) in version 0.9.21 just released 2010-12-04 as announced. Some background to my work is given in a blog post: Writing an RDF query engine. Twice I. General comments I felt the specification introduced more optional features bundled together, where it was not entirely clear what the combination of those features would do. For example a query with no aggregate expression but has a `GROUP BY` and `HAVING` is allowed by the syntax and the main document doesn't say if it's allowed or what it means. I found it hard to assemble all the pieces from the mathematical explanations into something I could code. The spec has several terms in the grammar not in the query document. After asking, these turned out to be federated query (`BINDINGS`), or update (LOAD, ...) but these are not pointed out or linked to clearly although there is mention of the documents in the status section. Please make these more clear. I decided to concentrate on the new Aggregates feature since I had already implemented `SELECT` expressions, leaving Subqueries and Negation to later. Property paths should be in the list of new features in the status section at the of the document. "SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs" is rather a long title; what does 'Uniform' or 'HTTP' add? SOAP is dead. suggest "SPARQL 1.1 RDF Graph Management Protocol" or RDF dataset With all the additions especially property paths (a new query language), update (data management language) and federated query (remote query support) and I understand \~30 additional keywords are being added beyond this draft for functions and operators, I see this as a major change to SPARQL 1.0, more of a SPARQL 2. You should consider renaming it. II. Aggregates I found the math in the aggregation and grouping sections rather hard to understand so I also looked what MySQL and SQLite did, and wrote my own diagram based on the data flow: See SPARQL 1.1 Query Execution Sequence so for me it was easier to see the individual components/stages (which roughly correspond to SPARQL algebra terms). I had to make several of my own tests with my guess on what the answers should be. With all the pieces for aggregate expressions: grouping, aggregate expression, distinct, having, counting (count * vs count(expr)) there needs to be several tests with good coverage. I felt aggregate functions can be broken down into these parts selecting of aggregate function value grouping of results - optional; explicit, implicit when agg func present execution of aggregate functions - optional; with some special cases filtering of group results with having - optional (following my diagram above) As it is clear they are all optional, it probably is worth explaining what it means when they are absent, such as group by + having with no aggregate expression as mentioned above. III. Bindings is a new syntax `BINDINGS` essentially gives a new way to write down a variable bindings result set. Even though it is discussed in the federated query spec about using it for `SERVICE`, it's not restricted to that by the grammar or specifications. `BINDINGS` in the query grammar (rBindingsClause) I previously asked about on 2010-10-15 and this comment is an extension of that comment. So as I read it this is a valid 'query' which does no real execution but just returns a set. SELECT * WHERE BINDINGS ?var1 ?var2 { ( "var1-value1" "var2-value1" ) ( "var1-value2" "var2-value2" ) } or if you really must you can leave out the `WHERE`: SELECT * BINDINGS { ( "var1-value1" "var2-value1" ) ( "var1-value2" "var2-value2" ) } My question is to ask if this is correct and to clarify in the spec the intended use, whether or not it is intended for use with `SERVICE` only. IV. Section-by-section comments Section: Status of this Document Should mention property paths as new since that is a major addition after SPARQL 1.0 Please link to the documents in the status, these are just text. Sections 1-8 Skipped, they are same as SPARQL 1.0 I hope 9 Property Paths I am unlikely to ever implement any of this, it's a second query language inside SPARQL. How many systems implemented this before the SPARQL 1.1 work was started? 10 Aggregates I took all the examples in this section and turned them into test cases where possible. 10.2 The explanation of errors and ListEvalE is rather opaque. It is still not clear to me what is done with errors in GROUP BY, HAVING and arguments to aggregate expressions. Some are skipped, some are ignored and return NULL. Examples and tests will enable checking this but the spec needs to be clearer. Definition: Group and Aggregation were hard for me to understand. The input to Aggregation being a 'scalar' meaning actually a set of key:value pairs was confusing. It is not also not clear if those are a set or an ordered set of parameters. This is only used today for the 'separator' with GROUP_CONCAT. 10.2.1 HAVING What happens when there is an expression error? What variables and expressions can be used here and what is their scope? 10.2.2 Set Functions Another confusing section. I mostly ignored this and did what SQL did. None of the functions that I can tell, ever use 'err'. 10.2.3 Mapping from Abstract Syntax to Algebra scalarvals argument is used here - I think this is called 'scalar' earlier. Un-numbered Section after 10.2.3: Joining Aggregate Values Never figured out what this was trying to define but my code executes the example. 11. Subqueries (Ignored in my current work) 12 RDF Dataset (Same as SPARQL 1.0 I assume so no comments) 13 Basic Federated Query Yes, please merge in the text here. 14 Solution Sequences and Modifiers ( Aside: This is one of those SPARQL parts where everything mentioned is optional. Otherwise this section has no change from SPARQL 1.0, I am just mentioning it as a pointer of a trend. ) 15. Query Forms No comments. 16. Testing Values 16.3 Operator Mapping Is it worth noting the new operators in SPARQL 1.1? Operators: implemented isNUMERIC() 16.4 Operators Definitions My current state of implementation of new to SPARQL 1.1 expressions 16.4.16 IF - implemented 16.4.17 IN - implemented 16.4.18 NOT IN - implemented 16.4.19 IRI - implemented 16.4.20 URI - implemented 16.4.21 BNODE - implemented 16.4.22 STRDT - implemented 16.4.23 STRLANG - implemented No comments on the above 16.4.24 NOT EXISTS and EXISTS I am lumping these together with sub-`SELECT` to implement. My concern here is that the syntax gets super-complex since all the graph pattern syntax can now appear inside any expression syntax. There is a filter operator "exists" that ... Does this imply these can only appear in `FILTER` expressions? Please clarify. 17 Definition of SPARQL I looked at the 17.2.3 for aggregate queries and it was more helpful than the math earlier. The pseudo code in Step 4 is a bit too unclear. Is that an example implementation or the required one? 17.6 Extending SPARQL Basic Graph Matching Ignored. 18 SPARQL Grammar Clearly this is not complete; there are lots of notes to update it. 19 Conformance If property paths are not removed, please add a conformance level that includes SPARQL 1.1 without property paths. Does SPARQL 1.1 Query require implementation of the dependent specs - federated query and update? Looks to me that protocol may also be dependent?
content:encoded	Rasqal 0.9.21 was just released on Saturday 2010-12-04 (announcement) containing the following new features: Updated to handle aggregate expression execution as defined by the SPARQL 1.1 Query W3C working draft of 14 October 2010 Executes grouping of results: `GROUP BY` Executes aggregate expressions: `AVG`, `COUNT`, `GROUP_CONCAT`, `MAX`, `MIN`, `SAMPLE`, `SUM` Executes filtering of aggregate expressions: `HAVING` Parses new syntax: `BINDINGS`, `isNUMERIC()`, `MINUS`, sub `SELECT` and `SERVICE`. The syntax format for parsing data graphs at URIs can be explictly declared. The `roqet` utility can execute queries over SPARQL HTTP Protocol and operate over data from stdin. Added several new APIs Fixed Issue: #0000388 See the Rasqal 0.9.21 Release Notes for the full details of the changes. I'd like to emphasis a couple of the changes to the `roqet(1)` utility program: you can now use it to query over data from standard input, i.e. use it as a filter, but only if you are querying over one graph. You can also specify the format of the data graphs either on standard input or from URIs, if the format can't be determined or guessed from the mime type, name or bytes. Finally `roqet(1)` can execute remote queries at a SPARQL Protocol HTTP service, sometimes called a "SPARQL endpoint". The new support for SPARQL Query 1.1 aggregate queries (and other features) led me to make comments to the SPARQL working group about the latest SPARQL Query 1.1 working draft based on the implementation experience. The comments (below) were based on the implementation I previously outlined in Writing an RDF query engine. Twice at the end of October 2010. The implementation work to create the new features was substantial but relatively simple to describe: new rowsources were added for each of the aggregation operations and then included in the execution plan when the query structure indicated their presence after parsing. There was some additional glue code that needed to be add to allow rows to indicate their presence in a group; a simple integer group ID was sufficient and the ID value has no semantics, just a check for a change of ID is enough to know a group started or ended. I also introduced an internal variable to bind the result of SELECT aggregate expressions after grouping (`$$aggID$$` which are illegal names in sparql). I then used that to replace the aggregate expression in the SELECT and the HAVING expressions and used the normal evaluation mechanisms. As I understand it, the SPARQL WG is now considering adding a way to name these explicitly in the GROUP statement. A happy coincidence since I had implemented it without knowing that. To prepare this I did think about the approach a lot and developed a couple of diagrams for the grouping and aggregation rowsources that might help to understand how they work, how they can be implemented and tested as standalone unit modules, which they were. Rasqal Group By Rowsource As always, the above isn't quite how it is implemented. There is no group by node if there is an implicit group when `GROUP BY` is missing but an aggregate expression is used; instead the Rasqal rowsource class synthesizes 1 group around the incoming row, when grouping is requested. Rasqal Aggregation Rowsource This shows the guts of the aggregate expression evaluation where internal variables are introduced, substituted into the `SELECT` and `HAVING` expressions and then created as the aggregate expressions are executed over the groups. The rest of this post are my detailed thoughts on this draft of SPARQL 1.1 Query as posted to the SPARQL WG comments list but turned into HTML markup here. Dave Beckett comments on SPARQL 1.1 Query Language W3C WD 2010-10-14 These are my personal comments (not speaking for any past or current employer) on: SPARQL 1.1 Query Language W3C Working Draft 14 October 2010 My comments are based on the work I did to add some SPARQL 1.1 query and update support to my Rasqal RDF query library (engine and API) in version 0.9.21 just released 2010-12-04 as announced. Some background to my work is given in a blog post: Writing an RDF query engine. Twice I. General comments I felt the specification introduced more optional features bundled together, where it was not entirely clear what the combination of those features would do. For example a query with no aggregate expression but has a `GROUP BY` and `HAVING` is allowed by the syntax and the main document doesn't say if it's allowed or what it means. I found it hard to assemble all the pieces from the mathematical explanations into something I could code. The spec has several terms in the grammar not in the query document. After asking, these turned out to be federated query (`BINDINGS`), or update (LOAD, ...) but these are not pointed out or linked to clearly although there is mention of the documents in the status section. Please make these more clear. I decided to concentrate on the new Aggregates feature since I had already implemented `SELECT` expressions, leaving Subqueries and Negation to later. Property paths should be in the list of new features in the status section at the of the document. "SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs" is rather a long title; what does 'Uniform' or 'HTTP' add? SOAP is dead. suggest "SPARQL 1.1 RDF Graph Management Protocol" or RDF dataset With all the additions especially property paths (a new query language), update (data management language) and federated query (remote query support) and I understand \~30 additional keywords are being added beyond this draft for functions and operators, I see this as a major change to SPARQL 1.0, more of a SPARQL 2. You should consider renaming it. II. Aggregates I found the math in the aggregation and grouping sections rather hard to understand so I also looked what MySQL and SQLite did, and wrote my own diagram based on the data flow: See SPARQL 1.1 Query Execution Sequence so for me it was easier to see the individual components/stages (which roughly correspond to SPARQL algebra terms). I had to make several of my own tests with my guess on what the answers should be. With all the pieces for aggregate expressions: grouping, aggregate expression, distinct, having, counting (count * vs count(expr)) there needs to be several tests with good coverage. I felt aggregate functions can be broken down into these parts selecting of aggregate function value grouping of results - optional; explicit, implicit when agg func present execution of aggregate functions - optional; with some special cases filtering of group results with having - optional (following my diagram above) As it is clear they are all optional, it probably is worth explaining what it means when they are absent, such as group by + having with no aggregate expression as mentioned above. III. Bindings is a new syntax `BINDINGS` essentially gives a new way to write down a variable bindings result set. Even though it is discussed in the federated query spec about using it for `SERVICE`, it's not restricted to that by the grammar or specifications. `BINDINGS` in the query grammar (rBindingsClause) I previously asked about on 2010-10-15 and this comment is an extension of that comment. So as I read it this is a valid 'query' which does no real execution but just returns a set. SELECT * WHERE BINDINGS ?var1 ?var2 { ( "var1-value1" "var2-value1" ) ( "var1-value2" "var2-value2" ) } or if you really must you can leave out the `WHERE`: SELECT * BINDINGS { ( "var1-value1" "var2-value1" ) ( "var1-value2" "var2-value2" ) } My question is to ask if this is correct and to clarify in the spec the intended use, whether or not it is intended for use with `SERVICE` only. IV. Section-by-section comments Section: Status of this Document Should mention property paths as new since that is a major addition after SPARQL 1.0 Please link to the documents in the status, these are just text. Sections 1-8 Skipped, they are same as SPARQL 1.0 I hope 9 Property Paths I am unlikely to ever implement any of this, it's a second query language inside SPARQL. How many systems implemented this before the SPARQL 1.1 work was started? 10 Aggregates I took all the examples in this section and turned them into test cases where possible. 10.2 The explanation of errors and ListEvalE is rather opaque. It is still not clear to me what is done with errors in GROUP BY, HAVING and arguments to aggregate expressions. Some are skipped, some are ignored and return NULL. Examples and tests will enable checking this but the spec needs to be clearer. Definition: Group and Aggregation were hard for me to understand. The input to Aggregation being a 'scalar' meaning actually a set of key:value pairs was confusing. It is not also not clear if those are a set or an ordered set of parameters. This is only used today for the 'separator' with GROUP_CONCAT. 10.2.1 HAVING What happens when there is an expression error? What variables and expressions can be used here and what is their scope? 10.2.2 Set Functions Another confusing section. I mostly ignored this and did what SQL did. None of the functions that I can tell, ever use 'err'. 10.2.3 Mapping from Abstract Syntax to Algebra scalarvals argument is used here - I think this is called 'scalar' earlier. Un-numbered Section after 10.2.3: Joining Aggregate Values Never figured out what this was trying to define but my code executes the example. 11. Subqueries (Ignored in my current work) 12 RDF Dataset (Same as SPARQL 1.0 I assume so no comments) 13 Basic Federated Query Yes, please merge in the text here. 14 Solution Sequences and Modifiers ( Aside: This is one of those SPARQL parts where everything mentioned is optional. Otherwise this section has no change from SPARQL 1.0, I am just mentioning it as a pointer of a trend. ) 15. Query Forms No comments. 16. Testing Values 16.3 Operator Mapping Is it worth noting the new operators in SPARQL 1.1? Operators: implemented isNUMERIC() 16.4 Operators Definitions My current state of implementation of new to SPARQL 1.1 expressions 16.4.16 IF - implemented 16.4.17 IN - implemented 16.4.18 NOT IN - implemented 16.4.19 IRI - implemented 16.4.20 URI - implemented 16.4.21 BNODE - implemented 16.4.22 STRDT - implemented 16.4.23 STRLANG - implemented No comments on the above 16.4.24 NOT EXISTS and EXISTS I am lumping these together with sub-`SELECT` to implement. My concern here is that the syntax gets super-complex since all the graph pattern syntax can now appear inside any expression syntax. There is a filter operator "exists" that ... Does this imply these can only appear in `FILTER` expressions? Please clarify. 17 Definition of SPARQL I looked at the 17.2.3 for aggregate queries and it was more helpful than the math earlier. The pseudo code in Step 4 is a bit too unclear. Is that an example implementation or the required one? 17.6 Extending SPARQL Basic Graph Matching Ignored. 18 SPARQL Grammar Clearly this is not complete; there are lots of notes to update it. 19 Conformance If property paths are not removed, please add a conformance level that includes SPARQL 1.1 without property paths. Does SPARQL 1.1 Query require implementation of the dependent specs - federated query and update? Looks to me that protocol may also be dependent?
is rdf:_5 of	nodeID://b873958
is topic of	http://journal.dajobe.org/journal/comments.rdf

Faceted Search & Find service v1.17_git144 as of Jul 26 2024

Alternative Linked Data Documents: iSPARQL | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3331 as of Aug 25 2024, on Linux (x86_64-ubuntu_noble-linux-glibc2.38-64), Single-Server Edition (378 GB total memory, 56 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software

About: https://www.dajobe.org/blog/2010/12/05/rasqal-0921-and-sparql-11-query-aggregation/ Goto Sponge NotDistinct Permalink

I. General comments

II. Aggregates

III. Bindings is a new syntax

IV. Section-by-section comments

Section: Status of this Document

Sections 1-8

9 Property Paths

10 Aggregates

10.2

10.2.1 HAVING

10.2.2 Set Functions

10.2.3 Mapping from Abstract Syntax to Algebra

Un-numbered Section after 10.2.3: Joining Aggregate Values

11. Subqueries

12 RDF Dataset

13 Basic Federated Query

14 Solution Sequences and Modifiers

15. Query Forms

16. Testing Values

16.3 Operator Mapping

16.4 Operators Definitions

16.4.24 NOT EXISTS and EXISTS

17 Definition of SPARQL

17.6 Extending SPARQL Basic Graph Matching

18 SPARQL Grammar

19 Conformance