SPARQL City exists to address the analytic needs produced by modern data sources, types, and storage systems.
We live in a world with more data, in more flavors, stored in nontraditional databases, and a greater need than ever before to make sense of the data. There are great tools targeted to programmers that, given sufficient skill and computer time, can be used to solve virtually any problem.
What was lacking was a simple way for people to query the data, to ask either traditional business intelligence questions or the newer metaphor of graph analysis, which is used to find relationships across datasets, often across time.
This is where a group of very talented individuals formed a new standard as part of the W3C, the World Wide Web Consortium. The people on this team were composed of members from the largest internet companies, computer companies, and scholars world-wide that have made a career of making data accessible, flexible, and understandable. It took many years to complete. Creating an improved SQL is no simple task. It must be robust enough to be the basis for products 40 years from now.
The answer is SPARQL.
SPARQL City’s role has been to take this standard and apply scalable, high-performance capabilities to this language.
SPARQL City is also directed to targeting where the data is and what shape it is today, rather than requiring customers to immediately migrate to an infrastructure that may be years away from being required by that customer. SPARQL City supports both the current infrastructure and the technologies that may exist in 20+ years.
SPARQL City is the developer and home for the commercial and core open-source product: SPARQLverse.
SPARQLverse is a standards-based analytic engine for performing common and sophisticated business analytics on NoSQL data. It sits on your existing NoSQL data systems like MongoDB or Cassandra and allows you to perform analytics, especially graph analytics using a query language that is much like SQL but designed for these systems.
SPARQLverse is designed to grow with your needs. Getting started is a simple as downloading the open-source Linux executables or source itself. As your needs grow, you can extend your access to extended support, training, and private events by adding a Gold or Platinum subscription.
Beyond the single-server version, you can grow your system up to several hundred servers to provide greater performance, concurrency, and data size.
This SPARQLverse product has been purpose-designed to provide the highest possible performance, to allow you to have workloads that run in one tenth the time on one tenth the hardware and power consumption footprint in your datacenter or cloud.
Most important, SPARQLverse is designed for use directly by the business analyst or data scientist, rather than requiring complex programming skills.
The basis for the feature set is the W3C (world-wide-web consortium) standard known as SPARQL 1.1. At a glance, this means:
- A next-generation of SQL that adds graph analytics to the well-understood SQL model.
- Release from being locked to a rigid schema.
- Integration to modern NoSQL databases and data formats
- Language extensions proven to be needed for modern analytics
The result is a SQL-like language that uses the expected concepts of SELECTING, JOINING, GROUPING, AGGREGATING and SORTING. It also has PATH primitives specifically for graph analysis.
The SPARQL concept is accomplished without requiring a rigid schema. Fluid data often means a fluid schema. Often the schema is implied by the nature of the data. In other cases, the schema is actually described as part of the data, allowing datasets from various sources to be directly joined and analyzed without complex ETL or schema re-design.
SPARQL City has taken this a step further by developing a tight, seamless integration to existing data systems, where much of the world’s data already resides. In addition to integration with existing NoSQL platforms like MongoDB and Cassandra, SPARQL city allows you to join in other data from RDF/SPARQL sources over the internet or even JSON or W3C “Turtle” files.
SPARQL City has extended the language with the addition of “Window Functions,” similar to the SQL 2003 specification. These allow for a much greater degree of analytic capability, again without programming skills.
SPARQLverse enables full ACID, which allows concurrent users of the system to be working with a logical view of the data as it existed when they started their session. Thus, data that has been added or deleted since the session began does not affect the session’s results. This feature can be extremely useful for consistent analytics that are “noise-free” from the data in flux.
By creating a standard way of querying a broad range of database systems, products are being developed and released to support this single new standard.
The SPARQL standard makes it possible to create applications that could only be dreamed of before, or required dedicated programming teams because the data query system was incomplete.
The products range from complete application development environments, to visualization tools, to teaching materials, to newer hardware platforms that contemplate more elements of SPARQL directly into the silicon for even greater performance at lower cost.
SPARQL is eliminating a “Tower of Babel” that first showed itself in the 1970′s and again in the 2000′s.
In the 1970′s each computer manufacturer had its own database system with its own query language. Query languages that could be used on multiple platforms existed but could not reach sustainability for several reasons. The one that did, SQL, redefined how data was to be queried.
In the 2000′s there was again a “Tower of Babel” partly from the commercial-only database products and partly from the open-source products. Again, the ability to transfer an application from one system to another degraded to a point similar to the 1970′s. While SQL began as a single approach to the world, its evolution has been harmed by feature bloat, making each vendor’s solution less and less like other vendors.
Here is where SPARQL’s genesis and path forward provide a huge benefit from other approaches. SPARQL was created as a world standard and implemented largely in open-source.
The very words “NoSQL Database” has undergone an enormous transition over the past 10 years, from a typical simple key-value store to the many different flavors of database other than traditional relational. It now includes document stores, columnar hybrids, graph stores, and object stores. At one time “NoSQL” meant SQL-free databases, but this has evolved to “Not Only SQL”. See Wikipedia’s definition of NoSQL.
The strength of SPARQL City is that we provide a single consistent view on the data regardless of how it is stored.
How do we do this?
SPARQL and the associated RDF data model is essentially a “key-value” or “document-key-value.”
In the RDF/SPARQL world, we use the term “triple.” The entire dataset, no matter of complex can be well represented by a set of triples. The elements of a triple are a “Subject,” “Predicate,” and “Object.” A unique aspect of the RDF world, probably from its W3C roots, is that a subject and predicate are URIs. URIs look like URLs but do not have to point to a world wide web address. Instead, they are simply guaranteed to be unique. SPARQL City also softens this restriction, allowing simpler, non-URI shaped terms.
What is a Triple:
Any complex document, even if in relational or other NoSQL form, can be represented by and considered a group of these triples, which are linked together to form a graph. Thus, person17 above and person456 are two subject nodes in the graph, linked by friendship, because person456 is the object of person17′s friend attribute.
What is a Graph:
The following graph is a linked assembly of triples. In visual representation, the predicates appear as named arrows. For a more complete discussion, the Comparing SQL and NoSQL in the documentation can be very helpful.
This particular graph describes a simple dataset of ticket sales for concerts, plays, and sporting events. It includes the relationships between the event type, the listing of the ticket by a selling agent, the buyer, seller, and venue. This exact dataset is included in the SPARQLverse download, and there are sample queries in the “Playground” menu of the GUI Command Console:
Now that we have a uniform way of describing a complex universe, we can query the universe using SPARQLverse.
Easily, quickly, and in a standard manner.