Semantic Web Architecture 1

iPlant's Semantic Web Services platform uses an innovative, integrated architecture.

At its core, iPlant's Semantic Web Services uses document-centric, RESTful interfaces based on an industry-standard client-host architecture.

iPlant Semantic Web Services Architecture 1

There are four primary "actors": 1) Providers of data and services, such as web sites; 2) Clients, or consumers of data and services, such as you and me using a web browser.  Providers may also be clients; 3) Ontologies and ontology servers.  Ontologies are the public terms used to describe data and services; 4) a Discovery Server: a semantically-aware search engine that satisfies search requests based on its knowledge of data and service types and properties.  A walk-through of the actors' transactions is detailed in Architecture 2.

The challenge is to allow varied actors to express their offerings and requests in a manner that is flexible, expressive, non-ambiguous, and responsive to the changing environment of the web.  As outlined in our publication Gessler et. al 2009, this requires a three-tiered layering of syntax, semantics, and logic.

Three Tiers of Syntax, Semantics, and Logic

The lowest layer, a common syntax, is the easiest to achieve. With it, one can write a single code base to parse (read in) and serialize (write out) content (though in practice, usually numerous code bases exist). Virtually all languages establish a syntax: English, Java, PHP, etc.. For the semantic web, the W3C recommended syntax is RDF/XML. We use RDF/XML. But in many cases, RDF/XML can be obtuse for human use, especially in a web environment where developers may not yet be familiar with semantic web technologies. In general, we hide its use except where it is architecturally necessary to expose it (see Architecture 2). To aid in this "impedance mismatch" between developer familiarity and syntax technologies, we use JSON (JavaScript Object Notation) in a separate, easily accessible web-based Application Programming Interface. JSON alone is not semantically-enabled, but it is broadly known.  Thus in our HTTP API we support JSON for the developer, and transform it's semantically-poor constructs into semantically-rich RDF/XML without the developer ever needing to know the low level encoding. Of course, for those familiar with RDF/XML, they can get full exposure to it and will find its use straight-forward and consistently supported.

Computers cannot discern meaning from syntax alone. A program that does division Div(a,b) may accept Div(1,2) or Div(2,1) -- both are syntactically correct -- but meaning of '1/2' and '2/1' are very different. Normally, a person would read about Div(a,b) and would be responsible for calling it correctly. But what about on the web? What about when there are millions of services and millions of types of data -- all coming live and changing and evolving in an unorchestrated environment? How can we find and engage the service we want we with the appropriate data without reading millions of documents ahead of time? We approach this with two part solution.

The first part is a canonical, stable protocol. A protocol allows web sites to express that they are a web Resource; that they are provided by some Provider (authority, institution, web site, etc.); that they map some input (called a Subject) into some output (called an Object). The protocol is very dull: it does not say anything about the specifics of the offering, it just allows any offering to be expressed in a manner whereby a computer can discern what is doing the mapping and what is being mapped. Because the protocol is dull it is entirely outside of the relevance of the meaning of the service or data itself. This allows the protocol to be universal, thereby creating a stable base for any semantic web service. Perhaps calling the protocol "very dull" is a little too self-deprecating; one neat thing it does do is project the basic RDF (Resource Description Framework) universal data model of Subject -> Predicate -> Object onto semantic web services themselves: essentially, the protocol is an expression that "some resource does some mapping of some thing to some thing else" at the level of service description. Another thing the protocol does is set the same canonical framework for description, discovery, invocation, and response. This is innovative, since usually the way something is described does not lend sufficient information for how to query, and certainly not to invoke or handle the response in a universal manner. For more on the protocol see Architecture 2 and

The second part is a shared, public semantic. So what about the non-dull parts? How do I express Div(Numerator,Denominator) or Div(Denominator,Numerator)? Or that my web service is a stock look-up service -- be those Wall St. stocks or seed stocks? How can I do this in a way that is extensible, yet not ad hoc; evolvable yet not fragile; flexible yet robust? The key is to match the stability of the protocol with the variability of non-reserved vocabularies that allow anyone to put terms on the web, yet embed those terms in a strict, transparent semantic. So you can say anything you want, but you express those statements within a public semantic that everyone knows how to interpret. The public semantic does not tell you what to say, but it does give you the rules under which to say it. Additionally, the public semantic enables you to extend the terms, concepts, and properties of others in a manner that preserves the semantics of that extension; for example to say that your term is a subclass of another term. Reuse and repurposing -- be they via aggregation, composition, or extension -- are important properties of scalable systems and we use them heavily. We combine two developments over the last 10 - 15 yrs to address a shared, public semantic: the development of RDF RDFS,  XSD, and especially OWL for a well-defined, domain-independent semantic, and the development of biological ontologies for a corpus of domain-specific terms and their relations. Neither is sufficient in the absence of the other.

Semantics is not enough; for data and service integration we need a computable logic. The home play is that just statically defining terms -- even under a well-defined semantic -- is not enough.  Computers need to be able to take terms and services that they have never previously "seen" and assess if they are suitable for the task at hand.  For this, they need a web-based, computable logic. The W3C-recommended Web Ontology Language of OWL DL (now OWL 2 with its DL OWL 2 Profiles) constrains the expressivity of RDF and OWL to that of a first-orderdescription logic. OWL DL has a number of nice properties; the type of properties you want if your task is to reason over arbitrary web resources:

  • Completeness: no truth can hide; many truths may be implied, but all can be uncovered and made explicit by an algorithm;
  • Validity and soundness: no falsehood can ever be "proven" true (validity); if one accepts the axiomatic premises then the system is sound;
  • Decidability: any statement (theorem) expressible in the system can be proven to be true or false by a finite-resource algorithm; no statement can remain forever undecidable
  • Satisfiability: one can determine generically and universally if there exists a concept (a class) of which no individual can ever be a member;
  • Consistency: no contradictions exist anywhere;
  • Monotonicity: no statement in the future can ever disprove anything proven today.

If all this sounds too good to be true (or too far-reaching to be believable), here's some little tidbits to put things into context:

  • Much of mathematics has all of the above properties, except decidability and trade-off between either completeness or consistency
    (deference to Godel). So we consider DLs (Description Logics) to be less expressive (more constrained) than general mathematics;
  • If a statement is found that breaks any one of the above properties -- such as how 270 degree triangles in Riemannian geometry break the Euclidean "proof" that all triangles have exactly 180 degrees -- such a statement does not break the system in just one part. It necessarily contradicts an axiom, so either the statement is inadmissible because an axiom is violated, or an axiom is inappropriate, and thus the entire system collapses;
  • A contradiction somewhere can "prove" a falsehood anywhere: inconsistency breaks the system globally, so systems are only as strong as their axioms are valid;
  • The complexity of OWL DL is non-deterministic, exponential-time complete (read: hard), so although algorithms operating on such systems are theoretically guaranteed to conclude, "finite time" can still be a long time.  (In practice, worse-case scenarios tend to be pathological and not exemplary, so real-world results are encouraging);
  • Much of the world cannot be modeled in a first-order description logic, so certain, even common, logical constructs are inexpressible.

iPlant Semantic Web Services uses automated reasoners over OWL DL. In doing so we implement a stack of a computable logic on top of a shared, public semantic and a common syntax.

Where to go from here: