Semantic e-Science: From Microformats to Models
A platform has been developed to transform semi-structured ASCII data into a representation based on the eXtensible Markup Language (XML). A subsequent transformation allows the XML-based representation to be rendered in the Resource Description Format (RDF). Editorial metadata, expressed as external annotations (via XML Pointer Language), also survives this transformation process (e.g., Lumb et al., http://dx.doi.org/10.1016/j.cageo.2008.03.009). Because the XML-to-RDF transformation uses XSLT (eXtensible Stylesheet Language Transformations), semantic microformats ultimately encode the scientific data (Lumb & Aldridge, http://dx.doi.org/10.1109/HPCS.2006.26). In building the relationship-centric representation in RDF, a Semantic Model of the scientific data is extracted. The systematic enhancement in the expressivity and richness of the scientific data results in representations of knowledge that are readily understood and manipulated by intelligent software agents. Thus scientists are able to draw upon various resources within and beyond their discipline to use in their scientific applications. Since the resulting Semantic Models are independent conceptualizations of the science itself, the representation of scientific knowledge and interaction with the same can stimulate insight from different perspectives. Using the Global Geodynamics Project (GGP) for the purpose of illustration, the introduction of GGP microformats enable a Semantic Model for the GGP that can be semantically queried (e.g., via SPARQL, http://www.w3.org/TR/rdf-sparql-query). Although the present implementation uses the Open Source Redland RDF Libraries (http://librdf.org/), the approach is generalizable to other platforms and to projects other than the GGP (e.g., Baker et al., Informatics and the 2007-2008 Electronic Geophysical Year, Eos Trans. Am. Geophys. Un., 89(48), 485-486, 2008).
Semantic e-Science in Space Physics - A Case Study
Several search and retrieval systems for space physics data are currently under development in NASA's heliophysics data environment. We present a case study of two such systems, and describe our efforts in implementing an ontology to aid in data discovery. In doing so we highlight the various aspects of knowledge representation and show how they led to our ontology design, creation, and implementation. We discuss advantages that scientific reasoning allows, as well as difficulties encountered in current tools and standards. Finally, we present a space physics research project conducted with and without e-Science and contrast the two approaches.
Semantics of interdisciplinary data and information integration
We have developed an application of semantic web methods and technologies to address the integration of interdisciplinary earth-science datasets. The specific use case addresses seeking and using atmospheric chemistry and volcano geochemistry datasets. We have developed an integration framework based on semantic descriptions (ontologies) of the linking relations between the application domains. In doing this, we have extensively leveraged and existing ontology frameworks such as SWEET, VSTO and GEON as well as included extentions of them when needed. We present the components of this application, including the ontologies, the registration of datasets with ontologies at several levels of granularity, the data sources, and application results from the use case. We will also present the cur rent and near-future capabilities we are developing. This work arises from the Semantically-Enabled Science Data Integration (SESDI) project, which is an NASA/ESTO/ACCESS-funded project performed in part by Rensselaer Polytechnic Institute, the High Altitude Observatory at the National Center for Atmospheric Research (NCAR), McGuinness Associates, NASA/JPL and Virginia Polytechnic University.
The Heirs of Hilbert's Sixth Problem
In an address to the International Congress on Mathematicians in 1900, David Hilbert posed twenty-three challenge problems. In this paper, we discuss how research in ontologies provides the axiomatic theories that are solutions to the Sixth Problem (the mathematical treatment of the axioms of physics and other sciences). We also show how the criteria for characterizing solutions to the Sixth Problem apply to the design and evaluation of ontologies for scientific theories. Finally, we consider the suite of ontologies that will be necessary to support geosciences.
A Foundational Approach to Designing Geoscience Ontologies
E-science systems are increasingly deploying ontologies to aid online geoscience research. Geoscience ontologies are typically constructed independently by isolated individuals or groups who tend to follow few design principles. This limits the usability of the ontologies due to conceptualizations that are vague, conflicting, or narrow. Advances in foundational ontologies and formal engineering approaches offer promising solutions, but these advanced techniques have had limited application in the geosciences. This paper develops a design approach for geoscience ontologies by extending aspects of the DOLCE foundational ontology and the OntoClean method. Geoscience examples will be presented to demonstrate the feasibility of the approach.
Knowledge provenance in science data pipelines; languages, tools and artifacts - what can we now answer?
We present our work to date on applying knowledge provenance representations (Proof Markup Language;
PML) and tools (Inference Web;IW and ProbeIt!) within data pipelines for solar physics instruments operated at
the Mauna Loa Solar Observatory in Hawaii. We now have experience with our initial infrastructure artifacts
(PML files) that can be searched, queried or browsed by early users for the purpose of answering specific use
cases that were not possible to answer before. In modeling both other parts of the data pipeline, as well as in
greater detail, we have used additional tools such as workflow driven ontologies (WDoIt! from the University of
Texas at El Paso). We present our current implementation, use of tools, and impact on language
representation - in particular the emerging need to introduce domain and instrument concepts (ontologies) into
the provenance artifacts.
The Semantic Provenance Capture in Data Ingest Systems (SPCDIS) project is an NSF/OCI/SDCI funded effort
involving the Tetherless World Constellation at Rensselaer Polytechnic Institute, the High Altitude Observatory
at NCAR, McGuinness Associates and the University of Michigan. Additional team members: Leo Salayandia3,
Aida Gandara3, Jiao Tao1, Honglei Zeng.