It's All About the Data: Workflow Systems and Weather
Digital data is fueling new advances in the computational sciences, particularly geospatial research as environmental sensing grows more practical through reduced technology costs, broader network coverage, and better instruments. e-Science research (i.e., cyberinfrastructure research) has responded to data intensive computing with tools, systems, and frameworks that support computationally oriented activities such as modeling, analysis, and data mining. Workflow systems support execution of sequences of tasks on behalf of a scientist. These systems, such as Taverna, Apache ODE, and Kepler, when built as part of a larger cyberinfrastructure framework, give the scientist tools to construct task graphs of execution sequences, often through a visual interface for connecting task boxes together with arcs representing control flow or data flow. Unlike business processing workflows, scientific workflows expose a high degree of detail and control during configuration and execution. Data-driven science imposes unique needs on workflow frameworks. Our research is focused on two issues. The first is the support for workflow-driven analysis over all kinds of data sets, including real time streaming data and locally owned and hosted data. The second is the essential role metadata/provenance collection plays in data driven science, for discovery, determining quality, for science reproducibility, and for long-term preservation. The research has been conducted over the last 6 years in the context of cyberinfrastructure for mesoscale weather research carried out as part of the Linked Environments for Atmospheric Discovery (LEAD) project. LEAD has pioneered new approaches for integrating complex weather data, assimilation, modeling, mining, and cyberinfrastructure systems. Workflow systems have the potential to generate huge volumes of data. Without some form of automated metadata capture, either metadata description becomes largely a manual task that is difficult if not impossible under high-volume conditions, or the searchability and manageability of the resulting data products is disappointingly low. The provenance of a data product is a record of its lineage, or trace of the execution history that resulted in the product. The provenance of a forecast model result, e.g., captures information about the executable version of the model, configuration parameters, input data products, execution environment, and owner. Provenance enables data to be properly attributed and captures critical parameters about the model run so the quality of the result can be ascertained. Proper provenance is essential to providing reproducible scientific computing results. Workflow languages used in science discovery are complete programming languages, and in theory can support any logic expressible by a programming language. The execution environments supporting the workflow engines, on the other hand, are subject to constraints on physical resources, and hence in practice the workflow task graphs used in science utilize relatively few of the cataloged workflow patterns. It is important to note that these workflows are executed on demand, and are executed once. Into this context is introduced the need for science discovery that is responsive to real time information. If we can use simple programming models and abstractions to make scientific discovery involving real-time data accessible to specialists who share and utilize data across scientific domains, we bring science one step closer to solving the largest of human problems.
Modeling Guru: Knowledge Base for NASA Modelers
Modeling Guru is an on-line knowledge-sharing resource for anyone involved with or interested in NASA's
scientific models or High End Computing (HEC) systems. Developed and maintained by the NASA's Software
Integration and Visualization Office (SIVO) and the NASA Center for Computational Sciences (NCCS), Modeling
Guru's combined forums and knowledge base for research and collaboration is becoming a repository for the
accumulated expertise of NASA's scientific modeling and HEC communities. All NASA modelers and
associates are encouraged to participate and provide knowledge about the models and systems so that other
users may benefit from their experience. Modeling Guru is divided into a hierarchy of communities, each with
its own set forums and knowledge base documents. Current modeling communities include those for space
science, land and atmospheric dynamics, atmospheric chemistry, and oceanography. In addition, there are
communities focused on NCCS systems, HEC tools and libraries, and programming and scripting languages.
Anyone may view most of the content on Modeling Guru (available at http://modelingguru.nasa.gov/), but you
must log in to post messages and subscribe to community postings. The site offers a full range of "Web 2.0"
features, including discussion forums, "wiki" document generation, document uploading, RSS feeds, search
tools, blogs, email notification, and "breadcrumb" links. A discussion (a.k.a. forum "thread") is used to post
comments, solicit feedback, or ask questions. If marked as a question, SIVO will monitor the thread, and
normally respond within a day. Discussions can include embedded images, tables, and formatting through the
use of the Rich Text Editor. Also, the user can add "Tags" to their thread to facilitate later searches. The
"knowledge base" is comprised of documents that are used to capture and share expertise with others. The
default "wiki" document lets users edit within the browser so others can easily collaborate on the same
document, even allowing the author to select those who may edit and approve the document. To maintain
knowledge integrity, all documents are moderated before they are visible to the public. Modeling Guru, running
on Clearspace by Jive Software, has been an active resource to the NASA modeling and HEC communities for
more than a year and currently has more than 100 active users. SIVO will soon install live instant messaging
support, as well as a user-customizable homepage with social-networking features. In addition, SIVO plans to
implement a large dataset/file storage capability so that users can quickly and easily exchange datasets and
files with one another. Continued active community participation combined with periodic software updates and
improved features will ensure that Modeling Guru remains a vibrant, effective, easy-to-use tool for the NASA
SWIFTER - Space Weather Informatics, Forecasting, and Technology through Enabling Research and Virtual Organizations
SWIFTER will build a virtual organization to enable collaboration among research, military, and commercial communities to find new ways to understand, characterize, and forecast space weather to meet the needs of our technology based society. In this paper we discuss how knowledge is shared in organizations and how a virtual organization can be formed. A key element of a "virtual" organization is that it is a fluid collection of members that share some means of communicating relevant information among some of its members. The members also share ideas in evolution (such as analysis, new technologies, and predictive trending). As concepts mature they can be matured or discarded more quickly as the power of the network is brought to bear early and often. Space weather, the changes in the near-Earth space environment, is important to a wide range of users as well as the public. The public is interested in a variety of phenomena including meteors, solar flares, the aurora, noctilucent clouds and climate change. Industry focus tends to be on more concrete problems such as ground-induced currents in power lines and communications with aircraft in transpolar routes as well as geolocation (i.e. the use of GPS systems to precisely map a function to a position). Other government-oriented users service specialized communities who may be more or less unaware of the research and development upon which the forecasts or nowcasts rely for accuracy. The basic research community may be more or less unaware of the details of the applications, or potential applications of their research. The problem, then, is that each of these constituencies may share elements in common but there is no umbrella organization that ties them together, nor is there likely to be such an organization. Our goal in this paper is to outline a scheme for a virtual organization, delineate the functions of that VO and illustrate how it might be formed. We also will assess the barriers to knowledge transfer that must be overcome.
Talkoot Portals: Discover, Tag, Share, and Reuse Collaborative Science Workflows
A small but growing number of scientists are beginning to harness Web 2.0 technologies, such as wikis, blogs, and social tagging, as a transformative way of doing science. These technologies provide researchers easy mechanisms to critique, suggest and share ideas, data and algorithms. At the same time, large suites of algorithms for science analysis are being made available as remotely-invokable Web Services, which can be chained together to create analysis workflows. This provides the research community an unprecedented opportunity to collaborate by sharing their workflows with one another, reproducing and analyzing research results, and leveraging colleagues' expertise to expedite the process of scientific discovery. However, wikis and similar technologies are limited to text, static images and hyperlinks, providing little support for collaborative data analysis. A team of information technology and Earth science researchers from multiple institutions have come together to improve community collaboration in science analysis by developing a customizable "software appliance" to build collaborative portals for Earth Science services and analysis workflows. The critical requirement is that researchers (not just information technologists) be able to build collaborative sites around service workflows within a few hours. We envision online communities coming together, much like Finnish "talkoot" (a barn raising), to build a shared research space. Talkoot extends a freely available, open source content management framework with a series of modules specific to Earth Science for registering, creating, managing, discovering, tagging and sharing Earth Science web services and workflows for science data processing, analysis and visualization. Users will be able to author a "science story" in shareable web notebooks, including plots or animations, backed up by an executable workflow that directly reproduces the science analysis. New services and workflows of interest will be discoverable using tag search, and advertised using "service casts" and "interest casts" (Atom feeds). Multiple science workflow systems will be plugged into the system, with initial support for UAH's Mining Workflow Composer and the open-source Active BPEL engine, and JPL's SciFlo engine and the VizFlow visual programming interface. With the ability to share and execute analysis workflows, Talkoot portals can be used to do collaborative science in addition to communicate ideas and results. It will be useful for different science domains, mission teams, research projects and organizations. Thus, it will help to solve the "sociological" problem of bringing together disparate groups of researchers, and the technical problem of advertising, discovering, developing, documenting, and maintaining inter-agency science workflows. The presentation will discuss the goals of and barriers to Science 2.0, the social web technologies employed in the Talkoot software appliance (e.g. CMS, social tagging, personal presence, advertising by feeds, etc.), illustrate the resulting collaborative capabilities, and show early prototypes of the web interfaces (e.g. embedded workflows).
A Weather Analysis System for the Baja California Peninsula: Tropical Cyclone Season of 2008
General characteristics of tropical weather systems are documented on a real-time basis. This study covers the warm season of 2008, from May through November, and includes observations from satellite imagery as well as reports from a rain-gauge network. During this season, the basin had 16 tropical storms and three of them made landfall in the Baja California peninsula, in northwestern Mexico. Tropical storm Julio developed in August and tropical storm Lowell made landfall in mid-September. Norbert, in early October, was the most intense hurricane of the season with strong winds and heavy rainfall that caused significant damage to the infrastructure in the southern peninsula. By the next day, the system moved over the mainland, causing major flooding in Sinaloa, Sonora, and Chihuahua. By request of the Baja California government, a meteorological perspective associated with the structure, intensity, and motion of Hurricane Norbert was presented. This consisted of high-resolution satellite imagery used to explain the spatial and temporal patterns of convection. This material provided an integral analysis of Norbert's behavior during its approach and passage over land, and it was one element, used by emergency managers, to determine the extent of the affected areas.
Post-Launch Assessment of Performance of the NOAA-19 Advanced Microwave Sounding Unit-A
The Advanced Microwave Sounding Unit-A (AMSU-A) on the NOAA-19 satellite was successfully launched on 6 February 2009. NOAA-19 is the fifth in a series of five Polar-orbiting Operational Environmental Satellites (POES) with AMSU-A that provide imaging and sounding capabilities. As it orbits the Earth, NOAA-19 will collect data about the Earth's surface and atmosphere that are vital inputs to NOAA's weather forecasts. AMSU-A is a new generation of total-power microwave radiometers which have been flown on the NOAA-15 to NOAA-18 and METOP-A Satellites since May 1998. AMSU-A is composed of two separate units. AMSU-A2 provides channels 1 and 2 at 23.8 and 31.4 GHz. AMSU-A1 furnishes 12 channels in the 50.3 to 57.3 GHz oxygen band which are used for temperature sounding from the surface to about 50 km (i.e., from 1000 to 1 millbar) plus channel 15 at 89 GHz. Channels 1-3 and 15, which have weighting functions peaked near the surface, aid the retrieval of temperature sounding by providing information to correct the effect due to surface emissivity, atmospheric liquid water, and total precipitable water vapor on temperature sounding. Channels 1 and 2 also provide information on precipitation, sea ice, and snow cover. Before launch, each AMSU-A was tested and calibrated by the instrument contractor Northrop Grumman (formerly Aerojet). These pre-launch calibration data are analyzed at NOAA to derive the calibration parameters which are used in the operational calibration software to produce the AMSU-A Level 1B data sets. A systematic post-launch calibration and validation of the instrumental performances was conducted with on-orbit data. The long-term trends of the housekeeping sensors and radiometric counts from the cold space and warm targets are continuously monitored. Scan-by- scan examination of the radiometric calibration counts is employed to confirm normal functioning of the instrument and to detect any anomalous events, such as lunar contamination (LC) in the cold space radiometric counts, which are detected, flagged, and corrected using an algorithm for detection and correction of LC in the AMSU-A data. The effect of lunar contamination on the Earth scene brightness temperatures will be demonstrated. Also it is desirable to have the instrument calibrated against a natural Earth target for evaluation of its performance. The Amazon rainforest and the tropic ocean in the region 20oS-20oN are chosen for such test targets. The NOAA-19 AMSU-A measurements over such targets will be obtained and compared to the NOAA-18 data. The results will be presented and discussed. The establishment of land and ocean calibration targets is important for calibration and validation of space-borne microwave instruments.