Metadata Management in Scientific Computing

Complex scientific codes and the datasets they generate are in need of a sophisticated categorization environment that allows the community to store, search, and enhance metadata in an open, dynamic system. Currently, data is often presented in a read-only format, distilled and curated by a select group of researchers. We envision a more open and dynamic system, where authors can publish their data in a writeable format, allowing users to annotate the datasets with their own comments and data. This would enable the scientific community to collaborate on a higher level than before, where researchers could for example annotate a published dataset with their citations. Such a system would require a complete set of permissions to ensure that any individual's data cannot be altered by others unless they specifically allow it. For this reason datasets and codes are generally presented read-only, to protect the author's data; however, this also prevents the type of social revolutions that the private sector has seen with Facebook and Twitter. In this paper, we present an alternative method of publishing codes and datasets, based on Fluidinfo, which is an openly writeable and social metadata engine. We will use the specific example of the Einstein Toolkit, a part of the Cactus Framework, to illustrate how the code's metadata may be published in writeable form via Fluidinfo.