0 évaluation0% ont trouvé ce document utile (0 vote)
64 vues1 page
John Deck, Berkeley Natural History Museums, UC Berkeley, principle of Biocode, LLC
Neil Davies, Directory, Gump South Pacific Research Station, UC Berkeley
The Biocode Field Information Management Systems (Biocode FIMS https://code.google.com/p/biocode-fims/) takes spreadsheet data that is generated in the field, validates it, aligns it with global metadata standards, assigns unique identifiers, and publishes it. While the publishing step can be optionally private, the public version can be referenced by other applications such as collections management systems, Genbank, data harvesters, and laboratory information management systems. Biocode FIMS is designed by building on principles of good data linking: creating persistent identifiers where samples are created and aligning with standardized vocabularies and ontologies. The following diagram describes the typical workflow:
The technical details: Biocode FIMS is an open source app¬¬¬lication, using XML configuration files to define validation rules and to structure logical relationships between sets of columns. A client-server interface, built on Java and REST services is used to control all interactions. Upon upload, data is stored in an Apache Fuseki TDB triplestore (http://jena.apache.org/documentation/serving_data/) with identifier keys and dataset metadata stored in a Mysql database (http://dev.mysql.com/). The Biocode FIMS identifier solution is based on EZID (http://n2t.net/ezid), from the California Digital Library.
Titre original
Biocode Field Information Management System (Biocode FIMS): Connecting Field Data to the Laboratory and the World
John Deck, Berkeley Natural History Museums, UC Berkeley, principle of Biocode, LLC
Neil Davies, Directory, Gump South Pacific Research Station, UC Berkeley
The Biocode Field Information Management Systems (Biocode FIMS https://code.google.com/p/biocode-fims/) takes spreadsheet data that is generated in the field, validates it, aligns it with global metadata standards, assigns unique identifiers, and publishes it. While the publishing step can be optionally private, the public version can be referenced by other applications such as collections management systems, Genbank, data harvesters, and laboratory information management systems. Biocode FIMS is designed by building on principles of good data linking: creating persistent identifiers where samples are created and aligning with standardized vocabularies and ontologies. The following diagram describes the typical workflow:
The technical details: Biocode FIMS is an open source app¬¬¬lication, using XML configuration files to define validation rules and to structure logical relationships between sets of columns. A client-server interface, built on Java and REST services is used to control all interactions. Upon upload, data is stored in an Apache Fuseki TDB triplestore (http://jena.apache.org/documentation/serving_data/) with identifier keys and dataset metadata stored in a Mysql database (http://dev.mysql.com/). The Biocode FIMS identifier solution is based on EZID (http://n2t.net/ezid), from the California Digital Library.
John Deck, Berkeley Natural History Museums, UC Berkeley, principle of Biocode, LLC
Neil Davies, Directory, Gump South Pacific Research Station, UC Berkeley
The Biocode Field Information Management Systems (Biocode FIMS https://code.google.com/p/biocode-fims/) takes spreadsheet data that is generated in the field, validates it, aligns it with global metadata standards, assigns unique identifiers, and publishes it. While the publishing step can be optionally private, the public version can be referenced by other applications such as collections management systems, Genbank, data harvesters, and laboratory information management systems. Biocode FIMS is designed by building on principles of good data linking: creating persistent identifiers where samples are created and aligning with standardized vocabularies and ontologies. The following diagram describes the typical workflow:
The technical details: Biocode FIMS is an open source app¬¬¬lication, using XML configuration files to define validation rules and to structure logical relationships between sets of columns. A client-server interface, built on Java and REST services is used to control all interactions. Upon upload, data is stored in an Apache Fuseki TDB triplestore (http://jena.apache.org/documentation/serving_data/) with identifier keys and dataset metadata stored in a Mysql database (http://dev.mysql.com/). The Biocode FIMS identifier solution is based on EZID (http://n2t.net/ezid), from the California Digital Library.
Biocode Field Information Management System (Biocode FIMS):
Connecting Field Data to the Laboratory and the World
John Deck, Information Services and Technology/Berkeley Natural History Museums Neil Davies, Gump South Pacific Research Station Capturing critical data elements at the source
The Biocode Field Information Management Systems (Biocode FIMS) takes spreadsheet data that is generated in the field and validates it, aligns it with global metadata standards, assigns unique identifiers, and publishes a private or public version that can be referenced by other applications such as collections management systems, Genbank, and data harvesters, but especially featuring Laboratory Information Management System Integration. Biocode FIMS is designed by building on keys to good data linking: persistent identifiers and alignment with standardized vocabularies and ontologies. More Information
Information for interested users and developers is at: http://code.google.com/p/biocode-fims Field Data Collection Collecting terrestrial invertebrates as part of the Moorea Biocode Project Insect specimen
KEY: subclass of
has specified output
has specified input
instance of
derives from
BCO:material sampling process BCO:identificatio n process BCO:material sample OBI:sequencing assay OBI:sequence data Genbank sequence B TaxonID A
TaxonID B
Tissue sampling DNA extraction Identification using key Identification using BLAST Sequencing
Biocode Sampling Tissue sample
DNA molecules
BCO:taxonomic name rdfs:Class Alignment with standardized vocabularies and ontologies
Biocode FIMS links spreadsheet fields to standardized vocabularies such as the Darwin Core (DwC) to describe events and specimens and the Minimum Information of any type of Sequence (MIxS) to describe genomic data. We are also working with the OBO Foundry and the Ontology for Biomedical Investigations (OBI) to describe logical relationships of sample-based biological data in a new project called the Biological Collections Ontology (BCO) (https://code.google.com/p/bco/). A diagram showing how information is classified using the Biological Collections Ontology Spreadsheet Templates Identifier Keys by Project Validation Convert to RDF Triples Map Spreadsheet to Standards Upload Query sets of spreadsheets (graphs) Inferencing Setup Data submission Query Return Data Biocode FIMS Design
The following chart shows how information is generated and organized logically in the Biocode FIMS database. Persistent Identifiers for Samples
Assigning persistent identifiers for samples as they are isolated from nature or sub-sampled from other material is a critical component of the Biocode FIMS. As these events usually happen in the field, we need an identifier solution that works in the field while also ensuring that the identifier itself can resolve for years to come. To handle this challenge, we have worked together with the California Digital Library to develop an identifier solution based on the EZID (http://n2t.net/ezid) solution called Biocode Commons Identifiers (http://biscicol.org/bcid/). These identifiers look a lot like digital object identifiers (DOIs) but are built on the archival resource key (ARK) model: http://n2t.net/ark:/21547/R2 Technical Details
Uses an XML configuration file to define validation rules for spreadsheets, how fields are logically related, and project codes to aid in assigning identifiers. Stores spreadsheet data in a Fuseki TDB triplestore. REST Service Framework integration with Biocode Commons Identifiers UI Available as a command-line tool and a Geneious Plugin. Coded in Java Code is open source and available under the Berkeley Standard Distribution license at http://code.google.com/p/biocode-fims Laboratory Information Management System Integration
Biocode FIMS is partnering with Biomatters, makers of the Geneious software for analyzing field samples in the laboratory using sequencing technologies. Integration of Biocode FIMS data and tools is via a customized Geneious plugin. National Science Foundation Support from: Collaborative Research: BiSciCol Tracker: Towards a tagging and tracking infrastructure for biodiversity science collections (DBI-0956426); Research Coordination Network for the Genomic Standards Consortium (DBI-0840989); The National Evolutionary Synthesis Center (NESCent), NSF #EF- 0905606 Developed in conjunction with faculty and staff affiliated with the Berkeley Natural History Museums, UC Berkeley Development of the first version of Biocode FIMS supported by the Gordon and Betty Moore Foundation John Deck is a programmer affiliated with Information Services and Technology and Berkeley Natural History Museums. Contact is jdeck@berkeley.edu Neil Davies is executive director of the UC Berkeley Gump Station, in Moorea, French Polynesia. Contact is ndavies@moorea.berkeley.edu