SPAS e-SciBioenergy: Program and Presentation Abstracts

Contents
Program . 05 Abstracts of Presentations .. 06 B. S. Manjunath . 06

Centre for Bio-image Informatics, University of California (UCSB), USA
Christophe Ambroise .... 07

Laboratoire Statistique et Gnome, Centre National de la Researche Scientifique (CNRS), France
Cludia Bauzer Medeiros . 08

Institute of Computing, University of Campinas (UNICAMP), SP, Brazil
Joo Eduardo Ferreira .. 10

Computer Science Department, Institute of Mathematics and Statistics (IME), University of So Paulo, Brazil
Marta L. Queirs Mattoso ..... 11

(jointly with Jonas Dias and Kary Ocana), Alberto Luiz Coimbra Institute for Graduate Studies and Research Engineering (COPPE), Federal University of Rio de Janeiro (UFRJ), Brazil
Susanna-Assunta Sansone ... 12

PhD. Principal Investigator, Team Leader. University of Oxford, Oxford e-Research Center, Oxford, UK
Thom H. Dunning, Jr. 16

National Center for Supercomputing Applications, Institute for Advanced Computing Applications and Technologies, and Department of Chemistry, University of Illinois at Urbana-Champaign
Yan Xu ...... 18
Microsoft Research, USA
Program
Time October 22 Monday Registration October 23 Tuesday October 24 Wednesday October 25 Thursday October 26 Friday
8:30
9:00
Opening Presentation FAPESP Break
9:30 10:30 11:00
Talk C. Ambroise Break Talk C. Ambroise
Talk T. Dunning Break Talk T. Dunning
Talk J. E. Ferreira Break Talk S. Sansone
Talk Y. Xu Break Talk Y. Xu
Talk M. Mattoso 12:00 Talk M. Mattoso Talk T. Dunning Talk S. Sansone
13:00
Lunch
Lunch
Lunch Talk B.S. Manjunath
Lunch Talk C. B. Medeiros Talk Graduate Progs Break -
Lunch
14:30
Talk C. B. Medeiros
Talk M. Mattoso
15:30
Talk C. Ambroise
Talk B.S. Manjunath Break Talk B.S. Manjunath
Posters Students
16:30 17:00
Break Talk C. Ambroise
Break Posters Students
B. S. Manjunath, Centre for Bio-image Informatics, University of California (UCSB), USA

Introduction to Bio-Image Informatics. Introduction to the topic; fundamental issues in image and video segmentation and tracking, examples drawn from recent research. (Lecture time: 2 hours) Introduction to Bisque Cyber Infrastructure for Bio-image Informatics. A high level introduction to the open source Bisque image database platform for managing, processing, indexing and searching bio-images. (Lecture time: 1 hour)
Christophe Ambroise, Laboratoire Statistique et Gnome, Centre National de la Researche Scientifique (CNRS), France
Statistical Models for Biological Network Inference. Gaussian Graphical Models provide a convenient framework for representing dependencies between variables. In this framework, a set of variables is represented by an undirected graph, where vertices correspond to variables, and an edge connects two vertices if the corresponding pair of variables are dependent, conditional on the remaining ones. Recently, this tool has received a high interest for the discovery of biological networks by l1-penalization of the model likelihood. In this lecture, we introduce various ways of inferring sparse coexpression networks from either steady-state or time-course transcriptomic data. We will focus on inference from samples collected in different experimental conditions and therefore not identically distributed. (Lecture time: 2 x 2 hours)
Cludia Bauzer Medeiros, Institute of Computing, University of Campinas (UNICAMP), SP, Brazil
The Era of eScience: building the ark during the data deluge. Scientists from all domains (from the mathematical to the social sciences) are collecting enormous amounts of data. These data are captured from a variety of devices (from those aboard satellites to microsensors in embedded systems), but also provided by experiments, or even social networks. This has originated the socalled "data deluge", sometimes referred to as "data tsunami", in recognition that a large amount of these data will never be seen or directly managed by humans. eScience has emerged as a branch of science characterized by joint research between computer scientists and scientists from other domains to leverage and accelerate research in those domains, helping scientists to analyze, filter, manipulate, visualize and interpret their data, while at the same time supporting cooperative work. This talk is geared towards discussing a few major trends in eScience research, from a data-centric perspective, with examples from several scientific domains. (Lecture time: 1 hour) Coping with Digital Preservation: preserving the present to help the future. We daily generate an enormous amount of data - for instance, during bank transactions, phone calls, credit card operations and others. Moreover, there are countless kinds of data linked to us -X-ray images, security videos in stores and banks, radar-triggered photos in streets, and so on. All this information is stored, frequently during several years, and maintained by third parties, given its economic and/or social value. What are we doing, however, 8
with other very valuable kinds of data sets - the data generated by our research? Our work involves complex models and computational simulations whose intermediate and final results need to be stored. We may archive the most relevant files, but there are many more data sets that are lost, sometimes for lack of adequate procedures, or time, or even appropriate hardware to record the data. This phenomenon is repeated in any context that involves experimental activities, e.g., in biology, chemistry, physics, sociology, anthropology, and so on. Even when all data and models involved in an experiment are recorded, there are other challenges to meet. For instance, how to ensure that we will be able to retrieve the desired information, in the future? And how to share and disseminate the results of our work? This and other issues are at the origin of digital preservation concerns. They are geared towards investigating new methods, models, algorithms and mechanisms to support data organization, archival and retrieval, for long term accessibility, while at the same time considering the issues of quality, reliability and durability. Preservation research can also be applied to corporate or business data, but the problems involved (and their solution) are not the same. This talk will discuss some of the challenges faced by the research in the preservation of experimental research data. (Lecture time: 1 hour)
Joo Eduardo Ferreira, Computer Science Department, Institute of Mathematics and Statistics (IME), University of So Paulo, Brazil
Transaction Processing for e-Science Applications. The management of molecular and clinical data in e-Science applications has introduced new requirements for database storage and transaction processing systems. There are two famous phrases that resume the e-Science scenario. The first phrase is Science is becoming dataintensive and collaborative, and the second is Researchers from numerous disciplines need to work together to attack complex problems; openly sharing data will pave the way for researchers to communicate and collaborate more effectively. These phrases were written by Ed Seidel, acting assistant director for NSF Mathematical and Physical Sciences directorate. This e-Science scenario shows that we are in data deluge age where transaction processing systems under collaborative research perspective is an important computer science challenge. More concretely, in typical e-Science laboratory routines, transaction processing is used in many tests that are performed concurrently and supervised by researchers. New tests are defined frequently, so researchers have to be guided to execute the right task at appropriate time. Incompatibilities among previous processes and new data requirements make the integration and analysis of available knowledge very difficult. This problem is compounded by the process of scientific knowledge discovery, which requires frequent process updates, collaborative interactions among researchers, and refinement of scientific hypotheses. This e-Science scenario requires an appropriate transaction processing in order to avoid data manual approaches that quickly become very expensive or commonly infeasible. In this talk, we provide a historical perspective, main recent challenges and solutions of transactional processing for e-Science applications. (Lecture time: 1 hour)
10
Marta L. Queirs Mattoso (jointly with Jonas Dias and Kary Ocana), Alberto Luiz Coimbra Institute for Graduate Studies and Research Engineering (COPPE), Federal University of Rio de Janeiro (UFRJ), Brazil
Exploring Provenance Data in High Performance Scientific Computing. Large-scale scientific computations are often organized as a composition of many computational tasks linked through data flow. After the completion of a computational scientific experiment, a scientist has to analyze its outcome, for instance, by checking inputs and outputs along computational tasks that are part of the experiment. This analysis can be automated using provenance management systems that describe, for instance, the production and consumption relationships between data artifacts, such as files, and the computational tasks that compose the scientific application. Due to its exploratory nature, large-scale experiments often present iterations that evaluate a large space of parameter combinations. In this case, scientists need to analyze partial results during execution and dynamically interfere on the next steps of the simulation. Features, such as user steering on workflows to track, evaluate and adapt the execution need to be designed to support iterative methods. In this course we define basic concepts of scientific workflows and provenance data. We will show examples of scientific workflows in the bioinformatics domain. We briefly describe how provenance of manytask scientific computations are specified and coordinated by current workflow systems on large clusters and clouds. We discuss challenges in gathering, storing and querying provenance in high performance computing environments. We also show how provenance can enable runtime and useful queries to correlate computational resource usage, scientific parameters, and data set derivation. (Lecture time: 2 x 2 hours)
11
Susanna-Assunta Sansone, PhD. Principal Investigator, Team Leader University of Oxford, Oxford e-Research Center, Oxford, UK
The Buzz Around Reproducible Bioscience Data: the policies, the communities and the standards. Increased availability of the bioscience data generated is
fuelling increased consumption, and a cascade of derived datasets that accelerate the cycle of discovery. But the successful integration of heterogeneous data from multiple providers and scientific domains is already a major challenge within academia and industry. Even when datasets are publicly available, published results are often not reusable due to incomplete description of the experimental details. In the last decade, several data preservation, management, sharing policies, and plans have emerged in response to increased funding for high-throughput approaches in genomics and functional genomics bioscience [1]. A growing number of communitybased initiatives have developed minimum reporting guidelines, terminologies and formats (referred to generally as community standards) [2] to structure and curate datasets, enabling data annotation to varying degrees; other efforts work to maximize the interoperability among these standards [e.g. 3, 4]. Researchers and bioinformaticians in both academic and commercial bioscience, along with funding agencies and publishers, embrace the concept that standards are pivotal to enriching the annotation of the entities of interest (e.g., genes, metabolites) and the experimental steps (e.g., provenance of study materials, technology and measurement types), to ensure that shared investigations are comprehensible and (in principle) reproducible. But despite 12
all these efforts, in practice data sharing is challenging [5]. Vast swathes of bioscience data still remain locked in esoteric formats, are described using ad hoc or proprietary terminology [e.g. 6], or lack sufficient contextual information; many tools do not implement standards even where these exists; a current wealth of domain-specific reporting standards, or their incompleteness and absence in other areas are other major challenges. My presentation will provide a snapshot of the current situation. I will highlight a number of stories, the social engineering side and also key challenges, enriched by my experience over the last decade by working with a variety of stakeholders, including bioscience researchers, bioinformaticians, developers in public and private sectors, standards developing communities, as well as funders and publishers.
(Lecture time: 1 hour)
References 1. Field D*, Sansone SA*, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J: Megascience. 'Omics data sharing. Science 326(5950):234-236 (2009) 2. List of standards at BioSharing: www.biosharing.org 3. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ; OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251-1255 (2007) 4. Taylor CF,* Field D*, Sansone SA*, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK Jr, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Le Novre N, et al.: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8):889-896 (2008) 5. Sansone SA and Rocca-Serra P: On the evolving portfolio of communitystandards and data sharing policies: turning challenges into new opportunities. GigaScience 1:10 (2012) 13
6. Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M, Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T, Wilson J, Lynch N, Wise J, Dix I: Empowering industrial research with shared biomedical vocabularies. Drug Discov Today 16(21-22):940-947 (2011)
The Reality From the Buzz: how to deliver reproducible bioscience data. In this unsettled status quo - presented in my first talk - how can we enable bioscience researchers to make use of existing community standards and maximize data sharing and the subsequent reuse of richly annotated experimental information? A successful example is provided by the Investigation/Study/Assay (ISA) [1] open source, metadata-tracking framework developed and supported by the growing ISA Commons community [2]. The ISA framework includes both a general-purpose file format and a software suite to tackle the harmonization of the structure of bioscience experimental metadata (e.g., provenance of study materials, technology and measurement types, sample-to-data relationships) by enabling compliance with the community standards. This example illustrates how the synergy between research and service groups in academia, (e.g. in Harvard [3] and at The European Bioinfomatics Institute [4]) and in industry (e.g. at The Novartis Institutes for BioMedical Research and at Janssen Pharmaceuticals, a company of Johnson & Johnson) across a variety of life science domains, is pivotal to build an network of data collection, curation, and sharing solutions that progressively enable the invisible use of standards. I will present the rationale behind the collaborative development and the evolution of this exemplar ecosystem of data curation and sharing solutions - built on the common ISA framework. I will also provide high-level examples on how this is used to collect, curate and manage heterogeneous experimental metadata in an increasingly diverse set of domains including environmental health, environmental genomics, metabolomics, (meta)genomics, proteomics, stem cell discovery, systems biology, transcriptomics, toxicogenomics, etc. I will also discuss the experiences learned by my team, our collaborators and the growing user community with usability of the community standards and provide an 14
update on the next steps to develop user-friendly visualization functionalities and use semantic web approaches to make existing knowledge available for linking, querying, and reasoning. (Lecture time: 1 hour)
References
1. Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, Field D, Harris S, Hide W, Hofmann O, Neumann S, Sterk P, Tong W, Sansone SA: ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics. 15; 26(18):2354-6 (2010); isa-tools.org 2. Sansone SA*, Rocca-Serra P*, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo CT, Forster MJ, Gaudet P, Gilbert J, Goble C, Griffin JL, Jacob D et al.: Toward interoperable bioscience data. Nat Genet 27; 44(2):121-126 (2012); isacommons.org 3. Ho Sui SJ, Begley K, Reilly D, Chapman B, McGovern R, Rocca-Sera P, Maguire E, Altschuler GM, Hansen TA, Sompallae R, Krivtsov A, Shivdasani RA, Armstrong SA, Culhane AC, Correll M, Sansone SA, Hofmann O, Hide W: The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons. Nucleic Acids Res 40 (Database issue):D984-91 (2012). (2012); discovery.hsci.harvard.edu 4. Haug K; Salek R; Conesa P, Hasting J, de Matos P, Rijnbeek M, Mahendraker T, Williams M, Neumann S, Rocca-Serra P, Maguire E, Gonzalez Beltran A, Sansone SA, Griffin J, Steinbeck C: MetaboLights An open-access general-purpose repository for Metabolomics studies and associated meta-data. Nucleic Acids Res (in review); www.ebi.ac.uk/metabolights
15
Thom H. Dunning, Jr., National Center for Supercomputing Applications, Institute for Advanced Computing Applications and Technologies, and Department of Chemistry, University of Illinois at Urbana-Champaign
Scientific Computing in Science and Engineering. Computational modeling and simulation is among the most significant developments in the practice of scientific inquiry in the 20th Century. Modeling and simulation are now contributors to essentially all scientific and engineering research programs and are finding increasing use in a broad range of industrial applications. The use of computing technology is now spreading to the observational sciences, which are being revolutionized by the advent of powerful new sensors that can detect and measure a wide range of physical, chemical and biological phenomena. Massive digital detectors in a new generation of telescopes have turned astronomy into a digital science. Sensor arrays for characterizing ecologies and new sequencing instruments for genomics research are revolutionizing the biological sciences. This lecture will discuss the elements of computational modeling and simulation as well as the emerging area of data-driven science and discuss the impact of these new approaches in a few fields, while also drawing on the lecturers experiences in chemistry. (Lecture time: 1 hour) Technology Trends and Future of High Performance Computing. Computing technologies are undergoing a dramatic transition. Because of physical limitations, the computational power of a single microprocessor core, the basis of all computing systems from laptops to supercomputers, has stopped increasing. Dual-core systems were introduced in 2005, quad-core chips in 2007, and eight-core chips are now
16

available from many vendors. This trend will continue into the future, with the number of cores on a chip continuing to increase. In fact, the use of innovative computing technologies based on many-core chips, e.g., NVIDIA GPUs, is now being seriously explored in many areas of scientific computing. This technology shift presents a challenge for computational science and engineeringthe only significant performance increases in the future will be through the increased exploitation of parallelism. Although these technologies promise to bring petascale computers into researchers institutions, and even their laboratories, computers built on these technologies have significant implications for the design of the next generation of science and engineering applications. This lecture will provide an overview of the directions in computing technologies as well as describe the challenges associated with exploiting these new technologies in computational science and engineering. (Lecture time: 1 hour) Blue Waters: overview of a sustained petascale computing system. A new generation of supercomputerspetascale computersis providing scientists and engineers with the ability to simulate a broad range of natural and engineered systems with unprecedented fidelity. Just as important in this increasingly data-rich world, these new computers allow researchers to manage and analyze unprecedented quantities of data, seeking connections, patterns and knowledge. The impact of this new computing capability will be profound, affecting science, engineering and society. The National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign is deploying a computing system that can sustain one quadrillion calculations per second on a broad range of science and engineering applications as well as manage and analyze petabytes of data. This computer, Blue Waters, has been configured to enable it to solve the most compute-, memory- and data-intensive problems in science and engineering. It will have tens of thousands of chips (CPUs & GPUs), petabytes of memory, tens of petabytes of disk storage, and hundreds of petabytes of archival storage. The presentation will describe Blue Waters and illustrate the role that Blue Waters will play in a few illustrative areas of research. (Lecture time: 1 hour)
17
Yan Xu, Microsoft Research, USA

Open Data for Open Science. Part 1. Tools for data scientists. An introduction to some of the most cutting-edge Microsoft technologies that facilitate scientists to discover, access, consume, and share scientific data. Part 2. Demos of data tools from Microsoft. Demos of how to create solutions using the tools presented in Part-1, with real-world scenarios and data. Attendees may bring their Windows PC to follow the demos to create data visualization samples with their own environmental research data in WorldWide Telescope (http://www.worldwidetelescope.org) and share the results on Layerscape (http://www.layerscape.org).
18

SPAS e-SciBioenergy: Program and Presentation Abstracts

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

SPAS e-SciBioenergy: Program and Presentation Abstracts

Transféré par

Droits d'auteur :

Formats disponibles

Contents

Program . 05 Abstracts of Presentations .. 06 B. S. Manjunath . 06

Christophe Ambroise .... 07

Cludia Bauzer Medeiros . 08

Joo Eduardo Ferreira .. 10

Marta L. Queirs Mattoso ..... 11

Susanna-Assunta Sansone ... 12

Thom H. Dunning, Jr. 16

Opening Presentation FAPESP Break

9:30 10:30 11:00

Talk C. Ambroise Break Talk C. Ambroise

Talk T. Dunning Break Talk T. Dunning

Talk J. E. Ferreira Break Talk S. Sansone

Talk Y. Xu Break Talk Y. Xu

Talk M. Mattoso 12:00 Talk M. Mattoso Talk T. Dunning Talk S. Sansone

Lunch Talk B.S. Manjunath

Lunch Talk C. B. Medeiros Talk Graduate Progs Break -

Talk B.S. Manjunath Break Talk B.S. Manjunath

Break Talk C. Ambroise

Break Posters Students

B. S. Manjunath, Centre for Bio-image Informatics, University of California (UCSB), USA

Yan Xu, Microsoft Research, USA

Vous aimerez peut-être aussi