Printer Friendly

Expanding roles in a library-based bioinformatics service program: a case study.

INTRODUCTION

Rapid advances in molecular technologies during the past 2 decades have generated an unprecedented quantity of biomedical data. An arsenal of bioinformatics databases and software tools has been, and continues to be, created to assist researchers in analyzing, manipulating, and interpreting these data. As reported in Nucleic Acids Research, the number of online molecular biology databases has increased from 58 in 1996 to 1,512 in 2012, a 26-fold increase [1]. Data analysis tools have increased during the same time at an even higher rate [2]. Understandably, investigators are focused on their own research, and they and their teams find it exceptionally difficult to keep up with the latest research analysis tools in even their own specialized fields. Furthermore, the interdisciplinary nature of bioinformatics compounds this problem by increasing the complexity and expanding the knowledgebase needed to take advantage of the capabilities of these resources. To complete their work, university researchers require access to bioinformatics tools, and support in identifying and using these resources.

The health sciences library is a logical provider of bioinformatics resources and services in that it exists to meet the multidisciplinary information needs of all members of the university's research community. Instruction, consulting services, and licensing support are provided by libraries to faculty, students, and staff at no charge. No other academic unit has this same institutional mission.

The first library-based bioinformatics service program was launched in 1995 by the University of Washington Health Sciences Library. This program pioneered the use of a bioinformatics specialist with a doctoral degree in molecular and cellular biology to staff the program. Provided services included consultation, education, and training on bioinformatics tools; access to networked biological information systems; and development of a web portal to bioinformatics resources [3]. Over the next decade, similar programs were instituted at approximately twenty additional academic health sciences libraries [4-6]. In many cases, responsibility for the centralized licensing of commercial database suites and/or software analysis tools for researchers was also assumed by the library.

This paper offers a case study of the implementation and growth of the Bioinformatics Service Program at the University of Southern California's (USC's) Norris Medical Library (NML), and the library's actions to reassess its users' bioinformatics needs, secure funding, and expand the program to meet the growing and evolving needs of the university's research community.

ESTABLISHING THE PROGRAM

Although USC's NML had been providing individual training and workshops in the areas of molecular biology and genetics since 1990, the growing complexity of researcher needs and resources made it clear that the services provided by the NML librarian needed to be carried out by an individual with a strong background in the biosciences. In 2005, the library hired an individual with a doctoral degree (PhD) in life science to fill a vacant reference librarian position, and the Bioinformatics Service Program was established. Similar to other existing health sciences library bioinformatics programs, the primary focus of the new program at USC was on providing a consulting service, training workshops, and a web portal for bioinformatics resources [7].

To promote its services and develop a client base for the library-based program, the bioinformatics specialist compiled an email list consisting of over 1,000 faculty members, postdocs, and staff researchers who are active in biomedical and life sciences research. The list, including updates, has been routinely used to disseminate announcements on new services and training workshops. Presentations were given at several departments or schools by invitation from researchers or faculty who had utilized the bioinformatics program. These presentations consisted of either overviews of the service and software collection or training workshops on specific research topics.

In 2008, in response to the growing demand for access to high-quality data analysis tools, the library purchased single seats for 3 commercial resources and provided free access to them for USC users. Commercial bioinformatics software programs are relatively easy to use and have reliable user support, and many contain high-quality, human-curated, and proprietary-knowledge content. The programs selected to address the major user needs were: Partek Genomics Suite for microarray data analysis, Ingenuity Pathway Analysis (IPA) for protein interaction network and pathway analysis, and Vector NTI Advance for molecular sequence analysis. In addition to their functions and scientific merits, these specific software tools were selected on the basis of feedback from USC users who attended vendor demonstrations set up by the bioinformatics specialist. During the first 2 years following the software purchase, more than 400 users registered for access to the software. Also in 2008, the bioinformatics specialist began to perform in-depth data analysis for researchers and, as a result, coauthored papers with more than a dozen faculty members. These collaborations served to further increase awareness of and the reputation of the program.

By 2009, the demand for bioinformatics services could no longer be met by one bioinformatics specialist. The volume of received questions grew as the requests became more divergent and complex. Requests for consultations, training, and data analysis assistance could not be met by a single specialist, and many had to be processed much later or turned away due to time constraints. At the same time, access to the small number of USC-licensed commercial software programs became increasingly competitive due to the limited number of licensed "seats."

REASSESSING NEEDS FOR BIOINFORMATICS SUPPORT

The existing service program was no longer sufficient to meet the needs of the USC research community, and it was imperative that additional funding be found to sustain and grow the program. To better understand what users needed and assess the usefulness of existing service offerings, a survey was sent to over 1,000 biomedical researchers through the program's mailing list in May 2010 (Appendix A, online only). A total of 254 researchers completed the survey, with faculty members as the largest percentage of respondents (38%), followed by graduate students (34%) and postdoctoral scholars (19%). Survey results showed that workshop training and in-person consultations were considered the most useful services provided by the program, although all services were highly rated (Table 1). All 3 commercial tools were rated by the majority (>60%) as "extremely useful" or "potentially useful," with IPA as the top rated tool (Table 2). At least 20%-40% of users had difficulty accessing the software tools due to limited availability (Table 3).

Strong interest was expressed in obtaining support for various types of data analysis. Most services were seen as desirable by users. All but one of the analysis needs listed was ranked as very useful or extremely useful by a majority of respondents (Figure 1).

The survey results confirmed an overwhelming need for bioinformatics support at USC and clearly showed that the greatest unmet need was for high-throughput data analysis support (related categories marked in Figure 1). Results also revealed that 35% of responding researchers had data they could not analyze because they lacked access to the appropriate software tools, and that 58% had data they could not analyze because they lacked sufficient training in using the appropriate tools. The shortage of NML's manpower in its bioinformatics service was remarked upon as was the need for increasing the size and scope of the licensed software collection.

RESOLVING FUNDING ISSUES

For the first four years, the bioinformatics program was funded wholly by NML. By reallocating personnel costs through staff reorganization, the personnel cost for one bioinformatics specialist was covered. The library budget also was used to make a modest investment in three bioinformatics software licenses for the university. This proved to be a critically important factor for later funding requests: statistics collected during this time were later used to demonstrate high usage and demand for the tools and the service as a whole. In addition, prominent faculty members and researchers who depended on the program's resources during this period provided strong letters of support and advocated for the program when broader funding from the university was sought.

Two primary sources for additional support were targeted: (1) the USC Libraries, which has a separate budget from the health sciences libraries, and (2) USC's Office of Research. Because the bioinformatics program has an interdisciplinary focus and its user community encompasses disciplines throughout the university, it made sense for the USC Libraries to contribute funding. The Office of Research has at its core mission the charge to grow research and provide support to grant-seeking faculty. The essential role of bioinformatics in research is underscored by the fact that using the proper software tool can save months of unnecessary, or flawed, analytical work. Consulting with a bioinformatics specialist can easily make the difference between whether or not researchers receive funding for their projects, publish their findings in top-tier journals, or make discoveries that will have a significant impact on patients' lives. The Office of Research viewed the NML Bioinformatics Service Program as a useful partner in furthering its own mission. After four years of experience demonstrating how the program could significantly benefit the university community and persistent advocacy efforts by the library director and bioinformatics specialist, both of these institutional entities eventually decided to provide a significant level of financial support. A summary of program expenses and sources of funding is provided in Table 4.

ENHANCING PROGRAM SERVICES

In 2010, using funds provided by the university library, an additional bioinformatics specialist with a dual master's degree in bioinformatics and biochemistry was hired. Additional seats were purchased for the three software programs already licensed for the program, and licenses for five additional commercial resources were purchased using the funds made available from the university's Office of Research. The expanded software collection, currently consisting of eight resources (Table 5, online only), and the increased number of seats allowed the NML Bioinformatics Service Program to better address the constellation of data analysis needs of its users, as the total number of registered software users tripled in the following two years (Figure 2, available only online).

Prior to 2009 the NML Bioinformatics Service Program held in-house workshops featuring multiple resources suitable for a specific research topic in order to help users identify the most appropriate tools to use for their research. Although useful, these workshops often lacked detailed instructions for utilizing the presented resources. Realizing that more thorough instruction was vital for getting researchers started in using a new tool, the bioinformatics specialists sought a way to incorporate in-depth, resource-oriented training into service offerings. Starting in 2009, outside trainers with extensive expertise, such as software field application specialists or program developers, were invited to conduct on-site or webinar training sessions. The training costs were included in the annual fee negotiated for licensing the software. These sessions elaborated on the key functionalities and latest features of a software tool and provided detailed instructions on using the tool to perform specific analyses. To enhance the user experience, live demonstrations and hands-on practice sessions were included whenever possible. The invited trainer workshops have appealed to a large audience over the past two years (Figure 3). By leveraging outside expertise for user training, a considerable amount of the bioinformatics specialists' time and effort has been saved.

In addition to live training, the bioinformatics specialists developed online information portals for key software programs and in-house workshops to promote self-training. These portals included general information, training information (schedules, presentation slides, handouts, and recordings), links to online tutorials and frequently asked questions (FAQ) pages, and contact information for technical support. A total of 15 subject guides have been developed since 2010, including 12 guides on software tools and 2 on training workshops. One guide serves as a "multimedia classroom" to host all workshop sessions that have been previously recorded for later viewing [8]. These guides receive approximately 800-1,000 visits per month.

RESPONDING TO NEXT GENERATION SEQUENCING

Evolving biotechnologies remarkably shifted the areas of needed bioinformatics support. One noticeable change was the adoption of the next generation sequencing (NGS) technique in biomedical research, which measures genome-wide biomolecular changes at single-nucleotide resolution [9-12]. Unlike previous generation techniques, NGS generates enormous and extremely complicated data, requiring greater bioinformatics expertise as well as extreme computation power [13].

While workshops and consultations help elucidate the principles of NGS data analysis, users are still incapable of performing data analysis without access to appropriate software or satisfactory high-performance computers. A short survey sent to the program's mailing list in 2011 showed that 53% (n538) of respondents reported a concern about "lack of expertise and software tools." Notably, 46% of respondents reported they expected to have NGS data in the next 612 months.

To address this emerging need, the NML Bioinformatics Service Program sought a combined software and hardware solution for NGS data analysis. After reviewing several commercially available NGS data analysis programs on the basis of their functionality, performance, and usability, two software suites were selected: the proprietary Partek Flow product and the open-access Galaxy suite. Both software suites integrate multiple tools for analyzing various NGS data and have implemented a user-friendly interface to facilitate the use of these command-line tools by biologists.

Computing resources suitable for performing NGS analysis were then investigated. Institutional bioinformatics cores have implemented powerful computer workstations [14, 15], computer clusters [16-18], and cloud-based NGS solutions [19]. Computer cluster and cloud-based NGS solutions provide ample computational power compared to computer workstations, which have only moderate computational power; however, the latter may be more flexible and accessible. When preinstalled with bioinformatics tools, they can serve as walk-in or remote-access workstations for various bioinformatics applications in addition to performing NGS analysis [14, 20]. After balancing the advantages and disadvantages and extensive discussion with USC's High-Performance Computing and Communication (HPCC) [21], the library purchased two computer workstations for its Bioinformatics Computation and Consulting Center and five computer nodes in HPCC (configured into a custom cluster named HPCC-NML) [22, 23].

Once the software and hardware were in place, a student intern with shell scripting and high-performance computing experience was recruited to assist with software installation and configuration. The bioinformatics specialists then tested the implementation extensively using datasets provided by USC researchers. From their establishment in October 2012 through March 2013, the workstations and HPCC-NML have an average weekly usage of more than 400 hours on a 24/7 basis; most usage is remote. The usage is quickly growing as users produce more data.

FACILITATING RESEARCH COLLABORATION

Bioinformatics has become an indispensable component of increasingly interdisciplinary biomedical research. Typically, bench-top researchers rely on personal communication to identify bioinformatics researchers for possible collaborations. As biomedical researchers generate more genomic data with increasing complexity, their bioinformatics needs become more diverse and specific, and the traditional method of identifying and establishing bioinformatics collaborations is no longer effective. This situation provides an opportunity for bioinformatics service programs to serve as a conduit for identifying campus collaborators.

Through years of daily consultations and collaborative data analysis projects, the library-based bioinformatics program has gathered considerable awareness of research projects and expertise at USC. Bioinformatics clients have been referred to other service providers on campus when their expertise was relevant. With firsthand information on both bioinformatics needs and offerings at USC, the two library-based bioinformatics specialists were in a natural position to promote on-campus collaborations. To exploit this role, the NML Bioinformatics Service Program sponsored half-day, campus-wide collaboration symposia, which were held to facilitate exploration of potential collaborations.

The first symposium, "Navigating an Ocean of 'Omics Data with Bioinformatics/Biostatistics Collaborations," was held in 2011 to promote collaboration between university labs. Fourteen high-profile speakers--including biomedical researchers, computational biologists, and biostatisticians--were invited to present 30-minute talks. The focus of the talks differed between the bioinformatics clients and providers: biomedical researchers introduced successful collaborative experiences, whereas computational biologists and biostatisticians showcased cutting-edge methods developed to analyze real-life data. The collaboration symposium was attended by 183 registrants, 73 (40%) of whom were faculty members.

"Resources for Next-Generation Sequencing" was held in 2012, focusing on raising awareness of various bioinformatics services and promoting lab-central service collaborations. Representatives of 5 major NGS service providers at USC--including sequencing core facilities, bioinformatics and biostatistics services, and the USC Clinical and Translational Science Institute--were invited to provide an overview of their services. Each presenter introduced the missions, types of services, and charging models (if any) for their services and discussed the concerns that affect NGS experimental design and data analysis. The 2012 NGS symposium was attended by more than 120 registrants, 43 (36%) of whom were faculty members.

Statistics plays an important role in experimental design and data analysis. In 2012, the NML Bioinformatics Service Program supplemented its range of services by providing access to free statistical consulting for users with general statistical questions. Although USC's Information and Technology Service had been offering this service on the university park campus for some time, no site on the health sciences campus had been designated for holding these consultations. Once the PhD consultant began to offer biweekly sessions in the NML Bioinformatics Computation and Consulting Center, the service became heavily used with an average of nine users and more than seven hours of consultation each week.

LESSONS LEARNED

Based on our experience, the authors believe that it is critically important to staff library-based bioinformatics service programs with individuals who possess a strong science background at the graduate degree level and preferably who have practical research experience. The library-based bioinformatics specialist must have the ability to communicate effectively in the language of the researcher.

Conducting a comprehensive needs assessment has proved to be an effective method for the bioinformatics program to gauge the needs of potential users. Results of the needs assessment revealed the problems faced by researchers and helped to define the services and resources that the program would use to address these problems, such as increasing the focus on high-throughput data (including NGS) analysis.

Commercially available bioinformatics software and analytic tools are expensive to license. In some library-based programs, fees are charged for access to licensed bioinformatics resources. To date, NML has elected not to pursue a fee-based model. In the view of the library, fees create barriers for many researchers and small labs that do not have the same ability as well-funded labs and grant-supported faculty to pay for access to resources. The library takes the position that these costs should be part of the institutional infrastructure [24]. As with other resources (journals, books, databases) provided by the library to support the educational, research, and clinical needs of its users, the bioinformatics tools should be freely available to all including graduate students, postdocs, and others who might be unable to access them otherwise.

As with traditional library resources, bioinformatics tools require personnel to ensure the resources are acquired through site licenses, promoted, and used effectively. For researchers to select the most appropriate tool and apply it to their data analysis, it is essential that they also receive appropriate educational support. The library is the only unit in the university setting with the mission to provide each of these critical service roles. However, to be a strong partner in furthering these research goals of the university, the library must also have the institution's commitment to provide additional financial support.

Establishing a bioinformatics service program does not ensure its usage. To be successful, library-based programs must put substantial effort into outreach activities [3, 4, 25]. Conducting regular consultations, attending in-house research presentations, and organizing campus-wide events around library-based bioinformatics resources has created excellent opportunities for the bioinformatics specialists to interact with users.

Although a library-based bioinformatics service is only one of many bioinformatics support providers in the institution, its unique service-oriented mission places it in an important position of having extensive knowledge of both the bioinformatics needs and the bioinformatics resources available at the university. This extensive knowledge allows the library-based bioinformatics staff to effectively promote intra-institute collaboration by matching individuals with similar research interests and connecting researchers with services and tools dispersed throughout campus. The popularity of the two symposia organized by the NML Bioinformatics Service Program to encourage collaboration serves as confirmation of the value that the research community places on identifying on-campus collaborators.

Library-based bioinformatics service programs require substantial commitments on the part of the library and the institution. Despite the required efforts and the major commitment of needed resources, it is our belief that the benefits far outweigh the costs of such a program. Continual feedback from USC researchers suggests that the NML bioinformatics program is one of the most significant contributions the library has made to the work of the research community at USC. Researchers with access to appropriate bioinformatics resources and training on how to use them effectively are in a position to significantly shorten the data analysis cycle, work with a much wider range of data, and increase their competitiveness for grant applications.

Our ultimate goal is to shape the NML-based bioinformatics program into an indispensable component of the USC research community. To meet this challenge, the library's bioinformatics specialists will continue to support the university's researchers in infusing bioinformatics tools and solutions into their daily routine to promote research efficiency, sharpen research focus, and polish research hypotheses.

DOI: http://dx.doi.org/ 10.3163/1536-5050.101.4.012

APPENDIX

Survey

1. Status:

* Faculty

* Postdoc

* Graduate student

* Lab/tech personnel

* Other, please specify--

2. School: (check more than one if you have a joint appointment)

* Keck School of Medicine

* School of Pharmacy

* School of Dentistry

* College of Letters, Arts, and Sciences

* Viterbi School of Engineering

* Other, please specify--

3. Identity:

Name--

Email address--

4. General assessment of data analysis needs

Please indicate how important the following data analysis areas are for your research needs:

                                          Not
Statistical analysis of high-throughput   useful   2   Useful
data (e.g., microarray data)              ()       ()   ()

Optional comments:--

DNA/protein sequence manipulation         ()       ()   ()
and analysis

Optional comments:--

SNP, genetic variation and genomewide     ()       ()   ()
association data analysis

Optional comments:--

Integrated searches of literature and     ()       ()   ()
high-throughput data

Optional comments:--

Functional analysis of high/low-          ()       ()   ()
throughput data

Optional comments:--

Signaling and metabolic pathway           ()       ()   ()
analysis

Optional comments:--

Transcription factor and gene             ()       ()   ()
regulatory sequence analysis

Optional comments:--
                                               Extremely
Statistical analysis of high-throughput   4    useful
data (e.g., microarray data)              ()     ()

Optional comments:--

DNA/protein sequence manipulation         ()     ()
and analysis

Optional comments:--

SNP, genetic variation and genomewide     ()     ()
association data analysis

Optional comments:--

Integrated searches of literature and     ()     ()
high-throughput data

Optional comments:--

Functional analysis of high/low-          ()     ()
throughput data

Optional comments:--

Signaling and metabolic pathway           ()     ()
analysis

Optional comments:--

Transcription factor and gene             ()     ()
regulatory sequence analysis

Optional comments:--


5. Additional data analysis needs not listed above:

1--

2--

3--

4--

5--

Assessment of existing bioinformatics software licensed by USC

Please review the purpose and capabilities of the following three commercial software suites/products that have been licensed by the Norris Medical Library and indicate their usefulness for your research efforts.

6.1. Ingenuity Pathways System

What is it?

A literature and knowledge-based software designed for comprehensive functional analysis and interpretation of various 'omics data at the systems biology level. What is it used for?

* Analysis of data derived from gene expression and SNP microarrays, metabolomics and proteomics experiments, and smaller scale experiments that generate gene lists

* Identify signaling and metabolic pathways, molecular networks, and biological processes that are most significantly perturbed in a dataset of interest

* Gain biological insight into cell physiology and metabolism from metabolite data

* Analyze toxicity and safety of candidate compounds to better understand pharmacological response, drug mechanism of action, and mechanism of toxicity

* Identify the most promising and relevant biomarker candidates within experimental datasets

* Transform customized networks and pathways into publication-quality pathway graphics

Not useful for    Potentially to   Extremely useful to   Have not used
my research       useful to        my research           but would be
                  my research                             useful

()                  ()                  ()                  ()


7. Currently, there is 1 concurrent user seat available for Ingenuity Pathways System. Have you had difficulty scheduling a time for accessing the software?

* No

* Yes, occasionally

* Yes, frequently

* Not applicable ... it is not useful for me

* Other, please specify--

8. Optional comments on ingenuity pathways system:--

9.2. Partek Genomic Suite

What is it?

A software program designed for statistical analysis of high-throughput array and nextgeneration sequencing data.

What is it used for?

* Expression microarray data analysis for customer and major array platforms

* Exon-level expression data analysis for the detection of alternative splice expression patterns

* Copy number analysis for identifying regions of LOH amplification/deletion

* ChIP-Seq and RNA-Seq analysis for next gen sequencing data

* Promoter tiling array data analysis for ChIP-on-Chip application

* Genotyping and SNP association data analysis

* MicroRNA array analysis with integration of mRNA data

Not useful for    Potentially to   Extremely useful to   Have not used
my research       useful to        my research           but would be
                  my research                            useful

()                  ()                  ()                  ()


10. Currently, there is 1 concurrent user seat available for Partek Genomic Suite. Have you had difficulty scheduling a time for accessing the software?

* No

* Yes, occasionally

* Yes, frequently

* Not applicable ... it is not useful for me

* Other, please specify--

11. Optional comments on Partek Genomic Suite:--

12.3. Vector NTI Advance 11 Suite

What is it?

A suite of applications for routine nucleotide/protein sequences analysis, manipulations, and management.

What is it used for?

* Perform common sequence analysis and manipulation tasks

* Generate recombinant cloning strategies and protocols

* Design and analyze PCR primers

* Assemble DNA sequences

* Annotate DNA/protein functions

* Perform multiple sequence alignments on proteins and nucleic acids

* Store and manage databases of molecules, enzymes, citations, BLAST results, etc.

* Create publication quality graphics

Not useful for    Potentially to   Extremely useful to   Have not used
my research       useful to        my research           but would be
                  my research                            useful

()                  ()                  ()                  ()


13. Currently, there is 1 concurrent user seat available for Vector NTI Advance 11. Have you had difficulty scheduling a time for accessing the software?

* No

* Yes, occasionally

* Yes, frequently

* Not applicable ... it is not useful for me

* Other, please specify--

14. Optional comments on Vector NTI Advance 11 Suite:--

Assessment of potential software suites for licensing

Please review the purpose and capabilities of the following five commercial software products and indicate their usefulness for your research efforts.

15.1. BIOBASE SUITE

What is it?

A suite of bioinformatics software and databases for in-depth functional analysis of various high/low-throughput data. It includes: BKL Proteome Database, BKL TRANSFAC Professional Databases, TRANSPATH Professional Database, Explain Analysis System, and Human Gene

Mutation Database.

What is it used for?

BKL Proteome

* Multi-facet functional analysis of molecular and genomic data for multiple model organisms at systems biology level

* Integrated analysis of associations among key biomedical concepts such as diseasebiomarkers, drug-proteins

* Search existing comprehensive knowledge on proteins

TRANSPAC Pro

* Search the most comprehensive collection of annotated eukaryotic gene-regulation data

* Search known and predicted factor binding sites and composite elements

* Analyze DNA sequences for transcription factor binding sites using Match and Patch, matrix, and pattern-based search tools

* Find in-vivo binding sequences from ChIP-on-chip and ChIP-seq experiments

* Pathologically relevant mutations in transcription factors or their binding sites

* Visualize gene-regulatory networks

* Search for miRNAs and their target sequences

* Search for extensive structural and functional information on transcription factors including our transcription factor classification

TRANSPACPATH Pro

* Search mammalian protein-protein interactions and signaling/metabolic pathways

* Analyze array data to build and visualize potential pathways using PathwayBuilder and ArrayAnalyzer

Explain Analysis System

* Integrated functional analysis of high-throughput data from microarray, proteomic, and ChIP-chip experiments in over 200 model organisms

* Systematically create experimentally testable hypotheses for gene transcription regulation and signaling networks

* Map putative transcription factor (TF) binding sites on promoters in focus

* Find TFs responsible for common regulation of differentially expressed genes

* Identify key molecules upstream of TFs that might be responsible for the coordinated regulation of the suggested TFs

Human Gene Mutation Database Pro

* Search annotated information on 93,000 inherited disease-relevant mutations and polymorphisms in over 3,500 human genes

* Search for specific genes, disease states, mutations, and literature references utilizing a wide range of options

* Obtain gene/mutation summaries containing disease mutation and clinical/laboratory phenotype data along with expert annotated mutation sequence information

Not useful for    Potentially to   Extremely useful to   Have not used
my research       useful to        my research           but would be
                  my research                            useful

()                  ()                  ()                  ()


16. Optional comments on BIOBASE:--

17.2. GenomatixSUITE PE

What is it?

A suite of bioinformatics software and databases for multilayer in-depth functional analysis of various high/low-throughput data. It includes: BiblioSphere PE, GEMS Launcher,

Eldorado/Gene2Promoter, MatInspector, and MatBase.

What is it used for?

BiblioSphere PE

* Extract and analyze gene relationships from literature databases and genome-wide promoter analysis

* Search world's largest database of biological networks created from millions of individually modeled relationships between genes, proteins, complexes, cells, and tissues

* Integrate knowledge from literature, genomic, interaction, pathway, and network databases.

GEMS Launcher

* In-depth analysis of transcriptional regulation with integrated high-quality databases and gold standard algorithms

ElDorado

* Genome visualization of curated sequence features for 21 model organisms focusing on the primary transcript structure, alternative promoter regions, alternative splicing, repetitive sequences, S/MARs, and potential regulatory effects of SNPs

* Comparative genomics analysis of orthologous genes to identify corresponding promoter regions for the analysis of phylogenetically conserved promoter elements

Gene2Promoter

* Retrieve and analyze promoter sequences of all genes annotated in the available genomes

* Generate the chromosomal location and a short description of the corresponding gene together with the promoters and alternative transcripts for a given list of genes

* Comparative promoter analysis

MatInspector

* Transcription factor analysis that utilizes a large library of matrix descriptions for transcription factor binding sites to locate matches in DNA sequences

* Search for transcription factor binding sites from given genes or sequences

MatBase

* Search for comprehensive information on transcription factors, including target genes, binding sites sequence and binding domain, weight matrices and matrix families, as well as literature references and expert-curated TF/gene interactions

Notes:

1. Like BIOBASE's TRANSFAC, Genomatix suite is another key resource indispensable for gene regulation study that focuses on transcription factors. While there is some overlap, the two software suites have substantial differences in database content coverage and many different analytical features.

Not useful for    Potentially to   Extremely useful to   Have not used
my research       useful to        my research           but would be
                  my research                            useful

()                  ()                  ()                  ()


18. Optional comments on GenomatrixSUITE PE:

19.3. NextBio

What is it?

An ontology-based semantic framework that provides a unified interface for integrated searches of both literature and a spectrum of pre-analyzed publicly available high throughput data.

What is it used for?

Literature-Centric Module

* Efficient and sophisticated biomedical literature search of PubMed abstracts, PMC full-text collection, clinical trials information, and news

* Exclusive full-text search of Elsevier journals @ ScienceDirect, along with on-the-fly NLPdriven text analysis and interactive filtering

Gene-Centric Module

* Search pre-analyzed public microarray data of RNA expression, miRNA expression, as well as protein/siRNA assays

* Import and correlate user's own microarray data to NextBio's pre-analyzed and curated public microarray data

* Rich functional analysis of high-throughput results for pathways, Gene Ontologies, TF binding sites and protein families

Sequence-Centric Module

* Import, management, analysis, and correlation of user data produced by chip and nextgeneration sequencing technologies with a spectrum of pre-analyzed and curated public data, including SNP genotyping, GWAS, protein-DNA binding (ChIP-seq and ChIP-chip), histone/DNA methylation (Methyl-seq, ChIP-chip), and DNA copy number (Array CGH, CNV sequencing and array)

* Innovative genome browser brings an integrated visualization of global meta-analysis of sequence-centric data

Notes:

1. Nextbio's orthogonal data integration makes it a unique bioinformatics resource that allows users to integrate and analyze orthogonal datasets at the level of DNA, RNA and proteins.

Not useful for    Potentially to   Extremely useful to   Have not used
my research       useful to        my research           but would be
                  my research                            useful

()                  ()                  ()                  ()


20. Optional comments on NextBio:

21.4. GeneSpring GX

What is it?

A software designed for visualization and analysis of expression and genomic structural variation data.

What is it used for?

* Expression microarray data analysis for customer and major array platforms

* Exon-level expression data analysis for the detection of alternative splice expression patterns

* Copy number analysis for identifying regions of LOH amplification/deletion

* ChIP-Seq and RNA-Seq analysis for next gen sequencing data

* Promoter tiling array data analysis for ChIP-on-Chip application

* Genotyping and SNP association data analysis

* Interactive Genome Browser allows users to visually integrate heterogeneous data by simultaneously importing data tracks from multiple experiments and permitting overlay of data and annotation tracks

* Built-in dynamic and flexible pathway and network analysis tools enable literature-based biological contextualization of microarray results

Notes:

1. While both Partek GS and GeneSpring GX are similar applications overall, there are substantial differences in their approaches and algorithms, as well as features, usability, strengths, and weaknesses.

2. Providing multiple software solutions to very complicated tasks such as microarray data analysis is a prudent approach that has been adopted by many universities and industrial settings.

Not useful for    Potentially to   Extremely useful to
my research       useful to        my research
                  my research
()                  ()                  ()


22. Optional comments on GeneSpring GX:--

23.5. Lasergene Suite

What is it?

A software suite for DNA and protein sequence analysis, contig assembly, and sequence project management.

What is it used for?

* Primer design, virtual cloning, visualization

* Sequence assembly and analysis, SNP discovery

* Pair-wise or multiple DNA or protein sequence alignment

* Protein structure analysis and prediction

* Sophisticated DNA sequence analysis including gene discovery, regulatory elements, and pattern identification

* Create publication quality graphics and gene reports Notes:

1. While there is significant overlap between Vector NTI and Lasergene, the two software programs have substantial differences in several analytical features.

2. Lasergene's newly added SNP discovery function is unique and may help USC researchers who have started to get involved in genetic variation studies.

3. Just like Vector NTI, the license is perpetual.

Not useful for    Potentially to   Extremely useful to
my research       useful to        my research
                  my research
()                  ()                  ()


24. Optional comments on Lasergene Suite:--

25. Are there other commercial bioinformatics software programs/tools that you think the university should consider site-licensing?

1--

2--

3--

4--

5--

Assessment of Norris Medical Library's Bioinformatics Service Program

26. Please rate the usefulness of the following services currently provided by the bioinformatics specialist at the Norris Medical Library.

                                       Not useful
                                                   2  Useful

A. Consulting service to help select     ()        ()   ()
appropriate data analysis tools

Optional comments:--

B. One-on-one training on software and   ()        ()   ()
data analysis tools

Optional comments:--

C. Bioinformatics workshop training      ()        ()   ()

Optional comments:--

D. Web-based bioinformatics              ()        ()   ()
application user guides

Optional comments:--

                                            Extremely
                                        4   useful

A. Consulting service to help select    ()    ()
appropriate data analysis tools

Optional comments:--

B. One-on-one training on software and  ()    ()
data analysis tools

Optional comments:--

C. Bioinformatics workshop training     ()    ()

Optional comments:--

D. Web-based bioinformatics             ()    ()
application user guides

Optional comments:--


27. Would you be interested in a fee-based service where certain data analysis tasks could be delivered back to you for a fee?

* Yes

* No

If so, what type of data analysis would be most helpful for you?

28. Do you have data that you have not been able to analyze because you lack access to the appropriate software?

* Yes

* No

Optional comment:--

29. Do you have data that you have not been able to analyze because you feel you lack sufficient training to use the analysis software/tool that is required?

* Yes

* No

Optional comment:--

30. Do you currently have your own license to any bioinformatics software?

* Yes

* No

Please list them:--

31. Do you have any other general comments you wish to add about bioinformatics support at USC?--

Table 3
Norris Medical Library licensed commercial resources and their
primary applications

Software             Main applications                  License type

Partek Genomics      Statistical analysis and           Concurrent
  suite                visualization tool for
                       microarray and next generation
                       sequencing (NGS) data
Golden Helix SVS 7   Statistical analysis and           Concurrent
                       visualization tool for SNP/
                       CNV/GWAS data analysis
BIOBASE              Literature-based transcription     Site license
                       regulation annotation and
                       analysis tool for gene lists
Ingenuity Pathway    Literature-based pathway,          Concurrent
Analysis               network, and functional
                       analysis tool for gene lists
Nextbio              Web-based discovery platform       Site license
                       for complex biological,
                       clinical literature, and data
                       mining
Oncomine             Web-based database for             Named users *
                       examining gene activities in a
                       particular cancer or across
                       multiple cancer types
Genevestigator       Multi-organism pre-analyzed        Named users *
                       microarray database and gene
                       expression meta-analysis tool
Vector NTI Advance   Comprehensive software package     Concurrent
                       for DNA/protein sequence
                       analysis and manipulation

                           License
Software                     number            Needs category

Partek Genomics Suite       2                  High-throughput data
                                                 statistical analysis

Golden Helix SVS 7          1                  High-throughput data
                                                 statistical analysis

BIOBASE                     Unlimited          Functional analysis

Ingenuity Pathway          3                   Functional analysis
Analysis

Nextbio                    Unlimited           Functional analysis,
                                                 data search & mining

Oncomine                   12                  Data search & mining

Genevestigator             10                  Data search & mining

Vector NTI Advance         2                   Sequence manipulation

* Named users: the license seats are assigned to specific users at a
given time period. The license seats can be
reassigned on quarterly basis (Oncomine)
or by demand (Genevestigator).

Figure 2
Cumulative annual registered users of Norris Medical Library
licensed commercial bioinformatics resources

                     2008   2009   2010   2011   2012

Genevestigator                            17      19
Golden Helix SVS 7                        7       18
Oncomine                                  40      53
BIOBASE                                   154     226
Nextbio                                   230     278
Vector NTI Advance          33    67      79      88
Black                61     78    101     137     165
Slides               102    243   243     308     401

Note: Table made from bar graph.


ACKNOWLEDGMENTS

The authors thank Eileen Eandi for her very useful comments and manuscript editing.

REFERENCES

[1.] Fernandez-Suarez XM, Galperin MY. The 2013 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res. 2013 Nov; 41(D1): D1-7.

[2.] Brazas MD, Yim D, Yeung W, Ouellette BF. A decade of web server updates at the bioinformatics links directory: 2003-2012. Nucleic Acids Res. 2012 Jul; 40(W1): W3-W12.

[3.] Yarfitz S, Ketchell DS. A library-based bioinformatics services program. Bull Med Lib Assoc. 2000 Jan; 88(1): 36-48.

[4.] Chattopadhyay A, Tannery NH, Silverman DAL, Bergen P, Epstein BA. Design and implementation of a library-based information service in molecular biology and genetics at the University of Pittsburgh. J Med Lib Assoc. 2006 Jul; 94(3): 307-13, E192.

[5.] Minie M, Bowers S, Tarczy-Hornoch P, Roberts E, James RA, Rambo N, Fuller S. The University of Washington Health Sciences Library BioCommons: An evolving Northwest biomedical research information support infrastructure. J Med Lib Assoc. 2006 Jul; 94(3): 321-9.

[6.] Osterbur DL, Alpi K, Canevari C, Corley PM, Devare M, Gaedeke N, Jacobs DK, Kirlew P, Ohles JA, Vaughan KTL, Wang L, Wu Y, Geer RC. Vignettes: diverse library staff offering diverse bioinformatics services. J Med Lib Assoc. 2006 Jul; 94(3): 306, E188-91.

[7.] Geer RC, Rein DC. Introduction: building the role of medical libraries in bioinformatics. J Med Lib Assoc. 2006 Jul; 94(3): 284-5.

[8.] Norris Medical Library. Bioinformatics support for USC faculty, students, and researchers [Internet]. Los Angeles, CA: University of Southern California [cited 2 May 2013]. <http://www.usc.edu/bioinformatics>.

[9.] Morin RD, O'Connor MD, Griffith M, Kuchenbauer F, Delaney A, Prabhu AL, Zhao Y, McDonald H, Zeng T, Hirst M, Eaves CJ, Marra MA. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res. 2008 Apr; 18 (4): 610-21.

[10.] Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 Jul; 5(7): 621-8.

[11.] Park PJ. Epigenetics meets next-generation sequencing. Epigenetics. 2008 Nov; 3(6): 318-21.

[12.] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 Jan; 10(1): 57-63.

[13.] Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP. Computational solutions to large-scale data management and analysis. Nat Rev Genet. 2010 Sep; 11(9): 647-57.

[14.] NIH Library, National Institutes of Health. Bioinformatics support program [Internet]. Bethesda, MD: The Library [updated 26 Jun 2012; cited 12 Feb 2013]. <http://nihlibrary.nih.gov/services/bioinformatics/Pages/default.aspx> .

[15.] University of Delaware. Center for Bioinformatics & Computational Biology [Internet]. Newark, DE: The University [cited 12 Feb 2013]. <http: //bioinformatics.udel.edu/home/>.

[16.] Camerlengo T, Ozer HG, Onti-Srinivasan R, Yan P, Huang T, Parvin J, Huang K. From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data. AMIA Summits Transl Sci Proc. 2012 Mar; 2012: 1-10.

[17.] NYU Center for Health Informatics and Bioinformatics. Services [Internet]. New York, NY: New York University Langone Medical Center [cited 12 Feb 2013]. <http: //www.nyuinformatics.org/services/>.

[18.] Stowers Insititute. Computational biology [Internet]. Kansas City, MO: The Insititute [cited 12 Feb 2013]. <http: //research.stowers.org/compbio/>.

[19.] Wall DP, Kudtarkar P, Fusaro VA, Pivovarov R, Patil P, Tonellato PJ. Cloud computing for comparative genomics. BMC Bioinformatics. 2010 May; 11: 259.

[20.] Bernard Becker Medical Library. The Research Pod [Internet]. St. Louis, MO: Washington University School of Medicine in St. Louis [cited 12 Feb 2013]. <https://becker.wustl.edu/services/research-pod/>.

[21.] Information Technology Services, University of Southern California. HPCC: Center for High-Performance Computing and Communications [Internet]. Los Angeles, CA: The University [cited 2 May 2013]. <http://hpcc.usc.edu>.

[22.] Norris Medical Library. Bioinformatics computing @ NML Bioinformatics Computation and Consulting Center [Internet]. Los Angeles, CA: University of Southern California [cited 2 May 2013]. <http: //www.norris.usc.libguides.com/nml-bioinfo/>.

[23.] Norris Medical Library. HPCC-NML bioinformatics computing resource [Internet]. Los Angeles, CA: University of Southern California [cited 2 May 2013]. <http://norris.usc.libguides.com/HPCC-NML>.

[24.] Anderson NR, Lee ES, Brockenbrough JS, Minie ME, Fuller S, Brinkley J, Tarczy-Hornoch P. Issues in biomedical research data management and analysis: needs and barriers. J Am Med Inform Assoc. 2007 Jul-Aug; 14(4): 478-88.

[25.] Geer RC. Broad issues to consider for library involvement in bioinformatics. J Med Lib Assoc. 2006 Jul; 94 (3): 286-98, E152-5.

Received February 2013; accepted June 2013

Meng Li, MS; Yi-Bu Chen, PhD; William A. Clintworth, MLibr

AUTHOR'S AFFILIATIONS

Meng Li, MS, mengli2@usc.edu, Bioinformatics Specialist; Yi-Bu Chen, PhD, yibuchen@usc.edu, Bioinformatics Service Program Coordinator; William A. Clintworth, MLibr, wclintwo@usc.edu, Associate Dean, Health Sciences Libraries, and Director; Norris Medical Library, University of Southern California, 2003 Zonal Avenue, Los Angeles, CA 90089-9130

Table 1
Assessment of current Norris Medical Library (NML)
Bioinformatics Service Program offerings (n=254)

Assessment of NML's                 Not       Somewhat      Useful
bioinformatics service            useful       useful

Consulting service to help
select appropriate data          3   (1%)     7   (3%)     64  (27%)
analysis tools

One-on-one training on
software and data analysis       5   (2%)     6   (3%)     60  (26%)
tools

Bioinformatics workshop          4   (2%)     7   (3%)     47  (21%)
training

Web-based bioinformatics         4   (2%)    10   (4%)     45  (20%)
application user guides

Assessment of NML's                Very       Extremely
bioinformatics service            useful        useful

Consulting service to help
select appropriate data          52  (22%)   107   (46%)
analysis tools

One-on-one training on
software and data analysis       64  (27%)    96   (42%)
tools

Bioinformatics workshop          56  (25%)   113   (50%)
training

Web-based bioinformatics         72  (32%)    95   (42%)
application user guides

Table 2
Assessment of current NML bioinformatics software (n=254)

Assessment of existing
bioinformatics software       Not useful for    Potentially useful
  licensed by USC              my research       for my research

Ingenuity Pathway Analysis   11      (5%)       44      (18%)
Partek Genomics Suite        21      (9%)       61      (25%)
Vector NTI Advance           19      (8%)       54      (23%)

Assessment of existing
bioinformatics software      Extremely useful   Have not used but
  licensed by USC            for my research     would be useful

Ingenuity Pathway Analysis   124    (51%)       64     (26%)
Partek Genomics Suite         89    (37%)       71     (29%)
Vector NTI Advance            97    (41%)       69     (29%)

Table 3
Assessment of difficulty accessing NML bioinformatics software (n=254)

Do you have difficulty
accessing these software
 when needed?                         No           Yes, occasionally

Ingenuity Pathway Analysis        80   (34%)        75       (32%)
Partek Genomics Suite             88   (38%)        51       (22%)
Vector NTI Advance                108  (48%)        41       (18%)

Do you have difficulty
accessing these software
 when needed?                   Yes, frequently     Not applicable

Ingenuity Pathway Analysis       23      (10%)       32      (14%)
Partek Genomics Suite            18       (8%)       43      (19%)
Vector NTI Advance               12       (5%)       30      (13%)

Do you have difficulty
accessing these software
 when needed?                        Other

Ingenuity Pathway Analysis       25     (11%)
Partek Genomics Suite            30     (13%)
Vector NTI Advance               33     (15%)

Table 4
NML Bioinformatics Service Program budget summary

                                    Amount
Annual recurring costs
Personnel                           $221,292
Bioinformatics specialist I
Bioinformatics specialist II
Bioinformatics licensed software    $125,115
8 analysis software suites
One-time costs
Hardware for NGS analysis           $23,864
2 computer workstations and
nodes on High-Performance
Computing and
Communications (HPCC)

                                    Source of funds &
                                      percent of funding
Annual recurring costs
Personnel                           Health Sciences Libraries    55%
Bioinformatics specialist I         University Libraries         45%
Bioinformatics specialist II
Bioinformatics licensed software    Health Sciences Libraries    22%
8 analysis software suites          University of Southern       78%
                                    California (USC) Office of
                                    Research

One-time costs
Hardware for NGS analysis           Health Sciences Libraries    2%
2 computer workstations and         USC Office of Research       48%
nodes on High-Performance
Computing and
Communications (HPCC)

                                    Ongoing commitment?
Annual recurring costs
Personnel                           Yes
Bioinformatics specialist I         Yes
Bioinformatics specialist II
Bioinformatics licensed software    Yes
8 analysis software suites          Past 3 years and
                                      likely Yes for future
One-time costs
Hardware for NGS analysis           Not applicable
2 computer workstations and         Not applicable
nodes on High-Performance
Computing and
Communications (HPCC)

Figure 3
Norris Medical Library bioinformatics workshop attendance

Year         Invited and sponsored     Internally developed
                 workshops                workshops

2007               17                       120
2008               216                      95
2009               25                       106
2010               148                      246
2011 *             936                      198
2012               409                      98

* The dramatic increase in the number of on-site training in 2011
is due to the licensing of 5 additional commercial tools.

Note: Table made from bar graph.
COPYRIGHT 2013 Medical Library Association
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2013 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Li, Meng; Chen, Yi-Bu; Clintworth, William A.
Publication:Journal of the Medical Library Association
Article Type:Report
Geographic Code:1U9CA
Date:Oct 1, 2013
Words:7450
Previous Article:The librarian as research informationist: a case study.
Next Article:Development of the research lifecycle model for library services.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters