Facilitating sponsor requests: a relational database model for project and effort reporting.
Department research administrators (DA) coordinate communication between principal investigators requesting extramural funding and the funding agencies that provide it. When a sponsor decides to fund an investigator-initiated application, the principal investigator is asked to address issues such as (a) current and pending support for the investigators, (b) the institution's review and approval processes for sponsored research, and (c) the ability of the researchers identified in the application to complete the research program, given their other commitments. Sponsor requests are typically forwarded to the DA, who has the responsibility for compiling the information and obtaining institutional endorsement before submitting it to the sponsor.
Sponsor requests can overwhelm even experienced DAs. To facilitate sponsor requests, data must often be compiled from multi-project research programs and individual research projects that are administered in outside departments. Institutional information systems often do not serve the needs of department-level administrators, and many can neither anticipate renewals or extensions of active projects, nor record changes in effort commitments by investigators. Due to the shared responsibility between institutions and departments for ensuring compliance, DAs may not have all of the information to address sponsor concerns. Landen and McCallister (2002) reinforce the DA's share in research compliance, thus reinforcing the need for a department-based database system. "Compliance, both financial and non-financial, is the growth industry in research administration, with the responsibility for compliance being integrated into the jobs of central and departmental administrators." (p. 18)
Baker and Wohlpart (1998) summarized information from a survey of college and university chief research officers conducted in July 1996 and concluded that "Overall, the most popular grant management systems are manual records and files and modified computer database systems, with over half of the responding institutions indicating these types of systems." (p. 38) They further concluded that of 30 survey respondents from public and private R1 research institutions (those that offer the full range of baccalaureate programs, award 50 or more doctoral degrees annually, and receive $40 million or more in federal support), "the most popular grant management system used by R1 institutions is a homegrown computer database, with 60 percent of the 30 respondents." As these homegrown systems have likely been designed for their respective central research offices, one could infer that the institutional reports provided to the DA would be inadequate for addressing sponsor requests.
The DA is positioned to obtain other support for current and pending projects as well as changes in effort commitments. When an individual project is conducted as part of a larger multi-project program, DAs are familiar with their supported investigator's award portion. While electronic spreadsheets and word processors, together with institutionally available data, can provide tools for DAs, relational database tools that maintain project-related information about investigators would be more useful. This paper describes the characteristics of a relational database, explains the benefits of a relational database model, and provides a blueprint for designing a relational database model that can be employed to enhance data retention and reporting capability.
Characteristics of a Relational Database
Relational databases consist of at least two separate, but related sets of data. These separate sets of data are typically referred to as Tables. Frye explained, "The most basic object in a database is the table, where you store your data. You might be tempted to jam every type of data you want to store in a single table, but that's hardly ever the right way to design tables in a database." (2001, p.32) Tables are containers for storing similar types of data, and consist of rows and columns. Tables are defined by the fields or data types they contain. Fields store individual cells of data at the intersection of each row and column inside a database table. Each field of data should be assessed for its appropriateness to the table prior to establishing relationships between key fields in the individual tables. A database record includes all data fields along the same horizontal row within the database table.
As Frye referred to above, understanding relational databases requires overcoming paradigmatic processes of storing all data components in the same place. The relational database concept does not suggest that it is more practical to retrieve similar data from more than one place, but that tables of data can be simultaneously accessible when each shares a related field of data. Although this paper focuses on the creation of a sponsored project and investigator effort reporting database, changing from any spreadsheet or word processing data retention mechanism to a relational database model requires an understanding of how data collections are related to one another.
Relationship Determination: Program or Project?
The first rational step toward creating relationships between sponsored project data is the characterization of research programs as separate entities from individual research projects. These entities should be stored in separate database tables and connected or joined by related fields of data. The field of data on which relationships between database tables are based is commonly referred to as a table's key. Each database table should be created to store information about only one type of object, and should possess at least one key field of data from which relationships to other tables can be based.
An individual research project (RP) may be conducted over several years. Since each year of a RP would have a different award period, it is logical to enter an individual record for each year of the RP into the same database table. A separate database table would store a record to represent the RP as a whole. These two tables would share a related key field of data such as the principal investigator's name or the grant number.
Multi-project programs (MP) provide conclusive justification for using a relational database model. MP's generally consist of two or more RPs. Like a RP, each project component of a MP may be conducted over several years. Individual records for each year of every component project of a MP should be stored in the same database table with individual records from RPs. The term [Projects] refers to this database table in the text and figures that follow. A separate record representing the MP as a whole would be stored in the same database table with the record representing the RP as a whole. The term [Programs] is used in the following text and figures to refer to this database table.
To establish the relationship between [Programs] and [Projects], a key field of related data should be identified in both. The sponsor grant number is an excellent choice for the key field, as it could be shared by each RP record within a MP record. This will also allow the database user to simultaneously enter and retrieve some or all of a program's records based on an identical field of data. Database construction should consider that a new grant application in its pre-award stage may not possess a sponsor grant number. This obstacle is easily overcome by inserting temporary assignment numbers into the [Programs] and its related [Projects] records. These temporary numbers can be replaced with the actual grant assignment number when it becomes available.
Grant Applications and Awards: Data Collection and Retention
Information pertaining to sponsored programs or projects would be maintained when there is current or planned participation of an investigator within a DA's scope of responsibility. Decisions about other types of data that will be collected and retained in the database should be made prior to constructing a database. These decisions will determine the amount of time and effort that will be required to maintain the database. Databases should be designed to simplify the process of entering and retrieving data, allowing the DA to quickly populate the underlying database tables, while cataloging database records by investigator, application type, or potential funding category. DAs should assess the award data already maintained at the institution and consider the capability of the database design to retain data for awards made from outside their institution when the information may be fragmentary. Since an award for a program or project could meet several conditions, the database design should be able to anticipate all of them. This will prevent database design modifications otherwise necessary should an unanticipated condition arise.
The following list provides several conditions applicable to a particular source of funds:
1. Individual Research Grant--awarded to administrator's institution and department
2. Individual Research Grant--awarded to administrator's institution but outside department
3. Individual Research Grant--subcontracted to administrator's department from outside institution
4. Multi-Project Program--awarded to administrator's institution and department
5. Multi-Project Program--awarded to administrator's institution but outside department
6. Individual Component of Multi-Project Grant--awarded to administrator's institution and department.
7. Individual Component of Multi-Project Grant--awarded to administrator's institution but outside department.
From the list above, DAs would have immediate access to detailed information about items 1, 3, 4, and 6. Detailed award information pertaining to items 2, 5, and 7 are likely not as accessible since the applications were not initiated in the administrator's department. Although the occurrence of certain award categories within a DA's area of responsibility might be infrequent, decisions to exclude any relative information will restrict the capability of the database to produce accurate reports.
Thus far, this paper has described a DA's need for a database, provided characteristics of a relational database, explained segregation of data and creation of relationships between tables of data within a database, and has provided some tips on assessing award types to avoid common pitfalls encountered in database design and data maintenance. The remainder of this paper will be devoted to describing a relational database model for sponsored projects and effort reporting. The model described is useful for the management of sponsored project application and award data from most any federal or nonfederal agencies. The fields within certain database tables in this model are structured to mirror award assignment data from the Information for Management, Planning, Analysis, and Coordination (IMPAC) and Computer Retrieval of Information on Scientific Projects (CRISP) systems to identify various aspects of extramural activities of the Department of Health and Human Services (DHHS).
Database Modeling for Sponsor Reporting
As is the case with most sponsors, institutes at the National Institutes of Health (NIH) commonly request information about the financial support and effort commitments of investigators. The format requested for this correspondence is the NIH-formatted "Other Support" page. Figure 1 demonstrates a responsive NIH-formatted "Other Support" page for Dr. John Doe. Dr. Doe serves as Principal Investigator on two individual research projects. He also participates as a Co-Investigator on a multi-project P01 grant from the National Heart, Lung and Blood Institute.
[FIGURE 1 OMITTED]
Although the NIH formatted "Other Support" page (Figure 1) could be produced with an electronic spreadsheet or word processor, it is unlikely that an institution would warehouse the data in the displayed format. Since Dr. Doe is not the Principal Investigator of the multi-project grant (P01 HL003456), information about this MP is more likely to be stored by the institution under the name of Dr. J. Moss (the Program Director of P01 HL003456). If an institution chose to store program and project related data in the Figure 1 format, redundant storage of information regarding the multi-project program (P01 HL003456) would be necessary for all participating investigators. According to Jennings, "Because relational databases eliminate most duplicate information, they minimize data storage and application memory requirements." (1997, p. 118)
[FIGURE 1 OMITTED]
As Figure 1 suggests, a relationship exists between Dr. Doe and the sponsored program (P01 HL003456), although he is not the Program Director of the MP. This relationship is modeled in relational database format in Figure 2.
[FIGURE 2 OMITTED]
Figure 2 demonstrates two tables within a relational database. The [Programs] table holds data fields pertaining to the MPs and the [Projects] table holds data fields of data pertaining to the RPs. The table relationship is demonstrated by a connecting line between the (ProgID) field in [Programs] and [Projects]. The tables share the same unique identifier, namely the Program Identifier (ProgID) field. The "I" seen just above the connecting line closest to [Programs] indicates that only one record containing the same (ProgID) can exist in the [Programs] table. The "[infinity]" seen just above the connecting line closest to [Projects] indicates that many records bearing the same (ProgID) can exist in the [Projects] table. In database terminology, this relationship is referred to a one-to-many relationship.
The one-to-many relationship between [Programs] and [Projects] eliminates the need for redundant entries of program-specific information. A single database form can be used to enter data into both tables simultaneously. Likewise, a single database query can simultaneously retrieve data from both database tables to produce a database report. Although detailed procedures for the creation of database forms, queries, and reports are beyond the scope of this paper, production of a report similar to Figure 1 will require a working knowledge of each. The primary objective is to understand that relationships between database tables are what provide functionality to the database and the database user.
Relationships among four tables in a sponsored project database are graphically demonstrated in Figure 3. Two data tables, [Investigator] and [Effort], have been added to the database relationships originally demonstrated in Figure 2. [Investigator] stores the name (PIName) and unique number (PINumber) for investigators associated with current or pending applications and awards. A one-to-many relationship is established between the (PINumber) field of [Investigator] and the (PINumber) fields of both [Programs] and [Projects]. The one-to-many relationship allows a single entry of the investigator name (PIName) in [Investigator]. This model provides functionality for the database to append a unique number (PINumber) representing each participating investigator, into [Programs] and [Projects]. Similarly, a relationship has been created between the (ProjID) field of both [Projects] and [Effort]. This relationship enables multiple effort records to be stored in [Effort] and linked to their related record in [Projects]. This linking is performed via the project's unique project identifier (ProjID), independently of the participant's role on the program or project.
[FIGURE 3 OMITTED]
These relationships are modeled after the real-world fact that an Investigator could participate as (a) the Program Director of an MP, (b) the Principal Investigator or Co-Investigator of an RP, or (c) the Principal Investigator, Core Leader, or Co-Investigator of an RP being conducted within one or more years of an MP. The underlying datasets for [Programs] and [Projects] are demonstrated respectively in Figure 4 and Figure 5.
[FIGURES 4-5 OMITTED]
As demonstrated in Figures 4 and 5, the names of the Principal Investigator or Program Director are not found in the dataset of [Programs] or [Projects]; however, this information is available via the referenced (PINumber) field. Because relationships exist between the (PINumber) of [Investigator], [Programs], and [Projects], information about any MP or RP could be entered or retrieved simultaneously with information about participants, participant effort, and role of the participant on the RP. The underlying dataset of [Investigator] is demonstrated in Figure 6, and the underlying dataset of [Effort] is demonstrated in Figure 7.
[FIGURES 6-7 OMITTED]
As demonstrated in Figure 6, the investigator's name is stored only once in [Investigators]. The corresponding (PINumber) is assigned automatically as the investigator's name is added to the [Investigator] table. The one-to-many relationship between [Investigator] and [Projects] requires the investigator's record to exist in [Investigator] before the investigator's (PINumber) can be added to [Projects].
Figure 7 displays the records contained in [Effort]. The one-to-many relationship between the (ProjID) in [Projects] and [Effort] requires the project's record to exist in [Projects] before a corresponding [Effort] record can be added. The relationship allows multiple investigator effort records to be stored in [Effort]. When multiple investigator effort records are added to [Effort] for the same [Project], the same project identifier (ProjID) is appended to each related effort record.
Although this paper does not provide detailed instruction on writing database queries, it should be understood that database queries are the engines for simultaneous retrieval of data from multiple database tables. For the purposes of this exercise, we define the query as a simultaneous request for multiple fields of data from each of four database tables--[Programs], [Projects], [Investigator], and [Effort].
Table 1 provides the database tables, fields, and specific conditions required in a query to produce the NIH formatted "Other Support" page originally demonstrated in Figure 1.
Extrapolated Benefit vs. Time Commitment
Although this database model was designed for its ability to produce the NIH formatted Other Support, the relationships defined in it can provide substantial additional benefits to the DA. The model demonstrated in this paper can be quickly deployed and provides substantial capability for the minimal effort required to maintain it. With only slight revisions to the query outlined in Table 1, reports could be generated to forecast future years of support for an investigator. The (StartDt) and (EndDt) fields found in [Projects] could be added to a query. By specifying date range criteria consistent with an NIH award year, DAs can forecast award levels for all NIH funding they administer. These same fields could be used in queries to forecast progress report due dates, or upcoming project expirations. Querying the database by the (Status) field can provide detailed listings for active, pending, or future committed awards for all investigators within a DA's area of responsibility. Specifying criteria in the (investigator) field can provide similar reports for a single investigator. Adding fields to the [Programs] or [Projects] tables to store institutional review board items such as animal care or human subject approval dates and protocol numbers would enable DAs to track protocol expirations for active projects. The possibilities are limited only by the amount of effort a DA wishes to commit to the database.
Relational databases are more effective than electronic spreadsheets or word processors for maintaining and reporting sponsored project and investigator effort information. DAs possess unique familiarity with their investigator's effort and award information. The process by which DAs must merge their knowledge with institutionally provided data is cumbersome. The DA equipped with a relational database similar to the one described in this paper is better equipped to fulfill sponsor requests. Automation of the NIH formatted "Other Support" page is only a single example of what can be accomplished with relational databases. It is my hope that this paper will encourage other administrators to implement relational database solutions that can expedite sponsor deliverables and automate other routine data warehousing needs they may have.
Table 1--Queried Tables, Fields, and Conditions Table Field Condition [Effort] (ParticipantName) Where = "Doe, John J." [Effort] (EffortPercent) None [Effort] (ProjectRole) None [Programs] (ProgAgency) None [Projects] (AppType) None [Programs] (ProgActCode) None [Programs] (ProgAdmOrg) None [Programs] (ProgSerialNumber) None [Projects] (ProjYr) None [Programs] (ProgStart) None [Programs] (ProgEnd) None [Investigator] (PINumber) Query without Display [Investigator] (PIName) Where [Investigator]![PINumber] = [Programs]![PINumber] [Projects] (Status) Where = "Active" or "Pending" [Projects] (DirCost) None [Projects] (MajorGoals) None
Landen, M. & McCallister, M. (2002). Evolution or Revolution? Examining Change in Research Administration. The Journal of Research Administration, (33)3, 17-20.
Baker, J.G. & Wohlpart, A. (1998). Research Administration in Colleges and Universities: Characteristics and Resources. Research Management Review, (10) 1, 33-55.
Frye, C. (2001). Microsoft[R] Access Version 2002 Plain & Simple. Redmond, WA: Microsoft Press.
National Institutes of Health, (2003). Activity Codes, Organization Codes, and Definitions Used in Extramural Programs; IMPAC (Information for Management, Planning, Analysis, and Coordination) A Computer-Based Information System of the Extramural Programs of NIH/DHHS.
CRISP (Computer Retrieval of Information on Scientific Projects) Version 184.108.40.206; A Biomedical database system containing information on research projects and programs supported by the Department of Health and Human Services. <http://crisp.cit.nih.gov/>
Jennings, R. (1997). Que[R] Using Access 97. Indianapolis, IN: Macmillan Computer Publishing USA.
Mark A. Hughes
Vanderbilt University Medical Center
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Case Study|
|Author:||Hughes, Mark A.|
|Publication:||Journal of Research Administration|
|Date:||Jul 1, 2004|
|Previous Article:||Conflict and convergence: the ethics review of action research.|
|Next Article:||Ask an expert: tips and tools of the trade.|