Printer Friendly

Does foreign aid work? Efforts to evaluate U.S. foreign assistance.

November 19, 2012


Congress's recent focus on reducing federal spending raises questions about the relative efficiency and effectiveness of all federal programs. In this context, evaluation of foreign assistance programs is of growing interest to many Members of Congress as they scrutinize the Administration's international affairs budget request and debate foreign aid spending priorities. Policymakers, taxpayers, and aid recipients alike want to know what impact, if any, foreign aid dollars are having, and whether foreign aid programs are achieving their intended objectives.

In most cases, the success or failure of U.S. foreign aid programs is not entirely clear, in part because historically, most aid programs have not been evaluated for the purpose of determining their actual impact. The purpose and methodologies of foreign aid evaluation have varied over the decades, responding to political and fiscal circumstances. Aid evaluation practices and policies have variously focused on meeting program management needs, building institutional learning, accounting for resources, informing policymakers, and building local oversight and project design capacity. Challenges to meaningful aid evaluation have varied as well, but several are recurring. Persistent challenges to effective evaluation include unclear aid objectives, funding and personnel constraints, emphasis on accountability for funds, methodological challenges, compressed timelines, country ownership and donor coordination commitments, security, and agency and personnel incentives. As a result of these challenges, aid agencies do not undertake rigorous evaluation for all foreign aid activities.

The U.S. government agencies managing foreign assistance each have their own distinct evaluation policies; these policies have come into closer alignment in the last two years than in the past. The Obama Administration's Quadrennial Diplomacy and Development Review (QDDR) resulted in, among other things, a stated commitment to plan foreign aid budgets "based not on dollars spent, but on outcomes achieved." This focus on evaluating the impact of foreign assistance reflects an international trend. USAID put this idea into practice by introducing a new evaluation policy in January 2011. The State Department, which began to manage a growing portion of foreign assistance over the past decade, followed suit with a similar policy in February 2012. The Millennium Challenge Corporation, notable for its demanding but little-tested approach to evaluation, also recently revised its policy. While differing in several respects, including their support for impact evaluation, the policies reflect a common emphasis on evaluation planning as a part of initial program design, transparency and accessibility of evaluation findings, and the application of data to inform future project design and allocation decisions. Aspects of the three evaluation policies are compared in Appendix A.

Though recent evaluation reform efforts have been agency-driven, Congress has considerable influence over their impact. Legislators may mandate a particular approach to evaluation directly through legislation (e.g., H.R. 3159, S. 3310), or can support or undermine Administration policies by controlling the appropriations necessary to implement the policies. Furthermore, Congress will largely determine how, or if, any actionable information resulting from the new approach to evaluations will influence the nation's foreign assistance policy priorities.

Does Aid Work? A Brief Summary
Impact and Performance Evaluations
History of U.S. Foreign Assistance Evaluation
Evaluation Challenges
Applying Evaluation Findings to Policy
Current Agency Evaluation Policies
Issues for Congress

Appendix A. Select Aspects of Current USAID, State Department,
and MCC Evaluation Policies

Author Contact Information


Congress's strong focus on reducing federal spending raises questions about the relative efficiency and effectiveness of all federal programs, and foreign assistance is a subject often raised in broad budget debates. Foreign assistance evaluation is one aspect of a government-wide effort to link program effectiveness to budgeting decisions. It is also an element of broader foreign aid reforms implemented in recent years. The 2010 Quadrennial Diplomacy and Development Review (QDDR), the basis of many recent aid policy initiatives, called for the State Department and the U.S. Agency for International Development (USAID) to plan foreign aid budgets and programs "based not on dollars spent, but on outcomes achieved," and for USAID to become "the world leader in monitoring and evaluation." (1) Rigorous evaluation is also a cornerstone of the Millennium Challenge Corporation (MCC), established in 2004 to promote a new model of development assistance. (2) According to USAID Administrator Rajiv Shah, global development policies and practices are experiencing a "transformation based on absolute demand for results." (3) That demand comes, in part, from some Members of Congress as they scrutinize the Administration's international affairs budget request and consider foreign aid spending priorities. (4) It also comes from aid beneficiaries and American taxpayers who want to know what impact, if any, foreign aid dollars are having and whether foreign aid programs are achieving their intended objectives.

The current emphasis on evaluation is not new. The importance, purpose and methodologies of foreign aid evaluation have varied over the decades since USAID was established in 1961, responding to political and fiscal circumstances, as well as evolving development theories. There are a number of reasons that this issue has gained prominence in recent years. For one, foreign aid funding levels have increased over the past decade while evaluations have decreased, raising questions about the knowledge basis for aid policy. (5) Analysts have noted that after decades of aid agencies spending billions of dollars on assistance programs, very little is known about the impact of these programs. (6) Some wonder how policymakers can develop effective foreign aid strategies without a clear understanding of how and why prior assistance has succeeded or failed.

This report focuses primarily on U.S. bilateral assistance, and less on the work of multilateral aid entities, such as the World Bank, to which the United States contributes. While a wide range of federal agencies provide foreign assistance in some form, (7) this report focuses on the three agencies that have primary policy authority and implementation responsibility for U.S. foreign assistance--USAID, the State Department, and the Millennium Challenge Corporation (MCC). It discusses past efforts to improve aid evaluation, as well as ongoing issues that make evaluation challenging in the foreign assistance context. The report also provides an overview of the current evaluation policies of the primary implementing agencies, and discusses related issues for Congress, including recent legislation.

Program Evaluation Government-Wide

Program evaluation is an important issue throughout the U.S. government, and foreign assistance evaluation is just one part of a broader effort by the federal government to improve accountability and program performance through stronger evaluation processes. With the Government Performance and Results Act (GPRA) of 1993, Congress established unprecedented statutory requirements regarding the establishment of goals, performance measurement indicators, and submission of related plans and reports to Congress for its potential use in policy development and program oversight. The GPRA Modernization Act of 2010 updated the original law, requiring more frequent plan updates and on-line posting of data. (8) The agency-specific evaluation plans discussed in this report are intended to comply with and build upon this government-wide effort. Most recently, in a May 18, 2012, memorandum, the Office of Management and Budget (OMB) directed all federal agencies to demonstrate the use of evidence from rigorous evaluation throughout their FY2014 budget submissions. (9) While OMB has emphasized use of evidence in prior years, this memorandum appears to take the issue to a more formal level, and suggests that evaluation data may be closely linked to budget approval in future fiscal years.

Does Aid Work? A Brief Summary

To know whether aid is successful, one must understand its purpose. The Foreign Assistance Act (FAA) of 1961 (P.L.87-195), as amended, is the authorizing legislation for most modern foreign aid programs. The FAA declared that
   the principal objective of the foreign policy of the United States
   is the encouragement and sustained support of the people of
   developing countries in their efforts to acquire the knowledge and
   resources essential to development, and to build the economic,
   political, and social institutions that will improve the quality
   of their lives. (10)

The original legislation lists five principal goals for foreign aid: (1) the alleviation of the worst physical manifestations of poverty among the world's poor majority; (2) the promotion of conditions enabling developing countries to achieve self-sustaining economic growth and equitable distribution of benefits; (3) the encouragement of development processes in which individual civil and economic rights are respected and enhanced; (4) the integration of the developing countries into an open and equitable international economic system; and (5) the promotion of good governance through combating corruption and improving transparency and accountability. (11) Amending legislation over the years added dozens of new, though often overlapping, aid objectives. For example, "the suppression of the illicit manufacturing of and trafficking in narcotic and psychotropic drugs" was added in 1971, (12) "to alleviate human suffering caused by natural and manmade disasters" was added in 1975, (13) and "to enhance the antiterrorism skills of friendly countries by providing training and equipment" and "to strengthen the bilateral ties of the United States with friendly governments by offering concrete [antiterrorism] assistance" (14) were added in 1983. In short, U.S. foreign aid is intended to be a tool for fighting poverty, enhancing bilateral relationships, and/or protecting U.S. security and commercial interests.

In this broad view, some instances of specific development assistance projects and programs are widely viewed as successful. The largest aid program of the last century, the Marshall Plan (1948-1952), for example, is acclaimed as a key factor in the post-World War II reconstruction of European states that have gone on to become major strategic and trade partners of the United States. In the late 1960s and 1970s, aid associated with the "green revolution" was credited with greatly improving agricultural productivity and addressing hunger and malnutrition in parts of Asia, and global health programs were credited with virtually eradicating smallpox. Korea, Taiwan, and Botswana are often cited as aid success stories as a result of remarkable economic progress following significant aid infusions. More recently, unquestionable progress in battling public health crises, such as HIV/AIDS, across the globe can be largely attributed to massive foreign assistance programs, both bilateral and multilateral. Even in these instances, however, close analysis often reveals many caveats.

In other specific instances foreign aid programs and projects have been considered to be conspicuously unsuccessful, or even harmful to intended beneficiaries. Critics of foreign assistance cite decades of aid to corrupt governments in Africa, which enriched corrupt leaders and did little to improve the lives of the poor. (15) In Latin America, U.S. aid to anti-communist rebels and regimes during the Cold War was associated with brutal violence and believed by many to have damaged U.S. credibility as a champion of democracy. Numerous examples exist of hospitals, schools, and other facilities that were built with donor funds and left to rot, unused in developing countries that did not have the resources or will to maintain them. In some instances, critics assert that foreign aid may do more harm than good, by reducing government accountability, fueling corruption, damaging export competitiveness, creating dependence, and undermining incentives for adequate taxation. (16)

The most notable successes and conspicuous failures of foreign aid give fodder to both aid advocates and detractors, but in all likelihood represent just a small segment of assistance activities. In most cases, clear evidence of the success or failure of U.S. assistance programs is lacking, both at the program level and in aggregate. One reason for this is that aid provided for development objectives is often conflated with aid provided for political and security purposes. Another reason is that historically, most foreign assistance programs are never evaluated for the purpose of determining their impact, either at the time or retrospectively. Furthermore, evaluation practices are not consistent enough to allow for the use of project level data as the basis for broader, strategic evaluations. According to one 2009 review of monitoring and evaluation across U.S. foreign assistance implementing agencies, evaluation of foreign assistance programs "is uneven across agencies, rarely assesses impact, lacks sufficient rigor, and does not produce the necessary analysis to inform strategic decision making." (17)

Impact and Performance Evaluations

The Department of State, USAID, and other U.S. agencies implementing foreign assistance programs have long evaluated the performance of their own personnel and contractors in meeting discrete objectives. Depending on the nature of the project or program, staff and contractors might monitor the miles of road built, number of police officers trained, or changes in the use of fertilizers by farmers. These results can be compared to the initial program goals and expectations to determine whether the project or contract has been performed successfully. This type of oversight is called performance monitoring, and if the resulting data are analyzed in an effort to explain how and why a program meets or fails to meet strategic objectives, this is called performance evaluation. Performance monitoring and evaluation are widely viewed as essential aspects of oversight, and performance evaluations represent the vast majority of foreign aid evaluation to date. Financial audits by agency Inspectors General, which examine whether funds are being used as intended, are also a common form of evaluation, particularly at the State Department.

Performance evaluation and financial audits play an important part in project management but do little to answer questions about foreign aid effectiveness. Addressing this question, some argue, requires impact evaluations. Impact evaluations can take many forms, but their common element is that they use a defined counterfactual, or control group, and baseline data to measure change that can be attributed to an aid intervention. (18) Impact evaluations look not at the output of an activity, but rather at its impact on a development objective. For example, while a performance evaluation of an education program may look at the number of textbooks provided and teachers trained, an impact evaluation may determine how or if literacy or math skills had improved for the target group as compared to a similar group that did not receive the textbooks or teacher training. A performance evaluation of an HIV prevention project may report the number of public awareness events held or condoms distributed, while an impact evaluation of the same program would monitor changes in the HIV/AIDS infection rate of the targeted population. An impact evaluation of a police training program would look at the program's impact on civil order and public safety rather than simply report how many officers were trained or the value of equipment supplied. Randomized controlled trials, in which beneficiaries are randomly selected from a prequalified group and compared before and after the program to those not selected, are widely viewed as best practice for impact evaluation, but less rigorous methods are used as well.

Impact evaluations can be key to determining whether a foreign assistance program "works." However, impact evaluations are generally far more complex and resource-intensive than performance evaluations. Agencies implementing foreign assistance must balance the potential knowledge to be gained from impact evaluation with the additional resources necessary to carry out such evaluations. As a result, while the potential learning benefits of impact evaluation have long been recognized by aid officials, the use of rigorous impact evaluation has been, and continues to be, very limited. More typically, agencies aim for evaluation practices that are, as one expert has put it, "cost-effectively rigorous," and, at minimum, "independent, transparent, and consistent, thus persuasive." (19)

History of U.S. Foreign Assistance Evaluation

The practice of foreign assistance evaluation has changed over time to reflect evolving, or some might say cyclical, attitudes about the purpose and relative importance of evaluation. (20) This is evident both in the United States and internationally. Aid evaluation practices and policies have variously focused on different evaluation objectives, including meeting program management needs, institutional learning, accountability for resources, informing policymakers, and building local oversight and project design capacity.

The history of U.S. foreign assistance evaluation begins with USAID, which implemented the vast majority of U.S. foreign assistance prior to the last decade. In its early years, USAID was primarily involved in large capital and infrastructure projects, for which evaluations focused on financial and economic rates of return were appropriate. However, the agency soon shifted focus towards smaller and more diverse projects to address basic human needs, and found that the rate of return evaluation model was no longer sufficient. (21) The agency established its first Office of Evaluation in 1968, and used a Logical Framework (LogFrame) model as its primary system for monitoring and evaluation. The LogFrame approach, subsequently adopted by many international development agencies, employed a matrix to identify project goals, purposes, results, and activities, with corresponding indicators, verification methods, and important assumptions. Baseline data were to be used for each indicator, and results were reported at quarterly points during the life of a project. However, these data were not analyzed to look for competing explanations of the results or unintended consequences of activities.

While the LogFrame approach established USAID as a thought leader with respect to evaluation policy, in practice, evaluations varied significantly from project to project. A 1970 evaluation handbook included a diagram of the "ideal" program evaluation design, which resembles a randomized controlled trial, but notes that "there are a great many reasons why it may not be possible to reach the ideal." (22) Reviews of foreign assistance evaluation over decades revealed shortcomings. For one, the system had become decentralized over time, suitable to meet the information needs of project managers in the field but not contribute to broader learning or policy making. A 1982 report by the General Accounting Office (now the Government Accountability Office, GAO) found that "AID staff does not apply lessons learned in the development of new projects," and that "lessons learned are neither systematically nor comprehensively identified or recorded by those who are directly involved." (23) In response to the GAO report's recommendation that USAID build an "information analysis capability," the agency created the Center for Development Information and Evaluation (CDIE) in 1983, with a mandate to "foster the use of development information in support of AID's assistance efforts." (24) CDIE carried out meta-evaluations to reveal broader trends in aid impact, provided information and training on evaluation best practices to mission staff, and made a wide range of evaluation reports accessible to implementers in the field. Aid officials suggest that CDIE's evaluation work played a significant role in shaping USAID strategies and priorities in many sectors over decades.

An internal USAID review in 1988 found that CDIE had greatly increased the use of aid evaluation information by implementers, but also identified a need to improve the quality and timeliness of evaluation reports. (26) While the evaluation policy at the time still called for rigorous, statistical methods of evaluation, it was found that this approach was never actually widely used at USAID because the required skills, time, and expense made implementation difficult. (27) As one internal review noted, "statistical rigor in evaluation methods was deemphasized in favor of 'reasonably' valid evidence about project performance." (28) Guidance to missions encouraged the use of low-cost and timely qualitative evaluation methodologies, including the use of key informant interviews, focus group discussions, community meetings, and informal surveys. (29)

Testing Family Planning Project Design in Thailand, 1979

Many evaluations are designed to answer specific questions about project design. One example is the Family Planning Health and Hygiene Project, a 1979 independent evaluation of USAID support for the government of Thailand's family planning policy. Implemented by the American Public Health Association, the evaluation used a baseline survey and experimental design to test the hypothesis that contraception services would be more cost effective and acceptable to communities if combined with basic health services rather than implemented in isolation. Obtaining the appropriate information to inform resource allocation was a primary objective of the evaluation. According to the report, "the evaluation was implemented with sufficient precision and adherence to experimental requirements to provide information on which to make management decisions about the best use of resources." Evaluators found that the hypothesis was not supported by the evidence. Adding basic health services doubled the cost of programs but was not associated with increased contraceptive use. As a result, the evaluators recommended that future decisions about family planning and basic health services programs be considered without any assumption that a linkage between the two would increase the acceptance of contraception use. (25)

In the early 1990s, accountability for funds became a primary focus of aid evaluation. After a 1990 GAO review concluded that USAID evaluation practices made it difficult or impossible to account for use of aid funds, (30) attention turned to tracking where aid money was going, not measuring what it was accomplishing. At the same time, USAID was facing increasing budgetary pressure and increasing congressional and public concern about what was being achieved through foreign assistance. (31) In response, USAID carried out an Evaluation Initiative from 1990 to 1992, greatly expanding the staff and budget of CDIE and making significant investments in rigorous evaluation designs and innovative methods to evaluate sector-wide results. (32) However, by the mid-1990s the priorities changed once again. A 1993 agency reorganization led to the 1994 elimination of an Office of Evaluation within CDIE, a reduction of overall CDIE staff, (33) and a new emphasis on "rapid appraisal techniques," which guidance documents describe as a compromise between slow, costly, and credible formal evaluation methods and cheap, quick, informal methods (focus group, etc.) that may be less reliable. (34)

In 1995, USAID replaced the requirement to conduct mid-term and final evaluations of all projects with a policy calling for evaluation only when necessary to address a specific management question. (35) The rationale was that the required evaluations had become pro forma, as GAO reviews had suggested, and that fewer, more comprehensive evaluations would be a better use of time and resources. As a result, the number of completed evaluations dropped from 425 in 1993 to an estimated 138 in 1999, (36) but the depth and scope of new evaluations reportedly did not change. (37) One study suggests that inconsistent guidance on evaluation in these years allowed many already overburdened mission staff to ignore agency-wide requirements, but noted that the Global Health, Africa, and Europe & Eurasia bureaus, which had their own evaluation procedures, continued to carry out quality evaluation work. (38)

Foreign assistance levels grew rapidly starting in 2003 to support military activities in Afghanistan and Iraq, as well as the President's Emergency Plan for AIDS Relief (PEPFAR) and the creation in 2004 of the Millennium Challenge Corporation (MCC). Accountability to Congress became a major evaluation priority. In 2005, inspired by remarks made by House Foreign Operations Appropriations Subcommittee Chairman Jim Kolbe regarding the importance of being able to clearly demonstrate results of aid expenditures, USAID Administrator Andrew Natsios sought to revitalize evaluation within the agency. He sent a cable to all mission directors calling for the inclusion of evaluation plans, and higher quality evaluations, in all program designs; designated monitoring and evaluation officers at each post; and set aside funding for evaluations and incentives for employees who do evaluations; among other things. (39)

In 2006, in further pursuit of accountability, as well as a desire to rationalize the bilateral assistance efforts of multiple U.S. agencies, Secretary of State Condoleezza Rice created the Office of the Director of Foreign Assistance (F Bureau) at the State Department. In addition to consolidating many USAID and State policy and planning functions for foreign assistance, the F Bureau established an extensive set of standard performance indicators "to measure both what is being accomplished with U.S. Government foreign assistance funds and the collective impact of foreign and host-government efforts to advance country development." (42) Prior to this initiative, the State Department, which traditionally had managed a much smaller aid portfolio than USAID, is said to have made a de facto decision not to evaluate its assistance programs on a systematic basis. (43) As a result, the data collected through the "F process," which remains in place today, allow for a marked improvement in aid transparency, demonstrating comprehensively where and for what purpose aid funds are allocated by State and USAID as of FY2006. (44) However, the demands of F process reporting were believed by some to have interfered with more results-oriented evaluation work at USAID, and a 2008 assessment of State's evaluation capacity found that several bureaus, including those that manage State's security assistance programs, still had little or no evaluation capacity. (45)

Primary School Deworming in Kenya (1997-2001) (40)

One well-known example of an impact evaluation that yielded useful information looked at a World Bank-supported project in Kenya that treated children for intestinal worms, a prevalent affliction that results in listlessness, diarrhea, abdominal pain, and anemia. The stated development objective was to increase the number of children completing their primary education. In collaboration with the local health ministry, NGO implementers treated 30,000 children in 75 schools with a drug that cost $3.27 annually per child, using baseline data and a random phase-in approach that allowed for a controlled comparison. The evaluation found that the de-worming resulted in a 25% reduction in absenteeism, or 10-15 more days of school attendance per child per year. This case is also an example of the value of consistent methodology and the use of sector- or region-wide evaluation that looks at results beyond the project level. Similar evaluation methods were used for other interventions (providing free uniforms, textbooks, and/or meals) with the same goal and in the same region, allowing evaluators to do a comparative analysis and determine that the de-worming intervention was the most effective of these interventions in increasing school participation. (41)

The structural reforms of the F Bureau came at a time of heightened congressional scrutiny of foreign aid. In 2004, Congress established the Helping to Enhance the Livelihood of People (HELP) Around the Globe Commission, through a provision in P.L. 108-199, to independently review foreign assistance policy decisions, delivery challenges, methodology, and measurement of results. After nearly two years of work, the HELP Commission released its report in late 2007. On the subject of evaluation, the report noted that "everyone to whom members of the Commission spoke about monitoring and evaluation expressed concern about the inadequacy of the existing process" and concluded that "unless our government better evaluates projects based on the outcomes they achieve, it will not improve the effectiveness of taxpayer dollars." (46) The commission recommended creation of a unified foreign assistance policy, budgeting, and evaluation system within State, quite similar to the F process, which was established before the report was released. Other HELP Commission recommendations included ensuring that evaluation strategies use control groups and randomization as much as possible; considering new evaluation methods, such as the use of professional associations or accreditation agencies; and building, in collaboration with other donors, the capacities of recipient governments to provide reliable baseline data. (47)

At the same time the F Bureau was established, and the HELP Commission was active, the international donor community began to prioritize aid effectiveness, sparking renewed interest in rigorous impact evaluation (see the "A Global Perspective on Aid Evaluation" text box below). Some aid professionals viewed the F process as an opportunity to build a cross-agency aid evaluation practice focused on impact, and were disappointed that the common indicators used by the F Bureau, while an improvement with respect to comparability, measured outputs rather than impact. Furthermore, the use of more rigorous evaluation methodologies was not a focus of the reform. These issues were revisited by the Obama Administration when it embarked in 2009 on a Quadrennial Diplomacy and Development Review (QDDR) to examine how State and USAID could be better prepared for current and future challenges. As a result of that review, the Administration committed itself in December 2010 to several principles of foreign assistance effectiveness, including "focusing on outcomes and impact rather than inputs and outputs, and ensuring that the best available evidence informs program design and execution." (48) The QDDR became the basis of many recent and ongoing changes at State and USAID, including the creation of a new Office of Learning, Evaluation and Research at USAID and a new USAID evaluation policy, which took effect in January 2011. State followed suit and adopted an evaluation policy similar to that of USAID in February 2012. These policies are discussed later in this report.

The Millennium Challenge Corporation is a relative newcomer to foreign assistance, and has a very limited evaluation history. Nevertheless, since its establishment in 2004, MCC has been regarded by many as a leader in aid evaluation, largely as a result of its demanding evaluation policy. MCC provides funding and technical assistance to support five-year development plans, called "compacts," created and submitted by partner countries. Since its inception, MCC policy has required that every project in a compact be evaluated by independent evaluators, using pre-intervention baseline data. MCC has also put a stronger emphasis on impact evaluation than State and USAID; of the 25 MCC impact evaluation plans (not completed evaluations) made publicly available, 11 employ a rigorous randomized control trial methodology rarely used by other aid agencies. (49) MCC to date has released five evaluations, all related to specific farmer training activities, and has not completed any final compact evaluations. A GAO report on the first two completed MCC compacts suggests that significant changes were made to the original evaluation plans, raising questions about whether the agency's practices will reflect its policy over the long term. (50)

MCC's First Impact Evaluations

MCC released its first set of independent impact evaluations on October 23, 2012.51 While the evaluations all look at farmer training activities, and reflect a small portion of MCC compacts in the respective countries (Armenia, Ghana, El Salvador, Honduras, and Nicaragua), they were much anticipated in the development community as harbingers of the success or failure of MCC's evidence-based approach to evaluation. The evaluation results were mixed. MCC reports meeting or exceeding output and outcome targets for most of the evaluated activities, but not seeing measurable changes in household incomes, which was the intended impact. The reports also describe some problems with evaluation design and implementation. Many development experts praised MCC's transparency about both the successes and shortcomings of its programs, and apparent commitment to continuous improvement. (52) The evaluation reports were published in full on MCC's website, along with MCC analysis of lessons learned (e.g., phased implementation doesn't work well on a tight schedule, as delays undermine the entire evaluation model) and questions raised (e.g., should the assumption that increased farm income leads to increased household income be reconsidered?). According to at least one development professional, this first set of evaluations is a "game changer" that has set a new standard for development agencies. (53)

Evaluation Challenges

The current evaluation emphasis on measuring impact and broader learning about what works is not new; as discussed above, it was the basis of USAID evaluation policy in the 1970s and at various times since. Nevertheless, a 2009 meta-evaluation of U.S foreign aid programs indicated that rigorous impact evaluation--the kind that could determine with credibility whether a specific aid intervention or broader sector strategy worked to produce a specific development outcome--was rarely attempted. Of the 296 evaluations reviewed, only 9% reported on a comparison group and only one used an experimental design involving randomized assignment, the method most likely to produce accurate data. (54) A 2005 review of USAID evaluations (focused on democracy and governance programs) found that "as a group, they lacked information that is critical to demonstrating the results of USAID projects, let alone whether the projects were the real cause of whatever change the evaluation reported." (55) This gap between evaluation goals and actual practices has been documented repeatedly over the history of U.S. foreign assistance; so too have the challenges that make it difficult for implementers to achieve ideal evaluation practices in the field. Some of these challenges are discussed below.

Mixed Objectives. The U.S. foreign assistance program has dozens of official objectives written into statute, and many aid programs are designed to meet multiple objectives. Often there are both strategic objectives and development objectives attached to an aid intervention, which may or may not be acknowledged in budget and planning documents. For example, assistance to Uzbekistan may be requested and appropriated for specific agriculture sector activities, but may be motivated primarily by a desire to secure U.S. overflight privileges for military aircraft bringing troops and supplies to Afghanistan. An evaluation of the agricultural impact may be of no use to policymakers who are more interested in the strategic goal, nor to aid professionals who are unlikely to view any lessons learned in these circumstances as applicable to agricultural development projects in a less politically affected environment. Another example is the Food for Peace program, which provides U.S. agricultural commodities to countries facing food insecurity. One objective of the program is to feed hungry people, but long-standing requirements that most of the food be provided by U.S. agribusiness and be shipped by U.S.-flagged vessels make clear that supporting the U.S. agriculture and shipping industries is a program objective as well, and a potentially conflicting one. Studies have shown that the buy and ship America provisions, as they are known, may lessen the hunger-alleviation impact of food aid by up to 40%. (57)

OTI Consolidation in Colombia, 2007-2011

A 2011 evaluation of USAID's Office of Transition Initiatives (OTI) Integrated Governance Response Program (IGRP) in Colombia demonstrates the difficulty in quantifying the success of certain types of foreign aid. The IGRP was intended to strengthen the government of Colombia's credibility and legitimacy in communities once controlled by rebels, a process known as "consolidation." When the Colombian military reestablished control over a community, OTI provided funds and technical assistance to support rapid-response community-based projects, such as school rehabilitation, and small income-generation programs, such as providing agricultural inputs, designed to increase citizen confidence in, and cooperation with, the government. The loosely defined objectives and ex-post approach to evaluation, however, made it difficult to determine the program's effectiveness. As the evaluation report notes, without a defined endpoint for the consolidation process or concrete indicators for what constitutes success, the evaluation is "necessarily impressionistic in nature." While a more rigorous evaluation methodology would be possible with better planning (for example, using a pre-intervention survey as a baseline to measure changing attitudes), it may not be practical. Rapid response was a key element of the OTI approach, which focused on citizens seeing an immediate and beneficial impact of government control, and delay for the sake of rigorous evaluation design could have undermined that strategy. Evaluators used literature reviews, interviews, and site visits to find that the program was a success because it "nurtured a mindset" among both Colombians and Americans working on consolidation that is valuable in achieving policy objectives in conflict zones. (56)

Despite the political and diplomatic considerations that arguably underlie the majority of foreign aid, strategic evaluations that examine those objectives are rare (or at least not publicly available). This may be understandable, as such evaluations would often be politically and diplomatically sensitive. Nevertheless, evaluation that focuses only on the development or humanitarian impact of a particular program or project, when broader strategic objectives are drivers of the aid, may largely miss the point.

Funding and Personnel Constraints. The more rigorous and extensive an evaluation, the costlier it tends to be, both in funds and staff time. Impact evaluations are particularly costly and require specially trained implementers. Absent a directive from agency leadership, aid implementers are unlikely to make resources available for evaluation at the expense of other program components. As one internal USAID review explained, "since USAID's development professionals have limited staff, limited budget, and copious priorities, unfortunately, due to lack of training on the crucial role of evaluation in the development process, most have chosen to eliminate evaluation from their programs." (58) Competitive contracting plays a role as well. At a time when most program implementation is contracted out, and cost is a key factor in winning contract bids, some argue that there is little incentive to invest in the up-front costs, such as baseline surveys, of a well designed evaluation plan in the absence of an enforced requirement. (59) As a result, ad hoc evaluations of limited scope and learning value--as one report describes it, the "do the best you can in three weeks" approach--often prevail by default. (60) "It is rare," according to one report, "that the resources provided for an evaluation are sufficient to develop and apply more rigorous research methods that would produce valid empirical evidence regarding outcomes and attributable impact." (61) Sometimes the limited resource is personnel, rather than funding. Reviews of assistance evaluation repeatedly cite lack of trained evaluation personnel as a problem.

Emphasis on Accountability of Funds. Aid evaluations in recent years have primarily focused on accountability of funds because that is what stakeholders, including Congress, generally ask about. Concerned about corruption and waste, bound by allocation limits, and required by law to report on various aspects of aid administration, implementing agencies have developed monitoring, evaluation, and data collection practices that are geared toward tracking where funds go and what they have purchased rather than the impact of funds on development or strategic objectives. For example, the F Bureau's Foreign Assistance Framework, launched in 2006, was created largely to address the information demands of stakeholders, who wanted more data on how aid funds are being spent. It worked, to the extent that it is now easier to find information on how much aid is being spent in a given year on counterterrorism activities in Kenya, for example, or on agricultural growth programs in Guatemala. (62) But little if any of the resulting data addresses the impact of aid programs. If stakeholders had instead expressed sustained interest in aid impact, the so-called "F process" may have taken a different form.

Methodological Challenges. In the complex environment in which many aid projects are carried out, it can be challenging to employ high quality evaluation methods. U.S. agency policies allow for a variety of evaluation methods (see Appendix A), acknowledging that the most rigorous methods are not always practical. Sometimes it is impossible to identify a comparable control group for an impact evaluation, or unethical to exclude people from a humanitarian intervention for the purpose of comparison. Sometimes the goals are intangible and cannot be accurately documented through metrics. For example, it may be much harder to measure the impact of programs such as the Middle East Partnership Initiative, designed to strengthen relationships, than to measure more concrete objectives, such as reducing malaria prevalence. This may be one reason why reviews have found that global health assistance has a stronger evaluation history than other aid sectors; (63) disease prevalence and mortality rates lend themselves to quantification better than military personnel attitudes towards human rights or the strength of civil society. Rigorous methodology can also limit program flexibility, as making program changes midcourse, in response to changed circumstances or early results, can compromise the evaluation design. Even MCC, with its emphasis on rigorous evaluation, has chosen to use less rigorous qualitative methods for certain projects that do not, in the agency's opinion, lend themselves to quantitative evaluation. (64)

Even when metrics and baselines are well established, it can still be very difficult to attribute impact to a specific U.S. aid intervention when such programs are often carried out in the context of a broader trade, investment, political, and multi-donor environment. (65) Also, some aid professionals see broader drawbacks to rigorous impact evaluation methods. Some assert that the use of randomized control groups, which generally require the use of independent evaluators, limits the participation of affected individuals and communities in project design. They argue that community participation in project planning and evaluation, which can lead to greater buy-in and local capacity building, is more valuable in the development context than high-quality evaluation findings. (66) Others counter that more participatory methodologies are often weakened by bias, and that it is unwise and even unethical to replicate programs, which may profoundly affect participants, without having properly evaluated them. (67)

Compressed Timelines. While development assistance, in particular, is recognized as a long-term endeavor, aid strategies can be trumped by political pressures, which can influence evaluation. In 2001, a USAID survey report stated that "the pattern found was that evaluation work responds to the more immediate pressures of the day." (68) Policymakers facing relatively short budget and election cycles do not always allow adequate time for programs to demonstrate their potential impact. Such pressures have only increased over the last decade, particularly in the politically charged environments of Iraq, Afghanistan, and Pakistan. As a Senate Foreign Relations Committee report on aid to Afghanistan explains, "the U.S. Government has strived for quick results to demonstrate to Afghans and Americans alike that we are making progress. Indeed, the constant demand for immediate results prevented the implementation of programs that could have met long-term goals and would now be bearing fruit." (69)

The type of evaluation necessary to determine whether aid has real impact is both hard to do and of limited use in a short-term context. Timelines are particularly restrictive for MCC, which originally intended to complete evaluations during the compact implementation period. This goal, which reflects broad support for limited timeframes on foreign assistance, was found not to be feasible during implementation of MCC's first compacts in Cape Verde and Honduras. (70) Baseline data and evaluation models can be rendered worthless if program timelines change. For example, an MCC evaluation of a farmer training program in Armenia found that the planned impact evaluation model--a phased roll-out--was compromised by a delay in implementing one component of the program and the five-year compact timeline. (71)

Sector Evaluation Example: Trade Capacity Building

Many analysts have suggested that cross-country evaluations of aid for a specific sector may be more useful for shaping policy than the more common individual project evaluations. One example of this approach is an evaluation commissioned by USAID to look at the impact of 256 U.S. trade capacity building (TCB) assistance projects in 78 countries from 2002 to 2006. The United States obligated about $5 billion during this period for TCB activities, through several federal agencies, including assistance to help developing countries strengthen their public institutions and policies related to trade, as well as programs to make private industries more knowledgeable about and competitive in global markets. The evaluation was designed after the fact, making a randomized controlled trial unfeasible, and had to account for variations in reporting across projects. Much of the report highlights anecdotal examples of issues that could not be analyzed systematically as a result of inconsistent data collection methodologies across projects. However, using regression analysis, evaluators found a relationship suggesting that each additional $1 invested in U.S. aid (from all agencies) for TCB is associated with a $53 increase in the value of recipient country exports two years later. For TCB aid specifically managed by USAID, the relationship was $1 invested for $42 in increased exports. No similar association was found between TBC assistance and recipient country imports or foreign direct investment. While this evaluation's methodology was not sufficient to demonstrate actual aid impact or causation, its findings may be useful to policymakers in both demonstrating a correlation between TCB aid and export growth, as well as forming the basis of a discussion about the comparative advantages of various U.S. agencies in managing TCB aid. (72)

Country Ownership and Donor Coordination. The United States and other

aid donor countries have made pledges in recent years to both coordinate their efforts and increase recipient country control, or "ownership," over the planning of aid projects and the management of aid funds. The QDDR also promotes these objectives. (73) Country ownership is believed by many to increase the odds that positive results will be sustained over time both by ensuring aid projects are consistent with recipient priorities and by helping to build the budget and project management capacity of recipient country governments and non-governmental organizations (NGOs) that administer the assistance. Donor coordination of assistance efforts is supposed to promote efficiency, ease administrative burdens on aid recipients, and avoid duplication, among other things. USAID, as part of its ongoing procurement reform process, aims to channel 30% of aid directly to governments and local organizations in developing countries by 2015. However, greater country ownership, and the pooled funds that may result from donor coordination, generally means diminished donor control, and a lesser ability to evaluate how U.S. funds contributed to a particular outcome. Accountability concerns often greatly overshadow the learning aspects of evaluation in such a context, as Congress has expressed concern about the heightened potential for corruption and mismanagement when funds flow directly to recipient country institutions.

Security. Over the past decade, a significant percentage of foreign aid has been allocated to countries where security concerns have presented major obstacles to implementing, monitoring and evaluating foreign aid. A 2012 evaluation of a USAID agricultural development program in rural Pakistan, for example, states "the operating environment for development projects has been especially testing in recent years in the presence of an insurgency and frequent targeted killings and kidnappings." (74) Development staff in Afghanistan and Iraq have not always been able to safely visit project sites to verify that a structure has been built or supplies delivered, much less be out on the streets conducting the types of surveys that certain evaluations would normally call for. A 2011 USAID Inspector General report noted that more than half of performance audits in Iraq indicated security concerns. In the most insecure environments, monitoring and evaluation of aid programs have often fallen by the wayside. Even in less hostile environments, security concerns can undermine evaluation quality. For example, a 2011 evaluation of Office of Transition Initiatives governance activities in Colombia noted that "security considerations limited to some degree the evaluation team's freedom to interview community members in project sites at will. This fact made it difficult to be certain that field research did not suffer from a form of sampling bias." (75) While security challenges may weigh against the use of aid in certain regions, the most insecure places are sometimes where the U.S. foreign policy interests are greatest, and policymakers must consider whether the risk of being unable to evaluate even the performance of an aid intervention is worth taking for other reasons.

Agency and Personal Incentives. Given discretion in the use and conduct of evaluations, observers have noted the inclination of foreign assistance officials to avoid formal evaluation for fear of drawing attention to the shortcomings of the programs on which they work. While agency staff are clearly interested in learning about program results, many are reportedly defensive about evaluation, concerned that evaluations identifying poor program results may have personal career implications, such as loss of control over a project, damage to professional reputation, budget cuts, or other potential career repercussions. (76) As explained by one USAID direct-hire in response to a survey, "if you don't ask [about results], you don't fail, and your budget isn't cut." (77) That same study revealed that staff felt more pressure to produce success stories than to produce balanced and rigorous evaluations, and that "professional staff do not see any Agency-wide incentive to advance learning through evaluations." (78) Few observers consider risk taking and accepting failure as a necessary component of learning to be hallmarks of USAID or State Department culture. MCC's institutional attitude toward adverse results may be tested in the coming year, as its first evaluations are being made public for the first time.

Applying Evaluation Findings to Policy

A consistent theme in past reviews of foreign aid evaluation practices is that even when quality evaluation takes place, the resulting information and analysis are often not considered and applied beyond the immediate project management team. Evaluations are rarely designed or used to inform policy. Lack of faith in the quality of the evaluation, irregular dissemination practices, and resistance to criticism may all contribute to this problem, as does lack of time on the part of aid implementers and policymakers alike to read and digest evaluation reports. A survey of U.S. aid agencies found that "bureaucratic incentives do not support rigorous evaluation or use of findings," "evaluation reports are often too long or technical to be accessible to policymakers and agency leaders with limited time," and learning that takes place, if any, is "largely confined to the immediate operational unit that commissioned the evaluation." (79) The shift in recent decades towards the use of contractors and implementing partners for most project implementation, and most project evaluation, may also impact the learning process. As one report notes, "partner organizations are learning from the experience, but USAID is not," and most evaluation work does not circulate beyond the partner. (80)

The lack of a "learning culture," as some describe it, has been a perennial criticism that agencies appear to have been largely unsuccessful addressing in the past, though the prominent "lessons learned" sections in the first batch of MCC evaluations may set a new standard. Some assert that outside pressure, such as a legislative mandate, may be necessary. Congress expressed some interest in this issue with the Initiating Foreign Assistance Reform Act of 2009 (H.R. 2139 in the 111th Congress), which called for "a process for applying the lessons learned and results from evaluation activities, including the use and results of impact evaluation research, into future budgeting, planning, programming, design and implementation of such United States foreign assistance programs." No such requirements were enacted in the 111th Congress, but the May 2012 memorandum from OMB, calling on all agencies to use evaluation data in their FY2014 budget submissions, may have similar impact. (81)

The learning aspect of evaluation relies heavily on agency culture, which may be shaped more by leadership than policy. The effective application of evaluation information depends also on the details of implementation, such as evaluation questions being based on the information needs of policymakers and program managers, and information being presented in a format and to a scale that is useful. Policymakers, for example, may be much better able to make actionable use of a meta-evaluation of microfinance programs, presented in a short report highlighting key findings, than a whole database of detailed analysis of single projects, the results of which may or may not be more broadly applicable. Experts have pointed out that individual project evaluations, even when well done, do not roll up nicely into a document showing what works and what does not. They contend that for maximum learning, an effort must be made at the cross-agency or even whole-of-government level to develop evaluation meta-data that is responsive not only to the needs of a project manager interested in the impact of a particular activity, but also to agency leadership and policymakers who want to know, more broadly, what foreign assistance is most effective. This view has been reflected in legislation introduced in recent years, including the Foreign Assistance Revitalization and Accountability Act of 2009 (S. 1524 in the 111th Congress), which called for the creation of a Council on Research and Evaluation of Foreign Policy to do cross-agency evaluation of aid programs.

As important as evaluation can be to improving aid effectiveness, not every aid project has broad learning potential. Knowing which potential evaluations could have the greatest policy implications may be key to maximizing evaluation resources. Many USAID projects, for example, are designed as small-scale demonstrations, with no intention that they be scaled up or replicated elsewhere. In other situations, an approach may have already been well proven. In such instances, a basic performance evaluation for accountability may be appropriate, but rigorous evaluation may be a poor use of resources. A 2012 USAID "Decision Tree for Selecting the Evaluation Design" asks staff to first consider whether an evaluation is needed, and decline to evaluate if the timing is not right, if there are no unanswered questions for the evaluation to address, or if there is no demand from stakeholders. (82)

Current Agency Evaluation Policies

The primary U.S. government agencies managing foreign assistance each have their own distinct evaluation policies, but these policies have come into closer alignment in the last two years. The Quadrennial Diplomacy and Development Review (QDDR) report of December 2010 stated the intent that USAID would reclaim its leadership role with respect to evaluation and learning, and referenced a new USAID evaluation policy in the works to reflect the growing demand for results data and attempt to address some persistent evaluation challenges. That policy took effect January 2011. The State Department followed suit in February 2012 with an new evaluation policy that is similar in many respects to the USAID policy, and MCC updated its policy in May 2012.

Appendix A compares key provisions of the current evaluation policies of USAID, State, and MCC.

The new State and USAID policies share much in common, balancing the costs and expected gains from evaluation. For example, both require performance evaluations of all larger-than-average projects and experimental/pilot projects, but not all projects. Both also include a target allocation of funds for program evaluation: 3% for USAID and 3%-5% for State. The policies share an emphasis on accessibility of information, with provisions to promote consistent and timely dissemination of evaluation reports. In their introductory language, both policies emphasize the learning benefits of evaluation, in addition to accountability. The USAID policy is notably more detailed than State's on many of the issues. The USAID policy establishes required features for evaluation reports, and specifies that evaluation questions be identified in the design phase of projects, issues which the State policy does not address. USAID states that most evaluations will be conducted by third party contractors or grantees, to promote independence, while State's policy does not explicitly mention use of independent evaluators. State's evaluation reporting requirements also focus on internal dissemination, while USAID requires public availability. According to State officials, however, many of these issues are fleshed out in subsequent internal guidance documents and the State and USAID policies, in practice, differ only on the use of impact evaluation. USAID's policy calls for impact evaluation whenever feasible, while the State policy sets a clear expectation that impact evaluation will be rare. (83)

MCC's evaluation policy shares many elements of the State and USAID policies, but goes farther in many respects. MCC requires independent evaluations of all compact projects, using indicators and baselines established prior to project implementation. It may be, however, that first-hand experience with the challenges of evaluation is bringing MCC policy and practice closer to that of USAID over time. MCC's 2012 policy revision adopts definitions from USAID's 2011 evaluation policy and includes a new section on institutional learning. The update also appears to move closer to the USAID model with respect to impact evaluation, calling for impact evaluations "when their costs are warranted," whereas the previous iteration referred to independent impact evaluations as an "integral part" of MCC's focus on results. (84) The MCC policy still appears to have the strongest enforcement mechanism among the three agency policies, conditioning the release of quarterly disbursements on substantial compliance with the policy. USAID's policy, in contrast, calls only for occasional compliance audits, and State's policy does not address compliance at all.

While some experts have called for greater uniformity of evaluation practices across agencies to allow for comparative analysis, others view the differences in State, USAID, and MCC evaluation polices as reflecting the different experience, scope of work, and priorities of the agencies. USAID, with the largest and most diverse assistance portfolio among the agencies, and numerous small projects, may require a more flexible approach to evaluation than MCC, which is narrowly focused on economic growth and recipient government ownership. At State, foreign assistance is just one part of a broader portfolio (including diplomatic activities), potentially impacting what type and scope of evaluation is useful or possible.

These current evaluation policies represent a step towards improving knowledge of foreign assistance measures of effectiveness at the program or project level, and increasing transparency of the evaluation process. They do not, however, attempt to establish a systemic approach to aid evaluation that would make country-wide, sector-wide, or cross-agency evaluation or aid more feasible. They look similar to earlier initiatives to improve aid evaluation. Many aspects of the new USAID policy, for example, are strikingly similar to the required actions called for in the 2005 cable to USAID missions (e.g., evaluation planning as part of all program designs, designated evaluation officers at each post, and set-aside evaluation funds). It is too early to know whether this new initiative will have more real or lasting impact than its predecessors. The State Department policy has only recently taken effect. MCC just released its first five project evaluation reports in October 2012, (85) and has yet to produce a compact evaluation. USAID, a year into implementation of its policy, reports that insufficient time has passed to document any changes in evaluation quality, as no evaluations have gone from start to finish under the new requirements. However, the quantity of USAID evaluations has increased notably, from 89 in 2010 to 295 in 2011, (86) and the agency aims to complete 250 "high quality" evaluations by January 2013.

A Global Perspective on Aid Evaluation

U.S. foreign assistance evaluation efforts have evolved in the context of a global movement by public and private aid donors to improve aid effectiveness, with improved evaluation practices as one of many strategies. Representatives of aid donor countries meet regularly under the auspices of the OECD Development Assistance Committee (DAC) to discuss evaluation practices, among other things, as a means of implementing the aid effectiveness agenda laid out in the 2005 Paris Declaration on Aid Effectiveness and the 2008 Accra Agenda for Action. A 2010 OECD/DAC survey and report on evaluation in the development agencies of major donor countries highlighted several issues that are common to U.S.-specific aid evaluation. (87) The report found a heavy reliance on measuring outputs, but also a trend toward measuring aid impact and larger strategic questions of development effectiveness. It identified new emphasis on dissemination of evaluation findings, and found that while bilateral aid agencies on average allocated 0.1% of their development assistance budget to evaluation, lack of human resources--people qualified to do rigorous impact evaluations, evaluations of direct budget support, or requiring specific language skills, in particular--presented a bigger obstacle to evaluation goals than did financial constraints.

Non-governmental organizations have focused on evaluation in recent years, as well. In 2004, an Evaluation Gap Working Group was convened by the Center for Global Development with support from the Bill & Melinda Gates Foundation and the William and Flora Hewitt Foundation. The Working Group focused on why rigorous impact evaluations of development assistance were so rare. The resulting report, "When Will We Ever Learn?," is a key resource for this report. The group made two recommendations: (1) that donors invest more in their own evaluation capacity, and (2) that an independent institution be created to evaluate aid. (88) The offshoot of the latter recommendation is the International Initiative for Impact Evaluation (3ie), established in 2009, with a mission to use impact evaluations, specifically, to generate high quality evidence for use in shaping effective development policies. 3ie both funds evaluations and produces extensive materials on evaluation methods, implementation practices, and application to policy, as a means to improve evaluators' technical capacity. USAID and MCC are official partners of 3ie, as are many other official aid agencies, private foundations, and non-profit organizations such as the Hewlett and Gates foundations and Save the Children.

Issues for Congress

While recent momentum on foreign aid evaluation reform has originated within the Administration, Congress may have significant influence on this process. Not only can Congress mandate or promote a certain approach to evaluation directly through legislation, as has been proposed, it can modulate Administration policies by controlling the appropriations necessary to implement the policies. Congress may also influence how, or if, the information resulting from evaluations will impact foreign assistance policy priorities. These issues are discussed in greater detail below.

Reform Authorization Legislation. There is at least one proposal in the 112th Congress that focuses specifically on foreign aid evaluation. The Foreign Aid Transparency and Accountability Act of 2012 (H.R. 3159; S. 3310) seeks to evaluate the performance of U.S. foreign assistance programs and improve program effectiveness by requiring the President to establish guidelines on measurable goals, performance metrics, and monitoring and evaluation plans for foreign assistance programs that can be applied on a uniform basis across implementing agencies, both U.S. and multilateral. The legislation also calls for the creation of a website, within two years of enactment, that would make detailed, program-level information on foreign assistance, including country strategies, budget documents, budget justifications, actual expenditures, and program reports and evaluations available to the public. The bill's requirements are similar in many respects to the F Process, but would extend the requirements across the various federal and multilateral agencies that administer aid programs. The benefit of such broad uniformity, arguably, is that it could enable policymakers, the public, and other stakeholders to better compare the activities of various agencies and get a more comprehensive picture of total U.S. foreign assistance. A potential drawback is the effort and expense required to impose such uniformity on agencies with different objectives, management structures, and information technology systems. The legislation is focused on transparency and accountability rather than effectiveness, and does not promote the use of impact evaluation. If performance evaluation continues to comprise the vast majority of aid evaluations, such a cross-agency requirement may provide comparable information on aid management from agency to agency, but is not likely to facilitate comparative analysis of what aid works best.

Appropriations for Enhanced Evaluation. Increasing the number and quality of foreign aid evaluations, while potentially cost effective in the long run, requires an investment of resources. For the most part, evaluation costs are integrated into program accounts at the various implementing agency budgets and are not scrutinized specifically by Congress. However, USAID, in conjunction with its new policy, started in the FY2012 budget request to identify resource needs for a centralized evaluation and learning through a "Learning, Evaluation and Research" (LER) line item. LER is one of the seven focus areas of the USAID Forward reform agenda, and is intended to both enhance USAID's ability to conduct rigorous evaluations, as well as apply the knowledge gained through evaluation to improve future assistance strategies and design. The Administration requested $19.7 million for this purpose, through the Development Assistance appropriations account, for FY2012. Congress provided $12.26 million. For FY2013, USAID requested $26.67 million, to expand the number of priority evaluations it can carry out, improve staff training, and support evaluation collaborations with international partners. The ultimate funding level established by Congress, together with any related legislative directives, may play a role in determining the extent of the Administration's efforts to strengthen evaluation practice.

Impact of Evidence Based Approach on Congressional Priorities. Congress has long exerted control over foreign assistance not only through appropriated funds and restrictions, but also by directing foreign assistance funds to certain sectors, countries, or even specific projects through bill or report language. For example, the committee reports accompanying the FY2013 House and Senate State-Foreign Operations appropriation proposals (H.Rept. 112-494; S.Rept. 112-172), like most of their predecessors, provide specific funding levels for microfinance, basic education, water and sanitation, women's leadership training, people-to-people reconciliation programs in the Middle East, and other sectors of particular interest to Members of Congress. Should credible information about the relative effectiveness of these programs be made available as a result of improved evaluation practices, Congress can weigh the importance of the data, among other drivers, in establishing aid priorities. Some congressional directives on aid are less likely than others to be affected by evaluation results. The availability of actionable evaluation data may not result in a maximization of aid effectiveness, but may allow Congress to make more deliberate trade-offs between effectiveness and other objectives.


The primary U.S. agencies charged with implementing foreign assistance have made significant steps in the last two years to address ongoing deficiencies in evaluation practices that make it difficult to judge whether foreign assistance is achieving its various objectives. There is widespread agreement, reflected in new policies, on the need for consistent performance evaluation of aid programs. The value of rigorous impact evaluation is broadly recognized as well, though the agencies differ in their capabilities and aspirations in this respect. Past policies and evaluation reform efforts, however, have been similarly focused but not sustained in the face of persistent challenges, many of which remain today. Other reforms, such as the establishment of centralized evaluation processes or the creation of an independent evaluation entity, have been proposed in legislation yet not addressed in agency policies. Growing emphasis in Congress and the Administration on results-based budgeting, as well as movement within the international aid donor community toward more rigorous aid evaluation practices, may provide the context for future change. The 113th Congress will have multiple opportunities to influence how U.S. foreign assistance is evaluated through legislative proposals, appropriations, and oversight activities.
Appendix A. Select Aspects of Current USAID, State Department,
and MCC Evaluation Policies

                        USAID                    State

Effective       January 2011             February 15, 2012
Responsible     PPL/LER responsible      F and RM Bureaus
  Personnel     for system               monitor and report on
                implementation, while    evaluations plans.
                missions and             Each Bureau should
                functional bureaus       identify a senior
                responsible for          staffer to serve as
                conducting               evaluation point of
                evaluations. All         contact.
                Bureaus and operating
                units must designate
                an evaluation point of
Evaluation      Operating units must     All programs/projects/
  Requirement   conduct at least one     activities greater
                performance evaluation   than or equal to the
                of each project that     median size (generally
                equals or exceeds        using dollar value as
                average project size.    the measure) for the
                                         Bureau must be
                Projects involving an    evaluated at least
                untested hypothesis or   once in their lifetime
                new approach, and that   or every five years,
                are anticipated to       whichever is less.
                expand in scale or
                scope, will undergo an   All pilot programs
                impact evaluation, if    must be evaluated once
                feasible.                every five years.

                All evaluations will     Each Bureau must
                share certain basic      evaluate 2 to 4
                features, including      projects/programs/
                a full description of    activities in
                methodology;             FY2012-FY2013, with
                standardized recording   this requirement
                and maintenance of       extending to all posts
                records from             in FY2013-FY2014
                evaluation; evaluation   period.
                findings based on
                facts, evidence,
                and data,
                data; and an
                explanation of the
                limitations of the
                Key evaluation
                questions will be
                identified during the
                design phase of every
Evaluation      Emphasis on quality      Bureau's discretion,
  Type          evaluation methods       based on context but
                and favoring             the policy establishes
                random assignment/       an expectation that
                experimental methods     the "great majority"
                for impact evaluations   of evaluations will
                when feasible.           be performance
                                         evaluations because
                                         impact evaluations are
                                         more time consuming,
                                         costly, and often
                                         difficult to
                                         successfully design
                                         for State programs,
                                         projects and
Evaluator       Policy states that       Suggests that
  Type          most evaluations will    evaluators should be
                be conducted by third    free from and pressure
                party contractors or     free from and pressure
                grantees managed by      and/or bureaucratic
                USAID, but evaluation    interference, but does
                teams may be composed    not explicitly call
                primarily of USAID       for the use of
                staff, led by an         outside evaluators.
                outside expert, when
                it is determined that
                this will facilitate
Funding         Recommends an average    Program managers
  Requirement   3% of program budgets    should identify
                be dedicated             resources of up to
                specifically to          3-5% for evaluation
                external evaluation,     activities.
                distinct from

                Resources for
                evaluation should be
                concentrated on large
                projects and those
                that are innovative or
                pilot approaches.
Reporting       Public availability of   Bureaus and posts
  Requirement   evaluation reports and   must electronically
                summaries, within 3      transmit final
                months of completion,    evaluation reports
                on the Development       as cables and post
                Experience               reports on their
                Clearinghouse website.   OpenNet or ClassNet
Compliance      PPL/LER will organize    No reference
  Enforcement   occasional external      to compliance
                technical audits of      enforcement.
                operating unit
                compliance with the


Effective       May 1, 2012
Responsible     Primary lead is MCA
  Personnel     (host country entity)
                M&E, with input from
                MCC M&E.

Evaluation      All Compacts and
  Requirement   Threshold Agreements
                include monitoring
                and evaluation plans,
                which identify the
                evaluations to be
                conducted for each
                project, the key
                evaluation questions
                and methodologies, and
                the data collection
                strategies that will
                be used.

                Final evaluations are
                required for all
                projects in a Compact
                upon completion or
                termination; mid-term
                evaluations are

                Selected indicators
                must have baselines
                established prior to
                the start of the

Evaluation      Impact evaluations
  Type          performed "when their
                costs are warranted
                by the expected
                accountability and

Evaluator       Independent
  Type          evaluators required
                for final evaluations
                of Compacts.

                Mid-term compact
                evaluations and final
                threshold program
                evaluations can be
                done independently or
                by MCC/MCA staff.

Funding         Does not specify a
  Requirement   portion of funds that
                should be used for

Reporting       MCAs must post their
  Requirement   approved Compact M&E
                plans on their
                website. MCC and MCAs
                must "regularly"
                publish results
                information on their
Compliance      Substantial compliance
  Enforcement   required for approval
                of quarterly
                requested by recipient

Source: Policy for Monitoring and Evaluation of Compacts and
Threshold Programs, MCC, May I, 2012; Department of State
Evaluation Policy, Bureau of Resource Management, February 23,
20I2; Evaluation: Learning from Experience, USAID Evaluation
Policy, January 20II.

Notes: PPL/LER = USAID Office of Learning, Evaluation and
Research; F Bureau = Office of Foreign Assistance Resources;
RM = State Department Bureau of Resource Management; MCA = the
Millennium Challenge Account implementing entity in each compact
country; M&E = monitoring and evaluation. The information in the
table refers only to what is in the actual evaluation policy
document of each agency, as cited above. Information available
outside of these documents, which may provide greater details
about aspects of the policies, is not reflected here.

Author Contact Information

Marian Leonardo Lawson

Analyst in Foreign Assistance, 7-4475

(1) U.S. Department of State, Quadrennial Diplomacy and Development Review, 2010, Leading Through Civilian Power, p. 103.

(2) For more information about the MCC model, see CRS Report RL32427, Millennium Challenge Corporation, by Curt Tarnoff.

(3) Statement of USAID Administrator Rajiv Shah to The Cable, as reported in The Cable, June 13, 2012.

(4) While not often discussing evaluation policy per se, some Members appear to be influenced in their policy decisions by their sense of what aid is working and what is not. For example, when introducing her subcommittee's FY2013 proposal at full-committee mark-up on May 17, 2012, House State-Foreign Operations Appropriations Subcommittee Chairwoman Kay Granger remarked that the legislation "only supports programs that work." Senator Lindsay Graham of the Senate State-Foreign Operations Appropriations Subcommittee, explaining the sharp reduction in aid for Iraq in the Senate's FY2013 proposal at a May 22, 2012, mark-up, said "there's no point in throwing good money after bad."

(5) For historic information on foreign aid spending, see CRS Report R40213, Foreign Aid: An Introduction to U.S. Programs and Policy, by Curt Tarnoff and Marian Leonardo Lawson.

(6) When Will We Ever Learn?: Improving Lives Through Impact Evaluation, Report of the Evaluation Gap Working Group, Center for Global Development, May 2006, p. 1.

(7) According to U.S. Overseas Loans and Grants, 21 U.S. Government agencies reported disbursing foreign assistance in FY2010. See

(8) For more on current GPRA requirements, see CRS Report R42379, Changes to the Government Performance and Results Act (GPRA): Overview of the New Framework of Products and Processes, by Clinton T. Brass.

(9) Use of Evidence and Evaluation in the FY2014 Budget, Memorandum to the Heads of Executive Departments and Agencies, Jeffrey D. Zients, Acting Director, Office of Management and Budget, May 18, 2012.

(10) Foreign Assistance Act of 1961, P.L. 87-195), [section]101(a).

(11) Ibid.

(12) FAA, as amended, [section]481(a)(1)(C).

(13) FAA, as amended, [section]491(a).

(14) FAA, as amended, [section]572 (1) and (2).

(15) Several examples of this are discussed in, Economic Gangsters: Corruption, Violence and the Poverty of Nations, by Raymond Fisman and Edward Miguel, Princeton University Press, 2008.

(16) See Dambisa Moyo, Dead Aid: Why Aid is Not Working and How There Is a Better Way for Africa, Farrar, Straus and Giroux, New York, 2009, p. 48.

(17) Beyond Success Stories: Monitoring and Evaluation For Foreign Assistance Results, Evaluator Views of Current Practice and Recommendations for Change, by Richard Blue, Cynthia Clapp-Wincek and Holly Benner, May 2009, p. ii.

(18) For a thorough, yet non-technical, discussion of the use of impact/attribution evaluation, see "An introduction to the use of randomized control trials to evaluate development interventions," by Howard White, International Initiative for Impact Evaluation, Working Paper 9, February 2011.

(19) Clemens, Michael. "Impact Evaluation in Aid: What For? How Rigorous?" Presentation at the Overseas Development Institute, July 3, 2012, video recording available at 1426372/.

(20) Trends in Development Evaluation Theory, Policies and Practices, USAID, 17 August 2009, p. 4.

(21) The USAID Evaluation System: Past Performance and Future Direction, Bureau for Program and Policy Coordination, USAID, September 1990, p. 9.

(22) Evaluation Handbook, Office of Program Evaluation, USAID, November 1970, p. 40.

(23) Experience--A Potential Tool for Improving U.S. Assistance Abroad, U.S. Government Accountability Office, GAO-ID-82-36, June 15, 1982, p. i (summary).

(24) The History of CDIE, CDIEHI ST.017/SESmith;JREriksson/10-17-94, p.4.; available through the Development Experience Clearinghouse on the USAID website.

(25) The Community-Based Family Planning Services Family Planning Health and Hygiene Project, prepared by Bruce Carlson, MSPH, and Malcolm Potts, M.D. under the auspices of The American Public Health Association, USAID, 1979, pp. 5, 7.

(26) Ibid.

(27) The A.I.D. Evaluation System: Past Performance and Future Directions, Bureau for Program and Policy Coordination, Agency for International Development, September 1990, p. 10.

(28) Ibid., p. 11.

(29) Ibid., p. 11.

(30) Accountability and Control Over Foreign Assistance, GAO/T-NSIAD-90-25, March 29, 1990, p. 6, 11. The review found that military assistance managed by State and the Department of Defense was also inadequately monitored and accounted for.

(31) The History of CDIE, p.6; The A.I.D. Evaluation System, p. 11.

(32) Ibid, pp. 6-7.

(33) Ibid. p. 8.

(34) The Role of Evaluation in USAID, Performance Monitoring and Evaluation TIPS, USAID CDIE, 1997, Number 11, 3p5. 3.

(35) Beyond Success Stories, p.7; Evaluation of Recent USAID Evaluation Experience, Cynthia Clapp-Wincek and Richard Blue, Working Paper No. 320, U.S. Agency for International Development, Center for Development Information and Evaluation, June 2001, p. 31.

(36) Evaluation of Recent USAID Evaluation Experience, p. 5. The report authors note that while some of the declining numbers can be attributed to missions not submitting their evaluations to the Development Experience Clearinghouse, as policy required, making the specific numbers unreliable, the trend of decline is unmistakable.

(37) Evaluation of Recent USAID Evaluation Experiences, p. 12.

(38) The Evaluation of USAID's Evaluation Function: Recommendations for Reinvigorating the Evaluation Culture Within the Agency, Janice M. Weber, Bureau for Program and Policy Coordination, USAID, September 2004, pp. 5, 10.

(39) Actions Required to Implement the Initiative to Revitalize Evaluation in the Agency, UNCLAS STATE 127594, July 8, 2005.

(40) For an overview of this evaluation, as well as links to related studies, see evaluation/primary-school-deworming-kenya.

(41) Roetman, Eric. A Can of Worms? Implications of Rigorous Impact Evaluations for Development Agencies, International Initiative for Impact Evaluations, Working Paper 11, March 2011, p. 5.

(42) See It was originally expected by many that the F Bureau would eventually track all foreign assistance provided by U.S. agencies, not just State and USAID. As of 2012, some MCC data has been added to the Bureau's public database (, but there does not appear to be momentum toward any expansion of F Bureau authority.

(43) Beyond Success Stories, p. 14. The State Department traditionally has used a variety of resources for monitoring its foreign assistance programs, including Mission and Bureau Strategic Plans, annual performance and accountability reports, and Office of Inspector General and Government Accountability Office reports, but had no systematic evaluation process (Department of State Program Evaluation Plan, FY2007-2012 Department of State and USAID Strategic Plan, Bureau of Resource Management, May 2007, Appendix II).

(44) The data is publically available at

(45) Beyond Success Stories, p. 8.

(46) Beyond Foreign Assistance: The HELP Commission Report on Foreign Assistance Reform, The United States Commission on Helping to Enhance the Livelihood of People (HELP) Around the Globe Commission, December 7, 2007, p. 15.

(47) HELP Report, p. 99.

(48) QDDR, p. 110.

(49) See

(50) Millennium Challenge Corporation: Compacts in Cape Verde and Honduras Achieved Reduced Target, GAO-11-728, pp. 32-38.

(51) MCC's statement on the release, which summarizes the findings, is available at release/statement-102312-evaluations.

(52) Statements of various leaders in the development community with respect to the MCC evaluations are available at guidance-for-future-programs/.

(53) See comments of William Savedoff from the Center for Global Development at 2012/11/the-biggest-experiment-in-evaluation-mcc-and-systematic-learning.php.

(54) Trends in Development Evaluation Theory, Policies and Practices, USAID, 17 August 2009, p. 46.

(55) Trends in International Development Evaluation Theory, Policies and Practices; USAID, 17 August 2009, p. 13. The report was prepared for USAID by Molly Hageboeck of Management Systems International.

(56) All information in this text box is based on USAID/OTI's Integrated Governance Response Program in Colombia, A Final Evaluation, produced for USAID by Caroline Hartzell, Robert Lamb, Phillip McLean and Johanna Mendelson Forman, April 2011. Direct quotes, in order of appearance, are from pages 20 and 13.

(57) The Developmental Effectiveness of Untied Aid, OECD, p.1, available at 41537529.pdf.

(58) An Evaluation of USAID's Evaluation Function, p. 5.

(59) Beyond Success Stories, p. 16.

(60) Ibid.

(61) Ibid.

(62) Foreign aid data from FY2006-FY2012 estimates, sorted by recipient country, year, agency (only State, USAID and MCC), appropriations account, and objective is readily available through the "Foreign Assistance Dashboard" at

(63) Beyond Success Stories, p. 9.

(64) Millennium Challenge Corporation: Compacts in Cape Verde and Honduras Achieved Reduced Target, GAO-11 728, p. 33.

(65) The QDDR states that "we know that in many cases the outcome-level results are not solely attributable to U.S. government investments and activities; we will focus on outcome-level progress in locations and subsectors where the U.S. government is concentrating support." (QDDR 2010, p. 104).

(66) A Can of Worms, p. 8.; Beyond Success Stories, p. 17.

(67) Improving Lives Through Impact Evaluation, p. 15

(68) Evaluation of Recent USAID Evaluation Experiences, p. 26.

(69) S.Prt. 112-21, Evaluating U.S. Foreign Assistance to Afghanistan, June 8, 2011, p. 14.

(70) Millennium Challenge Corporation: Compacts in Cape Verde and Honduras Achieved Reduced Target, GAO-11 728, p. 33.

(71) Measuring Results of the Armenia Farmer Training Investment, October 23, 2012, p.4, available at

(72) From Aid to Trade: Delivering Result. A Cross-Country Evaluation of USAID Trade Capacity Building, prepared for USAID by Molly Hageboeck of Management Systems International, November 24, 2010; Executive Summary.

(73) Leading Through Civilian Power, U.S. Department of State, Quadrennial Diplomacy and Development Review, 2010, p. 95.

(74) United States Assistance to Balochistan Border Areas: Evaluation Report, Prepared by Management Systems International for USAID, January 16, 2012, p. vi.

(75) USAID/OTI's Integrated Governance Response Program in Colombia, Final Evaluation, prepared by Caroline Hartzell et al., April 2011, p. 7.

(76) Evaluation of Recent USAID Evaluation Experiences, p. 22.

(77) Ibid., p. 24.

(78) Ibid., pp. 26-27.

(79) Beyond Success Stories, p.iv.

(80) Evaluation of Recent USAID Evaluation Experiences, p. 27.

(81) This memo is discussed in the text box on page 2. See Use of Evidence and Evaluation in the FY2014 Budget, Memorandum to the Heads of Executive Departments and Agencies, Jeffrey d. Zients, Acting Director, Office of Management and Budget, May 18, 2012.

(82) Decision Tree for Selecting the Evaluation Design, USAID, June 2012, p. 1, available on USAID's Development Experience Clearinghouse website.

(83) Author's communication with State officials via e-mail, October 10, 2012.

(84) Policy for Monitoring and Evaluation of Compacts and Threshold Programs, MCC, May 1, 2012, p.18; Policy for Monitoring and Evaluation of Compacts and Threshold Programs, MCC, May 12, 2009, p. 17.

(85) See

(86) USAID Evaluation Policy: Year One, First Annual Report and Plan for 2012 and 2013, p. 2.

(87) Evaluation in Development Agencies, Better Aid, OECD Publishing, 2010, available at 9789264094857-en.

(88) When Will We Ever Learn?: Improving Lives Through Impact Evaluation, Report of the Evaluation Working Group, Center for Global Development, May 2006.
COPYRIGHT 2012 Congressional Research Service (CRS) Reports and Issue Briefs
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2012 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Lawson, Marian Leonardo
Publication:Congressional Research Service (CRS) Reports and Issue Briefs
Date:Nov 1, 2012
Previous Article:The Asia-Pacific Economic Cooperation (APEC) meetings in Vladivostok, Russia: postscript.
Next Article:The Trans-Pacific Partnership negotiations and issues for Congress.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters