Accuracy of Petroleum Supply Data.
Petroleum supply data collected by the Petroleum Division (PD) of the Energy Information Administration (EIA) showed a progression of the accuracy of the 1999 data from good, to better, to best, for initial estimates to final values. These data were presented in a series of PD publications: the Weekly Petroleum Status Report (WPSR), the Winter Fuels Report (WFR), the Petroleum Supply Monthly (PSM), and the Petroleum Supply Annual (PSA). Weekly estimates in the WPSR and WFR were the first values available.
Figure FE1 illustrates the progress in the accuracy from the weekly estimates to the interim monthly values to the final petroleum supply values. The monthly-from-weekly (MFW) data are the least accurate but "good." The PSM data are more accurate or "better" and the PSA data are the most accurate or "best." Although the comparison of 1999 MFW and PSM values to PSA values shows 1999 initial and interim data to be less accurate than 1998, these results may be a combination of less accurate initial and interim reporting and an outstanding effort by EIA to resolve reporting discrepancies for the PSA. For 1999, 66 petroleum supply data series were analyzed to determine how close the PSM values were to the final PSA values. For these series, 32 out of the 66 were within 1 percent of the PSA values in terms of mean absolute percent error as compared to 40 in 1998. Sixty-one petroleum supply data series were analyzed to see how close the MFW estimates were to the final PSA values. For these 61 series, 23 were within 2 percent of the PSA values in terms of mean absolute percent error and, of those, 9 were within 1 percent, compared to 21 and 8, respectively, for 1998.
[Figure FE1 ILLUSTRATION OMITTED]
Two major factors that contribute to the PSM values being more accurate than the MFW estimates are: (1) the greater length of time between the close of the reference period and the publication date of the PSM; and, (2) some MFW values are estimates whereas many PSM respondents extract their actual data from automated accounting systems. The greater length of time allows more in-depth review of the data by the respondents and EIA. Within 2 months of the close of a reference month, interim values are published in the PSM. The weekly data are more quickly available. The WPSR is available electronically 5 days after and in hardcopy 7 days after the close of the reference week (excluding holiday weeks). Propane data are available electronically and in the WPSR. About 5 months after the end of the reference year, final monthly values, reflecting any resubmissions, are published in the PSA. Historically, the weekly publications (WPSR and WFR) and the monthly publication (PSM) provided volumes of crude oil and petroleum products data at relatively increasing levels of accuracy. This article provides petroleum analysts with a measure of the degree to which, on average, estimates and interim values vary from their final values.
The Petroleum Supply Reporting System
The 15 surveys in the Petroleum Supply Reporting System (PSRS) track the supply and disposition of crude oil, petroleum products, and natural gas liquids in the United States. To maintain a database with historically accurate observations and current estimates from the petroleum industry, EIA administers three survey series: weekly, monthly, and annual.
The PSRS is organized into two data collection subsystems, the Weekly Petroleum Supply Reporting System (WPSRS) and the Monthly Petroleum Supply Reporting System (MPSRS). The WPSRS processes data from the five weekly surveys. In addition, the Form EIA-807, "Propane Telephone Survey," collects data weekly from October through March. The MPSRS includes eight monthly surveys, one annual survey, and the Form EIA-807 monthly data, which are collected from April through September.
Figure FE2 displays the petroleum supply and distribution system and indicates the points at which petroleum supply data are collected. Both weekly and monthly surveys are administered at five key points along the petroleum production and supply path: (1) refineries, (2) bulk terminals, (3) product pipelines, (4) crude oil stock holders, and (5) importers of crude oil and products.
[Figure FE2 ILLUSTRATION OMITTED]
Annual U.S. refinery capacity data are collected on the Form EIA-820, "Annual Refinery Report." These data were collected and published in Volume 1 of the PSA for 1999.
The Weekly Petroleum Supply Reporting System
The WPSRS contains the data collected from the five weekly surveys. Each weekly survey is distributed to a sample of the corresponding monthly survey's universe. In Figure FE2, the icons represent the target population of the monthly and weekly surveys of the PSRS. For example, the target population for the survey Forms EIA-801 and EIA-811 is bulk terminal stocks. Thus, the respondents to the Form EIA-801 are a sample of the respondents who report on Form EIA-811. For the weekly surveys, EIA aims for a minimum 90-percent multi-attribute-cutoff sample from the respondents to the corresponding monthly survey. In choosing the sample for each product, companies are ranked in descending order by volume. Respondents are chosen in order, down the list until the sample includes those companies contributing at least 90 percent of a variable's total volume. For example, for distillate fuel oil stocks, the weekly sample includes those respondents whose combined volumes of stocks for distillate fuel oil from refineries, bulk terminals, and pipelines constitute at least 90 percent of the total volume of distillate fuel oil stocks as reported in the corresponding monthly surveys.
With these surveys, EIA can provide timely, relatively accurate snapshots of the U.S. petroleum industry every week. The weekly surveys collect information on the supply and disposition of selected petroleum products and crude oil. The reference period for each weekly survey begins at 7:01 a.m. each Friday and ends at 7:00 a.m. the following Friday. Respondents report their data via telephone, facsimile, or EIA's electronic data collection software package, the Personal Computer Electronic Data Reporting Option (PEDRO). All respondents must submit their data by 5:00 p.m. on the Monday following the end of the reference period. During 2 working days, quality control procedures are executed. Cell values determined to be unusual or inconsistent with other cell values are flagged. The validity of the value of each flagged cell is investigated. Some flagged values are verified by the respondent to be correct; other flagged cells are corrected; and the remaining flagged values are referred to as unresolved. Nonrespondent and unresolved flagged data are imputed using an exponentially smoothed mean of the respondents' historical data.
Within 5 days of the close of the reference week, data are made available to the public on the EIA's internet web site (http://www.eia.doe.gov) and within 7 days in hardcopy (through the WPSR). Except when holidays delay data processing schedules, values for the weekly variables, with the exception of propane, are available via the internet at 9:00 a.m. on the Wednesday following the close of the reference week. Propane data are available via the internet at 4:00 p.m. on the same Wednesday. The hardcopy WPSR is distributed on the Friday morning following the close of the reference week.
The Monthly Petroleum Supply Reporting System
The reference period for the monthly surveys starts on the first day of the month at 12:01 a.m. and ends on the last day of the month at midnight. Except for the Form EIA-819M, the deadline for filing monthly surveys is the 20th calendar day following the end of the report month. Data collection for the Form EIA-819M begins on the seventh working day of the month. Form EIA-819M data are solicited by telephone or received by facsimile or electronic mail. Data for the other monthly surveys are reported via telephone, facsimile, or PEDRO.
During the period of data editing, either the respondent or EIA staff may identify an error. If the respondent discovers an error, the EIA representative for a particular survey is notified and the value is corrected. If EIA's edits diagnose an unusual value, an EIA representative will determine if the value is correct or incorrect by calling the company and/or reviewing historical data.
Within 60 days of the close of the reference month, all of the interim monthly data are published in the PSM and on the internet. In addition to the internet, beginning in March 1996, monthly data became available on EIA's CD-ROM called the Energy InfoDisc, which is released quarterly. Throughout the year, EIA accepts data revisions of monthly data. If a revision is made after the PSM has been published, it is referred to as a resubmission. Resubmissions for earlier months are published in Appendix C of the PSM and are reflected in the PSA.
Beginning with the February 1994 PSM, Table H1, "Petroleum Supply Summary" was included to show early estimates of monthly data. The current-month values in Table H1 are preliminary estimates based on weekly submissions. These monthly-from-weekly estimates become available in the WPSR and on the internet on the Wednesday following the first Friday of each month.
Within 5 months of the end of the calendar year, the final monthly values for the previous year are published in the PSA. These values reflect all PSM resubmissions and other data corrections. The values contained in the PSA are EIA's most accurate measures of petroleum supply industry activity.
Factors Affecting Data Accuracy
Maintaining an accurate database is a major goal of EIA. The quality of the data drives the quality of all qualitative and quantitative analyses conducted using these data. Accuracy and timeliness are primary attributes of high quality data. Accuracy of survey data is measured as the closeness of the published values to the true values (i.e., those values that would be obtained if the target population had been correctly surveyed and all the data had been precisely recorded).
Respondents to the monthly surveys have more time to file than the weekly respondents, enabling them to collect, review, and revise their data more carefully than the weekly respondents. Additionally, EIA has more time to edit the monthly data. Also, some weekly respondents report estimates while many monthly respondents extract actual data from accounting systems. Thus, the monthly data are more accurate.
Some sources of error, such as nonresponse, are not totally preventable. Other errors, such as sampling errors, are unique to a particular type of survey. One situation where sampling error occurs is if the group of sampled respondents is dissimilar to the full population. Within the PSRS, only weekly surveys, the Form EIA-819M, "Monthly Oxygenate Telephone Report," and the Form EIA-807, "Propane Telephone Survey," are at risk of having sampling errors. However, all surveys in the PSRS are at risk for nonsampling errors, such as: (1) insufficient coverage of respondents (the survey frame does not include all members of the target population); (2) nonresponse; (3) response error; and (4) internal processing errors such as incorrect data entry. A detailed discussion of factors influencing data accuracy and how they are minimized in the PSRS follows.
Samples and Sampling Error
A sample is a subsection of a universe identifying members of a target population. The weekly surveys are administered to samples of the monthly populations to reduce respondent burden and to expedite the turnaround of data from survey respondents to the public. As with any sample, the values obtained are different from those obtained if the full universe had been surveyed. Sampling error is the difference between a sample estimate and a population value.
There are five samples, one for each weekly petroleum supply survey, in the WPSRS. For these surveys, the sampling error is minimized by using a minimum 90-percent multi-attribute-cutoff sample from the corresponding monthly survey's frame. At the end of each month, updates are made to the samples and survey frames if a 90-percent coverage was not obtained.
For the weekly surveys, better coverage will most likely reduce sampling error. As shown in Table FE1, 1999 coverage was comparable to 1998. All but one of the 21 product and supply type combinations had coverage of 90 percent or above in 1999. For 13 of the 21 combinations, 1999 coverage increased from 1998. Tabulations were done before rounding of the coverage values. The largest percentage increase from 1998 to 1999 was for residual fuel oil imports, from 91 to 94 percent. Jet fuel imports display the largest percentage decrease from 1998 to 1999, from 97 to 71 percent, because of noncompliance of a large respondent.
Table FE1. Average Coverage for Weekly Surveys, 1999 and 1998 (Percent of Final Monthly Volumes Included in Monthly-from-Weekly Sample)
Stocks Refi- Bulk Pipe- nery Ter- line minal Product 1999 1998 1999 1998 1999 1998 Total Motor Gasoline 97 98 93 92 97 97 Jet Fuel 98 98 92 91 100 99 Distillate Fuel Oil 97 97 90 88 99 98 Residual Fuel Oil 97 96 90 90 -- -- Crude Oil 96 96 -- -- -- -- Production Imports Product 1999 1998 1999 1998 Total Motor Gasoline 99 99 98 98 Jet Fuel 99 99 71 97 Distillate Fuel Oil 97 97 93 93 Residual Fuel Oil 95 95 94 91 Crude Oil -- -- 94 94
-- = Not Applicable.
Source: Energy Information Administration, Petroleum Supply Reporting System.
Unlike sampling errors, all survey data, even those from a census survey, are at risk of incurring nonsampling errors. There are two categories of nonsampling errors, random and systematic. With random error, on average, and over time, values will be overestimated by the same amount they are underestimated. Therefore, over time, random errors do not bias the data, but they will give an inaccurate portrayal at any point in time. On the other hand, systematic error is a source of bias in the data, since these patterns of errors are made repeatedly. The following is a discussion of how the four most frequently occurring types of nonsampling error are minimized within the PSRS.
The list of all companies identified as members of the target population is called a frame. If members of the target population are not included in the frame, there is an undercount of the aggregate data. To diminish the chance of undercounting, the PSRS frames are continually updated. New companies are identified through continual review of petroleum industry periodicals, newspaper articles, and correspondence from respondents. During the frames update, each frame is scrutinized to assure completeness.
Maintaining a Low Nonresponse
Survey respondents are required by law to report to EIA (see Explanatory Note 6 of the PSM for a description of action for chronic nonresponse). The 1999 response rates for the weekly surveys and their corresponding monthly surveys are enumerated in Table FE2. All but one of the 1999 response rates for each of the EIA weekly and monthly surveys decreased from 1998. The response rate for the monthly refinery survey increased from 96.3 percent in 1998 to 97.1 percent in 1999. Budget cuts at respondent companies had a negative effect on response rates. Company mergers and changes in company reporting systems have also contributed to lower response rates.
Table FE2. Average Response Rates for Monthly and Weekly Surveys, 1999
Respondents to Monthly Surveys Average Average Survey Site Universe Number of Percent(1) Size Respondents Refinery 248 241 97.1 Bulk Terminal 292 278 95.2 Pipeline 81 78 96.0 Crude Oil Stocks 170 164 96.4 Respondents to Weekly Surveys Average Average Survey Site Weekly Number of Percent(2) Sample Size Respondents Refinery 181 169 93.7 Bulk Terminal 70 63 89.7 Pipeline 44 40 92.3 Crude Oil Stocks 77 72 93.8
(1) The average response rates for monthly surveys are calculated by summing the individual monthly response rates and dividing by 12.
(2) The average response rates for weekly surveys are calculated by summing the individual weekly response rates and dividing by 52.
Note: Percents are calculated before rounding.
Source: Energy Information Administration, Petroleum Supply Reporting System.
To mitigate the effect of nonresponse, imputed values are calculated for all nonreported values except monthly imports. Weekly imputed values are the exponentially smoothed mean of that respondent's historical values for that variable. Monthly imputed values are the previous month's value for the particular respondent and variable. For imports, however, there is a great deal of fluctuation from one reference period to another, with respondents frequently having no imports of a particular product. As a result, zero is the value imputed for nonreported cells on the monthly survey. In addition, the monthly imports are collected and published at a much greater level of detail than the weekly imports, which makes imputation impractical.
Reducing Response Error
Improvements to the PSRS system are continuously being made to reduce response error. To satisfy customer needs and meet the particular requirements of some respondents, computerized spreadsheets that resemble the actual survey forms have been developed, and are available for respondent reporting. Another improvement has been the increased participation in the PEDRO system, which permits all weekly and monthly survey data except the Form EIA-819M and Form EIA-807 to be submitted to EIA electronically. A respondent entering values via PEDRO may execute edit routines prior to transmission of the survey responses. These routines include consistency and outlier (extreme value) checks of the data. Unusual or nonreported cells are flagged and, prior to transmission of the data, a representative of the company is able to review and verify or correct data in the flagged cells.
Even with sophisticated edit checks, response error (the difference between the reported value and the actual value) remains the most likely cause of data inaccuracy. The weekly surveys are more susceptible to response error since some of their values are estimates. Many monthly respondents abstract their actual data from accounting systems and thus are generally more accurate.
Maintaining accurate accounting records, however, does not ensure against response error. For example, numbers can be transposed within the correct cell; an otherwise correct value may be entered in the wrong cell; a respondent may misinterpret the intent of a question; or the wrong units may be used.
The terms, layout, and definitions on all survey forms are periodically reviewed for completeness, clarity, and consistency across surveys. At regular intervals, survey intent, as well as what data are collected, are subject to industry and government review. To the extent possible, industry changes in terminology and practice are incorporated into the PSRS on an ongoing basis.
Each of the variables included in these analyses is of current and historical interest. Of the 66 variables for which both PSM and PSA values were published, only 61 of them were published weekly throughout 1999. For each variable, six measures of accuracy were calculated to compare the differences between the MFW and PSM values relative to the PSA values.
* Error is the difference between the estimate or interim value and the final value for a given month. For inputs, production, stock change, imports, exports, and product supplied, values are expressed in units of thousands of barrels per day. For stocks, values are expressed in units of thousands of barrels.
MFW Error = MFW Volume - PSA Volume
PSM Error = PSM Volume - PSA Volume
* Percent Error is the error for a given month divided by the final value for a given month, and multiplied by 100.
MFW Percent Error = MFW Error / PSA Volume x 100
PSM Percent Error = PSM Error / PSA Volume x 100
* Mean absolute error is the weighted average over the 12 months of the year of the absolute values of the errors for each month. The mean absolute error measures the average magnitude of the revisions that took place over a year. Outliers increase the mean absolute error. The number of days in the month is used for weighting all product categories except stocks. Stocks are weighted equally for each of the 12 months.
* Mean absolute percent error is the weighted average over the 12 months of the year of the absolute values of the percent errors. It provides a measure of the average magnitude of the revisions relative to final values. The mean absolute percent error has an inverse relationship with data accuracy; i.e., the smaller the mean absolute error, the closer the interim data are to the final data; conversely, the larger the mean absolute percent error, the greater the difference in the interim value and the final value. Outliers inflate the mean absolute percent error.
* Range is the difference between the smallest and largest percent errors. The range shows the dispersion of the percent differences between interim and final values.
* Median of the percent errors is the point at which half the values are higher and half are lower. Unlike the mean, the median is not affected by an outlier. In these analyses, each distribution has 12 observations. The median is the average of the sixth and seventh ordered observation.
The average final absolute volumes and the mean absolute percent error for MFW estimates and PSM interim values for 1999 and 1998 are presented in Table FE3. The average final absolute volumes are presented to give the reader an idea of the magnitude of these volumes. Variables with very small volumes are prone to larger percent changes because a modest volume change is being compared to a small final volume. The mean absolute error and the size of the volumes involved must both be included in the interpretation of data accuracy.
Table FE3. Summary Statistics for Differences Between Interim and Final Data, 1999 and 1998
PSA Average Absolute Variable Volumes 1999 1998 Crude Oil Production (thousand barrels/day) 5,881 6,252 Refinery Operations Refinery Crude Oil Inputs (thousand barrels/day) 14,804 14,889 Operating Utilization Rate (percent) 93 96 Production (thousand barrels/day) Total Production 19,215 19,170 Refinery Production 16,990 17,030 Finished Motor Gasoline 8,111 8,082 Reformulated Motor Gasoline 2,564 2,483 Oxygenated Motor Gasoline 673 667 Other Motor Gasoline 4,874 4,932 Jet Fuel 1,565 1,526 Distillate Fuel Oil 3,399 3,424 Low Sulfur Distillate Fuel Oil 2,307 2,230 High Sulfur Distillate Fuel Oil 1,092 1,194 Residual Fuel Oil 698 762 Other Products 5,441 5,376 Propane 1,097 1,063 Other Products Refinery Production 3,392 3,427 Stocks (thousand barrels) Total Stocks 1,612,511 1,632,759 Total Stocks, excl. SPR 1,039,897 1,068,193 Total Crude Stocks 893,900 895,328 Crude Oil Stocks, excl. SPR 321,286 330,762 SPR Stocks 572,614 564,566 Refined Products Stocks 718,611 737,431 Total Motor Gasoline Stocks 212,696 214,782 Reformulated Motor Gasoline Stocks 42,986 44,089 Oxygenated Motor Gasoline Stocks 1,329 1,028 Other Motor Gasoline Stocks 123,913 124,574 Jet Fuel Stocks 44,915 43,829 Distillate Fuel Oil Stocks 135,555 140,800 Low Sulfur Distillate Fuel Oil Stocks 70,407 69,430 High Sulfur Distillate Fuel Oil Stocks 65,147 71,369 Residual Fuel Oil Stocks 40,789 40,483 Other Products Stocks 284,657 297,539 Propane Stocks 49,631 56,227 Fuel Ethanol Stocks 4,397 3,278 Methyl Tertiary Butyl Ether Stocks 8,567 8,941 Stock Change (thousand barrels/day) Total Stock Change 628 492 Crude Stock Change 274 379 Refined Products Stock Change 547 405 Imports (thousand barrels/day) Total Imports 10,852 10,708 Total Crude Imports 8,722 8,706 Crude Oil Imports, excl. SPR 8,730 8,706 SPR Imports 0 0 Refined Products Imports 2,122 2,002 Finished Motor Gasoline Imports 382 311 Reformulated Motor Gasoline Imports 190 179 Oxygenated Motor Gasoline Imports 0 0 Other Motor Gasoline Imports 191 132 Jet Fuel Imports 128 124 Distillate Fuel Oil Imports 250 210 Low Sulfur Distillate Fuel Oil Imports 141 119 High Sulfur Distillate Fuel Oil Imports 110 91 Residual Fuel Oil Imports 237 275 Other Products Imports 1,125 1,082 Propane Imports 122 138 Exports (thousand barrels/day) Total Exports 940 945 Crude Oil Exports 118 110 Refined Products Exports 822 835 Total Net Imports (thousand barrels/day) 9,912 9,764 Products Supplied (thousand barrels/day) Total Products Supplied 19,519 18,917 Finished Motor Gasoline Supplied 8,431 8,253 Jet Fuel Supplied 1,673 1,622 Distillate Fuel Oil Supplied 3,572 3,461 Residual Fuel Oil Supplied 830 887 Other Products Supplied 5,014 4,693 Propane Supplied 1,246 1,120 Monthly-from-Weekly Mean Absolute Variable Percent Error 1999 1998 Crude Oil Production (thousand barrels/day) (*) 1.54 2.81 Refinery Operations Refinery Crude Oil Inputs (thousand barrels/day) (*) 0.76 0.51 Operating Utilization Rate (percent) (*) 1.51 0.61 Production (thousand barrels/day) Total Production -- -- Refinery Production (*) 1.45 1.43 Finished Motor Gasoline (*) 1.77 0.87 Reformulated Motor Gasoline (*) 1.80 1.52 Oxygenated Motor Gasoline (**) 14.87 15.66 Other Motor Gasoline 2.42 2.26 Jet Fuel (*) 0.84 1.28 Distillate Fuel Oil (*) 1.11 1.76 Low Sulfur Distillate Fuel Oil (*) 1.31 1.92 High Sulfur Distillate Fuel Oil 3.41 3.06 Residual Fuel Oil 4.00 3.70 Other Products -- -- Propane -- -- Other Products Refinery Production 8.65 7.62 Stocks (thousand barrels) Total Stocks (*) 0.58 0.74 Total Stocks, excl. SPR (*) 0.86 1.07 Total Crude Stocks (*) 0.31 0.69 Crude Oil Stocks, excl. SPR (*) 0.77 1.72 SPR Stocks (*) 0.12 0.12 Refined Products Stocks (*) 1.07 1.76 Total Motor Gasoline Stocks (*) 1.60 1.13 Reformulated Motor Gasoline Stocks 3.60 1.65 Oxygenated Motor Gasoline Stocks (**) 23.72 27.21 Other Motor Gasoline Stocks (*) 1.99 2.29 Jet Fuel Stocks 2.36 2.24 Distillate Fuel Oil Stocks (*) 1.50 2.16 Low Sulfur Distillate Fuel Oil Stocks 2.37 2.34 High Sulfur Distillate Fuel Oil Stocks (*) 1.77 2.53 Residual Fuel Oil Stocks 3.41 2.06 Other Products Stocks 3.24 2.24 Propane Stocks (*) 1.99 2.79 Fuel Ethanol Stocks 5.85 13.72 Methyl Tertiary Butyl Ether Stocks 3.96 5.35 Stock Change (thousand barrels/day) Total Stock Change (**) 88.31 178.96 Crude Stock Change (**) 90.69 135.90 Refined Products Stock Change (**) 210.62 162.62 Imports (thousand barrels/day) Total Imports 3.52 3.65 Total Crude Imports 2.40 2.92 Crude Oil Imports, excl. SPR 2.45 2.92 SPR Imports (*) 0.00 0.00 Refined Products Imports (**) 11.04 9.25 Finished Motor Gasoline Imports (**) 12.73 9.57 Reformulated Motor Gasoline Imports (**) 13.04 14.98 Oxygenated Motor Gasoline Imports (*) 0.00 0.00 Other Motor Gasoline Imports (**) 14.95 9.82 Jet Fuel Imports (**) 26.50 37.74 Distillate Fuel Oil Imports (**) 18.71 6.27 Low Sulfur Distillate Fuel Oil Imports (**) 24.18 18.84 High Sulfur Distillate Fuel Oil Imports (**) 17.95 21.79 Residual Fuel Oil Imports (**) 16.64 19.68 Other Products Imports (**) 12.59 6.55 Propane Imports -- -- Exports (thousand barrels/day) Total Exports (**) 10.94 12.71 Crude Oil Exports (**) 49.80 54.75 Refined Products Exports (**) 11.36 10.23 Total Net Imports (thousand barrels/day) 4.05 4.06 Products Supplied (thousand barrels/day) Total Products Supplied 2.16 1.29 Finished Motor Gasoline Supplied 1.98 1.10 Jet Fuel Supplied 2.40 4.13 Distillate Fuel Oil Supplied 2.80 1.86 Residual Fuel Oil Supplied 6.88 9.01 Other Products Supplied 6.82 2.63 Propane Supplied -- -- PSM Mean Absolute Variable Percent Error 1999 1998 Crude Oil Production (thousand barrels/day) 1.33 1.43 Refinery Operations Refinery Crude Oil Inputs (thousand barrels/day) (*) 0.16 0.36 Operating Utilization Rate (percent) (*) 0.27 0.33 Production (thousand barrels/day) Total Production (*) 0.40 0.48 Refinery Production (*) 0.39 0.46 Finished Motor Gasoline (*) 0.49 0.54 Reformulated Motor Gasoline (*) 0.62 0.72 Oxygenated Motor Gasoline 3.87 6.82 Other Motor Gasoline (*) 0.72 0.88 Jet Fuel (*) 0.23 0.47 Distillate Fuel Oil (*) 0.31 0.31 Low Sulfur Distillate Fuel Oil (*) 0.48 0.53 High Sulfur Distillate Fuel Oil (*) 0.74 0.71 Residual Fuel Oil (*) 0.41 0.64 Other Products (*) 1.07 0.88 Propane (*) 0.82 0.89 Other Products Refinery Production (*) 0.63 0.84 Stocks (thousand barrels) Total Stocks (*) 0.40 0.15 Total Stocks, excl. SPR (*) 0.62 0.23 Total Crude Stocks (*) 0.38 0.20 Crude Oil Stocks, excl. SPR 1.02 0.55 SPR Stocks (*) 0.00 0.00 Refined Products Stocks (*) 0.54 0.12 Total Motor Gasoline Stocks (*) 0.80 0.21 Reformulated Motor Gasoline Stocks 2.21 0.74 Oxygenated Motor Gasoline Stocks 8.68 0.19 Other Motor Gasoline Stocks (*) 0.91 0.25 Jet Fuel Stocks 2.24 0.20 Distillate Fuel Oil Stocks 1.03 0.47 Low Sulfur Distillate Fuel Oil Stocks (*) 0.90 0.99 High Sulfur Distillate Fuel Oil Stocks 1.22 0.41 Residual Fuel Oil Stocks 1.13 0.59 Other Products Stocks (*) 0.38 0.21 Propane Stocks (*) 0.59 0.53 Fuel Ethanol Stocks 1.97 8.69 Methyl Tertiary Butyl Ether Stocks 2.44 0.86 Stock Change (thousand barrels/day) Total Stock Change (**) 47.65 41.10 Crude Stock Change (**) 49.09 66.12 Refined Products Stock Change (**) 32.63 17.46 Imports (thousand barrels/day) Total Imports 2.72 3.03 Total Crude Imports 1.62 1.80 Crude Oil Imports, excl. SPR 1.65 1.80 SPR Imports (*) 0.00 0.00 Refined Products Imports 7.18 8.46 Finished Motor Gasoline Imports 6.23 4.33 Reformulated Motor Gasoline Imports 2.98 7.59 Oxygenated Motor Gasoline Imports (*) 0.00 0.00 Other Motor Gasoline Imports 9.41 7.57 Jet Fuel Imports 4.87 35.02 Distillate Fuel Oil Imports (**) 14.59 7.28 Low Sulfur Distillate Fuel Oil Imports (**) 21.24 6.13 High Sulfur Distillate Fuel Oil Imports 6.84 10.59 Residual Fuel Oil Imports 8.68 25.64 Other Products Imports 7.31 2.64 Propane Imports 5.49 0.20 Exports (thousand barrels/day) Total Exports (*) 0.00 1.35 Crude Oil Exports (*) 0.00 0.00 Refined Products Exports (*) 0.00 1.62 Total Net Imports (thousand barrels/day) 2.97 3.18 Products Supplied (thousand barrels/day) Total Products Supplied (*) 0.84 1.24 Finished Motor Gasoline Supplied (*) 0.85 0.79 Jet Fuel Supplied 1.22 3.10 Distillate Fuel Oil Supplied 1.63 0.84 Residual Fuel Oil Supplied 2.43 7.80 Other Products Supplied 1.48 0.99 Propane Supplied (*) 0.98 1.30
= Not Applicable.
(*) = For MFW values, mean absolute percent error less than or equal to 2; for PSM values, mean absolute percent error less than or equal to 1.
(**) = Mean absolute percent error greater than or equal to 10.
SPR = Strategic Petroleum Reserve
Notes: * Error is the difference between Monthly-from-Weekly estimates or interim monthly data published in the Petroleum Supply Monthly and the final value as published in the Petroleum Supply Annual. Percent error is the error multiplied by 100 and divided by the final published value. Mean absolute error is the weighted average of the absolute errors. Mean absolute percent error is the weighted average of the absolute percent errors. The number of days in the month is used for weighting all product categories except stocks. Stocks are weighted equally for each of the 12 months * Totals may not equal sum of components due to independent rounding.
Source: Energy Information Administration, Petroleum Supply Reporting System.
The 1999 MFW mean absolute percent errors which were within 2 percent of their respective PSA values (23 of the 61 MFW series), and the 1999 PSM mean absolute percent errors which were within 1 percent of their PSA values (32 of the 66 PSM series), are distinguished by a single asterisk. Mean absolute percent errors that were greater than 10 percent are marked by a double asterisk. There were 18 such MFW series and 5 PSM series, compared to 14 and 6, respectively, for 1998.
For 1999, 7 of the 11 weekly production series increased in mean absolute percent error from 1998. Twelve of the 14 production series have a single asterisk in the PSM column, indicating a mean absolute percent error of less than 1 percent from the PSA. Additionally, 12 of the 14 PSM production series in 1999 show a decrease in mean absolute percent error from 1998. Weekly fuel ethanol supply and disposition data are not available; therefore, the weekly oxygenated motor gasoline field production is based on the latest available monthly value.
The single asterisks in Table FE3 by the stock series show that, as in prior years, the stock values for both MFW estimates and PSM interim values are very close to the final PSA values. A major exception is the double asterisk shown by the MFW percent error for oxygenated motor gasoline stocks. The increase is related to the average absolute volume. Fuel ethanol and methyl tertiary butyl ether stocks are not collected weekly, but are collected on the Form EIA-819M, "Monthly Oxygenate Telephone Report." The survey provides production data and preliminary stock data from a sample of respondents reporting on the monthly surveys and from the universe of oxygenate producers. These data are displayed in Appendix D of the PSM. Interim data are collected later on the monthly surveys and published in the PSM. Thirteen of the 19 weekly stock series decreased in mean absolute percent error from 1998. For the monthly stock series, all but two increased in mean absolute percent error from 1998.
Stock change is the difference between stocks at the beginning of the month and stocks at the end of the month. Since the monthly change in stock levels is small compared to the stock levels themselves, a large percent error in stock change can occur even when the percent errors in stock levels are small.
Crude oil stock change is one of the components in the calculation of unaccounted for crude oil (calculated disposition minus calculated supply of crude oil). For both the MFW and the PSM numbers, the volume of the unaccounted for crude oil may be increased by a combination of factors including an understatement of imports, an overstatement of exports, an understatement of crude oil production, an understatement of stock withdrawals, and an overstatement of crude oil inputs. The overstatement of crude oil inputs can be caused by injections along crude oil pipelines of natural gas liquids. When refiners receive this mixture, they process it as crude oil. As seen in Table FE3, the production, imports, and refinery inputs of crude oil have a small mean absolute percent error relative to crude oil stock change.
For petroleum products, stock change is a component in the calculation of product supplied (representing the consumption of petroleum products). Unlike the other variables, stock change values can be negative. Stock change thus has an added dimension by which to evaluate accuracy; this is the correctness of the direction of the change. Table FE4 provides a measure of accuracy of the direction of MFW and PSM stock change values for 1999 and 1998. Four out of the six stock change values for 1999 decreased the number of months that differed from the direction of the PSA values. All of the PSM stock change values were of the same direction as the PSA values.
Table FE4. Number of Months In Which the Direction of Non-Final Stock Change Values Differed From PSA
Number of Months 1999 1998 Total Stock Change MFW and PSA Values 1 3 PSM and PSA Values 0 1 Crude Stock Change MFW and PSA Values 2 2 PSM and PSA Values 0 1 Refined Products Stock Change MFW and PSA Values 0 1 PSM and PSA Values 0 0
Source: Energy Information Administration, Petroleum Supply Reporting System.
For imports, one reason for the large mean absolute percent errors in the MFW values is that shipments do not always arrive during the week in which they were expected. This has a greater impact when the end of the month occurs in the middle of the week. Six of the 15 MFW import series in Table FE3 showed an increase in mean absolute percent error from 1998 to 1999 compared to last year's increase of ten series from 1997 to 1998. For the PSM, six of the 16 import series increased in mean absolute percent error compared to last year's increase of 11 import series.
With the exception of refinery receipts in the Virgin Islands, EIA does not collect export data. They are gathered by the U.S. Customs Service on a monthly basis and are compiled by the U.S. Bureau of the Census. They are received by EIA on a monthly basis approximately 7 weeks after the close of the reporting month. The weekly estimates for exports are projections based on past monthly data. Because the export data are highly variable, it is difficult to obtain estimates of comparable quality to domestic estimates.
Products supplied is the calculation of field production, plus refinery production, plus imports, plus unaccounted for crude oil, minus stock change, minus crude oil losses, minus refinery inputs, minus exports. Therefore, the accuracy of products supplied is affected by the individual components.
Box and Whisker Plots
Example 1 in the shaded box titled "Structure of Box and Whisker Plots," is a simplified illustration of the box and whisker plots that follow. The box and whisker plots map the 5-year trends in historical accuracy of weekly estimates and monthly interim values. The details provided by the box and whisker plots include: historical trends, the range of monthly percent errors, direction of the error (i.e., overestimation or underestimation), and the identification of unusual values.
Each box and whisker plot is placed on a graph, where the horizontal axis represents the year and the vertical axis represents the percent error. The center horizontal line for all the box and whisker plots is zero percent error. For each variable studied, a pair of charts, each containing five box and whisker plots (one for each year, from 1995 through 1999), are presented side-by-side; the chart on the left contains the percent errors for the MFW estimates, and the chart on the right contains the percent errors for the PSM values. To facilitate the comparison of MFW percent errors and the PSM percent errors, the plots have the same scale.
The position of the box along the y-axis denotes whether the MFW or PSM values are predominantly overestimates or underestimates of the PSA values. For example, if the majority of the MFW values were overestimates, more than half of the box would be above the zero percent error line.
Crude Oil Production and Crude Oil Inputs
Crude oil production data are not collected through any of EIA's surveys. EIA's Dallas Field Office assembles data collected from State agencies responsible for measuring crude oil production. Based on historical trends and data reported on Form EIA-182, "Domestic Crude Oil First Purchase Report," EIA estimates weekly and monthly production. Final estimates based on revised Form EIA-182 data, State government agencies, and U.S. Department of Interior, Minerals Management Service data, are published in the PSA. Figure FE3 presents errors of MFW and PSM values relative to PSA values for crude oil production and crude oil inputs. Compared to the 1998 distribution of MFW percent errors for crude oil production, the 1999 MFW values were closer to the final PSA values. The smaller range indicates that the MFW estimates are getting back on track, similar to the years prior to 1998. Additionally, the range (3.96) of the 1999 PSM percent errors, from - 1.16 to 2.80 percent, was smaller than the range (6.58) for 1998.
[Figure FE3 ILLUSTRATION OMITTED]
For refinery crude oil inputs, the range (1.95) of the 1999 MFW percent errors was the largest range for the 5 years studied but was the smallest of all other MFW plots analyzed for 1999. February 1999 (1.24) had the largest percent error over the 60-month period. The outlier in March (-0.71) was the only MFW value to underestimate the PSA value and was due to company misreporting. Most of the 1999 PSM refinery crude oil inputs were extremely close to their final values except for the outliers in January, February, and April due to respondent reporting problems.
As expected, PSM interim values for production of each of the four major petroleum products were superior to their comparable MFW estimates. Figures FE4 and FE5 contain the box and whisker plots for motor gasoline and distillate fuel oil production, and residual fuel oil and jet fuel production, respectively.
[Figures FE4-FE5 ILLUSTRATION OMITTED]
The 1999 MFW motor gasoline production percent errors, displayed in Figure FE4, had the largest range (5.81) over the 5-year period. February 1999 (3.62) had the largest percent error over the 60 months studied. The 1999 PSM percent errors for motor gasoline production were within 1.49 percent and displayed a tight distribution about the median of -0.14 percent.
The range (4.59) of the 1999 MFW percent errors for distillate fuel oil production was the largest over the 5-year period, ranging from -1.10 to 3.49 percent. As in prior years, the distribution of the 1999 PSM percent errors was tightly grouped around the median. There was one outlier in April (-0.38). Distillate fuel oil production percent errors had the smallest range of all other PSM plots analyzed for 1999.
The box and whisker plots for residual fuel oil production and jet fuel production are shown in Figure FE5. The range of the 1999 MFW percent errors for residual fuel oil production was similar to the prior years but more of the 1999 MFW estimates overestimated the final PSA values. Most of the 1999 PSM percent errors were tightly distributed about the median except for the outliers in February (2.75) and July (-0.54).
In contrast to prior years, most of the 1999 MFW jet fuel production estimates underestimated the final values. The only negative median (-0.42) occurred in 1999. The 1999 PSM percent errors for jet fuel production were within 0.59 percent even though there were three outliers occurring in January, February, and November. In general, the outliers for product production resulted from computer problems at reporting companies, and disruptions caused by company mergers.
Figures FE6, FE7, and FE8 show the yearly distribution of percent errors for stocks of crude oil, motor gasoline, distillate fuel oil, residual fuel oil, jet fuel, and propane. Figure FE6 shows the box and whisker plots for crude oil stocks and motor gasoline stocks. The 1999 MFW percent errors for crude oil stocks had the smallest median (-0.03) over the 5-year period. Outliers occurred in May, July, and December due to company misreporting. All but one of the 1999 PSM interim values underestimated the final PSA values. The 1999 range (2.97) was the largest over the 5-year period and March 1999 (-2.60) had the largest absolute percent error over the past 60 months.
[Figure FE6-FE8 ILLUSTRATION OMITTED]
Similar to 1998, all but one of the 1999 MFW estimates for motor gasoline stocks were underestimated. The 1999 range (5.40) had the largest over the 5-year period and November (-5.08) had the largest absolute percent error over the 60 months studied. Similarly, the 1999 PSM interim values underestimated the final PSA values. Over the 5-year period, 1999 had the largest range (2.05) and over the past 60 months, November 1999 (-1.72) had the largest absolute percent error.
Figure FE7 shows box and whisker plots for distillate and residual fuel oil stocks. The 1999 range (7.25) of MFW percent errors for distillate fuel oil stocks was the largest over the 5-year period and November 1999 (-5.34) was the largest absolute percent error over the 60 months studied. Two outliers in November and December were due to company misreporting. Similarly, the 1999 range (4.59) for the PSM percent errors was the largest over the past 5 years and the largest percent error over the past 60 months occurred in January 1999 (3.51). This outlier was due to respondent reporting problems.
Residual fuel oil stocks typically have larger percent errors than other stock series. Similar to prior years, most of the 1999 MFW values were underestimates. The 1999 median (-3.74) was the largest absolute percent error over the 5-year period and September 1999 (-6.08) had the largest absolute percent error over the past 60 months. The 1999 range (4.79) of PSM percent errors was the largest over the 5-year period and August 1999 (-4.20) had the largest absolute percent error over the 60 months studied.
The box and whisker plots for jet fuel stocks and propane stocks are shown in Figure FE8. Similar to prior years, most of the 1999 MFW estimates for jet fuel stocks underestimated the final PSA values. May 1999 (-5.59) had the largest absolute percent error over the 60-month period. In contrast to prior years, the 1999 range (5.09) of PSM percent errors was the largest, ranging from -4.03 to 1.06 percent. July 1999 (-4.03) had the largest absolute percent error over the 60 months studied. Similar to the MFW estimates, most of the PSM interim values for jet fuel stocks underestimated the final values.
The 1999 MFW percent errors for propane stocks were distributed consistently about the median of -0.40 percent. One outlier in February 1999 (-6.03) was due to company misreporting. As in prior years, the 1999 PSM interim values were close to their final PSA values. The 1999 range (2.78) of percent errors was the largest in the 5-year period, ranging from -1.47 to 1.31 percent.
Figures FE9, FE10, and FE11 show the yearly distributions of percent errors for the imports of crude oil and four products: motor gasoline, distillate fuel oil, residual fuel oil, and jet fuel. Because of the irregularity of imports for crude oil and petroleum products, the magnitude and range of percent errors for both the MFW and the PSM imports numbers can be expected to be much larger and wider than for production and stocks.
[Figures FE9-FE11 ILLUSTRATION OMITTED]
Figure FE9 shows that the majority of the 1999 MFW estimates of crude oil imports underestimated the final PSA values. One outlier in April (-8.01) was due to company misreporting. All but one of the PSM interim values underestimated the final PSA values.
The distributions of percent errors of the MFW estimates and PSM interim values for 1995 through 1999 of motor gasoline and distillate fuel oil imports are shown in Figure FE10. The 1999 MFW median (-7.27) for motor gasoline imports was the largest absolute percent error over the past 5 years. Similar to 1998, most of the 1999 PSM interim values for motor gasoline imports were underestimates.
As in prior years, most of the 1999 MFW estimates for distillate fuel oil imports were underestimated. The 1999 median of -17.02 percent was the largest absolute percent error over the 5-year period. All but one of the 1999 PSM interim values underestimated the final PSA values. The 1999 range (27.31) was the largest for the 5 years analyzed. November 1999 (-27.31) had the largest absolute percent error over the past 60 months.
Figure FE11 shows the box and whisker plots for residual fuel oil imports and jet fuel imports. For residual fuel oil imports, the 1999 ranges of the MFW and PSM percent errors were the largest over the 5-year period and were the largest of all other MFW and PSM plots analyzed in 1999, 83.61 and 40.82 percent, respectively. In addition, the MFW percent error for October 1999 (60.11) was the largest over the 60 months studied resulting as an outlier due to company misclassification of products.
The 1999 range (83.50) of MFW percent errors for jet fuel imports was the largest over the 5-year period, ranging from -35.81 to 47.69 percent. In contrast to 1998, there were not as many resubmissions of PSM interim values in 1999 and the median (-2.00) was closer to zero.
In summary, similar to previous years, the interim PSM data were closer in value to the final PSA volumes than the MFW estimates. This is largely a result of the longer time period provided to process the monthly data and monthly respondents' accounting systems.
In 1999, 32 of 66 PSM interim values were within 1 percent (mean absolute percent error) of the final values; 23 of 61 MFW estimates were within 2 percent (mean absolute percent error) of the final values; and 9 of those 23 were within 1 percent. As in previous years, the accuracy of 1999 preliminary and interim values varied by product and by petroleum supply type. As a group, stocks continued to have the most accurate MFW estimates and PSM interim values.
The good coverage for weekly surveys across petroleum supply type and product combinations has contributed to the accuracy of weekly estimates. In 1999, for 20 of the 21 categories, coverage was 90 percent or above. The decreases in response rates from 1998 for the weekly and monthly surveys were the result of budget cuts at the respondent companies, company mergers, and new company accounting systems that initially made reporting difficult. These factors may have contributed to a decline in the accuracy of these data.
To successfully maintain and improve the accuracy of these data, the PD is participating in several initiatives including the expansion and diligence of the nonresponse follow-up team; the growth of customer outreach by developing industry brochures and improving the petroleum information retrieval on the EIA web site, including many new user-friendly information retrieval options; increased efforts to insure compliance with reporting requirements; the initiation of a total survey design project that will be researching forms design by identifying problem areas, conducting expert reviews, and performing concept testing with industry; reviewing the standard name and address file and master frame file; and researching and improving process flow. The PD is also looking at other government agencies and private industry for best practices in the field of data collection and processing systems with the goal of developing a new and improved system that will upgrade and unify legacy systems by incorporating state-of-the-art technology. Other efforts to improve accuracy include continuously assessing and improving PEDRO, the electronic data collection method, and continuation of efforts to improve survey methodology, graphical data validation, and the automated data retrieval system, Survey Information System (SIS). The results of these efforts should enable the PD to continue to provide accurate weekly and monthly data estimates.
Structure of Box and Whisker Plots
All box and whisker plots discussed in this article are the visual presentation of a variable's distribution of 12 values of percent errors for either MFW or PSM values relative to PSA values for a given year. In general, box and whisker plots group data, ordered from smallest to largest, into four areas of equal frequency, quartiles, and show the range and dispersion of data within the quartiles. Sometimes the values of quartiles must be interpolated, i.e., if there are two values that meet the criteria of a quartile, then the average of the two must be taken. Presented below is a discussion of components of box and whisker plots and how they apply to the 12-value distribution illustrated in Example 1: -35, -20, -11, -9, 0, 0, 0, 0, 4.5, 5.5, 15, and 20.
[Example 1 ILLUSTRATION OMITTED]
* First Quartile
Twenty-five percent of the values are equal to or below the first quartile. In Example 1, the first quartile is the average of the third and fourth ordered observations, i.e., (-11+(-9))/2=-10. The first quartile demarcates the lower boundary of the box.
* Second Quartile
The second quartile is the median, and it intersects the box. Fifty percent of the observations are equal to or below the median; in our example, the values of these six observations are: 0, 0, -9, -11, -20, and -35. Also, for this example, the median is the average of the sixth and seventh value, 0, i.e., (0+0)/2. The plot provides the value of the median (the second quartile) as well as information on how the median compares in magnitude to the rest of the observations. Outliers distort the magnitude of the mean, whereas a median is not distorted since it is the actual value that falls in the middle of the distribution. Since outliers have occurred in the distributions of values of PSRS variables, a median is preferred to a mean when assessing accuracy.
* Third Quartile
Seventy-five percent of the observations (9 in this case) have values equal to or below the third quartile. In Example 1, the third quartile is 5, i.e., (4.5+5.5)/2. The third quartile demarcates the upper boundary of the box.
The box contains half of all the values. In Example 1, as well as in each box found in Figures FE3-FE11, a minimum of six values are contained within the box. The interquartile range is the length of the box, the difference between the first and third quartiles. The interquartile range for Example 1 is 15, i.e., 5-(-10).
Each whisker extends out from the box, one from the first quartile and the other from the third quartile, to the most extreme value that still falls within 1.5 times the interquartile range. In Example 1, a whisker extends from the third quartile, 5, to 20, which is the maximum value and is within 1.5 interquartile ranges of 5 (as it is less than 5+(1.5*15)=-27.5). Also in Example 1, the lower whisker extends from the first quartile -10, to -20, which is the lowest value of the distribution within 1.5 interquartile ranges of the first quartile.
* Fourth Quartile
The fourth quartile is the maximum value of the distribution. In Example 1, the fourth quartile, 20, also demarcates the upper value of the top whisker as it is within 1.5 interquartile ranges of the third quartile.
An outlier, identified as an asterisk, is an observation that is more than 1.5 interquartile ranges greater than the third quartile, or more than 1.5 interquartile ranges less than the first quartile. In Example 1, there is one outlier, -35. It is less than the lower whisker's threshold value, which is -32.5 (-10-(1.5*15)). The importance of the occurrence of an outlier depends on the distribution of the variable. If the interquartile range is very Tight and the outlier is in close proximity, then there is little concern about the occurrence of that outlier. (See Figure FE3, MFW vs PSA of Crude Oil Production for 1997.)
|Printer friendly Cite/link Email Feedback|
|Author:||Heppner, Tammy G.; French, Carol L.|
|Publication:||Petroleum Supply Monthly|
|Date:||Oct 1, 2000|