What Are Hospital Cost Reports?

Every Medicare-certified hospital is required to submit an annual financial report to the Centers for Medicare & Medicaid Services (CMS). These financial reports are a bit like tax returns: they include basic information on the facility (its name, address, and so on) and the services it provided, but the bulk of the cost report is devoted to information on the hospital's finances. This financial information includes a huge range of detail on revenues, expenses, profits, assets, liabilities, wages, etc.

What Do "2552-10" And "2552-96" Refer To?

CMS requires that hospitals and many other types of health care providers fill out and submit annual financial reports, and different reports are used by different types of providers. The "2552" refers to the forms that hospitals are required to fill out. Skilled nursing facilities fill out form "2540," home health agencies fill out form "1728," and so on.

CMS has made several major updates to the forms hospitals fill out, and the "96" and "10" indicate the version the hospital filled out and submitted. One of those updates to the hospital forms occurred in 1996, and another in 2010. Some new worksheets were added in 2010, and many worksheets were overhauled. From 1996 to 2010, hospitals had to fill out form "2552-96," and from 2010 on hospitals have had to fill out form "2552-10."

How Do I Get Detailed Documentation On Each Variable In The Cost Reports?

The most detailed documentation is the full set of instructions that CMS provides to hospitals on how to fill out the cost reports. These instructions are part of the CMS "Provider Reimbursement Manual." The Provider Reimbursement Manual is made freely available online by CMS, but its intended audience is finance professionals working in the health care industry and initially it can be overwhelming for researchers. The Manual comes in 2 parts, the first part (Publication 15-1) consisting of 31 chapters on accounting topics spanning multiple provider types, and the second part (Publication 15-2) consisting of 44 chapters each relating to a specific type of provider and set of forms.

For researchers interested in the hospital cost reports, the key chapters are Part 2, Chapter 36 ("Hospital and Healthcare Complex, Form CMS-2552-96"), and Part 2, Chapter 40 ("Hospital and Hospital Health Care Complex Cost Report, Form CMS-2552-10") (see FAQ "What are '2552-10' and '2552-96').

Each of those two chapters includes a detailed set of instructions for hospitals, and a full set of the forms (in pdf format) hospitals fill out.

Can I Get A Data Dictionary For The Hospital Cost Reports?

If you download a data set from the RAND Hospital Data website, those are flat files that are similar to data sets that researchers are used to working with. We provide a spreadsheet data dictionary with a description of each variable in those flat files—some of those variables are taken directly from the cost reports; others are processed and added by RAND. We also provide spreadsheets (one for 2552-10, and another for 2552-96) showing where each raw variable comes from, and how it fits into the cost report forms.

If you download and work with the raw cost report data provided by CMS, those data are not structured as flat files, and the Provider Reimbursement Manual (see ‘How Do I Get Detailed Documentation On Each Variable In The Cost Reports?’) is the closest thing to a data dictionary.

CMS provides "rollup" SAS data sets, which are flat files with each observation representing a cost report—CMS also provides a spreadsheet record layout that can be helpful in identifying and interpreting fields in the cost reports.

What Do The Different Time Periods ("Hospital Fiscal Year," "Calendar Year," And "Federal Fiscal Year") Mean?

Each hospital can select its own cost reporting period, which is also referred to as the hospital's fiscal year. In most cases, those fiscal years run either from October 1 through September 30 (which aligns with the federal government's fiscal year), or January 1 through December 31, or July 1 through June 30. But, hospitals can and do have other cost reporting periods. And, sometimes, hospitals will change their cost reporting period in the middle of a year.

When CMS distributes hospital cost report data, they group cost reports based on the federal fiscal year of the beginning of the hospital cost reporting period. For example, if a hospital's cost reporting period begins on December 1, 2016, that day falls in federal fiscal 2017, and so that cost report would be distributed with the "2017" data by CMS.

For research purposes, it is useful to have the cost report data processed so that each value represents a time period that is consistent across hospitals. In preparing the "calendar year" data sets, and the "federal fiscal year" data sets, RAND uses an allocation method so that each year represents the same time period for all hospitals in the data.

The "hospital fiscal year" data sets follow the CMS method of assigning cost reports to years.

Do the Data Sets Available from RAND Hospital Data Have Every Field?

No, RAND Hospital Data only includes a small, but useful, subset of HCRIS data. The full HCRIS 2552-10 data set includes an overwhelming number of fields: around 38 thousand separate alphanumeric fields and 327 thousand separate numeric fields (this count treats "sublines" as separate fields).

If there is a field that you need for your analysis and that is not included in RAND Hospital Data, please contact us and let us know the worksheet, column, and line(s) that interest you. In future vintages, we will accommodate as many of these requests and upgrades as possible.

What Does "Level Of Aggregation" Mean?

If you download a data set with hospital as the level of aggregation, you will get a data set where each record represents a single hospital in a single year. If you download a data set with "County," "Market," State", or "National" as the level of aggregation, each record represents the aggregate for that geographic area in a single year. For some variables, such as number of beds or number of inpatient stays, the aggregate is the simple sum of the hospital-level values. For other variables, such as occupancy, the aggregate value is calculated based on aggregate sums (e.g., aggregate occupancy equals the sum of inpatient days divided by the sum of bed-days available).

What Does "Market" Mean?

“Market” refers to core-based statistical areas, which are defined by the U.S. Census Bureau. CBSAs “consist of the county or counties or equivalent entities associated with at least one core (urbanized area or urban cluster) of at least 10,000 population, plus adjacent counties having a high degree of social and economic integration with the core as measured through commuting ties with the counties associated with the core.“ CBSAs include metropolitan areas and micropolitan areas. When processing and creating RAND Hospital Data, all rural counties within a state (i.e., those counties that are not included in a metropolitan area or a micropolitan area) are grouped into a market—those CBSAs are coded as “XX999” where “XX” is the 2-character postal abbreviation for the state.

What Does "Vintage" Mean?

RAND periodically updates the raw HCRIS source data and the code used to process and prepare the data sets. Vintage represents the version or, more specifically, the date on which RAND created the data sets.

Different vintages of the same data set will differ for several reasons:

  • in the more-recent vintage, the processed data sets will use more-recent raw data with additional hospital-years;
  • in the more-recent vintage, the processed data sets and some cost reports will be audited, revised, and resubmitted, and some data values will differ as a result;
  • the more-recent vintage may include additional variables that were not included in the earlier vintage.
What Does "Data Errors Corrected" Mean?

Premium subscribers, when they are selecting data to download, can choose between data with errors corrected versus without errors corrected. (Registered users only have access to data with errors corrected.) To correct data errors, RAND applies an algorithm that identifies numeric values that fall far outside the normal range of variation ("outliers"), and replaces them with interpolated values. In general, the data is allowed a very wide range variation before being corrected, and the degree of variation is adjusted based on the degree of observed variation within a given hospital over time (so that hospitals that typically exhibit wider-than-normal variation are given more latitude), and the typical degree of variation for a given variable.

For each hospital and for each numeric variable, the allowable range of values is calculated following these eight steps:

  1. First, for each hospital we calculate the minimum value reported over all the years, the maximum value, the 25th percentile, the 50th percentile (median), and the 75th percentile.
  2. Second, for each hospital we calculate three measures of variability over time: the interquartile range (i.e., the difference between the 75th percentile and the 25th percentile, which is always nonnegative), the difference between the maximum value and the median ("max-to-median," which is always nonnegative), and the difference between the minimum value and the median ("min-to-median," which is always nonpositive).
  3. Third, for each hospital we calculate two ratios: the ratio of the max-to-median over the interquartile range ("max-to-IQR ratio"), and the ratio of the min-to-median over the interquartile range ("min-to-IQR ratio," which takes on negative values).
  4. Fourth, among all hospitals we calculate the 5th percentile and 50th percentile (median) of the min-to-IQR ratios, and the 50th percentile (median) and 95th percentile of the max-to-IQR ratio.
  5. Fifth, for each hospital we calculate a lower bound ratio as the median min-to-IQR ratio plus 4 times the difference between the 5th percentile min-to-IQR ratio and the median min-to-IQR ratio.
  6. Sixth, for each hospital we calculate an upper bound ratio as the median min-to-IQR ratio plus 4 times the difference between the 95th percentile max-to-IQR ratio and the median max-to-IQR ratio.
  7. Seventh, for each hospital we calculate the lower bound as the median (from the first step) minus the product of the IQR (from the second step) and the lower bound ratio (from the fifth step).
  8. Eighth, we calculate the upper bound as the median (from the first step) plus the product of the IQR (from the second step) and the upper bound ratio (from the sixth step).

In the data sets with data errors corrected, values that exceed the upper bound or fall below the lower bound are replaced with linearly interpolated values. (If the values to be replaced are the first or the last in the time series, then they are replaced with linearly extrapolated values.) The interpolation and extrapolation is based only on values that fall within the upper and lower bound (i.e. if two consecutive data points fall outside those bounds, then they will both be replaced).

What Do The Raw Healthcare Cost Report Information System (HCRIS) Data Look Like?

The raw HCRIS data are stored in a relational database that includes three file types: the report record, numeric values, and alphanumeric values.

The report record includes a report record number (rpt_rec_num), which is the key that uniquely identifies a single cost report and that links to the numeric values file and to the alphanumeric values file. The report record includes the provider number of the hospital that submitted the report (prvdr_num), the beginning and end dates of the cost reporting period (fy_bgn_dt and fy_end_dt), the status of the cost report (rpt_stus_cd), the identity of the fiscal intermediary that processed the cost report, and a handful of other fields. In the 2015 HCRIS data released by CMS on July 15, 2017, there were 6216 report records, and the first 5 of those records are shown here:

Example 1

The numeric values include the report record number (rpt_rec_num), the worksheet, line and column, and the numeric value reported by the hospital. In the 2015 HCRIS data released by CMS on July 15, 2017, there were 19.5 million numeric values, and the first 5 of those values are shown here (note that the report record number for these 5 values correspond to the first report record shown above):

Example 2

The alphanumeric values also include the report record number (rpt_rec_num), the worksheet, line and column, and the alphanumeric value. In the 2015 HCRIS data released by CMS on July 15, 2017, there were 3.6 million alphanumeric values, and the first 5 of those values are shown here (note that the report record number for these 5 values correspond to the first report record shown above):

Example 2
How Do I Know If The Cost Report Data Are Complete For A Given Year Or Time Period?

Users need to be aware that there are lags and checks in data processing that will affect the completeness of the cost report data, particularly for more-recent years. Hospitals have up to five months after the end of their cost reporting period to submit their data to CMS, and the Medicare Administrative Contractor then checks and applies edits to the data, which takes some time. CMS then updates the raw HCRIS data quarterly, and RAND downloads and processes the raw data.

As of October 12, 2017, the number of cost report records available for download from the CMS website were the following:

Beginning date of hospital cost report Number of cost report records available Completeness
October 1, 2010 – September 30, 2011 6150 Complete
October 1, 2011 – September 30, 2012 6226 Complete
October 1, 2012 – September 30, 2013 6247 Complete
October 1, 2013 – September 30, 2014 6248 Complete
October 1, 2014 – September 30, 2015 6235 Substantially complete
October 1, 2015 – September 30, 2016 3757 Partially complete
October 1, 2016 – September 30, 2017 19 Incomplete

When CMS released the next quarterly update of the raw HCRIS data in early 2018, many additional cost reports will be available for periods beginning from October 1, 2015 – September 30, 2016. But, researchers should not assume that all of those records were added by then.

In general, RAND only distributes processed data that are substantially complete (i.e. the processed data from the 2017_08_31 vintage does not include any data for 2016), but there will still be some drop off in aggregate totals in the most-recent year of data due to reporting delays.

Are The Cost Report Data Accurate?

The accuracy of the cost report data depends partly on the efforts by the hospitals that create and submit the data, and partly on the auditing and edits by the Medicare Administrative Contractors (MACs). In general, fields that are related more directly to Medicare reimbursement are reported more accurately by hospitals, and reviewed more closely by the MACs. For example, data in the cost reports on the number of Medicare discharges and Medicare revenues is likely to be more accurate than data on Medicaid discharges and revenues.

RAND has developed an algorithm for detecting and correcting errors in the cost report data values. That algorithm is designed to detect values that are extremely unusual relative to other values reported by the same hospital, with the threshold set for each field depending on the normal degree of within-hospital variability.

Premium subscribers can choose either to download data sets with this data error correction applied, or not.

Where Can I Find More Information On Medicare Cost Reports, And Examples Of How They Have Been Used?
  • The BlueCross BlueShield Association (BCBSA) released a series of commentaries and case studies on Medicare Cost Reporting Forms—these reports are not publicly available, but if they can be obtained from BCBSA or another source they are extremely helpful.
  • The National Bureau of Economic Research (NBER) provides HCRIS data and documentation, including the relational databases (in original text format, as well as SAS and Stata formats), and processed flat files containing select variables.
  • The Sheps Center at the University of North Carolina has published several studies that use Medicare hospital cost reports to measure the financial performance of rural hospitals.
  • CMS answers a set of Frequently Asked Questions relating to Medicare cost reports.
  • In 2001, Nancy Kane and Stephen Magnus discussed limitations of the Medicare cost reports, some of which were addressed in the update from 2552-96 to 2552-10.
  • In June, 2004, the Medicare Payment Advisory Commission (MedPAC) compared several sources of data on hospital financial performance, including Medicare cost reports and audited financial statements.
  • In 2012, a team of researchers at the University of North Carolina compared selected items from Medicare cost reports with "gold standard" audited financial statements.
How Do the RAND Hospital Data Differ from CMS "Rollup" Files and NBER Cost Report Files?

CMS provides rollup cost report files in SAS format. These files are referred to as "rollups" because within a single cost report from a single hospital they sum (roll up) into one variable the values reported on separate sublines. The rollup files from CMS are similar to the RAND Hospital Data in that they are flat files containing selected fields from the Medicare hospital cost reports, and the values on sublines are rolled up.

The CMS rollups differ from RAND Hospital Data in the following ways:

  • The rollup files are only available for hospital fiscal years (i.e., calendar year files are not available).
  • The rollup files do not include any processed variables (such as operating margins, or occupancy).
  • The variable names in the rollup files are non-intuitive (e.g. the variable "beds" in RAND Hospital Data corresponds to "S3_1_C2_14" in the rollup files--the rollup variable name reflects the fact that the values come from Worksheet S-3, Part 1, column 2, line 14).
  • The variable names are inconsistent across 2552-96 and 2552-10 (because variable names reflect worksheets, columns, and lines).
  • The rollup files are available from CMS only in SAS format; there is no error correction algorithm applied.
  • There are no geographic summary files available.

NBER provides Medicare cost report files for researchers in a flat format, available either in SAS format, Stata, or csv. NBER offers the CMS rollup files, plus "select variables" data sets that are created by NBER and that only include a subset of frequently used fields. NBER has also created custom data sets only containing a handful of variables for specific purposes, such as calculating cost-to-charge ratios or calculating Medicare add-on payments for medical education. But, like the CMS rollup files, the NBER files do not contain any preprocessed variables, the variable names are non-intuitive, the files are only available for hospital fiscal years (not calendar years), there is no error correction algorithm, and there are no geographic summary files available.

Register Today!

Register with your email to download a free data set and perform advanced searches, registering is free.