Step 1: Business Understanding
1. There are wide discrepancies of charges and payments between institutions
a. Larger hospitals charge more and receive higher payments
b. Urban hospitals charge more, but do not receive higher payments
2. Are the variations due to excessive charging or lower payments?
a. Excess Charge = Charge/Payment
b. Costtocharge ratio = Payment/Charge
Step 2: Data Understanding
1. IPPS Data
a. Medicare Provider Utilization and Payment Data: Inpatient
i. Total Discharges
ii. Average Covered Charges
iii. Average Total Payments
2. Census Data because of the size of this file this has been limited to NY ONLY – this file is text CSV so it will have to be opened in EXCEL first.
a. 2010 ZCTA to Metropolitan and Micropolitan Statistical Areas Relationship File
i. Zipcode
ii. CBSA
Step 3: Data Preparation

 Filter the IPPS file to only include NY
 Add the CBSA from the Census Data file to the IPPS Data fileCopy the CBSA column and Paste Special as values only
 Use VLOOKUP
 Remove #N/A values Use Find/Replace
 Insert a new column
 In the new column, use the IF function to recategorize the hospital geography
 If the hospital has an identified CBSA, recategorize that hospital as urban
 If the hospital does not have a CBSA, recategorize that hospital as rural
 Copy the Geography column and Paste Special as values only
 Calculate Excessive charges= ChargePayment
 Calculate Costtocharge Ratio (CRR) = Payment/Charge
 Copy the Excess Charge and CCR columns and Paste Special as values only
 Save the file as a .csv
 Also, save a version of the file as a .xlsx
 In the .xlsx version, click in any of the cells, format as a table (HOME – “Format as Table”)
 In the .xlsx version, name the table (DESIGN – “Table Name” – enter “DRG”)
 Save
Step 4: Modeling
1. Create a PIVOT TABLE of the count of hospitals for each geographic region (INSERT PivotTable). REMEMBER: click the checkbox “Add this data to the Data Model”
2. Create a PIVOT TABLE to calculate the following for each geographic region:
a. Average Total discharges
b. Average Covered charges
c. Average Total Payments
d. Average Medicare Payments
e. Average Excess charges
f. Average Costtocharge ratio (CCR)
3. Use COUNTIF to count the number of rural and urban hospitals (compare these results to what is provided in a PIVOT TABLE
=COUNTIF(DRG[Geo],”Urban”)
=COUNTIF(DRG[Geo],”Rural”)
4. Use SUMPRODUCT to count the number of rural and urban hospitals that have a costtocharge ratio greater than or equal to 0.5 and those less than 0.5 (How should we normalize these results? Calculate the proportion!).
=SUMPRODUCT((DRG[Geo]=”Urban”)*(DRG[CCR]<0.5))
=SUMPRODUCT((DRG[Geo]=”Urban”)*(DRG[CCR]>=0.5))
=SUMPRODUCT((DRG[Geo]=”Rural”)*(DRG[CCR]<0.5))
=SUMPRODUCT((DRG[Geo]=”Rural”)*(DRG[CCR]>=0.5))
5. Create a PIVOT TABLE of the count of each MSDRG
6. Create graphs to depict the above information (INSERT – CHARTS)
7. Open R
8. Open R commander
a. Type the following into R:
library(Rcmdr)
9. Import the data into R Commander using the following script:
dataset< read.csv(file.choose())
Locate the IPPS csv data file and click “OK”
10. Activate the dataset in R commander
a. Click <No active dataset> and find “dataset”
b. Confirm the number of rows and columns as compared to the original dataset
11. Obtain a summary of the following numeric data (Statistics – Summaries – Numeric Summaries – Hold down Ctrl and click the variable names shown below – Click OK):
a. Average Covered charges
b. Average Total Payments
c. Average Medicare Payments
d. Excess charges
e. Costtocharge ratio (CCR)
12. Create two graphs of the “Plot of means” to compare Total Average Charges, Total Average Payment, Excess Charge, and CRR by geographic location
13. Use a twosample Ttest to determine if there are significant differences in the following data between rural and urban hospitals:
a. Count of hospitals
b. Total discharges
c. Covered charges
d. Total Payments
e. Medicare Payments
f. Excess charges
g. Costtocharge ratio (CCR)
Step 5: Evaluation
1. Summarize the findings
a. Are there confounding variables that we should have considered in our analysis?
i. Hint: Frequency of MSDRG codes for each geographic location
Step 6: Deployment
1. How would these findings be relevant to your organization and what might your organization do with this sort of information?