AIMS: This study validated enterprise data warehouse (EDW) data for a cohort of hospitalized patients with a primary diagnosis of diabetic ketoacidosis (DKA). METHODS: 247 patients with 319 admissions for DKA (ICD-9 code 250.12, 250.13, or 250.xx with biochemical criteria for DKA) were admitted to Northwestern Memorial Hospital from 1/1/2010 to 9/1/2013. Validation was performed by electronic medical record (EMR) review of 10% of admissions (N=32). Classification of diabetes type (Type 1 vs. Type 2) and DKA clinical status were compared between the EMR review and EDW data. RESULTS: Key findings included incorrect classification of diabetes type in 5 of 32 (16%) admissions and indeterminable classification in 5 admissions. DKA was not present, based on the review, in 11 of 32 (34%) admissions. DKA was not present, based on biochemical criteria, in 15 of 32 (47%) admissions. CONCLUSIONS: This study found that EDW data have substantial errors. Some discrepancies can be addressed by refining the EDW query code, while others, related to diabetes classification and DKA diagnosis, cannot be corrected without improving clinical coding accuracy, consistency of medical record documentation, or EMR design. These results support the need for comprehensive validation of data for complex clinical populations obtained through data repositories such as the EDW.