Accurate identification of primary and recurrent cancer diagnoses is critically
important to clinical researchers. Traditional identification methods and electronic
diagnosis codes have significant limitations. To overcome these limitations and
further the science of clinical cancer research, Kaiser Permanente Southern California
researchers have developed a SAS-based coding, extraction, and nomenclature tool
(SCENT). SCENT uses natural language processing to identify and extract information
from the text of electronic pathology reports. The popularity of SAS statistical
software in clinical research settings will make SCENT highly accessible.
To assess the accuracy of SCENT, researchers conducted a validation study using
pathology reports of randomly selected breast and prostate cancer patients. The
tool successfully identified 97 percent (111/115) of confirmed cancer diagnoses
and produced only a few false positives (3/792). Additional information about SCENT
is available in a peer-reviewed publication at the Journal of the American Medical Informatics Association.
SCENT Program
Licensed under the Apache License, Version 2.0. See the notice embedded within the source code for additional detail.
Execution requires access to a licensed copy of SAS software, for which the licensee is solely responsible.
DOCUMENTATION
Slides presented at the 2012 HMORN conference in Seattle, WA
Link to online JAMIA publication and manuscript detailing SCENT’s methodology and validation.
PROGRAM
The Clinical Concept Dictionary in both Excel and SAS formats contained in a Zip File.