Why early detection of cancer is important
A Regional Coordinating Centre RCC and survey teams were set up for the baseline survey, consisting of 40 full-time staff members with medical qualifications and fieldwork experience in the three studied districts. All participants were indefinitely monitored for cancer occurrence through linkages with local cancer registries and health insurance databases. Exposure data were collected through survey questionnaires and physical measurements.
In addition, blood samples and other biological samples were collected. A total of , individuals have been recruited and the average follow-up time is 8. Using samples from the Taizhou Longitudinal Study TZL , we set out to develop a classification model that could identify cancer in a non-invasive manner prior to the appearance of cancer symptoms and conventional cancer diagnosis.
Written informed consent was obtained from all study participants prior to inclusion in the TZL. For healthy samples, it was required that the individual was not diagnosed with cancer for the duration of the monitoring period minimum of 5 years. For pre-diagnosis samples, it was required that a positive diagnosis of lung, liver, stomach, esophageal, or colorectal cancer was determined within 4 years of initial blood draw.
For all samples, after processing and sequencing, it was required that at least , unique mapped DNA molecules were observed in the sequencing data, as lower amounts indicate a low-quality sample. Additional case and control patients were incorporated into the study due to their availability.
A total of plasma samples were collected from the TZL cohort and local Taizhou hospital biobanks for inclusion in this study.
In the TZL cohort, a total of initially healthy study participants were later diagnosed with colorectal, esophageal, liver, lung, or stomach cancer within 4 years. For the training set, post diagnosis cancer samples, 93 pre-diagnosis samples, and healthy samples matched by sex, year age group, collection date, and unique mapped read count were selected; these samples were used for model building, and model parameters were fixed prior to any analysis of test set samples.
For the matched leave-out test set, post-diagnosis, 98 pre-diagnosis samples, and matched healthy samples were selected. For the leave-out test set, clinical outcomes were concealed from the classifier until calls were made. Blood samples were collected from , individuals from to and separated into plasma as part of the Taizhou Lontiduinal Study TZL Patient age at the time of blood sample collection, patient sex, cancer diagnosis date, and cancer tissue of origin was cataloged for each sample Supplementary Data 2.
Briefly, the bisulfite converted DNA was dephosphorylated and ligated to a universal adapter with a unique molecular identifier UMI. Following a purification, a second PCR added sample specific barcodes and full length Illumina sequencing adapters. Initial data analysis was performed as part of the standard Singlera methylation sequencing processing pipeline.
Reads were demultiplexed using the Illumina bcl2fastq software v2. The trimmed reads were then aligned to the bisulfite-converted human reference genome version hg19 using Bismark v0.
This left a matched set of pre-diagnosis samples, post-diagnosis samples, and healthy samples, as well as healthy and cancer tissue samples. For each remaining sample, aligned reads were assigned to each of the target regions covered by the PanSeer assay based on mapped genomic position. The average methylation fraction AMF was computed for each target region by summing the number of observed cytosines at all covered CpG sites and dividing by the total sequencing depth at all covered CpG sites in each region per the formula:.
A graphical example of this computation is provided in Supplementary Fig. This process resulted in a matrix with rows one for each target region and columns one for each sample ; these AMF values have been provided as Supplementary Data 5. In order to ensure that target regions utilized for cell-free DNA analysis showed aberrant methylation in cancer tissue, we utilized the set of cancer tissue samples and 40 healthy tissue samples including 8 primary lymphocyte samples obtained from Biochain.
For each of the target regions, we performed a t -test with Benjamini-Hochberg multiple testing correction comparing the AMF values for cancer tissue samples to the AMF values for healthy tissue samples for each tissue type. These resultant genomic regions corresponded to 10, CpG sites that could differentiate the Biochain healthy lymphocyte and healthy tissue samples from the Biochain cancer tissue samples Supplementary Fig.
In order to additionally confirm that the chosen sites truly represented a pan-cancer signature, we verified that PanSeer target regions showed differential methylation between healthy and cancer tissue for other tissue types in the publicly available TCGA dataset 28 Supplementary Fig. In order to classify each plasma sample as being derived from a healthy patient or cancer patient, a logistic regression LR classifier was constructed using the training set samples; in order to avoid overfitting, a cross-validation approach was utilized The detailed procedure used to build the classifier is described below, and code with inline pseudocode in the comments is provided in Supplementary Note 1 :.
The AMF matrix was first subset to contain only the columns corresponding to training set samples; leave-out test set samples were not considered until model parameters were finalized. In order to ensure that genomic regions considered by the model were covered across all samples, any regions for which any sample did not contain at least one read were dropped from consideration in the model i.
A target array with length equal to the number of training set samples was defined. If a training set sample was healthy, its corresponding target array value was set to 0, while if a training set sample was cancerous, its corresponding target array value was set to 1.
This splitting process was repeated times, leading to different model-building and model validation sets all subsets of the training set. This cross-validation procedure allows estimation of model parameters using only half of the training set and validation of model parameters using the other half of the training set. This resulted in 1, different LR equations one for each model-building set , which were used as an ensemble.
These equations were then used to compute LR scores for each corresponding model validation set; this resulted in a matrix of LR scores, with each training set sample having multiple scores from all iterations where it was part of the model validation set. A final LR score was computed for each training set sample by averaging all scores computed when a sample was part of the model validation set.
A cutoff of 0. Model accuracy was computed for the training set using these average scores and cutoffs Table 1. This ensemble model consisting of the average result of equations was then locked down prior to any analysis of the leave-out test set. For all leave-out test set samples, LR scores were computed using all equations, and a final score was computed by averaging these individual LR scores.
The final score was compared to the training set cutoff, and model accuracy was computed for the leave-out test set using these average scores Table 1. By utilizing this ensemble classifier approach, the risk of overfitting is greatly reduced as it can be ensured that training set and test set performance is similar. The full equations for the ensemble Logistic Regression classifier including coefficients, intercepts, and cutoffs is presented in Supplementary Data 3 , and source code has been included in Supplementary Note 1.
In addition, this same procedure was repeated using a Linear Discriminant Analysis classifier instead of Logistic Regression in order to demonstrate that results are independent of the chosen machine learning method; source code and results are presented in Supplementary Note 2.
To further demonstrate the robustness of this approach, we additionally determined that the logistic regression score was unaffected by either the number of missing values in a sample or its bisulfite conversion rate Supplementary Figs. We also repeated logistic regression modeling with two additional constraints to confirm that observed methylation signals in pre-diagnosis and post-diagnosis patients were cancer-derived; even with a minimum read depth requirement Supplementary Note 5 or using stricter marker selection criteria based on cancer tissue Supplementary Note 6 , model performance remained high.
Means and standard deviations or medians and range were utilized to summarize continuous variables, while whole numbers and percentages were utilized to summarize categorical variables. Accuracy metrics were computed for each sample set and subset based on sample covariates.
Binomial confidence intervals for sensitivity and specificity were calculated using the Clopper-Pearson method. To determine if any sample covariates impacted assay performance, the Kruskal-Wallis H -test was utilized to compare model scores for each category healthy, pre-diagnosis, and post-diagnosis across each analyzed covariate Supplementary Figs. In order to measure the analytical limit of detection of the PanSeer assay, a set of spike-in samples consisting of a mixture of cancer cell line DNA and healthy plasma was constructed; it was determined whether each spike-in level could be separated from baseline healthy plasma.
The DNA was then purified and concentrated using Ampure beads. Plasma from multiple healthy individuals was pooled together to use as a baseline in the limit of detection study. The PanSeer assay was then run on the spike-in samples. In order to evaluate the limit of detection analytically, due to the variation in methylation levels across the genome, four baseline samples were chosen as training samples to determine the level of observable background methylation in healthy plasma across each genomic region.
For each individual genomic region, a cutoff value was determined using these four baseline training samples; this cutoff was set at three standard deviations above the mean methylation value observed in the baseline samples. Detailed code for the limit of detection analysis is provided in Supplementary Note 4.
Further information on research design is available in the Nature Research Reporting Summary linked to this article. Full genetic sequencing data was not included in the informed consent, hence only the methylation status at each genomic position has been released. Huang, A. T-cell invigoration to tumour burden ratio associated with anti-PD-1 response. Nature , 60—65 Prigerson, H.
Chemotherapy use, performance status, and quality of life at the end of life. JAMA Oncol. World Health Organization. Guide to Early Cancer Diagnosis. Siegel, R. Cancer statistics, CA Cancer J. PubMed Google Scholar. Pickhardt, P. Colorectal cancer: CT colonography and colonoscopy for detection—systematic review and meta-analysis. Radiology , — Brawer, M. Prostate-specific antigen. Performance of radiographers in mammogram interpretation: a systematic review.
Breast 17 , 85—90 Partridge, E. Cervical cancer screening. Natl Compr. Pinsky, P. Prostate cancer screening—a perspective on the current state of the evidence.
Subramanian, S. Adherence with colorectal cancer screening guidelines: a review. Donaldson, J. Circulating tumor DNA: measurement and clinical utility. If it is detected at an early stage, before symptoms even appear, it is easier to treat and there is a better chance of survival. NHS England has introduced a new bowel cancer screen test for over 4 million people that is easier to use than the previous test.
In trials using FIT we saw more men, people from ethnic minority backgrounds and people in more deprived areas take up the offer of screening for bowel cancer. With up to a third-of-a-million more people expected to self-administer the FIT test, it will increase the number of early-stage bowel cancers that are detected. For more information bowel screening, please visit the screening page.
It sets out a delivery plan to ensure the NHS in England has the right numbers of skilled staff to provide high quality care and services to cancer patients at each stage in their care. Lung cancer is frequently diagnosed at a later stage than other cancers, due to there often being no signs at an early stage. The programme aims to improve earlier diagnosis of lung cancer, at a stage when it is much more treatable.
Early diagnosis is key to our survival efforts — it means an increased range of treatment options, improved long-term survival and improved quality of life.
Abstract Screening in both healthy and high-risk populations offers the opportunity to detect cancer early and with an increased opportunity for treatment and curative intent. Publication types Research Support, Non-U. Gov't Review. Screening for cervical cancer and colorectal colon cancer can prevent cancer by finding early lesions so they can be treated or removed before they become cancerous.
Screening for cervical, colorectal, breast, and lung cancers helps find these diseases at an early stage, when treatment works best. Communities can prevent and control cancer when they have the right partners, plans, and solutions. Skip directly to site content Skip directly to page options Skip directly to A-Z link.
0コメント