client logo

A synthetic dataset of 15,000 "patients" with Community Acquired Pneumonia (CAP)

Dataset
Version: 1.0.0
CAP is common, has variable outcomes and a complex management pathway. Hospital-based decision support algorithms would be highly valuable. This is a diverse and realistic synthetic dataset of 15,000 “CAP patients” to facilitate algorithm development.

Summary

Citation:
A synthetic dataset of 15,000 "patients" with Community Acquired Pneumonia (CAP)

Documentation

Description:
Community Acquired Pneumonia (CAP) is the leading cause of infectious death and the third leading cause of death globally. Disease severity and outcomes are highly variable, dependent on host factors (such as age, smoking history, frailty and comorbidities), microbial factors (the causative organism) and what treatments are given. Clinical decision pathways are complex and despite guidelines, there is significant national variability in how guidelines are adhered to and patient outcomes. For clinicians treating pneumonia in the hospital setting, care of these patients can be challenging. Key decisions include the type of antibiotics (oral or intravenous), the appropriate place of care (home, hospital or intensive care), and when it is appropriate to stop antibiotics. Decision support tools to help inform clinical management would be highly valuable to the clinical community. This dataset is synthetic, formed from statistical modelling using real patient data, and represents a population with significant diversity in terms of patient demography, socio-economic status, CAP severity, treatments and outcomes. It can be used to develop code for deployment on real data, train data analysts and increase familiarity with this disease and its management. PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. EHR. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”. This synthetic dataset has been modelled to reflect data collected from this EHR. Scope: A synthetic dataset which has been statistically modelled on all hospitalised patients admitted to UHB with Community Acquired Pneumonia. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care including timings, admissions, escalation of care to ITU, discharge outcomes, physiology readings (heart rate, blood pressure, AVPU score and others), blood results and drug prescribing and administration. Available supplementary data: Matched synthetic controls; ambulance, OMOP data, real patient CAP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
Is Part Of:
NOT APPLICABLE

Coverage

Spatial:
United Kingdom, England, West Midlands
Typical Age Range:
18-110
Follow Up:
0 - 6 MONTHS
Physical Sample Availability:
NOT AVAILABLE
Pathway:
Data is representative of the multi-ethnicity population within the West Midlands (42% non white). Data includes all patients admitted during this timeframe, with National data Opt Outs applied, and therefore is representative of admissions to secondary care. Data focuses on in-patient stay in hospital during the acute episode but can be supplemented on request to include previous and subsequent hospital contacts (including outpatient appointments) and ambulance, 111, 999 data.

Provenance

Origin

Purposes:
STUDY
Sources:
OTHER
Collection Situations:
OTHER

Temporal

Accrual Periodicity:
STATIC
Distribution Release Date:
2021-12-09
Start Date:
2018-01-01
End Date:
2021-06-08
Time Lag:
OTHER

Accessibility

Access

Access Service:
Trusted Research Environments (TRE) are built using Microsoft Azure services and hosted in the UK to provide research teams a safe, secure and agile environment which allows users to quickly analyse, interpret and form an enriched view of primary care information through a range of integrated datasets. Health data collated from multiple sources is ingested into a secure data lake which will then allow subsets of data to be made available to research teams on approval of a data request. Once approved a customer specific TRE is made available with a standard set of leading analytical tools from Microsoft including Azure Databricks, Azure Machine Learning, Azure SQL and Azure Synapse (for large-scale data warehouses). Specific tools can be provided at an additional cost over the standard platform data access charge and the PIONEER team will work with you to determine your exact needs. Access to the TRE is managed using the latest virtual desktop technology to provide a safe and secure end-user experience. By utilising leading edge design PIONEER are able to create TREs rapidly to enable us to service any customer requirement.
Access Request Cost:
www.pioneerdatahub.co.uk/data/data-services-costs/
Delivery Lead Time:
1-2 MONTHS
Jurisdictions:
GB-ENG
Data Controller:
University Hospitals Birmingham NHS Foundation Trust
Data Processor:
NOT APPLICABLE

Usage

Data Use Limitations:
GENERAL RESEARCH USE
Data Use Requirements:
PROJECT SPECIFIC RESTRICTIONS
Resource Creators:
  • Data is representative of the multi-ethnicity population within the West Midlands (42% non white). Data includes all patients admitted during this timeframe
  • with National data Opt Outs applied
  • and therefore is representative of admissions to secondary care. Data focuses on in-patient stay in hospital during the acute episode but can be supplemented on request to include previous and subsequent hospital contacts (including outpatient appointments) and ambulance
  • 111
  • 999 data.

Format and Standards

Vocabulary Encoding Schemes:
  • ICD10
  • SNOMED CT
Conforms To:
LOCAL
Languages:
en
Formats:
CSV

Enrichment and Linkage

Derivations:
Not Available

Observations

Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Events
15,000 synthetic admissions between 01/01/2018 and 08/06/2021
15000
Count
2021-09-12