N3C Education

Mission

The mission of the N3C Education Enclave is to provide educators and learners a space to develop and practice the skills needed to analyze real-world data (RWD, e.g., non-clinical trial data, such as from medical records, insurance claims, patient surveys, or census or community datasets). The Education Enclave provides simulated (also known as ‘synthetic’ or notional) datasets to learn on, as well as a series of training tutorials, the Researcher’s Guide to the N3C - a virtual textbook of the concepts and skills needed to study RWD, and access to many of the shared resources available to the broader N3C community.

Since the Educational Enclave does not include any real patient data, only simulated data, there are no restrictions on recording or sharing screen views, making it a rich venue for training programs, courses, and workshops.

Overview

Real-world datasets such as electronic health records (EHRs) provide important information needed for advancing and transforming health care. Translational studies - studies that translate lessons and evidence learned from real-world health data into new treatments and improved clinical practice and public health care - require multidisciplinary knowledge and skills. In addition to standard research skills, conducting high-quality, rigorous, translational projects requires understanding medical vocabularies, data models, data engineering, statistical knowledge, health equity, and public health. Real-world datasets are also observational, including significant “messiness”, and Good Algorithmic Practices are needed to appropriately plan and conduct studies, and to interpret and communicate results to inform and improve clinical care and public health. Hands-on training using real-world data and tools is essential; however, most databases and analysis tools require special installation or infrastructure support; cannot be shared or recorded - even for classes - in order to protect patient privacy; or are lacking in realism or suitability for teaching.

To remedy these challenges, the National Center for Advancing Translational Science (NCATS), which hosts the N3C, worked with Tufts Medical Center who generously agreed to make over 500,000 simulated patients available for educational use. The Tufts synthetic data has gone through extensive testing to mitigate any concerns about privacy, including getting an expert determination from an independent entity that specializes in privacy risk.

This data contains common elements, including conditions, devices, drugs, measurements, observations, procedures, and visits, that have been preliminarily verified to be highly concordant with the original EHR data across a number of domains and applications. However, this is simulated data; created for educational purposes.

The Educational Enclave also provides access to other notional data in the form of SynPuf and Synthea tables. Each of these data resources is formatted in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), an “open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence.” OMOP formatting is key to illustrating how data from different health systems can be harmonized, increasing our ability to produce large cross-institutional, broadly-representative data sets that enable increasingly powerful studies.

Like all N3C enclaves, resources such as code templates (prewritten sets of commonly-used programming code) and concept sets (prewritten sets of commonly-used medical codes) are sharable, allowing instructors and learners to develop material during training that can be shared and used in research projects. Users of the Education Enclave also have access to the training and support materials developed for the other N3C enclaves, including

Available Data

N3C ingests and harmonizes data from multiple sources for use in multiple enclaves. These data ‘streams’ primarily include Electronic Health Records (EHRs) that are then harmonized to the OMOP Common Data Model.

The following streams are available in the Educational Enclave:

  • Tufts Synthetic Data: High-quality simulated/synthetic EHR data for learning purposes. (OMOP format.)
  • Notional Patient Data: Other simulated/publicly available EHR data for learning purposes. (OMOP format.)
  • External Datasets: Publicly available data (e.g., U.S. Census and regional data) for use alongside EHR data. Users can request ingestion of additional external datasets–see link for details and currently available datasets. For more information about these and to request ingestion of more, see the External Datasets page. (Various formats.)

Using the N3C Educational Enclave

While the N3C Education Enclave does not contain real patient data, the registration and access procedures are the same as for other N3C enclaves. This promotes learning the involved regulatory processes, and enables educators and learners to continue on with real-world data research in the future.

Individuals wishing to use the N3C Educational Enclave must complete the following steps:

  1. Confirm there is a current Institutional Agreement for your institution. First, check the Signatories List at https://covid.clinicalcohort.org/tenant-duas/ to confirm that your institution has a signed institutional agreement with the NCATS.
    • Note that the Educational Enclave uses the standard Institutional Data Use Agreement (DUA) form; however, this DUA form does not provide access to any synthetic or patient data; it ensures users will abide by appropriate use of the platform and resources provided. Accessing data requires a separate form called the Data Use Request (see below), which confirms that the individual has the appropriate training and approvals to view the specific data requested.
    • If your institution does not currently have a signed institutional agreement on file, you can download the DUA form and provide it to your institution’s signing official (often someone from the business or legal office). The signing official completes the form and then e-mails it to NCATS at NCATSPartnerships@mail.nih.gov. Once submitted, it usually takes 1-2 weeks for NCATS to review, countersign, and add your institution to the Signatories List.
      • While the form is being processed, you can work on the training requirements and review resources described in the steps below.
    • If you do not have an institutional affiliation, you can sign and submit a Citizen Scientist agreement certifying that you agree to the terms of service.
  2. Complete required NIH Information Security Training. Complete the NIH Information Security and Management Refresher course at https://irtsectraining.nih.gov/publicUser. You only need to complete the refresher course with the 5 modules - 2024 Information Security, Insider Threats, Privacy Awareness, Records Management and Emergency Preparedness (60 - 90 minutes). After the 5th module, click the Print Certificate button and save a copy of your certificate. You will need to complete this training to submit your individual registration (below), and annually thereafter to maintain access.
  3. Submit your individual registration to access the enclave and its training resources. Once your institution is on the Signatories List, register for the N3C at https://n3c.cd2h.org/registration. After registration, your account will be reviewed for coverage by an Institutional Agreement; when this is complete, you will receive a “Welcome to N3C” email instructing you to log in at https://unite.nih.gov/.
    • To register, you will need your institutional email, a smartphone or tablet with a two-factor authentication application (e.g., DUO, Google Authenticator, Microsoft Authenticator), and an ORCID to ensure appropriate author attributions. If you do not have an existing ORCID, you can register at https://info.orcid.org/what-is-orcid/.
  4. Complete Human Subjects Research Protection Training. Even though the Education Enclave contains no human subjects data, completion of this training within the last 3 years is required as part of general N3C requirements. This training is frequently offered through https://citiprogram.org; consult your local Human Research Protection Program (HRPP) for your institution's specific requirements.
  5. Join or submit a Data Use Request (DUR) to access the synthetic datasets. Once you are oriented and ready to begin a course or other learning activity, you can request access to the relevant synthetic data set by clicking on (a) the Join a DUR button to locate and find an existing DUR, or (b) Create a New DUR button to submit a new request. The Data Use Request is a standard form used to govern access to data resources.

If you have questions about any of these steps, please visit the N3C Support Desk at https://covid.clinicalcohort.org/support/.

Once registered, spend some time exploring the Educational Enclaves' many resources. From the Home page, we recommend clicking on the Training Portal button and reviewing the Orientation videos and the Researcher’s Guide to the N3C, which provides a textbook of the concepts and skills needed to analyze real-world data. The Educational Enclave also provides a wealth of code templates and tutorials to help you learn how to design and develop a translational project.

Data Use Request (DUR) Submission Guidance

Individuals, instructors, and training program leads can create DURs to request using one or more of the synthetic data sets for their classes or educational projects. Requests will need to provide the project, course, or program’s:

  • Title and abstract describing the educational goals and focal areas of the project. This information will be posted publicly on N3C dashboards and websites.
  • Rationale, which will be evaluated by the Education Enclaves’ Data Access Committee (DAC) for educational focus. This information will remain private within the DAC.
  • Attestation and Agreement to the N3C Clinical Data Use Agreement, the N3C Download Policy, and the N3C Code of Conduct.

Data Use Requests for the Educational Enclave will generally be accepted for health-related, real-world data educational purposes. Educational Enclave DURs are valid for 1 year and renewable for continued access to approved project workspaces.

Special Considerations

While most N3C governed resources prohibit screenshotting, recording video, or sharing screen views of row-level data with others without the same level of data access, these restrictions do not apply to Education Enclave governed resources. Users are free to take screenshots, record videos, and screen-share data from the Tufts Synthetic Data, Notional Patient Data, and External Datasets data streams for educational purposes, and share them inside and outside of the Education Enclave and N3C platform.

Contacts

Shawn O’Neil, N3C Training Coordinator, shawn@tislab.org