Skip to content

Working with frames and series

In this section, you will work with frames and series in the sandbox.

Start the sandbox🔗

In the terminal, type

opensafely exec ehrql:v0 sandbox example-data

and press Enter.

A screenshot of VS Code, showing the sandbox

Notice that the command prompt, or the set of characters before the cursor, has changed to >>>. For the remainder of this section, when you see >>>, you should type the code that follows into the sandbox and press Enter.

Work with patient data🔗

Import the patients table🔗

>>> from ehrql.tables.beta.core import patients

Work with a patient frame🔗

The patients table is a patient frame; it has one row per patient. Notice that all values in the patient_id column are unique.

>>> patients
patient_id        | date_of_birth     | sex               | date_of_death
------------------+-------------------+-------------------+------------------
0                 | 1973-07-01        | female            | 2015-09-14
1                 | 1948-03-01        | male              | None
2                 | 2003-04-01        | male              | None
3                 | 2007-06-01        | female            | None
4                 | 1938-10-01        | male              | 2018-05-23
5                 | 1994-04-01        | female            | None
6                 | 1953-05-01        | male              | None
7                 | 1992-08-01        | female            | None
8                 | 1931-10-01        | female            | 2017-11-10
9                 | 1979-04-01        | male              | None

Work with a patient series🔗

The patients.date_of_birth column is a patient series; it has one row per patient.

>>> patients.date_of_birth
0 | 1973-07-01
1 | 1948-03-01
2 | 2003-04-01
3 | 2007-06-01
4 | 1938-10-01
5 | 1994-04-01
6 | 1953-05-01
7 | 1992-08-01
8 | 1931-10-01
9 | 1979-04-01

Work with event data🔗

Import the medications table🔗

>>> from ehrql.tables.beta.core import medications

Work with an event frame🔗

The medications table is an event frame; it has many rows per patient. Notice that some values in the patient_id column are not unique.

>>> medications
patient_id        | row_id            | date              | dmd_code
------------------+-------------------+-------------------+------------------
0                 | 0                 | 2014-01-11        | 39113611000001102
1                 | 1                 | 2015-08-06        | 39113611000001102
1                 | 2                 | 2018-09-21        | 39113311000001107
1                 | 3                 | 2020-05-17        | 22777311000001105
3                 | 4                 | 2022-11-09        | 22777311000001105
4                 | 5                 | 2017-05-11        | 39113611000001102
5                 | 6                 | 2017-07-11        | 3484711000001105
5                 | 7                 | 2019-07-06        | 39113611000001102
7                 | 8                 | 2021-01-27        | 3484711000001105
9                 | 9                 | 2015-03-14        | 3484711000001105

Work with an event series🔗

The medications.date column is an event series; it has many rows per patient.

>>> medications.date
0 | 0 | 2014-01-11
1 | 1 | 2015-08-06
1 | 2 | 2018-09-21
1 | 3 | 2020-05-17
3 | 4 | 2022-11-09
4 | 5 | 2017-05-11
5 | 6 | 2017-07-11
5 | 7 | 2019-07-06
7 | 8 | 2021-01-27
9 | 9 | 2015-03-14

Transform input data into output data🔗

ehrQL tables, such as patients and medications, contain input data. As a researcher, your task is to transform them into a dataset. A dataset has one row per patient and contains output data.

Let's look at how we might transform input data into output data.

Transform an event frame into a patient frame🔗

To transform an event frame into a patient frame:

  1. Sort the event frame
  2. Select either the first row or the last row in the event frame
>>> medications.sort_by(medications.date).first_for_patient()
patient_id        | date              | dmd_code
------------------+-------------------+------------------
0                 | 2014-01-11        | 39113611000001102
1                 | 2015-08-06        | 39113611000001102
3                 | 2022-11-09        | 22777311000001105
4                 | 2017-05-11        | 39113611000001102
5                 | 2017-07-11        | 3484711000001105
7                 | 2021-01-27        | 3484711000001105
9                 | 2015-03-14        | 3484711000001105

Transform an event frame into another event frame🔗

To transform an event frame into another event frame, filter rows that match or do not match a condition.

Rows that match 100mcg/dose Salbutamol:

>>> medications.where(medications.dmd_code == "39113611000001102")
patient_id        | row_id            | date              | dmd_code
------------------+-------------------+-------------------+------------------
0                 | 0                 | 2014-01-11        | 39113611000001102
1                 | 1                 | 2015-08-06        | 39113611000001102
4                 | 5                 | 2017-05-11        | 39113611000001102
5                 | 7                 | 2019-07-06        | 39113611000001102

Rows that do not match 100mcg/dose Salbutamol:

>>> medications.except_where(medications.dmd_code == "39113611000001102")
patient_id        | row_id            | date              | dmd_code
------------------+-------------------+-------------------+------------------
1                 | 2                 | 2018-09-21        | 39113311000001107
1                 | 3                 | 2020-05-17        | 22777311000001105
3                 | 4                 | 2022-11-09        | 22777311000001105
5                 | 6                 | 2017-07-11        | 3484711000001105
7                 | 8                 | 2021-01-27        | 3484711000001105
9                 | 9                 | 2015-03-14        | 3484711000001105

Extract a series of years from a series of dates🔗

To extract a series of years from a series of dates, append .year to the series of dates.

>>> patients.date_of_birth.year
0 | 1973
1 | 1948
2 | 2003
3 | 2007
4 | 1938
5 | 1994
6 | 1953
7 | 1992
8 | 1931
9 | 1979

Add one or more years to a series of dates🔗

To add one or more years to a series of dates, use the years function.

>>> from ehrql import years
>>> patients.date_of_birth + years(1)
0 | 1974-07-01
1 | 1949-03-01
2 | 2004-04-01
3 | 2008-06-01
4 | 1939-10-01
5 | 1995-04-01
6 | 1954-05-01
7 | 1993-08-01
8 | 1932-10-01
9 | 1980-04-01