Analysis of ET data within MR-Eye 2.0¶

Derived from: https://www.axonlab.org/hcph-sops/data-management/eyetrack-qc/

Author: Yiwei Jia

This notebook focuses on the analysis of eye-tracking data within the MR-Eye 2.0 project, where visual stimuli are presented at the center of the screen. The analysis includes examining eye-tracking signals, identifying regions of interest (ROIs), and creating eye-tracking (ET) masks based on specific criteria—in this case, selecting areas corresponding to eye rotation displacements within 1/3 voxel size. Furthermore, we investigate eye movement characteristics extracted from the ET data and compare them across four subjects to assess and standardize the data quality.

For demonstration purposes, this notebook illustrates only the method for selecting the region but does not include functions for saving the masks to a local device. Readers are encouraged to implement their own save functions as needed.

# Derived from:
# https://www.axonlab.org/hcph-sops/data-management/eyetrack-qc
# Load the autoreload extension
%load_ext autoreload
# Set autoreload to update the modules every time before executing a new line of code
%autoreload 2

%matplotlib inline
from pathlib import Path
import json
import ppjson
from importlib import reload  # For debugging purposes

import numpy as np
import pandas as pd

import eyetrackingrun as et

from IPython.display import HTML
from matplotlib import animation
from matplotlib.animation import FuncAnimation, PillowWriter
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import copy
from write_bids_yiwei import EyeTrackingRun, write_bids, write_bids_from_df

Step 1: Inspecting Eye-Tracking Data¶

In this step, we’ll begin by parsing the metadata of our eye-tracking data. This will help us understand the structure of the data, including the recording parameters, subject details, and any relevant events or labels. Parsing the metadata provides insight into the conditions and settings during data collection, which is essential for accurate analysis.

# Specify the subject and modality you want to inspect

subject_idx = 1
T_idx = 1

if T_idx == 1:
    mode = 'T1'
else:
    mode = 'T2'


BIDS_PATH = Path("./data/")  # file within a subdirectory
FILE_NAME = f"sub00{subject_idx}_T{T_idx}"    

# session = "001" #can be a string to be defined when required
tsv_name = f"{FILE_NAME}.tsv.gz"

recording_file = BIDS_PATH / tsv_name
print(f'recording_file: {recording_file}')

recording = pd.read_csv(
    recording_file,
#     sep=r"\s+",
    sep="\t",
    na_values="n/a",
)

recording_file: data/sub001_T1.tsv.gz

This command shows the first few rows of our eye-tracking dataset, helping us understand the organization, key columns, and the format of the data. This step is useful to verify that the data loaded correctly and to identify any initial patterns or anomalies.

print(f'The length of the recording: {len(recording)}')

The length of the recording: 754932

Step 2: Understanding Metadata¶

The metadata provides crucial information about the setup and parameters used during eye-tracking data collection. By examining the metadata, we can gain insights into the recording conditions, data quality, and calibration settings.

metadata = json.loads((
    recording_file.parent
    / recording_file.name.replace(".tsv.gz", ".json")
).read_text())

metadata

    {'AverageCalibrationError': [0.5],
     'CalibrationCount': 1,
     'CalibrationLog': [[1047679,
       '>>>>>>> CALIBRATION (HV5,P-CR) FOR RIGHT: <<<<<<<<<'],
      [1047679, 'Calibration points:'],
      [1047680, '-10.3, -50.9         0,      0'],
      [1047680, '-8.9, -71.9         0,  -2457'],
      [1047680, '-9.3, -29.2         0,   2457'],
      [1047680, '-41.2, -50.5     -3474,      0'],
      [1047680, '24.2, -49.8      3474,      0'],
      [1047680, 'eye check box: (L,R,T,B)\n\t  -48    31   -76   -25'],
      [1047680, 'href cal range: (L,R,T,B)\n\t-5211  5211 -3686  3686'],
      [1047680,
       'Cal coeff:(X=a+bx+cy+dxx+eyy,Y=f+gx+goaly+ixx+jyy)\n   5.882e-05  107.02  1.2434 -0.18295 -0.27969 \n   4.4115e-05 -0.84952  115.02 -0.085003 -0.090966'],
      [1047680, 'Prenormalize: offx, offy = -10.326 -50.929'],
      [1047680, 'Gains: cx:102.441 lx:116.794 rx:88.586'],
      [1047680, 'Gains: cy:109.181 ty:118.022 by:110.039'],
      [1047680, 'Resolution (upd) at screen center: X=2.6, Y=2.4'],
      [1047680, 'Gain Change Proportion: X: 0.318 Y: 0.073'],
      [1047680, 'Gain Ratio (Gy/Gx) = 1.066'],
      [1047680, 'Cross-Gain Ratios: X=0.011, Y=0.008'],
      [1047680, 'PCR gain ratio(x,y) = 2.478, 2.261'],
      [1047680, 'CR gain match(x,y) = 1.020, 1.020'],
      [1047680, 'Slip rotation correction OFF'],
      [1047682, 'CALIBRATION HV5 R RIGHT   GOOD'],
      [1058027,
       'VALIDATION HV5 R RIGHT GOOD ERROR 0.50 avg. 0.79 max  OFFSET 0.43 deg. -11.0,-3.5 pix.']],
     'CalibrationResultOffset': [0.43, [-11.0, -3.5]],
     'CalibrationResultOffsetUnits': ['deg', 'pixels'],
     'CalibrationResultQuality': ['GOOD'],
     'CalibrationType': 'HV5',
     'Columns': ['eye1_x_coordinate',
      'eye1_y_coordinate',
      'eye1_pupil_size',
      'eye1_pupil_x_coordinate',
      'eye1_pupil_y_coordinate',
      'eye1_href_x_coordinate',
      'eye1_href_y_coordinate',
      'fast_raw_x_velocity',
      'screen_ppdeg_x_coordinate',
      'screen_ppdeg_y_coordinate',
      'timestamp',
      'eye1_fixation',
      'eye1_saccade',
      'eye1_blink'],
     'CornealReflectionThreshold': [179],
     'EyeTrackingMethod': 'P-CR',
     'LoggedMessages': [[1066471,
       'NO Reply is disabled for function eyelink_cal_result'],
      [1084633, 'RECCFG CR 1000 2 0 R'],
      [1084633, 'ELCLCFG TOWER'],
      [1084633, 'ELCL_PCR_PARAM 5 3.0']],
     'MaximalCalibrationError': [0.79],
     'PupilFitMethod': 'centroid',
     'PupilFitMethodNumberOfParameters': 3,
     'PupilThreshold': [68],
     'RecordedEye': 'right',
     'SamplingFrequency': 1000,
     'ScreenAOIDefinition': ['square', [0, 800, 0, 600]],
     'StartTime': None,
     'StopTime': None,
     'ValidationErrors': [['eye1', 0.45, [-12.1, 1.8]],
      ['eye1', 0.21, [-5.6, 0.7]],
      ['eye1', 0.41, [-11.1, 1.8]],
      ['eye1', 0.5, [-12.7, -4.4]],
      ['eye1', 0.79, [-10.8, -18.3]]],
     'ValidationPosition': [['eye1', [400, 300]],
      ['eye1', [400, 51]],
      ['eye1', [400, 549]],
      ['eye1', [48, 300]],
      ['eye1', [752, 300]]]}

For instance:

RecordedEye: This specifies which eye was tracked. Here, the data corresponds to the right eye
SamplingFrequency: The sampling rate is 1000 Hz (1 kHz), meaning the data captures 1000 samples per second.
CalibrationDetails: We can see the calibration method used (HV5), calibration quality (GOOD), and errors.

Understanding these parameters helps us ensure data reliability and assess the quality of eye-tracking measurements before analysis.

# Prepare the time axis based on the sampling frequency

t_axis = (
    recording.timestamp.values - recording.timestamp[0]
) / metadata["SamplingFrequency"]
print(f"The end of the timestamp: {t_axis[-1]}")

The end of the timestamp: 754.931

# Change the naming to make it more readable
recording = recording.rename(
        columns={
            f"eye1_pupil_size": f"pupil_size",
            f"eye1_fixation": f"fixation",
            f"eye1_saccade": f"saccade",
            f"eye1_blink": f"blink",
            f"eye1_x_coordinate": f"x_coordinate", 
            f"eye1_y_coordinate": f"y_coordinate"         
        }
    )

metadata["Columns"] = recording.columns.tolist()

metadata["Columns"]

['Unnamed: 0',
 'x_coordinate',
 'y_coordinate',
 'pupil_size',
 'eye1_pupil_x_coordinate',
 'eye1_pupil_y_coordinate',
 'eye1_href_x_coordinate',
 'eye1_href_y_coordinate',
 'fast_raw_x_velocity',
 'screen_ppdeg_x_coordinate',
 'screen_ppdeg_y_coordinate',
 'timestamp',
 'fixation',
 'saccade',
 'blink']

Step 3: Inspecting Signals, Event Masks and Correlation¶

In this step we first visualize all the signals in the recordings to examine the overall shapes (trend, variability, or periodicity) and the scale of its values (range and units).

Besides, eye-tracking data includes event masks, which label segments of the data corresponding to specific events such as blinks, saccades, or fixations. These masks are typically binary arrays or categorical labels indicating the presence of a specific event at each timestamp.

check_all = True
if check_all:
    fig,axes = plt.subplots(6, 1, figsize=(10, 36), sharex=False)

     # Plot x coordinate
    axes[0].plot(t_axis, recording["x_coordinate"].values, label='X Coordinate', color='blue')
    axes[0].set_title("X coordinate")
    axes[0].legend()
    axes[0].set_xlabel("time [s]")

    # Plot y coordinate
    axes[1].plot(t_axis, recording["y_coordinate"].values, label='Y Coordinate', color='green')
    axes[1].set_title("Y coordinate")
    axes[1].legend()
    axes[1].set_xlabel("time [s]")

    # Plot pupil x coordinate
    axes[2].plot(t_axis, recording["eye1_pupil_x_coordinate"].values, label='Pupil X Coordinate', color='blue')
    axes[2].set_title("Pupil X")
    axes[2].legend()
    axes[2].set_xlabel("time [s]")

    # Plot pupil y coordinate
    axes[3].plot(t_axis, recording["eye1_pupil_y_coordinate"].values, label='Pupil Y Coordinate', color='green')
    axes[3].set_title("Pupil Y")
    axes[3].legend()
    axes[3].set_xlabel("time [s]")

    # Plot href x coordinate
    axes[4].plot(t_axis, recording["eye1_href_x_coordinate"].values, label='Href X Coordinate', color='orange')
    axes[4].set_title("Href X")
    axes[4].legend()
    axes[4].set_xlabel("time [s]")

    # Plot screen ppdeg x coordinate
    axes[5].plot(t_axis, recording["screen_ppdeg_x_coordinate"].values, label='Screen PPdeg X Coordinate', color='red')
    axes[5].set_title("Screen X (ppdeg)")
    axes[5].legend()
    axes[5].set_xlabel("time [s]")

    plt.ylabel(" ");

png

Here we check the pupil area

check_pupil = True
if check_pupil:
    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["pupil_size"].values,
    )

    plt.xlabel("time [s]")
    plt.ylabel("pupil area [a.u.]");

    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["pupil_size"].values,
    )
    plt.title("Zoom In - Pupil Area")
    plt.xlabel("time [s]")
    plt.ylabel("pupil area [a.u.]")
    # JB1 (100, 150)
    plt.xlim((100, 150))

png

Check the eye events:

Blink
Saccade
Fixation

We want to first check the masks of each event and want to inspect the correlation with the signals we checked above.

# Check blinking
check_blink = True
if check_blink:
    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["blink"].values,
    )
    plt.title("Blink Event")
    plt.xlabel("time [s]")
    plt.ylabel("eyes closed");

    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["blink"].values * 10000,
    )
    plt.xlabel("time [s]")
    plt.ylabel("eyes closed")
    plt.title("Zoom in - Blink Event")
    plt.xlim((100, 150))

png

plot_pupil_blink = True
if plot_pupil_blink:
    fig = plt.figure(figsize=(16, 2))

    plt.plot(
        t_axis,
        recording["pupil_size"].values,
    )

    plt.plot(
        t_axis,
        recording["blink"].values * 5000,
    )
    # screen_coor = recording["screen_ppdeg_x_coordinate"].values
    # plt.plot(
    #     t_axis,
    #     (screen_coor-np.min(screen_coor))/(np.max(screen_coor)-np.min(screen_coor)) * 7000
    # )

    plt.xlabel("time [s]")
    plt.ylabel("pupil area [a.u.]")
    plt.xlim((260, 290))
    plt.title('Pupil area vs. blinking')

png

Check saccading

Check_saccading=True
if Check_saccading:
    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["saccade"].values,
    )
    plt.title("Saccade Event")
    plt.xlabel("time [s]")
    plt.ylabel("eyes closed");

    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["saccade"].values * 10000,
    )
    plt.xlabel("time [s]")
    plt.ylabel("eyes closed")
    plt.title("Zoom in - Saccade Event")
    plt.xlim((100, 150))

png

plot_pupil_saccade = True

if plot_pupil_saccade:
    fig = plt.figure(figsize=(16, 2))

    plt.plot(
        t_axis,
        recording["x_coordinate"].values,
    )

    plt.plot(
        t_axis,
        recording["saccade"].values * 500,
    )

    plt.xlabel("time [s]")
    plt.ylabel("x coord [pixel]")
    plt.xlim((206, 210))
    plt.ylim((300, 600))
    plt.title('x coordinates vs. saccading')

png

Check fixation

plot_fixation = True
xlim = (206, 210)
ylim = (200, 600)

if plot_fixation:
    # Fixation
    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["x_coordinate"].values,
    )

    plt.plot(
        t_axis,
        recording["fixation"].values * 500,
    )
    plt.xlim(xlim)
    plt.ylim(ylim)
    plt.title('X coordinate of gazing with fixation')

    fig = plt.figure(figsize=(16, 2))
    plt.plot(
        t_axis,
        recording["y_coordinate"].values,
    )

    plt.plot(
        t_axis,
        recording["fixation"].values * 500,
    )
    plt.xlim(xlim)
    plt.ylim(ylim)
    plt.title('Y coordinate of gazing with fixation')

png

Did you find the correlation between x/y coordinates, pupil area and saccade or fixation events?

Step 4: Data Cleaning¶

Extract the signals we need
Eliminate the blinking and non-fixation area of ET data.
Design some criteria for preserving the useful data
- Visual angle
- Heat maps
- Fixation algorithm (literature review)

4.1: Extract the signals we need¶

In this exercise, we are focusing on a subset of the eye-tracking (ET) data. Specifically, we will process the first portion of the ET data, which corresponds to the duration of the sequence T1_LIBRE.

We will extract the first portion of the data with a length matching T1_LIBRE. Inspect the extracted subset to understand its structure and characteristics, such as gaze coordinates and event markers.

# Here is the given information: time duration (in sec) according to the prior information.

if  subject_idx == 1:
    T1_LIBRE = 374.558
elif subject_idx == 2:
    T1_LIBRE = 374.380
elif subject_idx == 3:
    T1_LIBRE = 334.237
else:
    T1_LIBRE = 374.565



print(f"The length of T1_LIBRE Subject{subject_idx} should be: {T1_LIBRE}")

The length of T1_LIBRE Subject1 should be: 374.558

# Size of screen
size = (
    metadata["ScreenAOIDefinition"][1][1],
    metadata["ScreenAOIDefinition"][1][3],
)

# For easier processing the entire recording and only the coordinate
# We stored the (x,y) coordinate in the coor_data 
# And stored the entire recording in the coor_recording
coor_data = recording[["x_coordinate", "y_coordinate"]]
print(f"Extract X Y coordinates from recording: {len(coor_data)}")

coor_recording = recording


coor_data.head()
coor_recording.head()

Extract X Y coordinates from recording: 754932

# start_margin determines where to begin extracting the useful portion of the eye-tracking (ET) data.
# In our case, it is set as 0.

start_margin = int(0.0*metadata['SamplingFrequency']) 

print(f'mode: {mode}')
print(f"set start margin as {start_margin}")


T1_LIBRE_sample = T1_LIBRE*metadata['SamplingFrequency']



t_axis_xy = (
    coor_data.index 
) / metadata["SamplingFrequency"]

# x coordinate
fig = plt.figure(figsize=(16, 2))

plt.plot(
    t_axis_xy,
    coor_data["x_coordinate"].values,
)


plt.axvline(x=start_margin/metadata["SamplingFrequency"], color='r', linestyle='--', label='LIBRE starts')
plt.axvline(x=(T1_LIBRE_sample+start_margin)/metadata["SamplingFrequency"], color='b', linestyle='--', label='LIBRE ends')


plt.title("x_coordinate along time")
plt.xlabel("time [s]")
plt.ylabel("x coordinate [px]")
plt.legend()

# y coordinate
fig = plt.figure(figsize=(16, 2))

plt.plot(
    t_axis_xy,
    coor_data["y_coordinate"].values,
)

plt.axvline(x=start_margin/metadata["SamplingFrequency"], color='r', linestyle='--', label='LIBRE starts')
plt.axvline(x=(T1_LIBRE_sample+start_margin)/ metadata["SamplingFrequency"], color='b', linestyle='--', label='LIBER ends')



plt.title("y_coordinate along time")
plt.xlabel("time [s]")
plt.ylabel("y coordinate [px]")
plt.legend()

coor_data_LIBRE = coor_data[start_margin:int(T1_LIBRE_sample)] 
coor_recording_LIBRE = coor_recording[start_margin:int(T1_LIBRE_sample)] 

coor_data_LIBRE_raw = copy.deepcopy(coor_data_LIBRE)
print(len(coor_data_LIBRE))

mode: T1
set start margin as 0
374558

png

4.2 Eliminating blinking¶

Based on the blink mask explored earlier, we now process the data to handle timestamps where blinks occurred. During a blink, gaze data is unreliable or missing, so we mark these periods by assigning NaN (Not a Number) to the gaze coordinates.

use_eliminate_blink = True

def eliminate_blink(coor_data, coor_recording, seq_name):
# Note: instead of filtering them out, I assign nan to the invalid elements to preserve the time information
    print(f'The data will be eliminated: {len(coor_data[coor_recording.blink > 0])}')
    coor_data.loc[coor_recording.blink > 0,  ['x_coordinate', 'y_coordinate']] = np.nan
    coor_recording.loc[coor_recording.blink > 0,  ['x_coordinate', 'y_coordinate']] = np.nan
    print(f'After eliminating blinking: \nlen of coor_data_{seq_name} {len(coor_data)} \
      should be equal to len of coor_recording {len(coor_recording)}')
    return coor_data, coor_recording


if use_eliminate_blink:
    coor_data_LIBRE,  coor_recording_LIBRE = eliminate_blink(coor_data_LIBRE, coor_recording_LIBRE, 'LIBRE')
# coor_data_STANDARD,  coor_recording_STANDARD = eliminate_blink(coor_data_STANDARD, 
#                                                                coor_recording_STANDARD, 'STANDARD')
plot_heatmap = False
if plot_heatmap:
    import plot
    plot.plot_heatmap_coordinate(coor_data_LIBRE, density=False, screen_size=size, 
                                 title='LIBRE: The gaze from the beginning to the end')
    plot.plot_heatmap_coordinate(coor_data_STANDARD, density=False, screen_size=size, 
                                 title='STANDARD: The gaze from the beginning to the end')

The data will be eliminated: 9513
After eliminating blinking: 
len of coor_data_LIBRE 374558       should be equal to len of coor_recording 374558


/var/folders/x4/yl1kbpks5sxc3345y3ttk6gm0000gn/T/ipykernel_28918/3369306811.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  coor_data.loc[coor_recording.blink > 0,  ['x_coordinate', 'y_coordinate']] = np.nan

4.3 Preserving fixation¶

Similarly, we will preserve the timestamps where fixation events are detected.

use_preserve_fixation = True

def preserve_fixation(coor_data, coor_recording, seq_name):
#     coor_data = coor_data[coor_recording.fixation > 0]
#     coor_recording = coor_recording[coor_recording.fixation > 0]
    print(f'The data will be eliminated: {len(coor_data[coor_recording.fixation < 1])}')
    coor_data.loc[coor_recording.fixation < 1, ['x_coordinate', 'y_coordinate']] = np.nan
    coor_recording.loc[coor_recording.fixation < 1, ['x_coordinate', 'y_coordinate']] = np.nan

    return coor_data, coor_recording


if use_preserve_fixation:
    coor_data_LIBRE,  coor_recording_LIBRE = preserve_fixation(coor_data_LIBRE, coor_recording_LIBRE, 'LIBRE')

plot_heatmap = False
if plot_heatmap:
    import plot
    plot.plot_heatmap_coordinate(coor_data_LIBRE, density=False, screen_size=size, 
                                 title='LIBRE: The gaze from the beginning to the end')

The data will be eliminated: 21482


/var/folders/x4/yl1kbpks5sxc3345y3ttk6gm0000gn/T/ipykernel_28918/3299493500.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  coor_data.loc[coor_recording.fixation < 1, ['x_coordinate', 'y_coordinate']] = np.nan

4.4 Plot the gazing dots¶

To visualize the gaze points before/after the data cleaning

from matplotlib.font_manager import FontProperties
title_font = FontProperties(family='Times New Roman', size=20, weight='bold')
axis_font = FontProperties(family='Times New Roman', size=20)
# ============================================================
# Example data (replace with your actual data)
X_coord = coor_data_LIBRE_raw['x_coordinate']
Y_coord = coor_data_LIBRE_raw['y_coordinate']
fig, ax= plt.subplots(figsize=(8, 6))
# Plot the data, flipping X coordinates and using dots as markers
plt.scatter(X_coord, Y_coord, s=50, c='#00468b', alpha=0.1, edgecolors='#00468b', linewidth=0.5)
plt.xlim((0, 800))
plt.ylim((0, 600))
# Set plot title
plt.title('LIBRE: Before filtering', fontproperties=title_font)

for label in plt.gca().get_xticklabels():
    label.set_fontproperties(axis_font)

for label in plt.gca().get_yticklabels():
    label.set_fontproperties(axis_font)
# Reverse the direction of the Y-axis
plt.gca().invert_yaxis()
plt.gca().invert_xaxis()

# ============================================================
# Example data (replace with your actual data)
X_coord = coor_data_LIBRE['x_coordinate']
Y_coord = coor_data_LIBRE['y_coordinate']
fig, ax= plt.subplots(figsize=(8, 6))
# Plot the data, flipping X coordinates and using dots as markers
plt.scatter(X_coord, Y_coord, s=50, c='#00468b', alpha=0.1, edgecolors='#00468b', linewidth=0.5)
plt.xlim((0, 800))
plt.ylim((0, 600))
# Set plot title
plt.title('LIBRE: After Filtering', fontproperties=title_font)

for label in plt.gca().get_xticklabels():
    label.set_fontproperties(axis_font)

for label in plt.gca().get_yticklabels():
    label.set_fontproperties(axis_font)
# Reverse the direction of the Y-axis
plt.gca().invert_yaxis()
plt.gca().invert_xaxis()
# ============================================================

png

4.5 Analyzing Gaze Point Distribution Before and After Filtering¶

After filtering out blink-related timestamps by assigning NaN to gaze coordinates, it's important to inspect how this operation has affected the distribution of gaze points along each dimension (e.g., x_coordinate and y_coordinate).

import seaborn as sns
import matplotlib.pyplot as plt

# Set up the figure
fig, axes = plt.subplots(2, 1, figsize=(16, 10))

# Plot KDE for eye1_x_coordinate
sns.kdeplot(coor_data_LIBRE_raw['x_coordinate'].dropna(), ax=axes[0], color='blue', fill=True, label='raw_x_coordinate')
sns.kdeplot(coor_data_LIBRE['x_coordinate'].dropna(), ax=axes[0], color='orange', fill=True, label='filtered_x_coordinate')
axes[0].set_title("X Coordinate: Raw vs. Filtered")
axes[0].set_xlabel("X Coordinate (px)")
axes[0].set_ylabel("Density")
axes[0].set_xlim((250,550))
axes[0].legend()

# Plot KDE for y coordinate
sns.kdeplot(coor_data_LIBRE_raw["y_coordinate"].dropna(), ax=axes[1], color='teal', fill=True, label='raw_y_coordinate')
sns.kdeplot(coor_data_LIBRE["y_coordinate"].dropna(), ax=axes[1], color='coral', fill=True, label='filtered_y_coordinate')
axes[1].set_title("Y Coordinate: Raw vs. Filtered")
axes[1].set_xlabel("Y Coordinate (px)")
axes[1].set_ylabel("Density")
axes[1].set_xlim((250,350))
axes[1].legend()

# Add a common title
fig.suptitle("Distribution of Eye-Tracking Data Along X Dimension", fontsize=16)

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

png

Step 5: Calculate the mask, statistics¶

You can calculate the mask based on your user case.

In the following step, I will present my method to generate the mask according to the project I introduced in the course.

from mask_gen import cal_mask, plot_x_y_coord, filter_XY_coord

def cal_stats(coor_data):
    stats_dict = {}
    X_coord = coor_data["x_coordinate"].values
    Y_coord = coor_data["y_coordinate"].values

    stats_dict['x_mean'] = np.nanmean(X_coord)
    stats_dict['x_median'] = np.nanmedian(X_coord)      # Median, ignoring NaNs
    stats_dict['x_std'] = np.nanstd(X_coord)        # Standard deviation
    stats_dict['x_min'] = np.nanmin(X_coord)      # Minimum value
    stats_dict['x_max'] = np.nanmax(X_coord)      # Maximum value
    # Calculate 25th, 50th (median), and 75th percentiles while ignoring NaN values
    stats_dict['x_q25'] = np.nanpercentile(X_coord, 25)  # 25th percentile
    stats_dict['x_q75'] = np.nanpercentile(X_coord, 75)  # 75th percentile

    stats_dict['y_mean'] = np.nanmean(Y_coord)
    stats_dict['y_median'] = np.nanmedian(Y_coord)      # Median, ignoring NaNs
    stats_dict['y_std'] = np.nanstd(Y_coord)        # Standard deviation
    stats_dict['y_min'] = np.nanmin(Y_coord)      # Minimum value
    stats_dict['y_max'] = np.nanmax(Y_coord)      # Maximum value
    stats_dict['y_q25'] = np.nanpercentile(Y_coord, 25)  # 25th percentile
    stats_dict['y_q75'] = np.nanpercentile(Y_coord, 75)  # 75th percentile

    return stats_dict

coor_data_LIBRE_1 = copy.deepcopy(coor_data_LIBRE)
stats_dict = cal_stats(coor_data_LIBRE_1)
print(stats_dict)

# You can play with the upper and lower boundary to filter the data as you need
stand_x_upper=stats_dict['x_q75']-stats_dict['x_median']
stand_x_lower=stats_dict['x_q25']-stats_dict['x_median']
stand_y_upper=stats_dict['y_q75']-stats_dict['y_median']
stand_y_lower=stats_dict['y_q25']-stats_dict['y_median']
print(f'stand_x_upper-{stand_x_upper} stand_x_lower-{stand_x_lower} stand_y_upper-{stand_y_upper} stand_y_lower-{stand_y_lower}')

Disp_dict_LIBRE = cal_mask(coor_data_LIBRE_1, stand_x_upper, stand_x_lower, 
                           stand_y_upper, stand_y_lower)
plot_x_y_coord(Disp_dict_LIBRE, metadata, duration=None, start_sample=0, seq_name='LIBRE')
filtered_coor_data_LIBRE, Preserve_mask, Discard_mask = filter_XY_coord(coor_data_LIBRE_1, Disp_dict_LIBRE, None)

{'x_mean': 397.06954140148815, 'x_median': 395.5, 'x_std': 13.955737401446955, 'x_min': 353.3999938964844, 'x_max': 609.7999877929688, 'x_q25': 388.2999877929688, 'x_q75': 405.2999877929688, 'y_mean': 305.2710657401416, 'y_median': 305.29998779296875, 'y_std': 9.855362153300605, 'y_min': 225.8000030517578, 'y_max': 460.7000122070313, 'y_q25': 301.0, 'y_q75': 309.0}
stand_x_upper-9.799987792968807 stand_x_lower--7.200012207031193 stand_y_upper-3.70001220703125 stand_y_lower--4.29998779296875
X_coord [        nan         nan         nan ... 398.8999939 398.5
 398.6000061]
Y_coord [         nan          nan          nan ... 302.1000061  301.79998779
 300.1000061 ]
After cleaning nan, eminating data affected by blinking,            and preserving the fixation 
The length of X coordinate data: 374558
The length of Y coordinate data: 374558

png

coor_data_LIBRE_1 = copy.deepcopy(coor_data_LIBRE)
stats_dict = cal_stats(coor_data_LIBRE_1)
print(stats_dict)

# You can play with the upper and lower boundary to filter the data as you need
stand_x_upper=40
stand_x_lower=-40
stand_y_upper=40
stand_y_lower=-40
print(f'stand_x_upper-{stand_x_upper} stand_x_lower-{stand_x_lower} stand_y_upper-{stand_y_upper} stand_y_lower-{stand_y_lower}')

Disp_dict_LIBRE = cal_mask(coor_data_LIBRE_1, stand_x_upper, stand_x_lower, 
                           stand_y_upper, stand_y_lower)
plot_x_y_coord(Disp_dict_LIBRE, metadata, duration=None, start_sample=0, seq_name='LIBRE')
filtered_coor_data_LIBRE, Preserve_mask, Discard_mask = filter_XY_coord(coor_data_LIBRE_1, Disp_dict_LIBRE, None)

{'x_mean': 397.06954140148815, 'x_median': 395.5, 'x_std': 13.955737401446955, 'x_min': 353.3999938964844, 'x_max': 609.7999877929688, 'x_q25': 388.2999877929688, 'x_q75': 405.2999877929688, 'y_mean': 305.2710657401416, 'y_median': 305.29998779296875, 'y_std': 9.855362153300605, 'y_min': 225.8000030517578, 'y_max': 460.7000122070313, 'y_q25': 301.0, 'y_q75': 309.0}
stand_x_upper-40 stand_x_lower--40 stand_y_upper-40 stand_y_lower--40
X_coord [        nan         nan         nan ... 398.8999939 398.5
 398.6000061]
Y_coord [         nan          nan          nan ... 302.1000061  301.79998779
 300.1000061 ]
After cleaning nan, eminating data affected by blinking,            and preserving the fixation 
The length of X coordinate data: 374558
The length of Y coordinate data: 374558

png

# You can play with the upper and lower boundary to filter the data as you need
stand_x_upper=40
stand_x_lower=-40
stand_y_upper=40
stand_y_lower=-40
print(f'stand_x_upper-{stand_x_upper} stand_x_lower-{stand_x_lower} stand_y_upper-{stand_y_upper} stand_y_lower-{stand_y_lower}')

Disp_dict_LIBRE = cal_mask(coor_data_LIBRE_1, stand_x_upper, stand_x_lower, 
                           stand_y_upper, stand_y_lower)
plot_x_y_coord(Disp_dict_LIBRE, metadata, duration=None, start_sample=0, seq_name='LIBRE')
filtered_coor_data_LIBRE, Preserve_mask, Discard_mask = filter_XY_coord(coor_data_LIBRE_1, Disp_dict_LIBRE, None)

stand_x_upper-40 stand_x_lower--40 stand_y_upper-40 stand_y_lower--40
X_coord [        nan         nan         nan ... 398.8999939 398.5
 398.6000061]
Y_coord [         nan          nan          nan ... 302.1000061  301.79998779
 300.1000061 ]
After cleaning nan, eminating data affected by blinking,            and preserving the fixation 
The length of X coordinate data: 374558
The length of Y coordinate data: 374558

png

def visualization_func(fig_title, filtered_coor_data_LIBRE):
    from matplotlib.font_manager import FontProperties
    title_font = FontProperties(family='Times New Roman', size=20, weight='bold')
    axis_font = FontProperties(family='Times New Roman', size=20)

    fig, ax= plt.subplots(figsize=(8, 6))
    plt.title(fig_title, fontproperties=title_font)
#     -----------------------------------------------------------------------
#     # Plot the data, flipping X coordinates and using dots as markers
#     X_coord = coor_data_STANDARD['x_coordinate']
#     Y_coord = coor_data_STANDARD['y_coordinate']
#     plt.plot(X_coord, Y_coord, '.', color='#ADD8E6', markersize=15, label='Standard')
#     -----------------------------------------------------------------------
    coor_data_LIBRE_vis = copy.deepcopy(coor_data_LIBRE)
    X_coord_1 = coor_data_LIBRE_raw['x_coordinate']
    Y_coord_1 = coor_data_LIBRE_raw['y_coordinate']

    # plt.plot(X_coord_1, Y_coord_1, '.', color='#728FCE', markersize=15, label='LIBRE w.o. binning')
    plt.scatter(X_coord_1, Y_coord_1, s=50, c='#728FCE', alpha=0.1, edgecolors='#728FCE', linewidth=0.5)

#     -----------------------------------------------------------------------
    X_coord = filtered_coor_data_LIBRE['x_coordinate']
    Y_coord = filtered_coor_data_LIBRE['y_coordinate']
    print(F"if same {X_coord_1 == X_coord}")
    # plt.plot(X_coord, Y_coord, '.', color='#f4d03f', markersize=15, label='LIBRE binning')
    plt.scatter(X_coord, Y_coord, s=50, c='#f4d03f', alpha=0.1, edgecolors='#f4d03f', linewidth=0.5)  # Larger points
#     ----------------------------------------------------------------------------------------------
    # plt.legend(prop={'family': 'Times New Roman', 'size': 10})

    plt.tick_params(axis='x', labelsize=14)
    plt.tick_params(axis='y', labelsize=14)
    plt.xlim((0, 800))
    plt.ylim((0, 600))
    # Set plot title
#     plt.title(fig_title, fontproperties=title_font)
    for label in plt.gca().get_xticklabels():
        label.set_fontproperties(axis_font)

    for label in plt.gca().get_yticklabels():
        label.set_fontproperties(axis_font)
    # Reverse the direction of the Y-axis
    plt.gca().invert_yaxis()
    plt.gca().invert_xaxis()


# Visualization of filtered coor data LIBRE
visualization_func(fig_title='Before vs. After Filtering', 
                   filtered_coor_data_LIBRE=filtered_coor_data_LIBRE)

if same 0         False
1         False
2         False
3         False
4         False
          ...  
374553     True
374554     True
374555     True
374556     True
374557     True
Name: x_coordinate, Length: 374558, dtype: bool

png

Step 6: Investigate the statistics of eye movement events across different subjects¶

Eye movement events such as blinks, saccades, and fixations provide valuable information about gaze behavior and data quality. By analyzing these events across different subjects, we can assess the consistency and reliability of the recorded data.

6.1 Statistics calculation¶

# Specify the subject and modality you want to inspect

T_idx = 1
recording_list = []
for subject_idx in range(1,5):
    print(f'Load recording for subject {subject_idx}...')
    if T_idx == 1:
        mode = 'T1'
    else:
        mode = 'T2'


    BIDS_PATH = Path("./data/")  # file within a subdirectory
    FILE_NAME = f"sub00{subject_idx}_T{T_idx}"    


    tsv_name = f"{FILE_NAME}.tsv.gz"

    recording_file = BIDS_PATH / tsv_name
    print(f'recording_file: {recording_file}')

    recording = pd.read_csv(
        recording_file,
    #     sep=r"\s+",
        sep="\t",
        na_values="n/a",
    )

    # Duration setting according to the MRI sequence info
    if  subject_idx == 1:
        T1_LIBRE = 374.558
        T2_LIBRE = 650.185
    elif subject_idx == 2:
        T1_LIBRE = 374.380
        T2_LIBRE = 650.185
    elif subject_idx == 3:
        T1_LIBRE = 334.237
        T2_LIBRE = 650.1925 
    else:
        T1_LIBRE = 374.565
        T2_LIBRE = 650.1875

    if T_idx == 1:
        print(f"The length of T1_LIBRE Subject{subject_idx} should be: {T1_LIBRE}")
    else:
        print(f"The length of T2_LIBRE Subject{subject_idx} should be: {T2_LIBRE}")

    coor_recording = recording
    start_margin = int(0.0)
    if mode == 'T1':
        print(f'mode: {mode}')
        T1_LIBRE_sample = T1_LIBRE*metadata['SamplingFrequency']

    else:
        print(f'mode: {mode}')
        T2_LIBRE_sample = T2_LIBRE*metadata['SamplingFrequency']


    if mode == 'T1':
        coor_recording_LIBRE = coor_recording[start_margin:int(T1_LIBRE_sample)]   
    else:
        coor_recording_LIBRE = coor_recording[start_margin:int(T2_LIBRE_sample)] 
    coor_recording_LIBRE = coor_recording_LIBRE.rename(columns={'eye1_saccade': 'saccade'})
    coor_recording_LIBRE = coor_recording_LIBRE.rename(columns={'eye1_fixation': 'fixation'})
    coor_recording_LIBRE = coor_recording_LIBRE.rename(columns={'eye1_blink': 'blink'})
    recording_list.append(coor_recording_LIBRE)

Load recording for subject 1...
recording_file: data/sub001_T1.tsv.gz
The length of T1_LIBRE Subject1 should be: 374.558
mode: T1
Load recording for subject 2...
recording_file: data/sub002_T1.tsv.gz
The length of T1_LIBRE Subject2 should be: 374.38
mode: T1
Load recording for subject 3...
recording_file: data/sub003_T1.tsv.gz
The length of T1_LIBRE Subject3 should be: 334.237
mode: T1
Load recording for subject 4...
recording_file: data/sub004_T1.tsv.gz
The length of T1_LIBRE Subject4 should be: 374.565
mode: T1

len(recording_list)

# Identify start and end indices of each event
def identify_start_end_event(df, events):
    event_dict = {}
    for event in events:
        event_change = event+'_change'

        event_starts = df.index[df[event_change] == 1].tolist()
        event_ends = df.index[df[event_change] == -1].tolist()

        # If the last row is part of a event, add it as an end
        if df[event].iloc[-1] == 1:
            event_ends.append(df.index[-1])

        # Calculate number of events
        num_events = len(event_starts)

        # Calculate duration for each event
        event_durations = []
        for start, end in zip(event_starts, event_ends):
            # Duration in milliseconds or based on the timestamp difference
            duration = end - start + 1  # If in milliseconds
            # If you have a timestamp column (e.g., in ms), you could calculate it as:
            # duration = df['timestamp'].iloc[end] - df['timestamp'].iloc[start]
            event_durations.append(duration)

        # Display results
        num_event_str = 'num_'+event
        event_dict[num_event_str] = num_events
        event_duration_str = event+'_durations'
        event_dict[event_duration_str] = event_durations

    return event_dict

events = ['saccade', 'fixation', 'blink']
subject_event_dict = []
for idx in range(4):
    df = copy.deepcopy(recording_list[idx])
    # Detect the start and end of each saccade
    df['saccade_shift'] = df['saccade'].shift(1, fill_value=0)
    df['saccade_change'] = df['saccade'] - df['saccade_shift']

    # Detect the start and end of each fixation
    df['fixation_shift'] = df['fixation'].shift(1, fill_value=0)
    df['fixation_change'] = df['fixation'] - df['fixation_shift']

    # Detect the start and end of each blink
    df['blink_shift'] = df['blink'].shift(1, fill_value=0)
    df['blink_change'] = df['blink'] - df['blink_shift']

    event_dict = identify_start_end_event(df, events)
    subject_event_dict.append(event_dict)

def cal_event_stat(event_duration):

    event_stat_dict={}
    event_stat_dict['mean_duration'] = np.mean(event_duration)
    event_stat_dict['median_duration'] = np.median(event_duration)
    event_stat_dict['std_duration'] = np.std(event_duration)
    event_stat_dict['min_duration'] = np.min(event_duration)
    event_stat_dict['max_duration'] = np.max(event_duration)
    event_stat_dict['total_duration'] = sum(event_duration)
    event_stat_dict['times'] = len(event_duration)

    return event_stat_dict

def voilinplot(subject_event_dict, event,color='skyblue'):
    sub_num = len(subject_event_dict)
    event_duration_list=[]
    for idx in range(sub_num):
        event_durations_str = f'{event}_durations'
        event_duration_list.append(subject_event_dict[idx][event_durations_str])

    df = pd.DataFrame({
        'Duration': [duration for sublist in event_duration_list for duration in sublist],
        'Subject': [f'Subject {i+1}' for i, sublist in enumerate(event_duration_list) for _ in sublist]
    })

    plt.figure(figsize=(6, 5))
    sns.violinplot(data=df, x='Subject', y='Duration', color=color)
    plt.title(f"Violin Plot of {event} Durations Across Subjects")
    plt.ylabel("Duration (ms)")
    plt.xlabel("Subject")
    plt.show()

subject_blink_stat = []
subject_saccade_stat = []
subject_fixation_stat = []
for idx in range(4):
    event_dict = subject_event_dict[idx]

    blink_stat = cal_event_stat(event_dict['blink_durations'])
    subject_blink_stat.append(blink_stat)

    saccade_stat = cal_event_stat(event_dict['saccade_durations'])
    subject_saccade_stat.append(saccade_stat)

    fixation_stat = cal_event_stat(event_dict['fixation_durations'])
    subject_fixation_stat.append(fixation_stat)

subject_fixation_stat

[{'mean_duration': 485.3278463648834,
  'median_duration': 247.0,
  'std_duration': 623.9966482850247,
  'min_duration': 22,
  'max_duration': 4157,
  'total_duration': 353804,
  'times': 729},
 {'mean_duration': 672.0169811320754,
  'median_duration': 491.5,
  'std_duration': 594.362456671281,
  'min_duration': 17,
  'max_duration': 5003,
  'total_duration': 356169,
  'times': 530},
 {'mean_duration': 417.9804195804196,
  'median_duration': 246.0,
  'std_duration': 466.94418247439074,
  'min_duration': 24,
  'max_duration': 3348,
  'total_duration': 298856,
  'times': 715},
 {'mean_duration': 128.8170276325616,
  'median_duration': 92.0,
  'std_duration': 117.27843003164334,
  'min_duration': 9,
  'max_duration': 1012,
  'total_duration': 172486,
  'times': 1339}]

A Violin Plot is a Useful Tool for Visualizing Event Statistics.

A violin plot combines the features of a box plot and a density plot, making it particularly useful for visualizing the distribution of data. It provides a comprehensive view of both the summary statistics and the underlying data distribution, which is highly beneficial when analyzing eye movement event statistics across different subjects.

voilinplot(subject_event_dict, event='blink', color='skyblue')
voilinplot(subject_event_dict, event='saccade',color='coral')
voilinplot(subject_event_dict, event='fixation',color='teal')

png

6.2 Statistics Test¶

Statistical tests, such as the Mann-Whitney U Test, t-Test, and Kolmogorov-Smirnov Test are used to compare statistics to determine whether differences between groups or conditions are significant or due to random variation. Without statistical tests, comparisons might rely on subjective observations or visual inspection, which are prone to bias. Statistical tests provide quantifiable and reproducible evidence to support conclusions.

The Mann-Whitney U test is a non-parametric statistical test used to determine whether there is a significant difference between the distributions of two independent samples. It is particularly useful when comparing groups where the data may not meet the assumptions required by parametric tests like the t-test.
T-test requires normally distributed data and equal variances, which may not hold true for eye movement statistics. Violating these assumptions could lead to misleading results.
Kolmogorov-Smirnov Test compares entire distributions but lacks the sensitivity of the Mann-Whitney U test for detecting location shifts (differences in medians).

Eye-tracking event statistics (e.g., blink counts, fixation durations) may not follow normal distributions due to inter-subject variability or artifacts. The Mann-Whitney U test allows robust comparison of these statistics between groups or conditions without requiring strict assumptions about the data.

def mann_whitney_u_stats(subject_event_dict, event):
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    from scipy.stats import mannwhitneyu
    sub_num = len(subject_event_dict)
    event_duration_list=[]
    for idx in range(sub_num):
        event_durations_str = f'{event}_durations'
        event_duration_list.append(subject_event_dict[idx][event_durations_str])

    # Prepare data for Seaborn
    df = pd.DataFrame({
        'Duration': [duration for sublist in event_duration_list for duration in sublist],
        'Subject': [f'subjects {i+1}' for i, sublist in enumerate(event_duration_list) for _ in sublist]
    })

    print(f"Pairwise Mann-Whitney U test results for {event} durations:")
    for i in range(sub_num):
        for j in range(i + 1, sub_num):
            stat, p_value = mannwhitneyu(event_duration_list[i], event_duration_list[j], alternative='two-sided')
            print(f"Comparison between {f'subjects {i+1}'} and {f'subjects {j+1}'}: U = {stat}, p = {p_value:.4f}")

mann_whitney_u_stats(subject_event_dict,'saccade')

Pairwise Mann-Whitney U test results for saccade durations:
Comparison between subjects 1 and subjects 2: U = 126157.0, p = 0.0000
Comparison between subjects 1 and subjects 3: U = 162432.0, p = 0.0000
Comparison between subjects 1 and subjects 4: U = 154110.5, p = 0.0000
Comparison between subjects 2 and subjects 3: U = 165390.0, p = 0.0002
Comparison between subjects 2 and subjects 4: U = 158977.5, p = 0.0000
Comparison between subjects 3 and subjects 4: U = 267041.5, p = 0.0000