Gymnasium and RL¶

GymTEPEnv wraps the simulator in a Gymnasium-compatible environment.

1. Create the environment¶

import numpy as np
from tep_studio import GymTEPEnv

env = GymTEPEnv(control_interval=0.01, horizon=24.0)

2. Reset¶

obs, info = env.reset(seed=123)

print(obs.shape)
print(info["shutdown_status"])

Expected observation shape:

(41,)

3. Step¶

action = np.array([
    63.053, 53.98, 24.644, 61.302, 22.21, 40.064,
    38.10, 46.534, 47.446, 41.106, 18.114, 50.0,
])

next_obs, reward, terminated, truncated, step_info = env.step(action)

print(reward)
print(terminated, truncated)
print(step_info["constraint_margins"])

The environment uses the Gymnasium five-return step signature:

obs, reward, terminated, truncated, info = env.step(action)

Action and observation spaces¶

The action space is:

env.action_space

It is a 12-dimensional Box from 0 to 100.

The observation space is:

env.observation_space

It is a 41-dimensional Box.

Termination and truncation¶

The environment uses:

terminated=True when the process shuts down;
truncated=True when the horizon is reached and the process has not already terminated.

This distinction is important for RL algorithms. A process shutdown is an endogenous process event. A horizon cutoff is an experiment-design choice.

Default reward¶

The built-in reward is only a demonstration reward. It penalizes reactor pressure limit violation and action deviation from 50 percent.

For a real control or RL study, define your own reward and document:

economic objective or tracking objective;
safety constraints;
shutdown handling;
disturbance distribution;
initial-condition distribution;
baseline controllers;
train, validation, and test scenarios.