Why two stages instead of one big network
A single regressor fits the average regime well and the tails badly. Routing first turns a hard regression problem into three easier ones, and the routing is interpretable - operators can see which regime each forecast came from, which makes the system trustworthy in a way an opaque end-to-end network is not.