- Queue size: the maximum number of requests that can be queued before a new GPU is turned on, by default it is 1.
- Max number of GPUs: the maximum number of GPUs that can be turned on at the same time, by default it is 1.
- Min number of GPUs: the minimum number of GPUs that should be turned on, by default it is 0. If this value is greater than 0, new users logging in to your application won’t experience a cold start, but billing will be higher.
- Idle time: the number of seconds of idle time before GPUs are turned off, by default it is 30 seconds. This means that a GPU will wait 30 seconds after it is done processing its last request before turning off. If GPUs wait longer before turning off, your users will experience fewer cold starts, but billing will be higher.