Large Language Model-Guided SARSA Algorithm for Dynamic Task Scheduling in Cloud Computing


1: Start
2: Input: Input the set of task
           T = t 1 , t 2 , t 3 , t 4 , , t m
3: Output: Output task scheduling policies
          Π P = Π P 1 , Π P 2 , Π P 3 , Π P 4 , Π P 5 , Π P p
4: Initialize Q s , a , s S , a A s , a r b i t r a r i l y   a n d   Q t e r m i n a l   s t a t e = 0
5: Initialize LLM heuristic Q buffer D ( G ( p ) = “> p , i = 1,2 n
6: For each episode S, perform
7:   Training phase of LLM_SARSA
8:        For every task in training task set t i T , perform
9:        Initialize state S, Action A
10:      Choose Action A from state S using the policy derived from g r e e d y  
11:        For each step of episode, perform
12:         Take action A, observe reward R, and go to next step S
13:         Choose action A from state S using the policy derived from g r e e d y  
11:       Q S , A = Q S , A + α ( R + γ Q S , A Q ( S , A )
12:          Update S S , A A
12:          Compute the LLM heuristic value D ( G p = D ( G ( p ) Q S , A , R , S
13:          Update the Q S , A with LLM D ( G p
14:          Q * S , A = ( Q S , A + α ( R + γ Q S , A Q S , A ) + D ( G p )
15:          Employ L2 loss to approximate the Q * value
16:          L2( Q * S , A ) = E r r o r S , A , Q * S , A D ( G p ( Q * * S , A Q * S , A ) 2
17:         Q * * S , A = L 2 ( Q * S , A ) + ( Q * S , A + α ( R + γ Q * S , A Q * S , A ) + D ( G p )
18:           End For of episode until S is terminal
19:    End For
20:    Testing phase of LLM_SARSA
21:       For every task in testing task set t i T , perform
22:          Initialize state S, Action A
23:          Choose Action A from state S using the policy derived from g r e e d y  
24:           For each step of episode, perform
25:              Execute the action A from state S with updated heuristic value and L2 loss value
26:              Q * * S , A = L 2 ( Q * S , A ) + ( Q * S , A + α ( R + * Q * S , A Q * S , A ) + D ( G p )
27:           End For of episode until S is terminal
28:          End For
29: End For
30: Output Π P = Π P 1 , Π P 2 , Π P 3 , Π P 4 , Π P 5 , Π P p
34: Stop



Source link

Bhargavi Krishnamurthy www.mdpi.com