Large Language Model-Guided SARSA Algorithm for Dynamic Task Scheduling in Cloud Computing

1: Start
2: Input: Input the set of task

T = t_{1}, t_{2}, t_{3}, t_{4}, \dots, t_{m}

3: Output: Output task scheduling policies

Π P = {Π P}_{1}, {Π P}_{2}, {Π P}_{3}, {Π P}_{4}, {Π P}_{5} \dots, {Π P}_{p}

4: Initialize

Q (s, a), \forall s \in S, a \in A (s), a r b i t r a r i l y a n d Q (t e r m i n a l s t a t e) = 0

5: Initialize LLM heuristic Q buffer

D (G (p) = “>p, i = 1,2 \dots n

6: For each episode S, perform
7: Training phase of LLM_SARSA
8: For every task in training task set

t_{i} \in T,

perform
9: Initialize state S, Action A
10: Choose Action A from state S using the policy derived from

\in g r e e d y

11: For each step of episode, perform
12: Take action A, observe reward R, and go to next step

S^{'}

13: Choose action

A^{'}

from state

S^{'}

using the policy derived from

\in g r e e d y

11:

Q (S, A) = Q (S, A) + α (R + γ * Q (S^{'}, A^{'}) - Q (S, A)

12: Update

S \leftarrow S^{'}

,

A \leftarrow A^{'}

12: Compute the LLM heuristic value

D (G (p) = D (G (p) \cup Q (S, A, R, S^{'})

13: Update the

Q (S, A)

with LLM

D (G (p)

14:

Q^{*} (S, A) = (Q (S, A) + α (R + γ * Q (S^{'}, A^{'}) - Q (S, A)) + D (G (p))

15: Employ L2 loss to approximate the

Q^{*}

value
16: L2(

Q^{*} (S, A)) = E r r o r (S, A, Q^{*} (S, A)) * D (G (p) * {(Q^{* *} (S, A) - Q^{*} (S, A))}^{2}

17:

Q^{* *} (S, A) = L 2 (Q^{*} (S, A)) + (Q^{*} (S, A) + α (R + γ * Q^{*} (S^{'}, A^{'}) - Q^{*} (S, A)) + D (G (p))

18:           End For of episode until S is terminal
19:    End For
20:    Testing phase of LLM_SARSA
21:       For every task in testing task set

t_{i} \in T,

perform
22: Initialize state S, Action A
23: Choose Action A from state S using the policy derived from

\in g r e e d y

24: For each step of episode, perform
25: Execute the action

A^{'}

from state

S^{'}

with updated heuristic value and L2 loss value
26:

Q^{* *} (S, A) = L 2 (Q^{*} (S, A)) + (Q^{*} (S, A) + α (R + * * Q^{*} (S^{'}, A^{'}) - Q^{*} (S, A)) + D (G (p))

27: End For of episode until S is terminal
28: End For
29: End For
30: Output

Π P = {Π P}_{1}, {Π P}_{2}, {Π P}_{3}, {Π P}_{4}, {Π P}_{5} \dots, {Π P}_{p}

34: Stop

Source link

Bhargavi Krishnamurthy www.mdpi.com