正在切换页面...

CPU Scheduling Algorithms

mediumOperating SystemProcess SchedulingFCFSSJFRound RobinCFSUpdated

Why CPU Scheduling is Necessary

In a multiprogramming system, multiple processes compete for limited CPU resources. The CPU Scheduler is responsible for determining which process to execute next and how long it should run.

A robust scheduling algorithm must balance several competing objectives:

Metric	Definition	Priority For
CPU Utilization	The percentage of time the CPU is actively processing.	System Administrators
Throughput	The number of processes completed per unit of time.	System Administrators
Turnaround Time	Total time from submission to completion.	Batch Processing Users
Waiting Time	Total time a process spends waiting in the ready queue.	All Users
Response Time	Time from submission until the first output/response.	Interactive Users

Non-preemptive vs. Preemptive Scheduling

Type	Description	Pros & Cons
Non-preemptive	Once a process acquires the CPU, it retains it until termination or voluntary yield.	Simple to implement, but risks "starving" other processes.
Preemptive	The scheduler can interrupt the running process at any time to switch to another.	Excellent responsiveness, but introduces severe concurrency complexities.

Modern operating systems almost exclusively rely on preemptive scheduling architectures.

Classic Scheduling Algorithms

1. First-Come, First-Served (FCFS)

The most rudimentary algorithm: the process that arrives in the ready queue first is executed first.

Arrival Order: P1(24ms) → P2(3ms) → P3(3ms)

Execution Gantt Chart:
|---- P1 ----|-- P2 --|-- P3 --|
0           24       27       30

Waiting Time: P1=0, P2=24, P3=27
Average Wait: (0+24+27)/3 = 17ms

Architectural Flaw: The Convoy Effect. A long, CPU-bound process at the front of the queue forces all subsequent short processes to wait endlessly. Observe the difference if the order changes:

Execution Order: P2(3ms) → P3(3ms) → P1(24ms)
Waiting Time: P2=0, P3=3, P1=6
Average Wait: (0+3+6)/3 = 3ms    ← Massive reduction!

2. Shortest Job First (SJF)

Prioritizes the process with the shortest estimated execution time. Mathematically proven to yield the minimum average waiting time.

Processes: P1(6ms)  P2(8ms)  P3(7ms)  P4(3ms)
SJF Order: P4(3) → P1(6) → P3(7) → P2(8)
Waiting:   P4=0, P1=3, P3=9, P2=16
Average:   (0+3+9+16)/4 = 7ms

Architectural Flaw:

Impossible to perfectly predict future CPU burst times.
Starvation: Long-running processes may never execute if short processes continuously arrive.

3. Round Robin (RR)

Each process is allocated a fixed Time Quantum (Time Slice). If the process exceeds this quantum, it is preempted and relegated to the back of the queue.

Time Quantum = 4ms
Processes: P1(24ms)  P2(3ms)  P3(3ms)

|P1  |P2 |P3 |P1  |P1  |P1  |P1  |P1  |
0    4   7  10   14   18   22   26   30

P2 and P3 complete within 10ms (no need to wait for P1 to finish its 24ms burst).

Tuning the Time Quantum:

Too Large → Degrades into FCFS; abysmal response times.
Too Small → Context switching overhead dominates, plummeting CPU throughput.
Modern OS defaults typically range between 10ms and 100ms.

4. Priority Scheduling

Assigns a strict priority value to each process. The CPU is always allocated to the highest-priority runnable process.

Solving Starvation: Aging. The system dynamically elevates the priority of a process the longer it waits in the queue, ensuring it eventually executes.

5. Multi-Level Feedback Queue (MLFQ)

Fuses Round Robin with Priority Scheduling. This is the foundational model for most actual operating systems:

High Priority ┌────────────────────┐  Quantum = 8ms
Queue 0      │ P_new  ──▶ Execute  │  New processes start at the top
             └────────┬───────────┘
            Exhausts Quantum │ Demoted
             ┌────────▼───────────┐  Quantum = 16ms
Queue 1      │       Execute       │
             └────────┬───────────┘
            Exhausts Quantum │ Demoted
             ┌────────▼───────────┐  Quantum = 32ms
Queue 2      │    Execute (FCFS)   │  Lowest tier, batch execution
Low Priority  └────────────────────┘

Core Rules:

New processes enter the highest priority queue.
If a process consumes its entire quantum (CPU-bound), it is demoted to the queue below.
If a process yields the CPU before its quantum expires (I/O-bound), it remains in its current tier.
Periodically, the OS boosts all processes back to the top tier to prevent starvation and adapt to changing behavior.

6. Completely Fair Scheduler (CFS)

CFS is the default process scheduler in the Linux Kernel (since v2.6.23).

Core Philosophy: Discard fixed time slices. Instead, dynamically ensure every process receives a fair proportion of CPU time based on its weight.

                          Red-Black Tree
                    (Ordered by vruntime)
                         ┌───┐
                         │ 30│
                        ╱    ╲
                    ┌───┐    ┌───┐
                    │ 15│    │ 45│
                   ╱         ╱
               ┌───┐    ┌───┐
               │ 10│    │ 35│
                ↑
         Smallest vruntime
         = The NEXT process to be scheduled

vruntime (Virtual Runtime): Calculated as Actual Run Time × (Default Weight / Process Weight). The process with the smallest vruntime is always picked next because it has been "unfairly" starved of CPU time relative to others.

Process	Nice Value	Weight	`vruntime` after 10ms actual execution
P1	0 (Default)	1024	10ms
P2	-5 (High Priority)	3121	10 × (1024/3121) ≈ 3.3ms
P3	+5 (Low Priority)	335	10 × (1024/335) ≈ 30.6ms

Because high-priority processes accrue vruntime much slower, they continuously gravitate toward the leftmost node of the Red-Black Tree, guaranteeing they are scheduled far more frequently.

System Design Audit & Observability

Understanding scheduler internals is crucial for optimizing latency-sensitive distributed systems.

1. The Container CPU Throttling Trap

In Kubernetes or Docker environments, processes are often throttled even if overall CPU utilization is low. This occurs because the CFS Quota system (cpu.cfs_quota_us) strictly enforces limits per time window.

Audit Command: Check cat /sys/fs/cgroup/cpu/cpu.stat. If nr_throttled is rapidly increasing, your application is being hard-paused by the kernel scheduler. Fix this by removing CPU limits (relying only on requests) or increasing the quota.

2. Priority Inversion Mitigation

When a low-priority task holds a lock required by a high-priority task, the system halts. If a medium-priority task preempts the low-priority one, the high-priority task is effectively starved.

Audit Protocol: Ensure mutexes support Priority Inheritance (the low-priority task temporarily inherits the priority of the waiting high-priority task until it releases the lock). Standard POSIX threads (PTHREAD_PRIO_INHERIT) support this.

3. Tuning the `nice` Value for Background Workers

Not all processes deserve equal CFS weighting. Heavy data-crunching background workers should not compete with user-facing API threads.

Audit Command: Use renice +10 -p <PID> to voluntarily lower the priority of batch processing scripts. This artificially inflates their vruntime growth rate, causing CFS to defer them when API threads need immediate CPU access.

CPU Scheduling Algorithms

mediumOperating SystemProcess SchedulingFCFSSJFRound RobinCFSUpdated

Why CPU Scheduling is Necessary

In a multiprogramming system, multiple processes compete for limited CPU resources. The CPU Scheduler is responsible for determining which process to execute next and how long it should run.

A robust scheduling algorithm must balance several competing objectives:

Metric	Definition	Priority For
CPU Utilization	The percentage of time the CPU is actively processing.	System Administrators
Throughput	The number of processes completed per unit of time.	System Administrators
Turnaround Time	Total time from submission to completion.	Batch Processing Users
Waiting Time	Total time a process spends waiting in the ready queue.	All Users
Response Time	Time from submission until the first output/response.	Interactive Users

Non-preemptive vs. Preemptive Scheduling

Type	Description	Pros & Cons
Non-preemptive	Once a process acquires the CPU, it retains it until termination or voluntary yield.	Simple to implement, but risks "starving" other processes.
Preemptive	The scheduler can interrupt the running process at any time to switch to another.	Excellent responsiveness, but introduces severe concurrency complexities.

Modern operating systems almost exclusively rely on preemptive scheduling architectures.

Classic Scheduling Algorithms

1. First-Come, First-Served (FCFS)

The most rudimentary algorithm: the process that arrives in the ready queue first is executed first.

Arrival Order: P1(24ms) → P2(3ms) → P3(3ms)

Execution Gantt Chart:
|---- P1 ----|-- P2 --|-- P3 --|
0           24       27       30

Waiting Time: P1=0, P2=24, P3=27
Average Wait: (0+24+27)/3 = 17ms

Execution Order: P2(3ms) → P3(3ms) → P1(24ms)
Waiting Time: P2=0, P3=3, P1=6
Average Wait: (0+3+6)/3 = 3ms    ← Massive reduction!

2. Shortest Job First (SJF)

Prioritizes the process with the shortest estimated execution time. Mathematically proven to yield the minimum average waiting time.

Processes: P1(6ms)  P2(8ms)  P3(7ms)  P4(3ms)
SJF Order: P4(3) → P1(6) → P3(7) → P2(8)
Waiting:   P4=0, P1=3, P3=9, P2=16
Average:   (0+3+9+16)/4 = 7ms

Architectural Flaw:

Impossible to perfectly predict future CPU burst times.
Starvation: Long-running processes may never execute if short processes continuously arrive.

3. Round Robin (RR)

Each process is allocated a fixed Time Quantum (Time Slice). If the process exceeds this quantum, it is preempted and relegated to the back of the queue.

Time Quantum = 4ms
Processes: P1(24ms)  P2(3ms)  P3(3ms)

|P1  |P2 |P3 |P1  |P1  |P1  |P1  |P1  |
0    4   7  10   14   18   22   26   30

P2 and P3 complete within 10ms (no need to wait for P1 to finish its 24ms burst).

Tuning the Time Quantum:

Too Large → Degrades into FCFS; abysmal response times.
Too Small → Context switching overhead dominates, plummeting CPU throughput.
Modern OS defaults typically range between 10ms and 100ms.

4. Priority Scheduling

Assigns a strict priority value to each process. The CPU is always allocated to the highest-priority runnable process.

Solving Starvation: Aging. The system dynamically elevates the priority of a process the longer it waits in the queue, ensuring it eventually executes.

5. Multi-Level Feedback Queue (MLFQ)

Fuses Round Robin with Priority Scheduling. This is the foundational model for most actual operating systems:

High Priority ┌────────────────────┐  Quantum = 8ms
Queue 0      │ P_new  ──▶ Execute  │  New processes start at the top
             └────────┬───────────┘
            Exhausts Quantum │ Demoted
             ┌────────▼───────────┐  Quantum = 16ms
Queue 1      │       Execute       │
             └────────┬───────────┘
            Exhausts Quantum │ Demoted
             ┌────────▼───────────┐  Quantum = 32ms
Queue 2      │    Execute (FCFS)   │  Lowest tier, batch execution
Low Priority  └────────────────────┘

Core Rules:

New processes enter the highest priority queue.
If a process consumes its entire quantum (CPU-bound), it is demoted to the queue below.
If a process yields the CPU before its quantum expires (I/O-bound), it remains in its current tier.
Periodically, the OS boosts all processes back to the top tier to prevent starvation and adapt to changing behavior.

6. Completely Fair Scheduler (CFS)

CFS is the default process scheduler in the Linux Kernel (since v2.6.23).

Core Philosophy: Discard fixed time slices. Instead, dynamically ensure every process receives a fair proportion of CPU time based on its weight.

                          Red-Black Tree
                    (Ordered by vruntime)
                         ┌───┐
                         │ 30│
                        ╱    ╲
                    ┌───┐    ┌───┐
                    │ 15│    │ 45│
                   ╱         ╱
               ┌───┐    ┌───┐
               │ 10│    │ 35│
                ↑
         Smallest vruntime
         = The NEXT process to be scheduled

Process	Nice Value	Weight	`vruntime` after 10ms actual execution
P1	0 (Default)	1024	10ms
P2	-5 (High Priority)	3121	10 × (1024/3121) ≈ 3.3ms
P3	+5 (Low Priority)	335	10 × (1024/335) ≈ 30.6ms

Because high-priority processes accrue vruntime much slower, they continuously gravitate toward the leftmost node of the Red-Black Tree, guaranteeing they are scheduled far more frequently.

System Design Audit & Observability

Understanding scheduler internals is crucial for optimizing latency-sensitive distributed systems.

1. The Container CPU Throttling Trap

Audit Command: Check cat /sys/fs/cgroup/cpu/cpu.stat. If nr_throttled is rapidly increasing, your application is being hard-paused by the kernel scheduler. Fix this by removing CPU limits (relying only on requests) or increasing the quota.

2. Priority Inversion Mitigation

When a low-priority task holds a lock required by a high-priority task, the system halts. If a medium-priority task preempts the low-priority one, the high-priority task is effectively starved.

Audit Protocol: Ensure mutexes support Priority Inheritance (the low-priority task temporarily inherits the priority of the waiting high-priority task until it releases the lock). Standard POSIX threads (PTHREAD_PRIO_INHERIT) support this.

3. Tuning the `nice` Value for Background Workers

Not all processes deserve equal CFS weighting. Heavy data-crunching background workers should not compete with user-facing API threads.

Audit Command: Use renice +10 -p <PID> to voluntarily lower the priority of batch processing scripts. This artificially inflates their vruntime growth rate, causing CFS to defer them when API threads need immediate CPU access.

Why CPU Scheduling is Necessary

Non-preemptive vs. Preemptive Scheduling

Classic Scheduling Algorithms

1. First-Come, First-Served (FCFS)

2. Shortest Job First (SJF)

3. Round Robin (RR)

4. Priority Scheduling

5. Multi-Level Feedback Queue (MLFQ)

6. Completely Fair Scheduler (CFS)

System Design Audit & Observability

1. The Container CPU Throttling Trap

2. Priority Inversion Mitigation

3. Tuning the nice Value for Background Workers

Why CPU Scheduling is Necessary

Non-preemptive vs. Preemptive Scheduling

Classic Scheduling Algorithms

1. First-Come, First-Served (FCFS)

2. Shortest Job First (SJF)

3. Round Robin (RR)

4. Priority Scheduling

5. Multi-Level Feedback Queue (MLFQ)

6. Completely Fair Scheduler (CFS)

System Design Audit & Observability

1. The Container CPU Throttling Trap

2. Priority Inversion Mitigation

3. Tuning the nice Value for Background Workers

3. Tuning the `nice` Value for Background Workers

3. Tuning the `nice` Value for Background Workers