[SQL/코드카타] Average Time of Process per Machine

문제

Table: Activity

더보기
+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| machine_id     | int     |
| process_id     | int     |
| activity_type  | enum    |
| timestamp      | float   |
+----------------+---------+
The table shows the user activities for a factory website.
(machine_id, process_id, activity_type) is the primary key (combination of columns with unique values) of this table.
machine_id is the ID of a machine.
process_id is the ID of a process running on the machine with ID machine_id.
activity_type is an ENUM (category) of type ('start', 'end').
timestamp is a float representing the current time in seconds.
'start' means the machine starts the process at the given timestamp and 'end' means the machine ends the process at the given timestamp.
The 'start' timestamp will always be before the 'end' timestamp for every (machine_id, process_id) pair.
It is guaranteed that each (machine_id, process_id) pair has a 'start' and 'end' timestamp.


There is a factory website that has several machines each running the same number of processes. Write a solution to find the average time each machine takes to complete a process.

The time to complete a process is the 'end' timestamp minus the 'start' timestamp. The average time is calculated by the total time to complete every process on the machine divided by the number of processes that were run.

The resulting table should have the machine_id along with the average time as processing_time, which should be rounded to 3 decimal places.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Activity table:
+------------+------------+---------------+-----------+
| machine_id | process_id | activity_type | timestamp |
+------------+------------+---------------+-----------+
| 0          | 0          | start         | 0.712     |
| 0          | 0          | end           | 1.520     |
| 0          | 1          | start         | 3.140     |
| 0          | 1          | end           | 4.120     |
| 1          | 0          | start         | 0.550     |
| 1          | 0          | end           | 1.550     |
| 1          | 1          | start         | 0.430     |
| 1          | 1          | end           | 1.420     |
| 2          | 0          | start         | 4.100     |
| 2          | 0          | end           | 4.512     |
| 2          | 1          | start         | 2.500     |
| 2          | 1          | end           | 5.000     |
+------------+------------+---------------+-----------+
Output: 
+------------+-----------------+
| machine_id | processing_time |
+------------+-----------------+
| 0          | 0.894           |
| 1          | 0.995           |
| 2          | 1.456           |
+------------+-----------------+
Explanation: 
There are 3 machines running 2 processes each.
Machine 0's average time is ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894
Machine 1's average time is ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995
Machine 2's average time is ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456

Q. 동일한 machine_id 내에서 기계가 작동하는 평균 시간(processing_time)을 구해라.

구하는 설명은 위 explanation과 같다.

 

정답쿼리 : 서브쿼리, MAX(CASE WHEN)

SELECT 
    machine_id,
    ROUND(AVG(end_time - start_time), 3) as processing_time
FROM
    (SELECT 
        machine_id,
        process_id,
        max(case when activity_type = 'start' then timestamp end) as start_time,
        max(case when activity_type = 'end' then timestamp end) as end_time
    FROM activity
    GROUP BY machine_id, process_id) t
GROUP BY machine_id;

 

 

다른 풀이 : CTE 문을 활용한 풀이, MAX(CASE WHEN)

WITH process_times AS (
    SELECT 
        machine_id,
        process_id,
        MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time,
        MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time
    FROM activity
    GROUP BY machine_id, process_id
)
SELECT
    machine_id,
    ROUND(AVG(end_time - start_time), 3) AS processing_time
FROM process_times
GROUP BY machine_id;

 

 

⭐️ 내 것으로 만들기

⭐️⭐️⭐️ MAX(CASE WHEN -- THEN -- END) ⭐️⭐️⭐️

여러 행(row)에 흩어져 있는 값을 조건에 따라 특정 컬럼 하나로 모아주는 역할!!

  • 조건에 맞는 행에서만 값을 가져오고
  • 조건에 맞지 않으면 null
  • MAX가 그 값 하나를 뽑아서 컬럼으로 만드는 구조

➡️ 세로 데이터를 가로로 펼치는 효과 -> 일종의 수동 PIVOT

 

작동원리

process_id activity_type  timestamp
0 start 0.712
0 end 1.520

위 데이터에서 아래 코드를 실행하면

MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time,
MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time

CASE WHEN 절

  • activity_type이 start인 행에서  timestamp(0.712) 반환
  • activity_type이 end인 행에서는 timestamp가 NULL

-> start_time 후보 값 -> 0.712, NULL

-> end_time 후보 값 -> NULL, 1.520

 

MAX() 절 

max()는 null이 아닌 단 하나의 값을 반환

  • MAX(0.712, null) -> 0.712
  • MAX(null, 1.520) -> 1.520

➡️ 즉, 한 process_id 당 -> start_time, end_time 컬럼이 생김

 

이 예시 테이블을 MAX(CASE WHEN -- THEN -- END)를 사용해서 펼치려면,

SELECT
	id, 
    MAX(CASE WHEN type = 'A' THEN val END)) as A_VAL,
    MAX(CASE WHEN type = 'B' THEN val END)) as B_VAL
FROM table
GROUP BY id

이렇게되고 결과는 아래와 같다.


🤔 마무리
원본테이블을 딱 봤을 때, "아 이거는 옆으로 펼쳐지면 좋을 것 같은데" 싶은 것들이 종종 있긴 했엇다. 그럴 때 MAX(CASE WHEN -- THEN-- END)를 활용해보자! 이렇게 피벗으로 세로 데이터를 가로로 펼칠 땐 주로 계산이 필요할 때 사용했던 것 같다.