[SQL/코드카타] 유저별 confirmed 의 비율 ( 평균이 바로 비율 )

문제

 

더보기

Table: Signups

+----------------+----------+
| Column Name    | Type     |
+----------------+----------+
| user_id        | int      |
| time_stamp     | datetime |
+----------------+----------+
user_id is the column of unique values for this table.
Each row contains information about the signup time for the user with ID user_id.

 

Table: Confirmations

+----------------+----------+
| Column Name    | Type     |
+----------------+----------+
| user_id        | int      |
| time_stamp     | datetime |
| action         | ENUM     |
+----------------+----------+
(user_id, time_stamp) is the primary key (combination of columns with unique values) for this table.
user_id is a foreign key (reference column) to the Signups table.
action is an ENUM (category) of the type ('confirmed', 'timeout')
Each row of this table indicates that the user with ID user_id requested a confirmation message at time_stamp and that confirmation message was either confirmed ('confirmed') or expired without confirming ('timeout').

 

The confirmation rate of a user is the number of 'confirmed' messages divided by the total number of requested confirmation messages. The confirmation rate of a user that did not request any confirmation messages is 0. Round the confirmation rate to two decimal places.

Write a solution to find the confirmation rate of each user.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Signups table:
+---------+---------------------+
| user_id | time_stamp          |
+---------+---------------------+
| 3       | 2020-03-21 10:16:13 |
| 7       | 2020-01-04 13:57:59 |
| 2       | 2020-07-29 23:09:44 |
| 6       | 2020-12-09 10:39:37 |
+---------+---------------------+
Confirmations table:
+---------+---------------------+-----------+
| user_id | time_stamp          | action    |
+---------+---------------------+-----------+
| 3       | 2021-01-06 03:30:46 | timeout   |
| 3       | 2021-07-14 14:00:00 | timeout   |
| 7       | 2021-06-12 11:57:29 | confirmed |
| 7       | 2021-06-13 12:58:28 | confirmed |
| 7       | 2021-06-14 13:59:27 | confirmed |
| 2       | 2021-01-22 00:00:00 | confirmed |
| 2       | 2021-02-28 23:59:59 | timeout   |
+---------+---------------------+-----------+
Output: 
+---------+-------------------+
| user_id | confirmation_rate |
+---------+-------------------+
| 6       | 0.00              |
| 3       | 0.00              |
| 7       | 1.00              |
| 2       | 0.50              |
+---------+-------------------+
Explanation: 
User 6 did not request any confirmation messages. The confirmation rate is 0.
User 3 made 2 requests and both timed out. The confirmation rate is 0.
User 7 made 3 requests and all were confirmed. The confirmation rate is 1.
User 2 made 2 requests where one was confirmed and the other timed out. The confirmation rate is 1 / 2 = 0.5.

 

문제요약

  • 각 유저별 confirmation rate 을 구하라.
  • 단, 어떤 confirmation messages도 요청하지 않은 유저의 응답률은 0이다.
  • 소수 둘째자리 까지 구하라.
  • 순서는 상관 없다.

 

최초 쿼리

WITH total_cnt AS (
    SELECT user_id, COUNT(*) AS cnt
    FROM confirmations
    GROUP BY user_id
), confirmed AS (
    SELECT user_id, COUNT(action) AS cnt_con
    FROM confirmations
    WHERE action = 'confirmed'
    GROUP BY user_id
)
SELECT s.user_id
        , IFNULL(ROUND(cnt_con/cnt, 2), 0) AS confirmation_rate
FROM signups   s
LEFT JOIN total_cnt c ON s.user_id=c.user_id
LEFT JOIN confirmed f ON s.user_id=f.user_id;

정답을 맞추긴 했다.

하지만 조금 더 효율적인 쿼리는 없을까?

 

다른 사람들의 풀이를 보니, 충분히 더 효율적인 쿼리가 있다.

 

다른 풀이

SELECT A.user_id, 
       ROUND(IFNULL(AVG(action = 'confirmed'), 0), 2) AS confirmation_rate
FROM Signups AS A
LEFT JOIN Confirmations AS B ON A.user_id = B.user_id
GROUP BY A.user_id;

 

와 이 생각은 못했네 😇

ROUND(IFNULL(AVG(action = 'confirmed'), 0), 2) AS confirmation_rate

그냥 LEFT JOIN을 한 후에 action = 'confirmed'의 평균을 구하면 그게 바로 비율이 되네...

🔥 범주가 2가지 밖에 없을 때, 비율은 곧 평균이라는 사실을 꼭 인지하고 있어야겠다.