1369. Get the Second Most Recent Activity
返回最近的第二次活动
Table: UserActivity
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| username | varchar |
| activity | varchar |
| startDate | Date |
| endDate | Date |
+---------------+---------+
This table does not contain primary key.
This table contain information about the activity performed of each user in a period of time.
A person with username performed a activity from startDate to endDate.
Write an SQL query to show the second most recent activity of each user.
If the user only has one activity, return that one.
A user can’t perform more than one activity at the same time. Return the result table in any order.
The query result format is in the following example:
UserActivity table:
+------------+--------------+-------------+-------------+
| username | activity | startDate | endDate |
+------------+--------------+-------------+-------------+
| Alice | Travel | 2020-02-12 | 2020-02-20 |
| Alice | Dancing | 2020-02-21 | 2020-02-23 |
| Alice | Travel | 2020-02-24 | 2020-02-28 |
| Bob | Travel | 2020-02-11 | 2020-02-18 |
+------------+--------------+-------------+-------------+
Result table:
+------------+--------------+-------------+-------------+
| username | activity | startDate | endDate |
+------------+--------------+-------------+-------------+
| Alice | Dancing | 2020-02-21 | 2020-02-23 |
| Bob | Travel | 2020-02-11 | 2020-02-18 |
+------------+--------------+-------------+-------------+
The most recent activity of Alice is Travel from 2020-02-24 to 2020-02-28, before that she was dancing from 2020-02-21 to 2020-02-23.
Bob only has one record, we just take that one.
思路
这道题的主要难点在于,如果用户的活动不满两次,则返回最近一次活动记录。那么我们需要两个判定条件:
1)判定活动总次数是否满足两次
2)如果满足两次,则返回第二次最近的记录
题目里提出了一个很大的前提假设,就是每个用户不可能同时进行多项活动。如果有重复的活动记录,则需要弄清最近第二次的定义是什么。
SQL:window function
# 计算每个用户活动次数,并按开始日期降序排列
with us_rank as
(select
*,
count(*) over (partition by username) as num_activities,
rank() over (partition by username order by startDate desc) as rnk
from UserActivity )
# 选择少于两次的活动记录,或者第二次最近的记录
select username, activity, startdate, enddate
from us_rank
where num_activities < 2
or rnk = 2
Pandas
import pandas as pd
user_activity = pd.DataFrame(columns=['username', 'activity', 'startDate', 'endDate'],
data = [['Alice', 'Travel', '2020-02-12', '2020-02-20'],
['Alice', 'Dancing', '2020-02-21', '2020-02-23'],
['Alice', 'Travel', '2020-02-24', '2020-02-28'],
['Bob', 'Travel', '2020-02-11', '2020-02-18']])
user_activity
user_activity['startDate'] = pd.to_datetime(user_activity['startDate'])
user_activity.groupby('username').startDate.apply(lambda date: date.nlargest(2).min()).reset_index()
如果有什么疑问欢迎一起讨论哦~