聚合并联接由 Amazon Marketing Stream 产生的流量和转化数据
Amazon Marketing Stream 可保证至少一次信息传递,因此部分信息可以多次发送。鉴于此原因,流量和转化数据集使用名为 idempotency_id
的字段。为确保对这些数据集中的增量记录进行准确求和,在聚合之前必须删除重复记录。
操作顺序
要获取最准确的结果,必须按以下顺序完成步骤:
- 使用
idempotency_id
键识别重复数据。 - 聚合并删除流量表中的重复数据。
- 聚合并删除转化表中的重复数据。
- 联接流量表与转化表。
示例
以下代码示例显示了如何对 sp-traffic 和 sp-conversion 数据集执行此顺序的操作。请注意,来自其他广告程序的流量和转化数据集的架构可能略有不同。请参阅数据指南 以确认每个可用数据集的架构。
注意
此示例采用包含窗函数
row_number()
以及idempotency_id
键来识别重复项。进一步了解窗函数。
-- Identify duplicates in both tables
with sp_conv as
(
SELECT *, row_number() OVER (PARTITION BY idempotency_id ORDER BY idempotency_id ASC) AS rn
FROM [INSERT CONVERSION TABLE HERE]
--Input initial WHERE filter criteria here.
),
sp_traff as
(
SELECT *, row_number() OVER (PARTITION BY idempotency_id ORDER BY idempotency_id ASC) AS rn
FROM [INSERT TRAFFIC TABLE HERE]
--Input initial WHERE filter criteria here.
),
-- Aggregate and remove duplicates from sp-conversions
sp_conv_agg as
(
SELECT "advertiser_id"
, "marketplace_id"
, "time_window_start"
, "campaign_id"
, "ad_group_id"
, "ad_id"
, "keyword_id"
, "placement"
, "currency"
, sum("attributed_sales_1d") as "attributed_sales_1d"
, sum("attributed_sales_1d_same_sku") as "attributed_sales_1d_same_sku"
, sum("attributed_sales_7d") as "attributed_sales_7d"
, sum("attributed_sales_7d_same_sku") as "attributed_sales_7d_same_sku"
, sum("attributed_sales_14d") as "attributed_sales_14d"
, sum("attributed_sales_14d_same_sku") as "attributed_sales_14d_same_sku"
, sum("attributed_sales_30d") as "attributed_sales_30d"
, sum("attributed_sales_30d_same_sku") as "attributed_sales_30d_same_sku"
, sum("attributed_conversions_1d") as "attributed_conversions_1d"
, sum("attributed_conversions_1d_same_sku") as "attributed_conversions_1d_same_sku"
, sum("attributed_conversions_7d") as "attributed_conversions_7d"
, sum("attributed_conversions_14d_same_sku") as "attributed_conversions_14d_same_sku"
, sum("attributed_conversions_30d") as "attributed_conversions_30d"
, sum("attributed_conversions_30d_same_sku") as "attributed_conversions_30d_same_sku"
, sum("attributed_units_ordered_1d") as "attributed_units_ordered_1d"
, sum("attributed_units_ordered_1d_same_sku") as "attributed_units_ordered_1d_same_sku"
, sum("attributed_units_ordered_7d") as "attributed_units_ordered_7d"
, sum("attributed_units_ordered_7d_same_sku") as "attributed_units_ordered_7d_same_sku"
, sum("attributed_units_ordered_14d") as "attributed_units_ordered_14d"
, sum("attributed_units_ordered_14d_same_sku") as "attributed_units_ordered_14d_same_sku"
, sum("attributed_units_ordered_30d") as "attributed_units_ordered_30d"
, sum("attributed_units_ordered_30d_same_sku") as "attributed_units_ordered_30d_same_sku"
FROM sp_conv
--Ensures only 1 unique idempotency_id record is used (in the event of duplicates)
WHERE rn = 1
--GROUP BY all the keys identified above
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
),
-- Aggregate and remove duplicates from sp-traffic
sp_traff_agg as
(
SELECT "advertiser_id"
, "marketplace_id"
, "time_window_start"
, "campaign_id"
, "ad_group_id"
, "ad_id"
, "keyword_id"
, "keyword_text"
, "placement"
, "currency"
, sum("cost") as "cost"
, sum("impressions") as "impressions"
, sum("clicks") as "clicks"
FROM sp_traff
--Ensures only 1 unique idempotency_id record is used (in the event of duplicates)
WHERE rn = 1
--GROUP BY all the keys identified above
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
),
-- Join sp-traffic and sp-conversion
SELECT spt."advertiser_id"
, spt."marketplace_id"
, spt."time_window_start"
, spt."campaign_id"
, spt."ad_group_id"
, spt."ad_id"
, spt."keyword_id"
, spt."keyword_text"
, spt."placement"
, spt."currency"
, spt."cost"
, spt."impressions"
, spt."clicks"
, spc."attributed_sales_1d"
, spc."attributed_sales_1d_same_sku"
, spc."attributed_sales_7d"
, spc."attributed_sales_7d_same_sku"
, spc."attributed_sales_14d"
, spc."attributed_sales_14d_same_sku"
, spc."attributed_sales_30d"
, spc."attributed_sales_30d_same_sku"
, spc."attributed_conversions_1d"
, spc."attributed_conversions_1d_same_sku"
, spc."attributed_conversions_7d"
, spc."attributed_conversions_14d_same_sku"
, spc."attributed_conversions_30d"
, spc."attributed_conversions_30d_same_sku"
, spc."attributed_units_ordered_1d"
, spc."attributed_units_ordered_1d_same_sku"
, spc."attributed_units_ordered_7d"
, spc."attributed_units_ordered_7d_same_sku"
, spc."attributed_units_ordered_14d"
, spc."attributed_units_ordered_14d_same_sku"
, spc."attributed_units_ordered_30d"
, spc."attributed_units_ordered_30d_same_sku"
FROM sp_traff_agg spt
LEFT JOIN sp_conv_agg spc
--START OF JOIN KEY LOGIC --
ON spt.advertiser_id = spc.advertiser_id
AND spt.marketplace_id = spc.marketplace_id
AND spt.campaign_id = spc.campaign_id
AND spt.ad_group_id = spc.ad_group_id
AND spt.ad_id = spc.ad_id
AND spt.keyword_id = spc.keyword_id
AND spt.placement = spc.placement
AND spt.time_window_start = spc.time_window_start
AND spt.currency = spc.currency
--END OF JOIN KEY LOGIC --
--Add any additional WHERE filtering criteria here
--Change ORDER clause to desired order
ORDER BY time_window_start
警告
为了获得准确的结果,每个 CTE 级别之间的筛选条件必须匹配。在以上示例中,
sp_conv
筛选条件应与sp_traff
筛选条件匹配。同样,两个聚合的 CTE(sp_traffic_agg
和sp_conv_agg
)也必须使用相同的筛选条件。通常,如果在命令链中进行更高级别的预筛选,则效果将更出色。建议对 CTE 之间共享的所有筛选条件进行参数化,以减少出错的机会。