Data Analysis Using SQL and Excel by Linoff Gordon S

Data Analysis Using SQL and Excel by Linoff Gordon S

Author:Linoff, Gordon S. [Linoff, Gordon S.]
Language: eng
Format: azw3
ISBN: 9781119021445
Publisher: Wiley
Published: 2015-12-02T16:00:00+00:00


Comparison of Hazards by Stops in Year in Excel

The previous chapter showed two ways of comparing changes in survival probabilities over time. The first method was to use starts in a given year, which provides information about acquisition during the year, but not about all the customers who were active during that time. The second approach forces the right censorship date to be earlier, creating a snapshot of survival at the end of each year. Using starts, customers who start in 2006 have relatively lower survival than customers who start in 2004 or 2005. However, the snapshot method shows that 2006 survival looks better than survival at the end of 2004.

This section proposes another method, based on time windows. Using time windows, hazard probabilities are estimated based on customers’ activity during each year. Time windows make it possible to calculate hazard probabilities for all tenures.

The approach is to calculate the number of customers who enter, leave, and stop at a given tenure, taking into account the time window. The following query does the calculation for stops during 2006:

WITH const as ( SELECT CAST('2006-01-01' as DATE) as WindowStart, CAST('2006-12-28' as DATE) as WindowEnd ) SELECT (CASE WHEN tenure < 1000 THEN tenure ELSE 1000 END) as tenure, SUM(enters) as numenters, SUM(leaves) as numleaves, SUM(isstop) as numstops FROM ((SELECT (CASE WHEN StartDate >= WindowStart THEN 0 ELSE DATEDIFF(day, StartDate, WindowStart) END) as tenure, 1 as enters, 0 as leaves, 0.0 as isstop FROM const CROSS JOIN Subscribers s WHERE Tenure >= 0 AND StartDate <= WindowEnd AND (StopDate IS NULL OR StopDate >= WindowStart) ) UNION ALL (SELECT (CASE WHEN StopDate IS NULL OR StopDate >= WindowEnd THEN DATEDIFF(day, StartDate, WindowEnd) ELSE Tenure END) as tenure, 0 as enters, 1 as leaves, (CASE WHEN StopType IS NOT NULL AND StopDate <= WindowEnd THEN 1 ELSE 0 END) as isstop FROM const CROSS JOIN Subscribers s WHERE Tenure >= 0 AND StartDate <= WindowEnd AND (StopDate IS NULL OR StopDate >= WindowStart) ) ) s GROUP BY (CASE WHEN Tenure < 1000 THEN Tenure ELSE 1000 END) ORDER BY tenure

Notice first that the stop window ends on 2006-12-28 rather than 2006-12-31. The 28th is the cut-off date for the data; the table has no starts or stops beyond that date. If the later date were used, then active customers would have their tenures extended by three days. That is, a customer who started on 2006-12-28 would have a tenure of three rather than zero, and the resulting hazards would differ slightly from the point estimates in the last section.

The variable enters counts the number of customers entering the time window at each tenure. This tenure is zero for customers who start during the window and a larger value for customers who start before the window. The variables leaves and stops are calculated based on the tenure on the right censorship date or the tenure when a customer stops.

Each subquery has the same WHERE clause in order to select only customers active during the



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.