Data Analysis Using SQL and Excel by Linoff Gordon S
Author:Linoff, Gordon S. [Linoff, Gordon S.]
Language: eng
Format: azw3
ISBN: 9781119021445
Publisher: Wiley
Published: 2015-12-02T16:00:00+00:00
Comparison of Hazards by Stops in Year in Excel
The previous chapter showed two ways of comparing changes in survival probabilities over time. The first method was to use starts in a given year, which provides information about acquisition during the year, but not about all the customers who were active during that time. The second approach forces the right censorship date to be earlier, creating a snapshot of survival at the end of each year. Using starts, customers who start in 2006 have relatively lower survival than customers who start in 2004 or 2005. However, the snapshot method shows that 2006 survival looks better than survival at the end of 2004.
This section proposes another method, based on time windows. Using time windows, hazard probabilities are estimated based on customers’ activity during each year. Time windows make it possible to calculate hazard probabilities for all tenures.
The approach is to calculate the number of customers who enter, leave, and stop at a given tenure, taking into account the time window. The following query does the calculation for stops during 2006:
WITH const as ( SELECT CAST('2006-01-01' as DATE) as WindowStart, CAST('2006-12-28' as DATE) as WindowEnd ) SELECT (CASE WHEN tenure < 1000 THEN tenure ELSE 1000 END) as tenure, SUM(enters) as numenters, SUM(leaves) as numleaves, SUM(isstop) as numstops FROM ((SELECT (CASE WHEN StartDate >= WindowStart THEN 0 ELSE DATEDIFF(day, StartDate, WindowStart) END) as tenure, 1 as enters, 0 as leaves, 0.0 as isstop FROM const CROSS JOIN Subscribers s WHERE Tenure >= 0 AND StartDate <= WindowEnd AND (StopDate IS NULL OR StopDate >= WindowStart) ) UNION ALL (SELECT (CASE WHEN StopDate IS NULL OR StopDate >= WindowEnd THEN DATEDIFF(day, StartDate, WindowEnd) ELSE Tenure END) as tenure, 0 as enters, 1 as leaves, (CASE WHEN StopType IS NOT NULL AND StopDate <= WindowEnd THEN 1 ELSE 0 END) as isstop FROM const CROSS JOIN Subscribers s WHERE Tenure >= 0 AND StartDate <= WindowEnd AND (StopDate IS NULL OR StopDate >= WindowStart) ) ) s GROUP BY (CASE WHEN Tenure < 1000 THEN Tenure ELSE 1000 END) ORDER BY tenure
Notice first that the stop window ends on 2006-12-28 rather than 2006-12-31. The 28th is the cut-off date for the data; the table has no starts or stops beyond that date. If the later date were used, then active customers would have their tenures extended by three days. That is, a customer who started on 2006-12-28 would have a tenure of three rather than zero, and the resulting hazards would differ slightly from the point estimates in the last section.
The variable enters counts the number of customers entering the time window at each tenure. This tenure is zero for customers who start during the window and a larger value for customers who start before the window. The variables leaves and stops are calculated based on the tenure on the right censorship date or the tenure when a customer stops.
Each subquery has the same WHERE clause in order to select only customers active during the
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Access | Data Mining |
Data Modeling & Design | Data Processing |
Data Warehousing | MySQL |
Oracle | Other Databases |
Relational Databases | SQL |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8292)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6668)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6643)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6515)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6303)
Driving Data Quality with Data Contracts by Andrew Jones(6256)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6022)
Learning SQL by Alan Beaulieu(5987)
Weapons of Math Destruction by Cathy O'Neil(5775)
Big Data Analysis with Python by Ivan Marin(5326)
Data Engineering with dbt by Roberto Zagni(4326)
Solidity Programming Essentials by Ritesh Modi(3975)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3826)
Pandas Cookbook by Theodore Petrou(3539)
Blockchain Basics by Daniel Drescher(3292)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2900)
Feature Store for Machine Learning by Jayanth Kumar M J(2808)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2791)
Mastering Python for Finance by Unknown(2742)
