Evaluating a small number of diverse programs with multilevel clustering over time and within groups is challenging. We use simulated data to emulate the implementation of the Young Adults in the Workplace initiative and examine the performance of various estimation strategies via the ex post dismantling design approach. The study compares convergence, type I and II error rates, median bias (MB), median absolute deviation (MAD), mean squared error (MSE), AIC and BIC for various random effects models. Preliminary evidence shows convergence of the true model and simplified models improves with higher number of randomization groups within each program and worsens by intraclass correlation coefficients (ICC). Type I and II errors, MB, MAD and MSE vary only modestly by ICC. Both AIC and BIC pointed to the true and preferred simplified models with varying ICC and random effects.