Barry, Ordering based on the frequency in the file may be better but it won’t lead to an optimal ordering either. An optimization algorithm (e.g. something like simulated annealing or genetic algorithms) with the gzip size as the cost function will (probabilistically.)