Register files of microprocessors have often been cited as performance bottlenecks and significant consumers of energy. The robust and modular nature of quasi-delay insensitive (QDI) design offers a toolchest of techniques for improving average-case performance and reducing energy consumption of register files, which cannot be leveraged as easily in synchronous designs. In this paper, we focus on the design of an asynchronous register core, the heart of a register file. We describe the vertical pipelining transformation and describe the locking mechanism that maintains pipelined mutual exclusion among reads and writes to the same register. The primary contributions of this paper are 1) detailed evaluation of the width-adaptive datapath (WAD) representation in register files, which leads to significant energy reduction by conditionally communicating higher significant bits of integers with little performance degradation, and 2) `nesting' the register core to create non-uniform banks to facilitate faster and lower energy accesses to more frequently used registers and slower accesses to less frequently used registers without increasing the interconnect requirement or control complexity. We present SPICE-simulated results for a wide variety of register files laid out in TSMC .18um technology.