Sparse Mixture of Experts at production speeds
Designing and training of task-specific SMoE architectures, with transparent data attribution, and the ability to merge experts from separate training runs. We will quantize and tune for low-latency inference as demanded by applications.