// SYSTEM ONLINE
EUROPEAN APPLIED ML RESEARCH LAB

DROIDCRAFT OY

Family-built, engineer-led lab distilling sparse intelligence into production-grade systems with transparent data attribution - tailored to tight inference SLOs.

LATENCY
<3SEC
real-time ML assisted decision making delivered across production SLOs for on-prem deployments
GPUs
400ACTIVE
frontier training campaigns orchestrated on EuroHPC + SF-Compute
01

Radically pragmatic research for privacy-sensitive domains

Droidcraft OY is a family-owned research lab of hands-on engineers, systems operators, and scientific compute specialists. We design, deploy, and operate sparse expert architectures that preserve transparent data attribution and privacy assurances with a on-prem / private cloud focus, while sustaining the inference SLOs demanded by live operational systems at scale.

Every artifact we ship - from kernels to orchestration - remains type-safe, free from license burdens, and maintainable long after the pilot wraps.

02

What our lab delivers

01

Sparse Mixture of Experts at production speeds

Designing and training of task-specific SMoE architectures, with transparent data attribution, and the ability to merge experts from separate training runs. We will quantize and tune for low-latency inference as demanded by applications.

02

Full-cycle experimentation and evaluation

End-to-end experiment and pilot design, evaluation and training harnesses, and statistical analysis that increase model transparency before they reach regulated workloads.

03

Hardware-aware optimization

Custom kernels, sharding strategies, and throughput tuning validated on H100, H200, MI250X, and federated K8s clusters to push utilization past industry benchmarks.

04

Compliance-native software engineering

Formally quality-gated codebases, audit trails, and restriction-free license stacks that satisfy cross-industry governance, privacy, and security reviews.

03

Capable of owning the entire training stack

  • Frontier architecture selection & custom SMoE design
  • Data curation, alignment, and structured attribution pipelines
  • Pre-training, post-training, and task-specific adaptation
  • Agentic tool-calling, LLM-as-judge, structured output scaffolds
  • On-prem GPU inference blueprints for Grace Hopper, H100/H200 and ARM-based servers
  • Production deployment playbooks with observability baked in
04

Selected lab outcomes

PRJ-001

Nanomoe-2.5B-1A

Sparse-mixture-of-experts transformer crafted to validate our zero token-drop router, expert merging, 8-bit optimizer workflow, and full training pipeline end to end.

MoE // Marenostrum 5 (H100) // EuroHPC grant
PRJ-002

70B Llama 3.2 Adaptation

Industry-specific post-training of a 70B Llama 3.2 variant with IPO objectives and FP8 quantization for single-superchip deployment.

400 GPUs // Lumi Supercomputer // 2024
PRJ-003

ARM-GH200 Inference Optimization

Latency-tuned inference stack on ARM-based Grace Hopper nodes pairing custom Triton kernels with streaming adapters for hybrid clouds.

GH200 Superchip // Real-time scoring // 2025
PRJ-004

Application-Specific MCP Integration

Design and integration of application-specific MCP servers wired into standard LLM agent pipelines with transparent routing and audit hooks.

Private MCP stack // Tool attribution // 2025
PRJ-005

vLLM & SGLang Porting

Implementing private model architectures on top of vLLM and SGLang with observability, structured output tooling, and safe tool-calling. Maintainers of an Anthropic feature-complete API for SGLang.

Low-latency deployment // PCI-aligned // 2025

Invest in sovereign compute and transparent AI

What we are looking for:

  • Projects interested in on-prem deployments of open-weights and open-source LLMs for industry-specific use (Operations, Fine-Tuning)
  • Partners interested in deploying their own hardware for GPU / Transformer workloads (Hardware-Software Integration)
  • Research labs looking to optimize inference (speed) of their model for production on specific Nvidia hardware (MLRE)
INITIATE CONTACT