EUROPEAN APPLIED ML RESEARCH LAB

DROIDCRAFT OY

Family-built, engineer-led lab distilling sparse intelligence into production-grade systems with transparent data attribution - tailored to tight inference SLOs.

<3SEC

real-time ML assisted decision making delivered across production SLOs for on-prem deployments

400ACTIVE

frontier training campaigns orchestrated on EuroHPC + SF-Compute

→ REQUEST LAB BRIEFING VIEW RESEARCH TRAIL

Droidcraft OY is a family-owned research lab of hands-on engineers, systems operators, and scientific compute specialists. We design, deploy, and operate sparse expert architectures that preserve transparent data attribution and privacy assurances with a on-prem / private cloud focus, while sustaining the inference SLOs demanded by live operational systems at scale.

Every artifact we ship - from kernels to orchestration - remains type-safe, free from license burdens, and maintainable long after the pilot wraps.

Sparse Mixture of Experts at production speeds

Designing and training of task-specific SMoE architectures, with transparent data attribution, and the ability to merge experts from separate training runs. We will quantize and tune for low-latency inference as demanded by applications.

Full-cycle experimentation and evaluation

End-to-end experiment and pilot design, evaluation and training harnesses, and statistical analysis that increase model transparency before they reach regulated workloads.

Hardware-aware optimization

Custom kernels, sharding strategies, and throughput tuning validated on H100, H200, MI250X, and federated K8s clusters to push utilization past industry benchmarks.

Compliance-native software engineering

Formally quality-gated codebases, audit trails, and restriction-free license stacks that satisfy cross-industry governance, privacy, and security reviews.

Frontier architecture selection & custom SMoE design
Data curation, alignment, and structured attribution pipelines
Pre-training, post-training, and task-specific adaptation
Agentic tool-calling, LLM-as-judge, structured output scaffolds
On-prem GPU inference blueprints for Grace Hopper, H100/H200 and ARM-based servers
Production deployment playbooks with observability baked in

Nanomoe-2.5B-1A

Sparse-mixture-of-experts transformer crafted to validate our zero token-drop router, expert merging, 8-bit optimizer workflow, and full training pipeline end to end.

MoE // Marenostrum 5 (H100) // EuroHPC grant

70B Llama 3.2 Adaptation

Industry-specific post-training of a 70B Llama 3.2 variant with IPO objectives and FP8 quantization for single-superchip deployment.

400 GPUs // Lumi Supercomputer // 2024

ARM-GH200 Inference Optimization

Latency-tuned inference stack on ARM-based Grace Hopper nodes pairing custom Triton kernels with streaming adapters for hybrid clouds.

GH200 Superchip // Real-time scoring // 2025

Application-Specific MCP Integration

Design and integration of application-specific MCP servers wired into standard LLM agent pipelines with transparent routing and audit hooks.

Private MCP stack // Tool attribution // 2025

vLLM & SGLang Porting

Implementing private model architectures on top of vLLM and SGLang with observability, structured output tooling, and safe tool-calling. Maintainers of an Anthropic feature-complete API for SGLang.

Low-latency deployment // PCI-aligned // 2025

NEXT STEPS

Invest in sovereign compute and transparent AI

What we are looking for:

Projects interested in on-prem deployments of open-weights and open-source LLMs for industry-specific use (Operations, Fine-Tuning)
Partners interested in deploying their own hardware for GPU / Transformer workloads (Hardware-Software Integration)
Research labs looking to optimize inference (speed) of their model for production on specific Nvidia hardware (MLRE)

→ INITIATE CONTACT