June 5 — A startup has demonstrated that a large AI model can be trained across GPUs scattered around the globe rather than in a single billion-dollar data center. This breakthrough was achieved by Macrocosmos, which built on the Bittensor network to unveil Orion-100B, a 100-billion-parameter language model trained across geographically distributed Nvidia A100 GPUs. The team’s system, called IOTA, splits the model itself across many machines using 16 pipeline-parallel stages, instead of requiring each participant to host the full model.
Heavy inter-GPU traffic, unstable nodes, and mismatched hardware were among the challenges that the team had to overcome.
Despite these hurdles, the team reported more than 30 percent model FLOP utilization and roughly 65 percent of the efficiency of a comparable data-center setup. A compression technique was used to cut traffic per stage from about 150 megabytes to 2.2 megabytes, significantly reducing the strain on the system.
This innovation suggests that large-scale AI training may not always require a single massive cluster, and points toward markets that reward owners of idle GPUs.
The result is not yet a replacement for hyperscaler infrastructure, but it is a significant step forward. It shows that with the right approach, it is possible to train large AI models without relying on a single, expensive data center.
What this means for the future of AI development is that more people and organizations may be able to participate in training large models, even if they don’t have access to a massive data center.
This could lead to more innovation and breakthroughs in the field, as well as new opportunities for those with idle GPUs. As the technology continues to evolve, it will be exciting to see how it is used and what new developments emerge. For now, the achievement of Macrocosmos is a significant one, and it will be interesting to watch how it is built upon in the coming weeks and months.





























