Inception is a new company in Palo Alto, driven by Stanford professor Stefano Ermon. They have created a fresh AI model using “diffusion” technology, called a diffusion-based large language model, or DLM.
Currently, generative AI falls into two main groups: large language models (LLMs) and diffusion models. LLMs, based on the transformer architecture, specialize in generating text. On the other hand, diffusion models, like those used in Midjourney and OpenAI’s Sora, focus on creating images, video, and audio.
Inception’s model combines the strengths of LLMs, such as coding and answering questions, but claims to do so faster and at a lower cost.
Ermon has long explored how diffusion models could enhance text generation. He believes traditional LLMs are sluggish compared to diffusion technology. LLMs generate words one at a time, whereas diffusion models tackle data all at once, making them quicker.
After years of research, Ermon and a student made significant progress in applying this diffusion approach to text. They documented their findings in a research paper published last year.
Seeing the potential of their work, Ermon founded Inception last summer, partnering with former students Aditya Grover and Volodymyr Kuleshov.
Although Ermon didn’t share details about funding, sources say the Mayfield Fund has invested in the company.
Inception has quickly attracted customers, including some Fortune 100 firms, eager for faster AI performance and reduced latency, according to Ermon.
“Our models utilize GPUs more efficiently,” said Ermon, highlighting a shift in how language models are built.
Inception provides an API, options for on-premises and edge device deployment, model fine-tuning support, and a variety of ready-to-use DLMs for different applications. They claim their DLMs can operate up to 10 times quicker than standard LLMs while costing significantly less.
A spokesperson mentioned, “Our ‘small’ coding model matches the performance of [OpenAI’s] GPT-4o mini but is over 10 times faster.” Their ‘mini’ model surpasses some open-source models like [Meta’s] Llama 3.1 8B, achieving over 1,000 tokens per second.
To industry insiders, “tokens” refer to fragments of data. Achieving 1,000 tokens per second is quite impressive, if Inception’s claims hold true.
Source link
diffusion,dlms,Funding,inception








