Grok-1 Architecture Open-Sourced for General Release by xAI

March 19, 2024, 4:00 am

≫ Next: New Tech from MIT, Adobe Advances Generative AI Imaging

Elon Musk’s xAI has released its Grok chatbot and open-sourced part of the underlying Grok-1 model architecture for any developer or entrepreneur to use for purposes including commercial applications. Musk unveiled Grok in November and announced that it would be publicly released this month. The chatbot itself is available to X social premium members, who can ask the cheeky AI questions and get answers with a snarky attitude inspired by “The Hitchhiker’s Guide to the Galaxy” sci-fi novel. The training for Grok’s foundation LLM is said to include X social posts.

“Grok-1 is a 314 billion parameter ‘Mixture-of-Experts’ model trained from scratch by xAI,” SiliconANGLE reports, explaining that “a Mixture-of-Experts model is a machine learning approach that combines the outputs of multiple specialized sub-models, also known as experts, to make a final prediction, optimizing for diverse tasks or data subsets by leveraging the expertise of each individual model.”

Grok’s open-source release “does not include the full corpus of its training data,” VentureBeat notes, writing that “this doesn’t really matter for using the model, since it has already been trained, but it does not allow for users to see what it learned from.”

An xAI blog post states the “base model trained on a large amount of text data, not fine-tuned for any particular task.”

“Grok was open sourced under an Apache License 2.0, which enables commercial use, modifications, and distribution, though it cannot be trademarked and there is no liability or warranty that users receive with it,” according to VentureBeat, which notes that the license requires licensees to “reproduce the original license and copyright notice, and state the changes they’ve made.”

The New York Times writes that open-sourcing the Grok -1 code “is the latest volley between Mr. Musk and ChatGPT’s creator, OpenAI, which the mercurial billionaire sued” on March 1, over what Reuters says he alleged was “abandoning its original mission for a profit.”

Musk was a co-founder of OpenAI who departed when the company decided to sell a large stake to Microsoft. Musk at the time professed the view that “such an important technology should not be controlled solely by tech giants like Google and Microsoft, which is a close partner of OpenAI,” NYT says.

The beta version of Grok released in November was proprietary. Those interested in using Grok’s open-source code can download it from xAI on GitHub, or via Academic Torrents. “Hugging Face also added a fast download instance,” per VentureBeat.

The post Grok-1 Architecture Open-Sourced for General Release by xAI appeared first on ETCentric.

↧

New Tech from MIT, Adobe Advances Generative AI Imaging

March 28, 2024, 3:45 am

≫ Next: Databricks DBRX Model Offers High Performance at Low Cost

≪ Previous: Grok-1 Architecture Open-Sourced for General Release by xAI

Researchers from the Massachusetts Institute of Technology and Adobe have unveiled a new AI acceleration tool that makes generative apps like DALL-E 3 and Stable Diffusion up to 30x faster by reducing the process to a single step. The new approach, called distribution matching distillation, or DMD, maintains or enhances image quality while greatly streamlining the process. Theoretically, the technique “marries the principles of generative adversarial networks (GANs) with those of diffusion models,” consolidating “the hundred steps of iterative refinement required by current diffusion models” into one step, MIT PhD student and project lead Tianwei Yin says.

It could potentially be a new generative modeling method that saves time while maintaining or improving quality, Yin explains in MIT News, which writes that the single-step DMD model “could enhance design tools, enabling quicker content creation and potentially supporting advancements in drug discovery and 3D modeling, where promptness and efficacy are key.”

MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Adobe collaborators have shared their findings in a technical paper on arXiv and a GitHub overview that includes copious comparative images.

“The traditional process of generating images using diffusion models has been complex and time-consuming, often requiring multiple iterations for the algorithm to produce satisfactory results,” writes TechTimes, noting the new approach “leverages a teacher-student model, wherein a new computer model is trained to mimic the behavior of more complex, original models that generate images.”

DMD utilizes two key components: “a regression loss and a distribution matching loss,” reports TechTimes, explaining that “the regression loss ensures stable training by anchoring the mapping process, while the distribution matching loss aligns the probability of generating images with their real-world occurrence frequency.”

That dual approach is further assisted by two diffusion models for faster generation that minimizes the distribution divergence between the real and generated images. The results, TechSpot says, “are comparable to Stable Diffusion, but the speed is out of this world,” noting “the researchers claim their model can generate 20 images per second on modern GPU hardware.”

The researchers have “figured out how to make the most popular AI image generators 30 times faster,” condensing them into smaller models without a compromise in quality, writes Live Science.

The MIT researchers are not alone in applying a single-step approach to generative imaging, which may soon include generative video.

“Stability AI developed a technique known as Adversarial Diffusion Distillation (ADD) to generate 1-megapixel images in real-time,” TechSpot reports, detailing how the company “trained its SDXL Turbo model through ADD, achieving image generation speeds of just 207 ms on a single Nvidia A100 AI GPU accelerator,” using “a similar approach to MIT’s DMD.”

The post New Tech from MIT, Adobe Advances Generative AI Imaging appeared first on ETCentric.

↧

Databricks DBRX Model Offers High Performance at Low Cost

March 29, 2024, 5:00 am

≪ Previous: New Tech from MIT, Adobe Advances Generative AI Imaging

Databricks, a San Francisco-based company focused on cloud data and artificial intelligence, has released a generative AI model called DBRX that it says sets new standards for performance and efficiency in the open source category. The mixture-of-experts (MoE) architecture contains 132 billion parameters and was pre-trained on 12T tokens of text and code data. Databricks says it provides the open community and enterprises who want to build their own LLMs with capabilities previously limited to closed model APIs. Compared to other open models, Databricks claims it outperforms alternatives including Llama 2-70B and Mixtral on certain benchmarks.

“While not matching the raw power of OpenAI’s GPT-4, company executives pitched DBRX as a significantly more capable alternative to GPT-3.5 at a small fraction of the cost,” writes VentureBeat.

Likewise, TechCrunch calls DBRX “akin to OpenAI’s GPT series and Google’s Gemini.”

“While foundation models like GPT-4 are great general-purpose tools, Databricks’ business is building custom models for each client that deeply understand their proprietary data. DBRX shows we can deliver on that,” Databricks CEO Ali Ghodsi said at a press event covered by VentureBeat. “We’re excited to share DBRX with the world and drive the industry towards more powerful and efficient open-source AI,” he added.

A distinctive aspect of DBRX is its innovative approach to MoE. Whereas competing models typically use all parameters to generate individual words, DBRX has 16 expert sub-models that dynamically assign the four most relevant for each token. The result is high performance with a relatively modest 36 billion parameters active at any one time, thus faster, more cost-effective operation.

“The Mosaic team, a research unit acquired by Databricks last year, developed this approach based on its earlier Mega-MoE work,” VentureBeat reports, quoting Ghodsi saying the Mosaic team has rapidly improved over the years. “We can build these really good AI models fast — DBRX took about two months and cost around $10 million,” the executive said.

“Training mixture-of-experts models is hard,” Databricks explains in a blog post introducing the new model. “Now that we have done so, we have a one-of-a-kind training stack that allows any enterprise to train world-class MoE foundation models from scratch.”

The weights of the DBRX Base model and the finetuned DBRX Instruct are available on Hugging Face under an open license for research and commercial use. DBRX files are also downloadable on GitHub.

Databricks customers can access DBRX via APIs and “can pretrain their own DBRX-class models from scratch or continue training on top of one of our checkpoints using the same tools and science we used to build it,” Databricks says.

The post Databricks DBRX Model Offers High Performance at Low Cost appeared first on ETCentric.

↧