close
close

why researchers are now running tiny AIs on their laptops

why researchers are now running tiny AIs on their laptops

The website histo.fyi is a database of structures of immune system proteins called major histocompatibility complex (MHC) molecules. It contains images, data tables and amino acid sequences, and is maintained by bioinformatician Chris Thorpe, who uses artificial intelligence (AI) tools called large language models (LLMs) to convert those assets into readable summaries. But he doesn’t use ChatGPT or any other web-based LLM. Instead, Thorpe runs the AI ​​on his laptop.

In recent years, chatbots based on LLMs have won praise for their ability to write poetry or hold conversations. Some LLMs have hundreds of billions of parameters — the more parameters, the greater the complexity — and are accessible only online. But two more recent trends have emerged. First, organizations are creating “open weights” versions of LLMs, in which the weights and biases used to train a model are publicly available for users to download and run locally, if they have the computing power. Second, tech companies are creating stripped-down versions that can run on consumer hardware — and that rival the performance of older, larger models.

Researchers can use such tools to save money, protect patient or company confidentiality, or ensure reproducibility. Thorpe, who lives in Oxford, U.K., and works at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, U.K., is just one of many researchers exploring what the tools can do. That trend is likely to grow, Thorpe says. As computers get faster and models more efficient, people will increasingly have AIs running on their laptops or mobile devices for all but the most intensive needs. Scientists will finally have AI assistants at their fingertips—but the actual algorithms, not just remote access to them.

Big things in small packages

Several large technology companies and research institutes have released small and open-weight models in recent years, including Google DeepMind in London; Meta in Menlo Park, California; and the Allen Institute for Artificial Intelligence in Seattle, Washington (see “Some small open-weight models”). (“Small” is relative: these models can contain about 30 billion parameters, which is large compared to earlier models.)

While California-based tech company OpenAI hasn’t open-weighted its current GPT models, its partner Microsoft in Redmond, Washington, has been busy releasing the small-language models Phi-1, Phi-1.5, and Phi-2 in 2023, and then four versions of Phi-3 and three versions of Phi-3.5 this year. The Phi-3 and Phi-3.5 models have between 3.8 billion and 14 billion active parameters, and two models (Phi-3-vision and Phi-3.5-vision) process images1According to some benchmarks, even the smallest Phi model outperforms OpenAI’s 2023 GPT-3.5 Turbo, which is said to have 20 billion parameters.

Sébastien Bubeck, Microsoft’s vice president of generative AI, attributes Phi-3’s performance to its training dataset. LLMs initially train by predicting the next “token” (a piece of text) in long strings of text. For example, to predict the name of the murderer at the end of a murder mystery, an AI must “understand” everything that came before it, but such consistent predictions are rare in most texts. To get around this problem, Microsoft used LLMs to write millions of short stories and textbooks in which one thing builds on another. The result of training on this text, Bubeck says, is a model that fits on a mobile phone but has the power of the first version of ChatGPT, due out in 2022. “If you’re able to create a dataset that’s very rich in those reasoning tokens, then the signal is going to be much richer,” he says.

Phi-3 can also help with routing: deciding whether to send a query to a larger model. “That’s where Phi-3 is going to shine,” Bubeck says. Small models can also help scientists in remote areas with little cloud connectivity. “Here in the Pacific Northwest, we have great places to hike, and sometimes I just don’t have a network,” he says. “And maybe I want to take a picture of a flower and ask my AI some information about it.”

Researchers can build on these tools to create custom applications. For example, Chinese e-commerce site Alibaba has built models called Qwen, with 500 million to 72 billion parameters. A biomedical scientist in New Hampshire refined the largest Qwen model using scientific data to create Turbcat-72b, which is available on the model-sharing site Hugging Face. (The researcher goes by only the handle Kal’tsit on the Discord messaging platform, since AI-enabled work in science remains controversial.) Kal’tsit says she created the model to help researchers brainstorm, proof manuscripts, prototype code and summarize published papers; the model has been downloaded thousands of times.

Maintain Privacy

In addition to the ability to refine open models for targeted applications, Kal’tsit says, another benefit of local models is privacy. Sending personally identifiable data to a commercial service could violate data protection regulations. “If there was an audit and you showed that you were using ChatGPT, things could get pretty ugly,” she says.

Cyril Zakka, a physician who leads the health team at Hugging Face, uses local models to generate training data for other models (some of which are local as well). In one project, he uses them to extract diagnoses from medical reports so that another model can learn to predict those diagnoses based on echocardiograms, which are used to monitor heart disease. In another project, he uses the models to generate questions and answers from medical textbooks to test other models. “We’re paving the way to fully autonomous surgery,” he explains. A robot trained to answer questions could better communicate with doctors.

Zakka uses local models — he prefers Mistral 7B, released by Paris-based tech firm Mistral AI, or Meta’s Llama-3 70B — because they’re cheaper than subscription services like ChatGPT Plus, and he can fine-tune them. But privacy is also important, because he can’t send patients’ medical records to commercial AI services.

Johnson Thomas, an endocrinologist at Mercy Health System in Springfield, Missouri, is also motivated by patient privacy. Clinicians rarely have time to transcribe and summarize patient interviews, but most commercial services that use AI to do this are either too expensive or not approved to handle private medical data. So Thomas is developing an alternative. Based on Whisper — an open-source, weighted speech recognition model from OpenAI — and Google DeepMind’s Gemma 2, the system will allow doctors to transcribe conversations and turn them into medical notes, as well as summarize data from medical research participants.

Privacy is also a consideration in the industry. CELLama, developed at South Korean pharmaceutical company Portrai in Seoul, uses local LLMs such as Llama 3.1 to reduce information about a cell’s gene expression and other characteristics to a summary sentence2. A numerical representation of this sentence is then created, which can be used to cluster cells into types. The developers highlight privacy as a benefit on their GitHub page, noting that CELLama “runs locally, eliminating data leakage.”

Using models well

As the LLM landscape evolves, scientists are faced with a rapidly changing menu of options. “I’m still in the tinkering phase, playing around with using LLMs locally,” says Thorpe. He tried ChatGPT, but found it expensive and the tone of the output was not good. Now he uses Llama locally, with 8 billion or 70 billion parameters, both of which can run on his Mac laptop.

Another advantage, Thorpe says, is that local models don’t change. Commercial developers, on the other hand, can update their models at any time, leading to different outcomes and forcing Thorpe to change his prompts or templates. “In most science, you want things that are reproducible,” he explains. “And it’s always a concern if you don’t have control over the reproducibility of what you’re generating.”

For another project, Thorpe is writing code that aligns MHC molecules based on their 3D structure. To develop and test his algorithms, he needs many different proteins—more than exist naturally. To design plausible new proteins, he uses ProtGPT2, an open-weight model with 738 million parameters trained on about 50 million sequences3.

Sometimes, though, a local app just won’t cut it. For coding, Thorpe uses the cloud-based GitHub Copilot as a companion. “It feels a bit like my arm’s been chopped off if I can’t use Copilot for some reason,” he says. Local LLM-based coding tools exist (such as Google DeepMind’s CodeGemma and California-based developer Continue ), but in his experience, they can’t compete with Copilot.

Access points

So, how do you run a local LLM? Software called Ollama (available for Mac, Windows, and Linux operating systems) lets users download open models, including Llama 3.1, Phi-3, Mistral, and Gemma 2, and open them via a command line. Other options include the cross-platform app GPT4All and Llamafile, which can transform LLMs into a single file that runs on any of the six operating systems, with or without a graphics processing unit.

Sharon Machlis, a former editor at the website InfoWorld who lives in Framingham, Massachusetts, wrote a guide to taking LLMs locally, in which she reviewed a dozen options. “The first thing I would recommend,” she says, “is to match the software you choose to the level of tinkering you want to do.” Some people prefer the convenience of apps, while others prefer the flexibility of the command line.

Whatever approach you take, local LLMs should soon be good enough for most applications, says Stephen Hood, who leads open-source AI at the San Francisco-based technology company Mozilla. “The pace of progress in that area has been astonishing in the last year,” he says.

What those applications may be, is up to the users to decide. “Don’t be afraid to get your hands dirty,” says Zakka. “You’ll be pleasantly surprised by the results.”