Generative AI for Research

As part of the AI Initiative at FSU, this section of our documentation outlines how to use local models for research applications of generative AI.

We provide an introduction to hosting and running your own language models on FSU-owned infastructure and how you can abide by FSU research and security guidelines. We also address a few of the most common questions we get from researchers interested in developing and using AI models at FSU.

What is a language model?#

A large language model, often abbreviated as an LLM, or "Large Language Model", is a type of machine learning model trained on vast amounts of text data to understand, generate, and reason about human language. At its core, it predicts the most likely next word (or token) given a sequence of previous words and can be used for a wide range of research tasks.

Models come in varying sizes, and it is important that you understand why and how these sizes impact the type of research you can do. These sizes are determined by the number of parameters each model contains. In turn, these parameters are the learned weights that store the model’s knowledge. They can generally be categorized as follows:

Small models (~1M – 1B parameters)#

Simple classification, summarization, prototyping

These models are fast and efficient, and they can often be run on laptops and less powerful GPUs. They do tend to have limited context windows (the amount of data they can handle in each prompt before losing track of information previously provided).

Medium models (~1B – 100B parameters)#

More complex reasoning or longer context

These models are better at maintaining context but require heavier GPU use for inference and especially for fine-tuning or training with large amounts of data.

Large models (100B – 1T+ parameters)#

Highest capability, but often impractical for local workflows

These are the models that you have most likely interacted with when using Gemini, Copilot, or other large online models, which require significant dedicated computational resources for all tasks.

Cloud vs. local AI models: What is the difference?#

One of the first questions we get from researchers is whether to use cloud-hosted models or local models. Cloud-hosted models are the well-known "AI" chatbots such as Google Gemini, Anthropic Claude, Microsoft Copilot, etc. They tend to be easy to use, but they require transmitting data off-site which creates potential privacy and compliance concerns for researchers, especially those dealing with personally identifiable data in their research.

Conversely, local models are run locally on institutionally or personally owned hardware such as the HPC cluster or your personal workstation. You, the researcher, have complete control over the data; local models can be trained or fine-tuned to fit your specific research needs. Local models require a fair amount of technical expertise to configure and use, but they allow for a high degree of flexibility and customization, and they do not transmit any data over the internet.

Note

Local models reduce off-site data transmission but do not automatically satisfy all compliance requirements. Contact us if you have any questions or concerns about using local models.

Types of AI tasks#

There are two common types of AI tasks that you may perform in your research:

Inference – This is the one most of us are familiar with, which is when we feed a model a prompt with possible additional data and await a response or an "inference" from the model. Examples:
- Summarizing a paper
- Generating code
- Classifying text
Fine-tuning – Fine-tuning allows you to customize a language model for a specific domain or task by training it further on specialized data. Examples:
- Training a model to read regional dialects of a language
- Recognizing gene names in biology papers or part numbers in manufacturing docs.
- Training on a company's past support tickets so responses reflect their products and tone.

The HPC facilitates both types of tasks. You can use AI models for inference in your research workflows through the Ollama container or the AI Playground. Documentation on how to fine-tune is coming soon.

AI tools on the HPC#

If you want a basic chat interface, start with the AI Playground
If you want to integrate a model into Python or R code, start with Ollama

AI Playground#

The AI Playground is an interactive app in our Open OnDemand portal that provides a chat interface similar to Microsoft Copilot and Google Gemini. It allows you to chat with models hosted on our systems. We currently have a few models to choose from:

Google Gemma - 3-4B parameters
Google Gemma - 3-12B parameters
Qwen3 - 0.6B parameters
Mistral - 7B parameters
OpenAI GPT-OSS - 20B parameters

vLLM#

vLLM is an open-source, high-throughput Python library designed for fast and memory-efficient Large Language Model (LLM) inference.

Local models with vLLM

Ollama container#

The Ollama container is designed for researchers who need to integrate language models into programming workflows. We currently support local models in both R and Python tools through our Open OnDemand interactive apps. For more information on how to configure these, refer to the following pages: