Earlier this year, I started looking at how to run a fully on-prem AI.  In February, I bought a machine to run the inference engine on and set up Tailscale (which works similarly to Hamachi) to connect to it remotely.  If you want to use it remotely, there are a lot of options for native clients.

MacOS

My favorite client for MacOS is MindMac.  You can buy it for under $30, it works with multiple models, servers, and server types, and it is easy to use.

If you want to look further into it, you can check it out at mindmac.app.

Android

My favorite client for Android is Amallo.  It is $23 and like MindMac, it works with multiple models, servers, and server types.  My only complaint would be that uploading a base64-encoded image to the model doesn’t seem to work well.

If you want to look further into it, you can check it out at doppeltilde.com.

ipadOS

There is a version of Amallo for iPadOS but I have been liking Enchanted LLM more.  If you like it, there is a version for macOS as well.  It has the added benefit of being free.

If you want to look further into it, you can check it out at the project’s GitHub page.

Have any questions, comments, etc?  Please feel free to drop a comment, below.

https://jws.news/2024/how-i-use-ai/

#AI #Amallo #Enchanted #LLM #MindMac #Ollama

LLM / ML – JWS.news

Posts about LLM / ML written by steinbring

JWS.news

Yesterday, we played with Llama 3 using the Ollama CLI client (or REPL).  Today I figured that we would play with it using the Ollama API.  The Ollama API is documented on their Github repo.  Ollama has a client that runs when you run ollama run llama3 and a service that can be accessed from something like MindMac, Amallo, or Enchanted.  The service is what starts when you run ollama serve.

In our first Llama 3 post, we asked the model for “a comma-delimited list of cities in Wisconsin with a population over 100,000 people”.  Using Postman and the completion API endpoint, you can ask the same thing.

You will notice the stream parameter is set to false in the body. If the value is false, the response will be returned as a single response object, rather than a stream of objects.  If you are using the API with a web application, you will want to ask the model for the answer as JSON and you will probably want to provide an example of how you want the answer formatted.

You can use Node and Node-fetch to do the same thing.

If you run it from the terminal, it will look like this:

Have any questions, comments, etc?  Please feel free to drop a comment, below.

https://jws.news/2024/lets-play-more-with-llama-3/

#AI #Amallo #Enchanted #llama3 #LLM #MindMac #NodeJs #Ollama #Postman

Lets play with Llama 3

Last week, Meta announced Llama 3.  Thanks to Ollama, you can run it pretty easily.  There are 8b and 70b variants available.  There are also pre-trained or instruction-tuned variants available.   …

JWS.news

The browsers of choice installed are #Firefox, #Brave and of course #Safari. I am tempted to make a switch to Safari as my daily driver. I have never really used it, as it was quite difficult to customize.

For all things AI, I am pleased with #Ollama and one of the current top 3 models.

When it comes to GUIs for Ollama I have #MindMac, #LMStudio and #DiffusionBee installed.

But to be honest, I am not a heavy AI user. I still do not have found a (killer) use case for me.

@dpowr @dsoft same for me. Why send my data over the wire when I can have all that power locally on my machine. For once, I do not want to be the product.

With regards to interfacing with #Ollama, I use #MindMac, #LMStudio and #Diffusionbee. I installed the #OllamaWebUI too. Have to play with it some more.

Another great feature is that the Ollama API is compatible with OpenAI’s now. That allows Ollama to work with so many more tools and libraries out there.

Back in December, I started exploring how all of this AI stuff works.  Last week’s post was about the basics of how to run your AI.  This week, I wanted to cover some frequently asked questions.

What is a Rule-Based Inference Engine?

A rule-based inference engine is designed to apply predefined rules to a given set of facts or inputs to derive conclusions or make decisions.  It operates by using logical rules, which are typically expressed in an “if-then” format.  You can think of it as basically a very complex version of the spell check in your text editor.

What is an AI Model?

AI models employ learning algorithms that draw conclusions or predictions from past data.  An AI model’s data can come from various sources such as labeled data for supervised learning, unlabeled data for unsupervised learning, or data generated through interaction with an environment for reinforcement learning.  The algorithm is the step-by-step procedure or set of rules that the model follows to analyze data and make predictions. Different algorithms have different strengths and weaknesses, and some are better suited for certain types of problems than others.  A model has parameters that are the aspects of the model that are learned from the training data.  A model’s complexity can be measured by the number of parameters contained in it but complexity can also depend on the architecture of the model (how the parameters interact with each other) and the types of parameters used.

What is an AI client?

An AI client is how the user interfaces with the rule-based inference engine.  Since you can use the engine directly, the engine itself could also be the client.  For the most part, you are going to want something web-based or a graphical desktop client, though.  Good examples of graphical desktop clients would be MindMac or Ollamac.  A good example of a web-based client would be Ollama Web UI.  A good example of an application that is both a client and a rule-based inference engine is LM Studio.  Most engines have APIs and language-specific libraries, so if you want to you can even write your own client.

What is the best client to use with a Rule-Based Inference Engine?

I like MindMac.  I would recommend either that or Ollama Web UI.  You can even host both Ollama and Ollama Web UI together using docker compose.

What is the best Rule-Based Inference Engine?

I have tried Ollama, Llama.cpp, and LM Studio.  If you are using Windows, I would recommend LM Studio.  If you are using Linux or a Mac, I would recommend Ollama.

How much RAM does your computer need to run a Rule-Based Inference Engine?

The RAM requirement is dependent upon what model you are using.  If you browse the Ollama library, Hugging Face, or LM Studio‘s listing of models, most listings will list a RAM requirement (example) based on the number of parameters in the model.  Most 7b models can run on a minimum of 8GB of RAM while most 70b models will require 64GB of RAM.  My Macbook Pro has 32GB of unified memory and struggles to run Wizard-Vicuna-Uncensored 30b.  My new AI lab currently has 128GB of DDR4 RAM and I hope that it can run 70b models reliably.

Does your computer need a dedicated GPU to run a Rule-Based Inference Engine?

No, you don’t.  You can use just the CPU but if you have an Nvidia GPU, it helps a lot.

I use Digital Ocean or Linode for hosting my website. Can I host my AI there, also?

Yeah, you can.  The RAM requirement would make it a bit expensive, though.  A virtual machine with 8GB of RAM is almost $50/mo.

Why wouldn’t you use ChatGPT, Copilot, or Bard?

When you use any of them, your interactions are used to reinforce the training of the model.  That is an issue for more than the most basic prompts.  In addition to that, they cost up to $30/month/user.

Why should you use an open-source LLM?

What opinion does your employer have of this research project?

You would need to direct that question to them.  All of these posts should be considered personal opinions and do not reflect the views or ethics of my employer. All of this research is being done off-hours and on my own dime.

Why are you interested in this technology?

It is a new technology that I didn’t consider wasteful bullshit in the first hour of researching it.

Are you afraid that AI will take your job?

No.

What about image generation?

I used (and liked) Noiselith until it shut down.  DiffusionBee works but I think that Diffusers might be the better solution.  Diffusers lets you use multiple models and it is easier to use than Stable Diffusion Web UI.

You advocate for not using ChatGPT. Do you use it?

I do.  ChatGPT 4 is a 1.74t model.  It can do cool things.  I have an API key and I use it via MindMac.  Using it that way means that I pay based on how much I use it instead of using it via a Pro account, though.

Are you going to only write about AI on here, now?

Nope.  I still have other interests.  Expect more Vue.js posts and likely something to do with Unity or Unreal at some point.

Is this going to be the last AI FAQ post?

Nope.  I still haven’t covered training or fine-tuning.

https://jws.news/2024/ai-frequently-asked-questions/

#AI #ChatGPT #Docker #LLM #LMStudio #MindMac #Ollama #Ollamac

LLM / ML – JWS.news

Posts about LLM / ML written by steinbring

JWS.news
#MindMac set to launch at login, with inline enabled, is giving me a glimpse into what system-wide integration of #AI and #LM with #ChatGPT can look like. Working with #ObsidianMD, the possibilities are incredible. Is there a #Linux #distro focused on bringing AI together with #PKM yet? Am also thinking about what this could look like if fully baked into an e-ink tablet like Remarkable. #pkms