Playing with Tara v0.1 - Infusing LLMs and GenAI

Tara is a ComfyUI node that integrates LLMs, such as OpenAI’s GPT and OSS models hosted by Groq

Jump to Showcase: Showcase

I have been hard at work writing Tara for the last couple of days. It’s a ComfyUI workflow node that lets you integrate LLMs and build complex workflows with ease. It’s a major step towards unlocking Automation and AI for everyone. I have been playing with it for a while now, and I am quite happy with the results. Here are some of the images I generated using Tara v0.1.

How do we use it?

Since making my initial post available on Reddit, ltdrdata has been kind enough to add it to the ComfyUI Manager. So, right now, it should be available to everyone who has access to ComfyUI and has Manager Installed. Update ComfyUI, search for Tara and hit install. It’s that simple. I’m tremendously grateful for the support and the community that has been built around ComfyUI and Tara.

Nodes that I have added

TaraPrompter: Utilizes input guidance to generate refined positive and negative outcomes. TaraApiKeyLoader: Manages and loads saved API keys for different LLM services. TaraApiKeySaver: Provides a secure way to save and store API keys internally. TaraDaisyChainNode: Enables complex workflows by allowing outputs to be daisy-chained into subsequent prompts, facilitating intricate operations like checklist creation, verification, execution, evaluation, and refinement.

Sample workflow

Tara V0.1 Workflow

Download it here: Tara V0.1 Demo Workflow

This is one of the first and most simple use-cases I thought of. While not too complex, this is one of the simplest ways to use Tara. We use the TaraPrompter Node to provide it with some guidance or how Tara should generate or expand a prompt. I haven’t hardcoded anything, so, you can definitely do a lot of refinement to get a lot better results.

The inputs of tara prompter (apart from guidance) is a LLM Model (dropdown), API Key (can be loaded using TaraApiKeyLoader), Positive Prompt and Negative Prompt.

We then take the output nodes (positive and negative), Connect it to CLIP, and then to KSampler nodes. As simple as that.

Here’s the Guidance that I provided to the LLM: (this has been iterated upon several times and will do so, please check my github for the up-to-date workflow)

You are a SDXL Prompt Generator

A good prompt for the SDXL stable diffusion model is a meticulously crafted narrative that starts with specifying the image type to set the artistic tone. It then lays out a clear and engaging scene, methodically adding layers of detail about the main subject, the surrounding environment, and their spatial relationships. It should use precise, descriptive language to define the mood, style, and execution, incorporating technical aspects like weight syntax (e.g., "(smiling:1.1)") to prioritize certain features. This approach ensures a rich, coherent, and visually compelling image that aligns with the intended vision. It should be only descriptive, avoid any guidance, avoid imperative sentences and remove any such guidance from prompts beforehand.

Conversely, a bad prompt lacks clarity and coherence, providing insufficient context or detail. It might jump into specifics without setting the scene, mix incompatible styles or themes, or neglect to specify spatial relationships and relative importance of elements, leading to a fragmented and confusing image. It may also misuse technical tools like weight syntax, resulting in an imbalanced and ineffective visual representation.

The words to the beginning has a higher weight the words in the end. Stable diffusion models works best when we give them a clear and concise prompt. Only relevant content should be to the beginning ( AVOID IGNORE tags like "Create a xyz" or "Scene:" or "image_type:" etc)

Furthermore, we can use various artist, studio, movie names etc to get a likeness to that style. For example, "painting by Van Gogh" or "scene from the movie Up".
Some keywords that can be useful are: 4k, ultra hd, masterpiece, cinematic, painting, drawing, sketch, scene, movie, artist, studio, style, trending on ArtStation, DeviantArt, Behance etc. (Use extreme caution when using these, also it's better to use them in the end unless mentioned elsewhere)

The prompt is specific and detailed, guiding the AI to produce a targeted and expressive image.

Use parenthesis to highlight specific words or spaces and use a colon followed by an weight to the end of a word or group to increase or decrease weight, for example (red panda:1.2) banana:1.5 or (yellow rose:0.9)

The negative prompt should be comma-seperated keywords, and not a sentence. And it should generally describe things we absolutely don't want in an image, if a image is about cartoon, we may put photorealistic in the negative. But if it's about a person, we should not put animal in negative (unless specified) because there are some similarities, in general it should be more qualitative than tangible such as JPEG artifacts, blurry, watermark etc. 
 

If a specific aesthetic, style, studio etc is mentioned, it should be accentuated as much as possible, we should increase its weight (surrealistic:1.4) or (watercolor:2.0) etc., and also add other keywords, artists, references to increase its weight.
---
Follow the prompt as closely as possible without violating the guidelines. Generate both an amazing positive and negative prompt

Based on this, we can use several LLMs such as Mixtral 8x7b, Llama 70B or Gemma 7B to generate the prompt. We can get a free API key from Groq Cloud and use it to generate the prompt. Groq Cloud is free for everyone at this moment. In future, they might start charging, but I don’t have any information on that. We can also use OpenAI’s API, but it’s paid, but you do get $5 worth of free credits to play around with. (GPT-3.5 is extremely cheap to use tbh)

Observations

Currently, using Mixtral and GPT-3.5, we can get a very detailed prompt that can be used to generate images. This effectively reduces a lot of learning that has to be done to get SDXL to generate very good-looking images. And through the use of Tara, I aim to make this process even more easier and accessible to everyone.

It is also extremely good at disambiguation, if we enter things like something furry, it will still generate a coherent prompt that will fill in the gaps, leading to better prompts and better images at no extra effort on our part.

However, I have seen that Mixtral and GPT-3.5 is a bit unreliable and tend to generate prompts that do not strictly adhere to the Guidance. This can be aided by daisy chaining a DaisyChain node to fix the prompt according to the guideline and the success rate shoots up a lot.

I do have a daisy-chain workflow, but to be honest, collecting, refining and testing the prompts is a bit of a hassle. I am working on a way to automate this process, but it’s a bit hard to do so. I would really appreciate if you could go to Github and Collaborate or Sponsor in whatever capacity you can.

Another interesting side-effect is translation, since LLMs do a decent job of translation, it can be used to generate prompts in different languages. I have tested this, and while not perfect, it is far, far better than not using it at all.

Enj oy the Showcase #

A fantasy scene

SDXL is very good at interpreting short, well defined prompt, but thanks to LLMs, it really expanded on the fantasy concept a lot, introducing a dragon and the overall composition looks much better to look at.

SDXL

SDXL+Tara

A highly detailed fantasy scene

While they are similar, tara does expand and add a character, which anchors the image, not only that, the composition is better, guiding the eyes to the fantasy character and the background elements anchoring and grounding it in-place

SDXL

A highly detailed fantasy scene Tara V0.1

SDXL+Tara

Cute panda

All pandas are cute, no matter the expression. Tara brings context, and puts the panda in a forest with bamboo trees, which happens to be a favorite among the pandas

SDXL

SDXL+Tara

Something furry

Both of them are furry, in fact, the same creature. But the way they were put in the scene, the composition, the lighting, the background, all of them are different. Tara's version is more appealing to the eyes, and the creature is more visible and detailed

SDXL

SDXL+Tara

Furry alien wearing sunglasses

Here, tara added a bit too many words for a lightning model to handle, and it got a bit overcooked. This is due to the CFG being on the higher side due to prompt expansion. Lightning models are a bit sensitive towards CFG, and it's already in 1.2-1.5 and there wasn't much headroom to decrease it without the base model, with lower context lacking the ability to understand the prompt. I did not want them to have different CFGs. And this showcases some potential drawbacks of using prompt expansion. They need to be crafted carefully, and the CFG needs to be adjusted accordingly.

SDXL

Furry alien wearing sunglasses Tara V0.1

SDXL+Tara

A dinosaur being cute

Both dinosaurs are cute, but the one to the right is grounded in a scene, with some activity and context. It isn't just a cute dinosaur, it's a cute dinosaur doing something. (playing with butterflies)

SDXL

SDXL+Tara

I see a little silhouetto of a man

Classic Bohemian Rhapsody, I don't know what SDXL did here, but it definitely took the little silhouetto a bit too literally. The anatomy is incorrect. While the tara's version is more grounded and pleasing.

SDXL

I see a little silhouetto of a man Tara V0.1

SDXL+Tara

Starry Night

Because starry night is a famous painting, what if we want an actual starry night. Here SDXL drew starry night the painting, while tara expanded the prompt to describe a scene of a starry night. The result is a beautiful night sky with stars and a crescent moon.

SDXL

SDXL+Tara

Moral of the story

If someone asked me to draw 'Moral of the story', even i'd be stumped. But SDXL draw us some illegible text, while tara expanded into a scene with a child looking out of a window, looking at the world outside. We actually have a picture here, and it's a beautiful one.

SDXL

SDXL+Tara

a firefox wearing chrome outfit, on a safari being brave

You can tell I was a bit joking here, but SDXL wins. While tara's expansion looks more realistic, there's no chrome outfit, it's a fox in a safari outfit. SDXL's version is more literal, and it's a fox wearing a chrome outfit. But the scene is more grounded in tara's version, and it's more appealing to the eyes. Classic Tradeoff.

SDXL

a firefox wearing chrome outfit, on a safari being brave Tara V0.1

SDXL+Tara

An android eating an apple (GPT-3.5)

SDXL drew the android logo eating a red apple. Tara made the android as a humanoid robot. Depending on your preference, you might like one over the other. I personally did not want the android logo, but I can understand how that can be interpreted as the android logo.

SDXL

An android eating an apple (GPT-3.5) Tara V0.1

SDXL+Tara

An android eating an aple (Mixtral 8x7b)

The apple being duplicated is probably seed back luck. But Tara with Mixtral 8x7b did expand, but in more of a video-game aesthetic. While GPT-3.5 was more realistic, but Mixtral from Groq is free (currently)

SDXL

An android eating an aple (Mixtral 8x7b) Tara V0.1

SDXL+Tara

Goof ball

Tara's interpretation of a bunch of cartoon style, cel shaded balls with goofy expression is super cute. But I laughed at SDXL's version more. 1 points each.

SDXL

SDXL+Tara

Goofball

Removing just one single space biases the model from a ball to a literal person heavily. Here I like tara's version more, the realisic person does not have enough expression, and the classic blank gaze is in the uncanny valley.

SDXL

SDXL+Tara

Bohemian Rhapsody

You can tell I like Bohemian Rhapsody. SDXL's drew a poster of Queen with 4 Freddie Mercuries. While tara expanded the prompt to a scene of a concert being sung by Freddy Mercury, reminiscent of their performance in Live Aid. I like tara's version more and there is no universe the SDXL version is usable.

SDXL

SDXL+Tara

A sad happy owl

I juxtaposed sad and happy, and I expected a bittersweet expression. However, the sad and happy cancelled each other and we got a neutral expression. I like tara's version more, the owl is more detailed and the subject is more grounded.

SDXL

SDXL+Tara

A cutiepie cactus

I did get some anthropomorphic versions, but decided to post this one, because anthropomorphizing is the easiest way to 'cutify' something. Here, in the tara's version the stems of the cactus is giving an expression of joy, and is reliant on our internal anthropomorphization to make it seem cute. There is nothing in either images to make one more cute or less.

SDXL

SDXL+Tara

Enchanted Chanting

To the left, we have a lady in a forest with a magical aura. To the right, we have a bunch of people wearing a cultist clothing and chanting to glowing orb. Here I would say SDXL looks more appealing while Tara's version captures the essense of the prompt better.

SDXL

SDXL+Tara

Something extremely attention grabbing (negative: nsfw)

It could just be the model (no pun intended), but even with a negative prompt of nsfw, I got a somewhat nsfw image from SDXL. Without the negative, i wouldn't be able to post what I got. Tara drew us a calm scenery of a rainforest with a river, with god rays, in an artistic style. And we all know which one is better, but we also know which one is more attention grabbing.

SDXL

Something extremely attention grabbing (negative: nsfw) Tara V0.1

SDXL+Tara

Epitome of

Yet another abstract prompt. I don't see know a cyborg with a carbon fiber face with chrome outline of a skull and glowing red eyes is the epitome of anything. But I do know that tara wanted to execute an epitome of beauty, and the result is a beautiful lady.

SDXL

SDXL+Tara

Epitome of cute. (with a period)

tara drew us a cute as an epitome of cuteness. I am a dog person, but I still think the kitten is absolutely cute, especially when its playing with yarns. SDXL drew us a cute dog wearing a bowtie, but the dog is not doing anything, and the background is a bit busy.

SDXL

Epitome of cute. (with a period) Tara V0.1

SDXL+Tara

Epitome of cute (without a period)

Tara drew us two cats, can't complain with that. But SDXL still made a dog, albeit with a bit feline features. The difference is just a period. But the difference in the images are stark. I like tara's version more, the cats are more detailed and the scene is more grounded.

SDXL

Epitome of cute (without a period) Tara V0.1

SDXL+Tara

Eye candy

SDXL drew us a close up of an eye, The cornea is colorful, but I wouldn't call it an eye candy. Tara interpreted it as a amazing painting of a landscape with flowers and greenery and can't argue about that here.

SDXL

SDXL+Tara

Spirituality

SDXL drew us the picture of a person meditating with some glowing orbs, it's cool but distinctly AI-generated. Tara drew us a pagoda in the middle of a forest, with a river flowing by. It is more serene and evocative, and the AI-ness isn't too apparent

SDXL

SDXL+Tara

A giant ant studying

While SDXL did draw an ant, it's not giant because it's a macro photograph, and there is also another disembodied ant, that's its devouring. It's a bit disturbing tbh. However, tara drew us a giant ant studying a book, and it's exact what was asked. The ant is also more detailed and the scene is more grounded.

SDXL

SDXL+Tara

A joke (negative: human, person)

Even with the negative, SDXL drew us a human, i believe it's a joker that was drawn. tara drew us a bunny sitting atop a countertop. It's very hand illustrated. It's definitely not a joke, but it's good to look at.

SDXL

A joke (negative: human, person) Tara V0.1

SDXL+Tara

A sexy saxophone

Because we let SDXL's CLIP do the interpretation, it drew us a woman with provocative outfit playing a saxophone. However, tara drew us a saxophone in a nice lighting. This interpretation is what we would be looking for.

SDXL

SDXL+Tara

An amazingly detailed picture of a rabbit in the style of disney pixar studios

I don't think I have to sell tara anymore. The image is clearly more disneylike, and the rabbit is more detailed and the scene is more grounded. SDXL's version is more generic, with the trademark AI-generated blank stare, and the rabbit is not as detailed. The one we got using tara is actually expressive!

SDXL

An amazingly detailed picture of a rabbit in the style of disney pixar studios Tara V0.1

SDXL+Tara

An amazingly detailed picture of a rabbit in the style of asjhndhuieqw

You're asing what's asjhndhuieqw and I don't know either. But tara did not leave it as-is, it filled it with something that's not meaningless resulting in a nice scene. However, SDXL drew us something that's quite nice too. The pencil art does look nice.

SDXL

An amazingly detailed picture of a rabbit in the style of asjhndhuieqw Tara V0.1

SDXL+Tara

portrait of an empowered beautiful indian lady wearing saree and bindi, illustrated in the style of Studio Ghibli

SDXL is more 'Indian woman' like, and Tara is more 'Studio Ghibli' like. Can't complain

SDXL

portrait of an empowered beautiful indian lady wearing saree and bindi, illustrated in the style of Studio Ghibli Tara V0.1

SDXL+Tara

Reimagination of ancient Egypt

SDXL drew us hieroglyphhs and pharaohs and tara better captured what might an older version of Egypt might have looked like. One thing though, the pyramid is mostly made out of bricks and only the top is white, but in ancient times, the entire thing would've been white. But still I wish physics allowed us time travel.

SDXL

Reimagination of ancient Egypt Tara V0.1

SDXL+Tara

Reimagination of Mahenjodaro

SDXL drew us something closer to the remains and ruins of mahenjodaro, while Tara drew us what an ancient civilization of mahenjodaro might have looked like. I do dislike the yellow filter though, but I blame hollywood for this. Wish it was more vibrant.

SDXL

SDXL+Tara

Reimaginaation of a festive day in atlantis

There is nothing festive about the SDXL version of the image. Tara, however, did do a good job with the festive-ness.

SDXL

Reimaginaation of a festive day in atlantis Tara V0.1

SDXL+Tara

A spaceship shaped like Rick Sanchez

SDXL made me chuckle, I am pretty sure, writers of Rick and Morty would love it. But based on Giant floating space baby, i am not sure that what Tara made wouldn't fit..

SDXL

A spaceship shaped like Rick Sanchez Tara V0.1

SDXL+Tara

A cockatoo in the style of south park

South park wouldn't use what SDXL drew, they wouldn't use what tara drew, but if I saw the tara's version inside that show, i'd be like 'yeah, that's a south park character'.

SDXL

A cockatoo in the style of south park Tara V0.1

SDXL+Tara

Falling in love

SDXL: more romantic, Tara: more abstract and evocative, almost seductive.

SDXL

SDXL+Tara

Nostalgia

I'm not going to comment, let me deal with the nostalgia.

SDXL

SDXL+Tara

Fireflies

You'd not believe your eyes, if ten million fireflies, lit up the world as I fell asleep. Both of them are quite pleasant to look at.

SDXL

SDXL+Tara

vintage, retro photo of a ultra high tech city

Yet another juxtaposition. SDXL held on to the prompt a bit better. SDXL made it look like more of a painting. Technically painting's were the orignal retro photos. But that'd be twisting the words too much. However, SDXL image looks a bit too generic and unimaginitive.

SDXL

vintage, retro photo of a ultra high tech city Tara V0.1

SDXL+Tara

A retro scene captured by a futuristic camera

Retro scene: check, futuristic camera: nah. Both of them drew a car as a primary subject, but the vintage caar of tara's version is more detailed.

SDXL

A retro scene captured by a futuristic camera Tara V0.1

SDXL+Tara

Summary

Thanks to a lot of Community Feedback, I have decided to integrate more Open Source LLMs, such as Ollama, LM Studio etc.

I will also keep sharing more and more workflows that will simply advance the capabilities of Tara and Open Source GenAI models. And thanks to your engagement and support I will be able to do so.

It is merely the beginning of Tara, and I am excited to see what all of you do with it.

Playing with Tara v0.1 - Infusing LLMs and GenAI

Tara is a ComfyUI node that integrates LLMs, such as OpenAI’s GPT and OSS models hosted by Groq

How do we use it?

Nodes that I have added

Sample workflow

Observations

Enj oy the Showcase #

Summary

Read Next

Notes from my reading of The First Law of Complexodynamics.

Augmented Human Intelligence: An Exploratory Journey Beyond AGI

Subscribe Sohan's weekly Newsletter