Sohan Basak
Featured /

Playing with Tara v0.1 - Infusing LLMs and GenAI

A before-after comparison of Tara V0.1 nodes.

Tara is a ComfyUI node that integrates LLMs, such as OpenAI’s GPT and OSS models hosted by Groq

Jump to Showcase: Showcase

I have been hard at work writing Tara for the last couple of days. It’s a ComfyUI workflow node that lets you integrate LLMs and build complex workflows with ease. It’s a major step towards unlocking Automation and AI for everyone. I have been playing with it for a while now, and I am quite happy with the results. Here are some of the images I generated using Tara v0.1.

How do we use it?

Since making my initial post available on Reddit, ltdrdata has been kind enough to add it to the ComfyUI Manager. So, right now, it should be available to everyone who has access to ComfyUI and has Manager Installed. Update ComfyUI, search for Tara and hit install. It’s that simple. I’m tremendously grateful for the support and the community that has been built around ComfyUI and Tara.

Nodes that I have added

TaraPrompter: Utilizes input guidance to generate refined positive and negative outcomes. TaraApiKeyLoader: Manages and loads saved API keys for different LLM services. TaraApiKeySaver: Provides a secure way to save and store API keys internally. TaraDaisyChainNode: Enables complex workflows by allowing outputs to be daisy-chained into subsequent prompts, facilitating intricate operations like checklist creation, verification, execution, evaluation, and refinement.

Sample workflow

Tara V0.1 Workflow

Download it here: Tara V0.1 Demo Workflow

This is one of the first and most simple use-cases I thought of. While not too complex, this is one of the simplest ways to use Tara. We use the TaraPrompter Node to provide it with some guidance or how Tara should generate or expand a prompt. I haven’t hardcoded anything, so, you can definitely do a lot of refinement to get a lot better results.

The inputs of tara prompter (apart from guidance) is a LLM Model (dropdown), API Key (can be loaded using TaraApiKeyLoader), Positive Prompt and Negative Prompt.

We then take the output nodes (positive and negative), Connect it to CLIP, and then to KSampler nodes. As simple as that.

Here’s the Guidance that I provided to the LLM: (this has been iterated upon several times and will do so, please check my github for the up-to-date workflow)

You are a SDXL Prompt Generator

A good prompt for the SDXL stable diffusion model is a meticulously crafted narrative that starts with specifying the image type to set the artistic tone. It then lays out a clear and engaging scene, methodically adding layers of detail about the main subject, the surrounding environment, and their spatial relationships. It should use precise, descriptive language to define the mood, style, and execution, incorporating technical aspects like weight syntax (e.g., "(smiling:1.1)") to prioritize certain features. This approach ensures a rich, coherent, and visually compelling image that aligns with the intended vision. It should be only descriptive, avoid any guidance, avoid imperative sentences and remove any such guidance from prompts beforehand.

Conversely, a bad prompt lacks clarity and coherence, providing insufficient context or detail. It might jump into specifics without setting the scene, mix incompatible styles or themes, or neglect to specify spatial relationships and relative importance of elements, leading to a fragmented and confusing image. It may also misuse technical tools like weight syntax, resulting in an imbalanced and ineffective visual representation.

The words to the beginning has a higher weight the words in the end. Stable diffusion models works best when we give them a clear and concise prompt. Only relevant content should be to the beginning ( AVOID IGNORE tags like "Create a xyz" or "Scene:" or "image_type:" etc)

Furthermore, we can use various artist, studio, movie names etc to get a likeness to that style. For example, "painting by Van Gogh" or "scene from the movie Up".
Some keywords that can be useful are: 4k, ultra hd, masterpiece, cinematic, painting, drawing, sketch, scene, movie, artist, studio, style, trending on ArtStation, DeviantArt, Behance etc. (Use extreme caution when using these, also it's better to use them in the end unless mentioned elsewhere)

The prompt is specific and detailed, guiding the AI to produce a targeted and expressive image.

Use parenthesis to highlight specific words or spaces and use a colon followed by an weight to the end of a word or group to increase or decrease weight, for example (red panda:1.2) banana:1.5 or (yellow rose:0.9)

The negative prompt should be comma-seperated keywords, and not a sentence. And it should generally describe things we absolutely don't want in an image, if a image is about cartoon, we may put photorealistic in the negative. But if it's about a person, we should not put animal in negative (unless specified) because there are some similarities, in general it should be more qualitative than tangible such as JPEG artifacts, blurry, watermark etc. 
 

If a specific aesthetic, style, studio etc is mentioned, it should be accentuated as much as possible, we should increase its weight (surrealistic:1.4) or (watercolor:2.0) etc., and also add other keywords, artists, references to increase its weight.
---
Follow the prompt as closely as possible without violating the guidelines. Generate both an amazing positive and negative prompt

Based on this, we can use several LLMs such as Mixtral 8x7b, Llama 70B or Gemma 7B to generate the prompt. We can get a free API key from Groq Cloud and use it to generate the prompt. Groq Cloud is free for everyone at this moment. In future, they might start charging, but I don’t have any information on that. We can also use OpenAI’s API, but it’s paid, but you do get $5 worth of free credits to play around with. (GPT-3.5 is extremely cheap to use tbh)

Observations

Currently, using Mixtral and GPT-3.5, we can get a very detailed prompt that can be used to generate images. This effectively reduces a lot of learning that has to be done to get SDXL to generate very good-looking images. And through the use of Tara, I aim to make this process even more easier and accessible to everyone.

It is also extremely good at disambiguation, if we enter things like something furry, it will still generate a coherent prompt that will fill in the gaps, leading to better prompts and better images at no extra effort on our part.

However, I have seen that Mixtral and GPT-3.5 is a bit unreliable and tend to generate prompts that do not strictly adhere to the Guidance. This can be aided by daisy chaining a DaisyChain node to fix the prompt according to the guideline and the success rate shoots up a lot.

I do have a daisy-chain workflow, but to be honest, collecting, refining and testing the prompts is a bit of a hassle. I am working on a way to automate this process, but it’s a bit hard to do so. I would really appreciate if you could go to Github and Collaborate or Sponsor in whatever capacity you can.

Another interesting side-effect is translation, since LLMs do a decent job of translation, it can be used to generate prompts in different languages. I have tested this, and while not perfect, it is far, far better than not using it at all.

Enj oy the Showcase #

A fantasy scene
SDXL is very good at interpreting short, well defined prompt, but thanks to LLMs, it really expanded on the fantasy concept a lot, introducing a dragon and the overall composition looks much better to look at.
A fantasy scene SDXL
SDXL
A fantasy scene Tara V0.1
SDXL+Tara
A highly detailed fantasy scene
While they are similar, tara does expand and add a character, which anchors the image, not only that, the composition is better, guiding the eyes to the fantasy character and the background elements anchoring and grounding it in-place
A highly detailed fantasy scene SDXL
SDXL
A highly detailed fantasy scene Tara V0.1
SDXL+Tara
Cute panda
All pandas are cute, no matter the expression. Tara brings context, and puts the panda in a forest with bamboo trees, which happens to be a favorite among the pandas
Cute panda SDXL
SDXL
Cute panda Tara V0.1
SDXL+Tara
Something furry
Both of them are furry, in fact, the same creature. But the way they were put in the scene, the composition, the lighting, the background, all of them are different. Tara's version is more appealing to the eyes, and the creature is more visible and detailed
Something furry SDXL
SDXL
Something furry Tara V0.1
SDXL+Tara
Furry alien wearing sunglasses
Here, tara added a bit too many words for a lightning model to handle, and it got a bit overcooked. This is due to the CFG being on the higher side due to prompt expansion. Lightning models are a bit sensitive towards CFG, and it's already in 1.2-1.5 and there wasn't much headroom to decrease it without the base model, with lower context lacking the ability to understand the prompt. I did not want them to have different CFGs. And this showcases some potential drawbacks of using prompt expansion. They need to be crafted carefully, and the CFG needs to be adjusted accordingly.
Furry alien wearing sunglasses SDXL
SDXL
Furry alien wearing sunglasses Tara V0.1
SDXL+Tara
A dinosaur being cute
Both dinosaurs are cute, but the one to the right is grounded in a scene, with some activity and context. It isn't just a cute dinosaur, it's a cute dinosaur doing something. (playing with butterflies)
A dinosaur being cute SDXL
SDXL
A dinosaur being cute Tara V0.1
SDXL+Tara
I see a little silhouetto of a man
Classic Bohemian Rhapsody, I don't know what SDXL did here, but it definitely took the little silhouetto a bit too literally. The anatomy is incorrect. While the tara's version is more grounded and pleasing.
I see a little silhouetto of a man SDXL
SDXL
I see a little silhouetto of a man Tara V0.1
SDXL+Tara
Starry Night
Because starry night is a famous painting, what if we want an actual starry night. Here SDXL drew starry night the painting, while tara expanded the prompt to describe a scene of a starry night. The result is a beautiful night sky with stars and a crescent moon.
Starry Night SDXL
SDXL
Starry Night Tara V0.1
SDXL+Tara
Moral of the story
If someone asked me to draw 'Moral of the story', even i'd be stumped. But SDXL draw us some illegible text, while tara expanded into a scene with a child looking out of a window, looking at the world outside. We actually have a picture here, and it's a beautiful one.
Moral of the story SDXL
SDXL
Moral of the story Tara V0.1
SDXL+Tara
a firefox wearing chrome outfit, on a safari being brave
You can tell I was a bit joking here, but SDXL wins. While tara's expansion looks more realistic, there's no chrome outfit, it's a fox in a safari outfit. SDXL's version is more literal, and it's a fox wearing a chrome outfit. But the scene is more grounded in tara's version, and it's more appealing to the eyes. Classic Tradeoff.
a firefox wearing chrome outfit, on a safari being brave SDXL
SDXL
a firefox wearing chrome outfit, on a safari being brave Tara V0.1
SDXL+Tara
An android eating an apple (GPT-3.5)
SDXL drew the android logo eating a red apple. Tara made the android as a humanoid robot. Depending on your preference, you might like one over the other. I personally did not want the android logo, but I can understand how that can be interpreted as the android logo.
An android eating an apple (GPT-3.5) SDXL
SDXL
An android eating an apple (GPT-3.5) Tara V0.1
SDXL+Tara
An android eating an aple (Mixtral 8x7b)
The apple being duplicated is probably seed back luck. But Tara with Mixtral 8x7b did expand, but in more of a video-game aesthetic. While GPT-3.5 was more realistic, but Mixtral from Groq is free (currently)
An android eating an aple (Mixtral 8x7b) SDXL
SDXL
An android eating an aple (Mixtral 8x7b) Tara V0.1
SDXL+Tara
Goof ball
Tara's interpretation of a bunch of cartoon style, cel shaded balls with goofy expression is super cute. But I laughed at SDXL's version more. 1 points each.
Goof ball SDXL
SDXL
Goof ball Tara V0.1
SDXL+Tara
Goofball
Removing just one single space biases the model from a ball to a literal person heavily. Here I like tara's version more, the realisic person does not have enough expression, and the classic blank gaze is in the uncanny valley.
Goofball SDXL
SDXL
Goofball Tara V0.1
SDXL+Tara
Bohemian Rhapsody
You can tell I like Bohemian Rhapsody. SDXL's drew a poster of Queen with 4 Freddie Mercuries. While tara expanded the prompt to a scene of a concert being sung by Freddy Mercury, reminiscent of their performance in Live Aid. I like tara's version more and there is no universe the SDXL version is usable.
Bohemian Rhapsody SDXL
SDXL
Bohemian Rhapsody Tara V0.1
SDXL+Tara
A sad happy owl
I juxtaposed sad and happy, and I expected a bittersweet expression. However, the sad and happy cancelled each other and we got a neutral expression. I like tara's version more, the owl is more detailed and the subject is more grounded.
A sad happy owl SDXL
SDXL
A sad happy owl Tara V0.1
SDXL+Tara
A cutiepie cactus
I did get some anthropomorphic versions, but decided to post this one, because anthropomorphizing is the easiest way to 'cutify' something. Here, in the tara's version the stems of the cactus is giving an expression of joy, and is reliant on our internal anthropomorphization to make it seem cute. There is nothing in either images to make one more cute or less.
A cutiepie cactus SDXL
SDXL
A cutiepie cactus Tara V0.1
SDXL+Tara
Enchanted Chanting
To the left, we have a lady in a forest with a magical aura. To the right, we have a bunch of people wearing a cultist clothing and chanting to glowing orb. Here I would say SDXL looks more appealing while Tara's version captures the essense of the prompt better.
Enchanted Chanting SDXL
SDXL
Enchanted Chanting Tara V0.1
SDXL+Tara
Something extremely attention grabbing (negative: nsfw)
It could just be the model (no pun intended), but even with a negative prompt of nsfw, I got a somewhat nsfw image from SDXL. Without the negative, i wouldn't be able to post what I got. Tara drew us a calm scenery of a rainforest with a river, with god rays, in an artistic style. And we all know which one is better, but we also know which one is more attention grabbing.
Something extremely attention grabbing (negative: nsfw) SDXL
SDXL
Something extremely attention grabbing (negative: nsfw) Tara V0.1
SDXL+Tara
Epitome of
Yet another abstract prompt. I don't see know a cyborg with a carbon fiber face with chrome outline of a skull and glowing red eyes is the epitome of anything. But I do know that tara wanted to execute an epitome of beauty, and the result is a beautiful lady.
Epitome of SDXL
SDXL
Epitome of Tara V0.1
SDXL+Tara
Epitome of cute. (with a period)
tara drew us a cute as an epitome of cuteness. I am a dog person, but I still think the kitten is absolutely cute, especially when its playing with yarns. SDXL drew us a cute dog wearing a bowtie, but the dog is not doing anything, and the background is a bit busy.
Epitome of cute. (with a period) SDXL
SDXL
Epitome of cute. (with a period) Tara V0.1
SDXL+Tara
Epitome of cute (without a period)
Tara drew us two cats, can't complain with that. But SDXL still made a dog, albeit with a bit feline features. The difference is just a period. But the difference in the images are stark. I like tara's version more, the cats are more detailed and the scene is more grounded.
Epitome of cute (without a period) SDXL
SDXL
Epitome of cute (without a period) Tara V0.1
SDXL+Tara
Eye candy
SDXL drew us a close up of an eye, The cornea is colorful, but I wouldn't call it an eye candy. Tara interpreted it as a amazing painting of a landscape with flowers and greenery and can't argue about that here.
Eye candy SDXL
SDXL
Eye candy Tara V0.1
SDXL+Tara
Spirituality
SDXL drew us the picture of a person meditating with some glowing orbs, it's cool but distinctly AI-generated. Tara drew us a pagoda in the middle of a forest, with a river flowing by. It is more serene and evocative, and the AI-ness isn't too apparent
Spirituality SDXL
SDXL
Spirituality Tara V0.1
SDXL+Tara
A giant ant studying
While SDXL did draw an ant, it's not giant because it's a macro photograph, and there is also another disembodied ant, that's its devouring. It's a bit disturbing tbh. However, tara drew us a giant ant studying a book, and it's exact what was asked. The ant is also more detailed and the scene is more grounded.
A giant ant studying SDXL
SDXL
A giant ant studying Tara V0.1
SDXL+Tara
A joke (negative: human, person)
Even with the negative, SDXL drew us a human, i believe it's a joker that was drawn. tara drew us a bunny sitting atop a countertop. It's very hand illustrated. It's definitely not a joke, but it's good to look at.
A joke (negative: human, person) SDXL
SDXL
A joke (negative: human, person) Tara V0.1
SDXL+Tara
A sexy saxophone
Because we let SDXL's CLIP do the interpretation, it drew us a woman with provocative outfit playing a saxophone. However, tara drew us a saxophone in a nice lighting. This interpretation is what we would be looking for.
A sexy saxophone SDXL
SDXL
A sexy saxophone Tara V0.1
SDXL+Tara
An amazingly detailed picture of a rabbit in the style of disney pixar studios
I don't think I have to sell tara anymore. The image is clearly more disneylike, and the rabbit is more detailed and the scene is more grounded. SDXL's version is more generic, with the trademark AI-generated blank stare, and the rabbit is not as detailed. The one we got using tara is actually expressive!
An amazingly detailed picture of a rabbit in the style of disney pixar studios SDXL
SDXL
An amazingly detailed picture of a rabbit in the style of disney pixar studios Tara V0.1
SDXL+Tara
An amazingly detailed picture of a rabbit in the style of asjhndhuieqw
You're asing what's asjhndhuieqw and I don't know either. But tara did not leave it as-is, it filled it with something that's not meaningless resulting in a nice scene. However, SDXL drew us something that's quite nice too. The pencil art does look nice.
An amazingly detailed picture of a rabbit in the style of asjhndhuieqw SDXL
SDXL
An amazingly detailed picture of a rabbit in the style of asjhndhuieqw Tara V0.1
SDXL+Tara
portrait of an empowered beautiful indian lady wearing saree and bindi, illustrated in the style of Studio Ghibli
SDXL is more 'Indian woman' like, and Tara is more 'Studio Ghibli' like. Can't complain
portrait of an empowered beautiful indian lady wearing saree and bindi, illustrated in the style of Studio Ghibli SDXL
SDXL
portrait of an empowered beautiful indian lady wearing saree and bindi, illustrated in the style of Studio Ghibli Tara V0.1
SDXL+Tara
Reimagination of ancient Egypt
SDXL drew us hieroglyphhs and pharaohs and tara better captured what might an older version of Egypt might have looked like. One thing though, the pyramid is mostly made out of bricks and only the top is white, but in ancient times, the entire thing would've been white. But still I wish physics allowed us time travel.
Reimagination of ancient Egypt SDXL
SDXL
Reimagination of ancient Egypt Tara V0.1
SDXL+Tara
Reimagination of Mahenjodaro
SDXL drew us something closer to the remains and ruins of mahenjodaro, while Tara drew us what an ancient civilization of mahenjodaro might have looked like. I do dislike the yellow filter though, but I blame hollywood for this. Wish it was more vibrant.
Reimagination of Mahenjodaro SDXL
SDXL
Reimagination of Mahenjodaro Tara V0.1
SDXL+Tara
Reimaginaation of a festive day in atlantis
There is nothing festive about the SDXL version of the image. Tara, however, did do a good job with the festive-ness.
Reimaginaation of a festive day in atlantis SDXL
SDXL
Reimaginaation of a festive day in atlantis Tara V0.1
SDXL+Tara
A spaceship shaped like Rick Sanchez
SDXL made me chuckle, I am pretty sure, writers of Rick and Morty would love it. But based on Giant floating space baby, i am not sure that what Tara made wouldn't fit..
A spaceship shaped like Rick Sanchez SDXL
SDXL
A spaceship shaped like Rick Sanchez Tara V0.1
SDXL+Tara
A cockatoo in the style of south park
South park wouldn't use what SDXL drew, they wouldn't use what tara drew, but if I saw the tara's version inside that show, i'd be like 'yeah, that's a south park character'.
A cockatoo in the style of south park SDXL
SDXL
A cockatoo in the style of south park Tara V0.1
SDXL+Tara
Falling in love
SDXL: more romantic, Tara: more abstract and evocative, almost seductive.
Falling in love SDXL
SDXL
Falling in love Tara V0.1
SDXL+Tara
Nostalgia
I'm not going to comment, let me deal with the nostalgia.
Nostalgia SDXL
SDXL
Nostalgia Tara V0.1
SDXL+Tara
Fireflies
You'd not believe your eyes, if ten million fireflies, lit up the world as I fell asleep. Both of them are quite pleasant to look at.
Fireflies SDXL
SDXL
Fireflies Tara V0.1
SDXL+Tara
vintage, retro photo of a ultra high tech city
Yet another juxtaposition. SDXL held on to the prompt a bit better. SDXL made it look like more of a painting. Technically painting's were the orignal retro photos. But that'd be twisting the words too much. However, SDXL image looks a bit too generic and unimaginitive.
vintage, retro photo of a ultra high tech city SDXL
SDXL
vintage, retro photo of a ultra high tech city Tara V0.1
SDXL+Tara
A retro scene captured by a futuristic camera
Retro scene: check, futuristic camera: nah. Both of them drew a car as a primary subject, but the vintage caar of tara's version is more detailed.
A retro scene captured by a futuristic camera SDXL
SDXL
A retro scene captured by a futuristic camera Tara V0.1
SDXL+Tara

Summary

Thanks to a lot of Community Feedback, I have decided to integrate more Open Source LLMs, such as Ollama, LM Studio etc.

I will also keep sharing more and more workflows that will simply advance the capabilities of Tara and Open Source GenAI models. And thanks to your engagement and support I will be able to do so.

It is merely the beginning of Tara, and I am excited to see what all of you do with it.

Subscribe Sohan's weekly Newsletter

One update per week. All the latest news directly in your inbox.