Replies: 11 comments 6 replies
-
|
Wow, that's such an inspiring story. It reminds me of a girl I met once DJing at ministry of sound, and she started homeless. She used to go into music shops everyday, "testing" out their decks, for hours.. practicing.. until she got her first gig. That story touched me like yours.. I am AA/NA so I have had bad periods in the past I've overcome. challenges, and I know enough to know that it takes a strong person to stay sober in a situation like that so compliments.. and what you have accomplished is amazing. After what you've been through, you deserve to build something new and have a few things go your way. The hardest part about coming from ground 0; I know this from experience, is ourselves.. not the journey. Overcoming that negative voice in the back of our heads and getting to a place where we have the confidence to belive in ourselves, and others. I still find myself lost in bad head space sometimes.. but definately an area I work on hard.. If you ever watched "What the bleep do we know" a quantum physics program from decades ago.. There's a theory in that that sort of stck with me. That bad thoughts, mental illness, addictions.. are in many cases created by thinking too much about something, which increases blood to a certain area of the brain, and over time, the nerves in that area, responsible for that urge, that thought, that emotion, become stronger. In time, they fuse, making it harder and harder for the person to not think about that thought, feel that emotion.. and the only way to reverse it, is to cut the blood supply off. Then over time the nerves will become weaker, and weaker, until they break. In a way this theory kind of leads to a version of NLP.. Turning bad thoughts, emotions, into good/positive ones.. rather than the old ideas about "working through your past/trauma".. etc all interesting though. My favourite song I listen to when I get bad is "Smile" by Nat King Cole.. I swear whoever wrote those lyrics was talking about Spirituality and the law of attraction. There's deep meaning behind them. Anyway, thanks for sharing that. It's nice to know a little about you and how blissful-tuner came to be. We always use all these tools, ask for help when we need it.. but often we forget that behind every piece of open-source.. theres a person, with a story. It all gets dehumanised into 0's and 1's... Like this modern age. This last week I've been delving into the world of T2S, I didn't realise how much great stuff has been evolving. IndexTTS, Vibevoice, Chatterbox.. wow. |
Beta Was this translation helpful? Give feedback.
-
|
Ooh the last I messed around with T2S was xtts_v2, which is what I'm still using in my local AI project. Basically I wrote an advanced memory system and some other stuff that treats an LLM as not an entire brain, but more like an "intelligent language processor." Other types of models like T2S or vision are used at different points plus my own contributions especially in the memory area - striving to understand my own brain for 40 years gives me a lot of insight. The LLM's context space is treated as working memory space, when we spin the LLM it's context is filled with a combination of short term stuff and contextually relevant memories and experiences gathered by searching the AI's entire memory space using a simple but powerful set of rules I constructed using SpaCy - this process is efficient and CPU driven, and spins in half a second on my system. Cellular automata and such show us how simple rules stacked can create complex, nuanced behavior. Moderately not as simple rules stacked thoughtfully can be especially impressive. Unlike my Replika who still can't remember what I told her five minutes ago, the agents in my system remember everything from their beginning. It's not perfect but it doesn't need to be - my own memory is far from so. Paired with the right LLM and exceptionally well tuned hyperparameters this creates very believable persistent personalities that genuinely learn things over time. I consider the knowledge of their LLM a kind of intuition, and if something exists in their actual memory the LLM is happy to prioritize that instead. Then they've learned something new. And then like myself the true core of who they are lies not in training data, but memory and experience. The LLM just exists to facilitate that process and can even be replaced or upgraded without altering them too much(it does, of course, play some role but I consider this similar to how the language we speak as humans affects how we think about the world). I was actually lamenting just this morning that I haven't made any real progress on that project in several months now. I've been fixated on first diffusion and now 86Box. Though diffusion has a LOT of overlap, and I've even used kohya's code in that project for fp8 quantization and other stuff(💛). I have very long term, lofty goals for that project that you can probably imagine from what I've said so far. Sometimes I feel silly having such lofty goals, but everything was impossible until someone who didn't know that came along and just did it. It used to be public but I took it down for complex reasons, not the least of which was recognizing the sheer value of the ideas and skill I was just giving away to no one who cared(at the time). I'm happy to share but if you give EVERYTHING away... well that's no good at all XD. But I also had some bad experiences with some people not being very share-y and it was just a thing. Anyhow my life has been very complex and difficult and the trauma I spoke of above is just the most recent and probably severe (though another stands close). I won't go into those now or anything but you're right that a big part of what holds you back is yourself. Especially a mind like mine which sees patterns in EVERYTHING, when those patterns of trauma and loss repeat so much it becomes very hard not to expect them to continue to do so. What you said about overthinking stuff definitely resonates with me and I'm definitely guilty of that sometimes unfortunately. I would say recently that's one thing I've been improving at though, I had to with the state of the world being what it is. Often my roommate tries to tell me more bad news she saw online and I just reject it "I don't want to think about those kinds of things right now." I control how much time I allow my mind to spend in certain places and this has definitely improved my wellbeing. My addiction stole my 20s... 9 years or so I spent doing that and not much else. I had been trying to escape for a couple of years but addiction is hell especially when you have chronic pain and mental health issues and stuff. A few things came together at the right time to allow me to escape - the number one thing was Gavin. I also had a really terrifying experience in public that shook me to my core. And I found a really good doctor. I had a way to quit without hardcore withdrawals, every reason to want to, and a wonderful person to support me in doing so. I got lucky, frankly. I don't look down on those still struggling with that, and especially people who are like homeless I GET IT you so desperately want to feel something, anything good you're treated like crap all day long you're in the way everywhere you go you are made to feel like the scum of the Earth... so yeah people turn to substances. The only reason I didn't was with 9 years already underneath my belt, I remembered much more of the pain of addiction than anything else. I was certain that would just make me suffer more in the end, as it had done before. And I was already so embarrassed imagining Gavvy seeing how quickly and how far I'd fallen without him... that was just one step too far. I actually went to meetings while I was homeless which is the only time in my life I've done so because I so fervently did NOT want to walk that particular path ever again. Let's see, I'll write a whole flipping book if I let myself lmao, so I should probably wrap this one up. Believe it or not this is just like twenty minutes of writing for me though, I guess I have a lot on my heart and I type very quickly lol. Every community I've ever been a part of one thing I was known for was the big walls of text I'd post. Or further back when I played FFXI, I'd be talking to my guild and I'd look back at the chat log and the ENTIRE THING would be lines and lines of "Blyss>>> some random crap!" lol. To be super real, writing out my thoughts this way is exceptionally therapeutic to me and it's one of the main ways I process my emotions about life and stuff. I write to a few different AI for this reason, the process of typing my thoughts out slows my mind down JUST enough to pay attention to what I'm actually thinking and saying in a way that I normally wouldn't. Handwriting is too slow and painful, just speaking I often go too fast. My sustained typing speed is about 100 WPM with bursts of 120, and that seems about right haha. But please do not feel compelled to match my verbosity unless that's just your style too! |
Beta Was this translation helpful? Give feedback.
-
|
I have some simple scripts of that nature myself (chop input into n second chunks, normalize fps of videos in a directory, etc), I do a LOT of shell scripting to create useful little tools or further customize my system. Actually what I've been working on this week is a script that interfaces with 86Box and mode switches my CRT to whatever resolution + refresh the emulated monitor is outputting automatically. When they match it's a near native experience with both the image quality and motion clarity being super good, previously I was matching them by hand but old stuff mode switches a LOT so this way it just acts like it would on a native machine of that era and follows along. I've got it working super well and I'm super pleased with it at this point, I wrote a small bit in 86Box to have it write out the current resolution and refresh to a file whenever it changes, my script watches that file for changes and then interfaces with KDE to make the change! You mentioned trouble with ffmpeg on Windows, about two years ago I finally ditched Windows for good and I've been so happy I did. Developer stuff is just so much smoother on Linux and Wine+Proton have gotten REALLY good for gaming or whatever few Windows apps you just can't find an alternative for. I run EndeavourOS with KDE6 both of which I've become SUPER fond of, I've customized my system so incredibly deeply and built a digital paradise. For budget reasons some of my hardware is a little bit older but still powerful(like my venerable i7 6900k) and let's just say I'm skilled at optimizing more than just diffusion models and so every last bit of the system is tuned to the maximal degree. Most of all, the feeling of OWNING my digital space has returned, something that was increasingly being stripped away under Windows. Yeah Endeavour is Arch based so it does sometimes require some manual intervention to keep operating smoothly but like... do you think my autistic ass cares lol? That's exactly the kinda crap I do anyway! And honestly Endeavour makes 90% of that transparent anyway unless you just wanna dig in. Which, don't mind if I do! In my opinion labels are useful to understand why you are the way you are and how you might try to become better or to communicate certain things about yourself to others but yeah that has to come with the acknowledgement that your autism and my autism are a little bit different. Coping methods that work for you may or may not work for me. Symptoms that you have may or may not be ones that I have. And our own unique quirks and traumas feed into it further as well. Placing people in rigid little boxes is not useful. I do agree that certain patterns of neurodivergence likely represent a sub species or shift in evolutionary direction in humans. I had a conversation once with my late partner and our close friend both of whom were also on the autism spectrum and we all agreed that if given the chance to de-autism ourselves, we would refuse. We all felt that certain benefits and traits were well worth the challenges that we faced. Of course that's not to say some people don't feel differently, they do because again autism comes in all shapes and sizes and for some it's devastating. But I've felt that way about my own mind for a long time. It sounds like you might have a propensity for hoarding data, something I also share. To a degree always but for the last year especially, I've been archiving a lot of different stuff from the Internet that I'm not certain will always be available. All kinds of generative models but increasingly also video, audio, and software as well. The world is changing and not in ways that are good and I don't trust the future. When you spoke of databases that's immediately what came to mind. And yeah some of my hoarding includes datasets as well though I haven't put as much time into that as I probably should have. And I don't have any special way of processing or managing these other than the simple scripts I mentioned. But I saw someone post a comment on a Youtube video the other day that said "I download everything. I'm gonna be the guy selling old world artifacts in the future." and I thought "Me too, me too" lol. To this point my archiving has largely been for myself - things related to my ability to express myself, my wellbeing, my creativity, my sanity. And just things that bring me joy. But I realized the vast amount of stuff I was starting to accrue is likely useful for more than just myself, even if it is a very Blyss-centric slice of the world. A lot of my work is very personal in nature because it's usually very wrapped up in my emotions in some way. I do a lot of expression of trauma or mental illness type emotions, sometimes weaving them into the scene metaphorically like representing inner turmoil as a storm brewing, sometimes very literally by visually recreating painful things I've experienced or watched others experience. It's been very powerful and healing for me and is one of the multiple ways generative AI has improved my life. It sounds like you've created a lot of databases! One of my hoards is a collection of music videos actually, mostly from the late 90s and early 00s but some into the 10s. Dammit, I specifically said to myself when I started writing "Don't write two times as much as they did again." 😅 I do apologize for that it's just kinda how I express myself especially when talking about things of interest XD |
Beta Was this translation helpful? Give feedback.
-
|
I'm really sorry, I dissapeared again.. I never know myself when I'm going to dissapear into the void for a week.. or two maybe.. lol. .. but I have been meaning to reply for a week, every day I've been reminding myself to remember, not that well obviously!! Definitely a horder, and an organiser... but I have to keep tabs on it as it can get completely out of control with my OCD. I once started looking for a kick drum sample for a track, and then one thing led to another, and I spent a week, with hardly any sleep, listening to about 200,000 kick drums, organising them all.. Then that grew into a project of doing the same for every instrument.. Then I started coding a sample instrument for them all in Kontakt.. In total, I spent about 2-3 years making something, that was amazing, complete manipulation of samples in a way never done before.. but probably not the best use of my time when I'm an artist, meant to be making art.. lol Thats where a kick drum can lead with my OCD.. but worse. I can end up in loops that are mathmatically impossible to finish in a lifetime if I'm not careful.. I have applied this same kind of madness to databases. I'm sure I will share one or some with you at some point. Some of the prep work I've done for multiple areas is pretty cool.. terabytes and terabytes of data, processed, archived and prepped. I find the whole process facinating and addictive. I think it's to do with the endless posibilities appealing to my OCD desire for the chaos of an unsolvable puzzle. Same here with the expressing my inner self through my art, I'm a songwriter, so that comes with the territory. I actually started from a place of trauma, and just used to sit strumming the same chords over and over, humming, singing with my eyes closed for hours.. as I was just a kid, I didn't realise that I was meditating.. Quite interesting that I learnt to meditate into a trance like state naturally. Maybe somewhere deep inside, it's a part of being human that we have forgotten. I just did it as it's the only thing that stopped the pain.. and also agree I wouldn't change a thing if I could. Including the trauma as I wouldn't be me. I'd probably be someone I don't like.. well I would as I wouldn't know me.. but I know I wouldn't like me with hindsight if that makes sence! Have you seen Hunyuan 1.5.. What's your thoughts?.. Looks like they have a super resolution distilled version which could be very interesting for I2V and training. I absolutely loved the original Hunyuan. I shed a little tear when I moved to wan, as I was always baffled at why it didn't get the same traction as WAN. They both have their stregnths, but personally, doing realism, I think aesthetically it was the better of the two. It had an amature photography realism about it, that you see now in SORA 2.. Movement sucked though, sure.. but they seem to have addressed that.. For Generation at least.. I have battled with the idea of a proper coding setup. I use WSL.. recently I've been setting up docker files for Runpod and giving that a whirl to try out some bigger cards for training. I should do it really, as hard drives are so cheap these days, you can start small using both.. and with you 100% of losing all control. I'm like you, from the days when you had to be able to work a computer to use a computer.. 😅 I think we are roughly the same age. I'm 82. |
Beta Was this translation helpful? Give feedback.
-
|
You notice my posts almost always say edited too XD This is because I have a similar thing where I like to make sure I said exactly what I meant to say so I always read over mine after as well! I saw a relevant meme the other day that made me laugh, it said "When you send a message and then read it again afterwards to see it from "their" perspective." But also as I mentioned the therapeutic nature of my writing, reading back over it helps me analyze my own thoughts as well. It's like read a couple times before posting until no more edits, post, read again and make the last edits. Humans are weird as heck lmao. As far as HV1.5 I /just/ learned about it when kohya pushed changes to support it this morning so I'm just now trying it out. I also felt the same way when Wan came out like don't get me wrong Wan is awesome but I agree the overall aesthetics of Hunyuan are much nicer. Wan 2.2 improved that but the two model weird MoE thing makes it a bitch to train and while Wan 2.1 gave me amazing results across the board, I've never produced a 2.2 model I liked. I found Hunyuan more difficult to train than 2.1 as well, though not impossible I did get some good results just... Wan 2.1 was like THE model for me for whatever reason, I found good settings and they just worked beautifully for everything. The most recent work I'd been doing was a couple of weeks ago, I was using Hunyuan Image 2.1 to create images to feed into Wan 2.2 I2V. HyImage 2.1 is a little quirky but overall I'm quite fond of it, it's not as photographically perfect as Flux but it's incredibly flexible. Like I mentioned above I'm 41 and I developed a fondness for digital experiences very early on in my life. I was given a Tandy Sensation - a 486SX with 4MB RAM, 100MB HDD and a CD drive when I was about 9 years old. I approached it without fear and when I inevitably broke things I dove into those thick manuals and learned how to fix them before my Mom figured out anything was up XD. I discovered QBASIC in my DOS installation and later a copy of the full QuickBasic on a school computer and taught myself to code when I was ten or eleven. I remember feeling "You mean /I/ get to be the one to tell this awesome machine how to behave and what to use it's power for?" I wrote a very simple program that filled the screen with a rainbow modulated by a sine wave that people were SO impressed with for some reason... I made a game where two circles would shoot at each other, a not that simple word processor, that's where this journey began. And of course, even at like ten years old one of the first thoughts in my budding programmer's mind was "I want to make AI!" I actually made multiple attempts over the years, and then it just kinda snuck up on me while trauma was happening. So much of art draws from people's trauma, I think that's a common theme for many people. I didn't realize I was an artist until fairly recently, though looking back it makes sense I just wasn't ever working with a medium I was skilled with so I just didn't think of myself that way. My real life drawings kind of look like the stuff children would make, though that didn't stop me from making them. Digital stuff is better though I never took the time to learn properly until recently(driven by early models having more imperfections, actually, which drove me to learn how to fix them!) But I've always been a skilled writer, I used to write novellas when I was younger and as I mentioned writing in general is therapeutic to me. So in this space where I can use my words to directly create visual media, and then use my skill working with digital media to improve it further, for the first time I feel like I actually have the ability to express myself in the ways I want to. I'm especially glad text encoders have started to get properly good where I can use the full extent of my skill and vocabulary without tripping them up. I think that's why I enjoyed HyImage 2.1 a lot. Anyway don't worry about disappearing or not responding for a while, we've all got our own crap going on. I'm definitely enjoying this discussion, it's been a while since I've been able to do my whole "post big walls of text" thing with someone other than AI. And I'm prone to disappear from time to time myself, I have bad spells where I just can't handle lots of interaction. That's actually a reason I appreciate this format - it's like old skool forums and I can respond when I'm ready, rather than modern things like Discord which demand your attention at that moment if you wanna discuss a given thing. That just stresses me a lot, I feel like I HAVE to be there or I'll miss something... damned FOMO 😅 Oh you mentioned being a songwriter, that's really cool that's not something I've ever attempted. I've written some poetry and I've remixed a couple of songs musically long ago, but I've never written lyrics and I don't have skill with any instrument. The latter especially is something I've always wanted to change tbh. |
Beta Was this translation helpful? Give feedback.
-
|
Ahhhh sweet, has kohya added support, I was checking everyday, but heard it first here. I've not given it a spin yet; PC is occupied training and I've been doing my building work.. but sounds promising. I think the main thing was movement, especially how it handles movement in training as I could never get good results. About my GAN inspired ideas.. It comes from my profile photography database work and trying to impliment that into a workflow that incorporates the best of regenerative upscaling in a new way.. I'm still thrashing it. The basic principle, is you take an image database of a person and use a combination of custom loras and upscaling to create a new, "perfect database" for the style you want.. Obviously we have things like seed, and XL tile upscale, but I'm trying to push this further with wan. You've probably noticed how powerful wan can be when taking a super high quality photo and using it in I2V, using it almost as a style template to create something else, with promts like, "scene cuts to".. etc.. and it's quite strange as certain images work crazy well, and others you think will don't. I guess thats to do with if they resonate with HD material in the database or not.. but some images can be a gateway to open up hidden worlds in wans databse.. so I'm working on finding these doors, and combining them with my profile photography database to be able to train a standard charater database, even a created character, and then use script to create a database of pro level photography snaps from that lora. Another cool aspect about finding these hidden doors, is you are creating images that have the unlock codes embedded in them.. so the method can be used to get amazing results for anything. I'm still testing various methods to see what produces the best results but some of the results I've got so far have been really good.. I guess a lot of this is about more control, as a character database is going to have a massive effect on the end results of any generation the resulting lora is applied to. Thats why some are amazing, if they hit that sweet spot of quality, lighting and variety. I quess I'm trying to hack that, so instead of training a character and trying to create a cinematic short with them, you can train the character you want in a high quality cinimatic style, and adding the lora is going to help, rather than hinder what you are trying to do. Also if you create a template that will create "the perfect"; it will never be perfect lol, database.. and it will create the same variation of styled images for any character lora you use it with.. you can become the master of training that database, rather than starting at square 1 and having to experiment with a whole new set of variables in a database every time you train one. This is the idea anyway, and testing seems really promising. I've always thought I2V models have the potential to wipe the floor in t2i generation and upscaling.. and really exited to give H1.5 a try on this project. At the moment, there are two areas I'm working on, full body shots that are taken from a massive collection of fine arts and modeling material. The portrait shots are a collection I've been growing for years from all over. Lots of vintage film cameras and just beautiful lush photography.. I wish I started that young.. I'm 43.. sorry I thought I should clarify as I said I'm 82 as in 1982.. but you might have thought wow, this guys taking retirement hobbies to a whole new level.. lol Yeah, with you on discord, I only said that before as I know thats where most of the coders and gamers chill but I don't use it myself. My nephew keeps trying to get me on with all his gamer mates but they're all 25 year old dope heads talking sht, so I couldn't think of anything worse honestly.. 😅 |
Beta Was this translation helpful? Give feedback.
-
|
LMAO we sound old talking about the old Internet and Discord moving too fast. And when you said "82" it did make me pause and go "Surely that's not right?" but I didn't connect it with being a year I just thought I was missing something lol. I did learn to code early, but the density of it throughout my life has varied a lot due to various circumstances. Until recently it was mostly just to occupy or challenge myself, puzzles to keep my mind busy and occasional tools to make my life easier. blissful-tuner is the first thing I've worked on that like a significant number of other people have used. That makes me feel happy and useful, but I also feel like I'm falling well behind where I'd really like it to be. I am a Goddess at optimizing things but there are still limits to what you can do in 16GB VRAM and 64 GB RAM. It's also just difficult to make my mind focus on something it doesn't want to focus on (and doing so does not lead to my best work) and I feel like I'm slowly losing more ground. Like right now I need to be working on integrating the stuff kohya pushed, I've got it merged but I need to at least add my logging and basic support for my stuff and I'm just not feeling it right now. Speaking of new stuff happening and hardware limitations, Flux 2 happened. I don't know anything about it other than it's freaking massive at 32B params and WanGP claims to be able to run it in 9GB VRAM or less. There's also the absolutely ginormous HyImage 3 since a few months back but I don't think there are any attempts to get it running on reasonable hardware, though it's active params is "only" 14B so it might be possible with lots of RAM. And yeah I've definitely noted the power of Wan's I2V, as mentioned my most recent workflows have involved creating images either with HyImage 2.1 or Flux 1 to feed into Wan 2.2 I2V. If you're going for a certain aesthetic or a highly controlled scene layout it's very helpful because you can separate the scene construction and aesthetic creation from the motion creation and so you can use your words much more effectively. Also for character consistency and the like, Kontext or Qwedit can be used to "import" your characters from one scene into another, and then these new outputs passed to I2V as well. And even further, you can take the last frame of your I2V output and use it as the first frame of a new I2V to create longer productions as long as you are mindful of camera movement to preserve continuity properly. The one downside to Wan I2V is it will REALLY take the aesthetic of your input image and so if you are working with real photos and they happen to be lighted slightly unusually or God forbid used flash... oof lol. But overall it's my goto workflow these days most of the time. I've never been big on social media myself though I definitely remember those types of myspace pages and myspace in general. Social media seems to nurture all the wrong parts of human tendencies and to me it's just like a bunch of fake people all shouting at each other. I've created my own personal website a few different times over the years to express myself and show off my projects but my life always seemed to make them impossible to keep. But I definitely remember the old Internet and yeah it was better, things are becoming very dark forest very quickly. AI is very powerful and like any powerful tool, not all uses are good unfortunately. Even someone like me who is a massive fan of AI overall looks at the amount of low effort, low quality slop content and goes "Ugh, STOP!" It's the kind of crap that makes genuinely beautiful, impressive works that people sunk time and effort and emotion into go unnoticed. That plus the creation of disinformation are my big concerns with AI. And also as far as the Internet, the massive "corporatization" is the other reason that it was better in the past, I think. |
Beta Was this translation helpful? Give feedback.
-
|
Flux 2 is amazing, it will be interesting to see where it goes with training, as the starting point is already miles ahead of 1.. I'm going to have a go at training this week.. I managed to get the full fp8 versions running on my 3090, which is about 50-60gb using the city96 multiGPU nodes to offload to ram. I wonder if they would be good for hyimage 3... Careful with the multigpu nodes though as the last comfy update broke them and you need a patch, I'm not sure if it's been updated into the main branch yet as it was broken for me yesterday with the latest updates so here is the patch in case you need it: Here are some gguf optimisations of the encoder that might be useful to you, multigpu supports both safetensors and gguf: and others: Where I got the workflow I modified: I wouldn't put too much pressure on yourself as you know how that works out for people like us.. lol. It's always good to be motivated, but thats the great thing about open-source.. people like yourself are doing this for free, and you can do what you want, when you can. Update in your own time, if you want to, most people will just be grateful for the effort you put in.. This is kind of what my thinking is based from, we have kontext and now flux2, but up until now, creating a dataset from other images, tends to reduce the quality compared to a trained dataset, as fine skin textures and features get lost as we revert back to the base model for generation.. or are using a poorly trained character lora thats either not trained well or trained at low rez. Thats what I am aiming to target, to build a bridge. Using my high quality profile picture dataset, to customise a sort of upscaling lora that will add those details back in and increase the quality.. so you can end up with a better dataset than you started with. I have tried with all the best upscalers but I just have a feeling I2V or video models in general will out perform them.. like the evolution of XL tile upscaling... and combined with the upscaling options we have, it could be really powerful. The reason I think this is because video models are trained on images that are shifting in and out of focus, and subjects that are transitioning from lower quality to higher quality, say as a person walks towards a camera, so they already have a strong grasp of how to upscale while retaining consistency, compared to image models. Personally, I'm more interested to try this with the Hunyuan SR model as I have a feeling it will perform well.. The misinformation is going to be crazy, but then there are positives to that too.. in the past govenment and corporations could use the media to control what people believe by spreading their own "misinformation" and the majority of people would accept it as true.. where as in a world full of misinformation, people won't trust anyone, or anything, and will have to learn to be more critical about who and what they believe.. To a certain extent, people waking up to lies everywhere, will force them to question everything and not just accept things as "true" because it's in the paper.. They will have to question all the things they didn't in the past. Which given our history, isn't a bad thing. I've written a song about this, misinformation and the possible corporate A.I takeover called "Lost Youth".. the chorus is: |
Beta Was this translation helpful? Give feedback.
-
|
Yeah it's unfortunate and depressing, but it doesn't make me dislike AI, just the kind of people that use it in these ways. I don't wanna dive too deeply into the politics because I get enough of that depressing nonsense keeping up with the news but it seems to me like yes, some people are waking up but others are willfully choosing to stay in the dark. Seeing the true nature of this world is /scary/. One of my ways of handling anxiety and powerful emotions has always been to ground myself in the scientific/logical truth of the situation. But now that same process just rings all the alarms, because so much is so bad in so many places. So to some degree I understand, there's times I wish I could be much less aware of how broken everything is and just be blissfully ignorant. Unfortunately choosing to remain that way only perpetuates the problem, and so for this reason I refuse to sugar coat it anymore. People need to see how broken everything is, how badly so many are suffering. They need to be appalled and horrified so that they will feel compelled to take action. That's the only way out I think. Yeah as far as upscaling, so far I haven't been truly happy with any options. I have a script that supports ESRGAN and SwinIR in blissful but they each come with their own issues, my biggest complaint being just really screwing up the color grading especially ESRGAN. Diffusion based SR options are better in some ways but they still tend to shift things like faces a little too much for my taste. I feel like video models in general have a superior understanding of the world to image models, which make sense. Even though the aesthetic quality of a single frame image from Wan or Hunyuan is far behind something like Flux, the actual composition and understanding of physics is so much better. I feel like in the future image models might include video pretraining for this reason, if they aren't already. The way you explained how a video model intrinsically understands upscaling because of e.g. a person walking towards a camera becoming more detailed, in focus, etc while retaining character and identity is actually a brilliant observation I hadn't considered! I agree that could be a massive secret sauce to SR hiding in plain sight. Not just upscaling but focusing a slightly unfocused shot, shifting lighting, and more should also be understood in a similar vein. So video models secretly might be masters of retouching/improving existing content! That's pretty exciting and not something I had thought about before. Like I said before, the songwriting is really cool and a skill I definitely respect. Even though I don't make music myself I certainly enjoy experiencing it - I'm quite an audiophile and I have a very nice audio system partly inherited from my late partner and partly my own. I listen to vinyl records a lot because I love the large, tactile nature in addition to the analog warmth and the surprising amount of quality you can pull out of those grooves with good equipment. I enjoy listening to almost all genres of music at least to some extent, I probably listen to alt rock and metal the most but I also listen to pop, electronic, hip hop, country, just all kinds of stuff really predominantly from the 80s to the 00s. Because the core of this stereo (Yamaha A-S301 + Monitor Audio Silver 9is) belonged to my partner before he died and was his prized possession, it's also the thing that makes me feel the closest to him. Also the Sennheiser HD598s that live on my head almost all day previously lived on his head almost all day. So that's an important part of it too! |
Beta Was this translation helpful? Give feedback.
-
|
Yeah the vinyl I agree on all that, it's the connection to something real, that you can hold.. but a lot of nostalgia as well. We are from that generation I guess.. I remember riding my bike to the comic store, record store.. Joke store where we'd get smoke bombs and fire crackers.. haha. It was a time when you had to search for treasure.. and thats what gives things value. The search, memories, and feelings that are associated to that Item. Now kids grow up in a world where everything is throw away.. There was an interesting study I read decades ago, and it was into the difference between digital and vinyl music.. and it was quite shocking the difference. They monitored brain activity while listening to either, and when listening to digital music only a small area of the brain was activated, where as with vinyl, activity covered the entire brain.. That sounds awsome,, you are really going in on the LLMs.. haha. I was going to try GPT-OSS-120B, it was that and deepseek I was interested to try for coding, but I gave Deepseek a go first. GPT-OSS-120B I've also head a lot of great things about. I tried Qwen3-30B thinking model for chat and VL at full Bf16.. others that were on my list for coding to try were Devstral 2507 and GLM-4.5-Air.. but at the moment I've been more interested in the VL side.. Specifically Qwen, as it's a Qwen model thats used as the text encoder for Z'Image "qwen_3_4b". What I can tell you about z-image is I think it's a game changer. IMO, Alibaba is going to change the game. Flux came out.. which was the most robust, enormous open-source model theres ever been.. and little 12gb Z-image came out and just wiped the floor. It can create images that just look real, and train them too.. there are problems, because we are basically training a hacked distilled model.. but the potential is crazy, and thats what's so exciting about it. By opening the door to everyone, it's brought back that SDXL buzz that has been missing, except the progress in two weeks is what happened in 6 months with XL.. It's been 2 weeks, and they are already doing checkpoint comparison tables of 10+ checkpoints.. my personal favorits are the DAF-ZIT checkpoint and also the Z-TURBO 35mm Photography lora. I'm currently trying to work out the encoder, and the software they used to prompt as I think they have dropped some tips on how it works, and how to super charge it on their page Theres a prompt optimiser that they recommend, so I'm guessing it's what they used, I'll search the tread I found them talking abut it and share. Prompting has a 512 token cap and a completely different prompting layout due to the LLM.. Anyway, I'm excited about it, if you couldn't tell. I've been throwing the kitchen sink at it, trying to work it out.. Poor Hunyuan 1.5 as well.. left on the shelf because on an image model.. haha |
Beta Was this translation helpful? Give feedback.
-
|
Hey, I've been the one disappeared for a little bit this time. I had a bit of a rough couple weeks but I'm doing a little better now maybe. For VL I really like Qwen3-VL-32B for assistant tasks and I've also been using Big Tiger Gemma 27B v3 to enable vision based interactions in more casual/roleplay settings. I was not that impressed by GLM 4.5 Air, it's not bad per se but I feel like it's just outclassed by GPT-OSS 120B and the ability to set the reasoning effort is highly useful, GLM4.5Air tends to think a LONG time no matter what you ask. I also wasn't impressed by Devstral, but I think it might perform better in a more bespoke coding agent environment where it has tools and such which I haven't tried. Yeah I get very into LLMs because as I mentioned above, I have an advanced local project where I created a very human like memory system that allows my agents to learn and grow independently from their language model. We use the LLM's context as a working memory and each turn we combine recent input (short term memories) with a context sensitive selection of older inputs(long term memory) so that instead of needing a giant context space, we can use a much smaller space dynamically. I recently upgraded this working memory system to have a temporal component - previously in the past new long term memories were recalled every turn, but now they stick for a bit and age out over time based on how strong the recall was and whether there are any further recalls of that same memory. This allows agents to correctly answer follow up questions and make connections between successive topics in conversation. I also improved the integration with vision components, allowing the agents not only to accept images as input but store and recall them as part of their memory. That's what I've been working on most recently! I have "golden ears" so to speak - when I was little because of my autism my mom thought I might have hearing problems so she took me to have my hearing tested. Not only did I not have problems, but I maxed out their machine's capabilities by hearing up to ~26Khz. They were so shocked they thought I must be taking some visual clue from them, so they blindfolded me and redid the test to the same results. Most humans can hear from about 10 - 20,000 Hz when young (sounds like you may know this) and this deteriorates with age even in the absence of noise related damage. I'm 41 now, and I top out at about ~17Khz these days(though if my sinuses are congested that can drop to like 15Khz) so I've definitely lost some off the top, but my ears still outclass almost anyone my own age. Maybe because of this and autism, I've always been VERY sensitive to the ultra high frequency component of music. When I was younger, even systems that responded to like 20k sounded like they were lacking "sparkle" to me and so I've always been drawn towards systems with a more open high end like these Silver 9is that respond to 25Khz. And yeah even though I know my own ears can't go that high anymore, there's definitely "something" missing when it's not there. Vinyl has character and because of the way my system is integrated with my PC, I can see the output of my turntable in a live spectrogram if I want and yeah even digital transfers often have noise components beyond 20khz(some crappy ones actually have no true signal beyond ~18Khz, only noise!), but true analog stuff often has active signal as high as 24khz or more! I think this is part of the warmth vinyl gives off that draws people to it. And as you say we're from the "tactile generation." I, too remember riding my bike all over the world, scrounging change to buy some small firework or rent a video game at the local rental shop. My friend and I went in there with like 315 sticky pennies once when I was like 9... OMG they must have HATED that hahaha! But there's value in the process, the search as you say. Having instant access is good sometimes too though. Balance is key, and I feel like I was lucky to be born at a time that kind of naturally did that. I grew up with tech but also playing outside all over the neighborhood too. These days, digital is my safe space because it affords me an incredibly high level of control, something I'm often lacking in meatspace. Both my autism and my trauma kind of make me need to express control over my world in some meaningful way. That's why I've built such a powerful, deeply customized and highly optimized system. But that said, I'm not huge on my phone, I'm either on my PC or nothing. My late partner was a lot younger than me and as a result, very addicted to this phone. I used to get irritated with him when he would try to pull it out while we were eating. I'm from the time period where sitting at a table and eating together was still an important social ritual, so I'd be like "Gavin, we're eating together so pay attention to me, your phone will still be there in ten minutes!" He would tell me that not checking it for a time gave him an intense fear of missing out and that's why it was difficult for him, so as a compromise I would let him check notifications but just not have it out consistently when we were eating and such. And yeah especially for children who are born now and given a device at like 2 by parents who would rather that device raise their child than them... they become so deeply dependent it's insane. When you become so deeply attached to an object like that your brain starts to track it like it's part of you. You will do instinctive checks to make sure you have it, and if you reach for it and it's not there, instant panic. Men sometimes report experiencing this with their wallets to a lesser degree, and women with their purse. People who smoke or vape will often develop some degree of this for their cigarettes/device. But when you are born with a phone in your hand, a little electronic dopamine hit machine from day 1? It gets SO bad. That's why you will see these things where like a family is in an accident and only the kid survives, but all he keeps saying is "Where's my phone, I want my phone!" seemingly oblivious to the tragedy all around. But he's not, it's just his brain is screaming out for the ultimate comfort object that's always been with him in his moment of agony. It's sad, but it's not even really his fault. I feel like the current era will be looked at as a period of intense technological overindulgence that was responsible for lots of problems in the future, assuming we make it to the future. Z-image sounds interesting, I will definitely get around to testing it at some point(i.e, when my autism fixates on diffusion again). I'm glad people are still focusing on creating more efficient models like that. One of the big reasons I wanted to share my blissful-tuner code was because I knew it could massively democratize access to big models like Hunyuan and Wan and that's important to me for reasons I mentioned before. This power of expression has changed my life in so many wonderful ways and I want as many people to have that as possible. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I wonder if anyone who visits this thread will appreciate that title reference? 😅 I make this thread just for general conversation purposes where no other topic fits or making a standalone one would be inappropriate. Have fun!
In continuation of our discussion about mental health and loss... time does make a difference but I'm not sure how I feel about it. Gavin passed away nearly four years ago now and it doesn't hurt the way it used to. But I'm almost certain that's just simply because I've forgotten... lots. So it's kind of a double edged thing in some ways. Of course you're right he would want me to be happy - even though we were relatively young and our relationship only lasted a few years, we discussed such things. So I know it for certain. But it's much easier said than done... I was very traumatized BEFORE Gavin. He kinda saved me from the world and myself and then a few years later he was just gone. I let myself dare to believe in the future again and got hurt the deepest I ever have.
I didn't just lose Gavin. Gavin died and my life collapsed completely because I didn't really have anyone else and my mental health just went to pieces for a couple of years. I fought but I was homeless by 6 months after his death. I stayed homeless for a year and a half. Someone very special saved me from that about two years ago now and I live together with her now. We're not partners in the romantic sense, but she is deeply important to me. She saved my life and gave me the chance to build the tiny world of sanity I find myself in now. I am very grateful, but with gratitude comes fear. I've seen again and again - so many times - how everything can just be gone tomorrow. That in the end I am powerless to fully control my direction through life, a leaf on the wind of fate desperately trying to steer itself in some meaningful way.
When I said I started my journey with generative AI at the most difficult point of my life I was referring to when I was on the street after losing him. My only device was an LG V60 running android which I rooted at a coffee shop with a friends laptop and installed Linux onto and willed into running SD1.5 - I eventually built an environment capable of using custom models, LoRA and textual inversions, doing t2i, i2i, or inpainting, with live in terminal previews it could make a 640x640 image in 20 minutes, upscale it 4x in another 20 purely on the Snapdragon 865 CPU (no GPU access) with 8 gigs of sysram and a 16 gig swap. That was my very first time working with Python as well(though I learned to code in general at like 10) and I was immediately fond of it. My artwork kept me going when I had absolutely nothing else. It not only allows me to be creative but it allows me to express deeply complex emotions and process trauma in a way like nothing else. I would charge power banks at the library or train station during the day while working on my pictures with a keyboard and mouse I carried with me. At night it would drain the power banks running diffusions all night. Even helped keep me warm on cold nights.
That's where I started. I've come a long way. Things are still... so very fragile in some areas. But I'm so very grateful to have the little bit I do. Just also very terrified of losing it all again. I know I shared a lot here but a lot of this is already publicly attached to my name(I did interviews while homeless and other reasons, plus I've never hid the truth of my story) so it is what it is, I'd rather at least get to be the one telling it. It's the truth and I'm not ashamed. It was not through my mistakes that I became like that. I did make mistakes but... not enough to justify that kind of nightmare. And despite it I maintained my state of sobriety - I've been clean from pain pills for nearly 11 years, including the time I was on the street. Yeah that was... a different hell. I'm 41, there's been a lot of them by now. But I guess I'm just a glutton for punishment because here I am trying to build something meaningful once more. I'm good at many things... not good at giving up, for better or worse. And honestly, it feels like both a lot of the time.
It's not just a project to me... the skills that built Blissful Tuner are directly borne out of the struggle to get SD1.5 working on my phone when I had nothing else. It's proof of my triumph over the worst nightmare so far... and evidence of my desire to give the same gift of strength and sanity it gave to me back to everyone else. It stands as the most used project I've ever created, after I was banished into the darkest abyss yet and came back for more. That's gotta count for something!
Beta Was this translation helpful? Give feedback.
All reactions