It’s not possible to remove bias from training datasets at all. You can maybe try to measure it and attempt to influence it with your own chosen set of biases, but that’s as good as it can get for the foreseeable future. And even that requires a world of (possibly immediately unprofitable) work to implement.
Even if your dataset is “the entirety of the internet and written history”, there will always be biases towards the people privileged enough to be able to go online or publish books and talk vast quantities of shit over the past 30 years.
Having said that, this is also true for every other form of human information transfer in history. “The history is written by the victors” is an age-old problem when it comes to truth and reality.
In some ways i’m glad that LLMs are highlighting this problem.
Part of this is a symptom of support demands from users. There has been an expectation in software development historically, back from when software was always hideously expensive and limited to companies as users, that errors would be fixed by someone on demand ASAP. We’re all familiar with the IT guy “file a ticket first” signs on offices, or the idiot executive’s demands for a new computer because they filled theirs with malware somehow.
But now a lot of what software did is web-based and frequently free/freemium. But the customer’s expectations of having their issue fixed ASAP remains. Despite the internet being far from a standardised system of completely intercompatible components. So updates and fixes need to continually be deployed.
And that’s great for most people, until that expectation extends to the creation of new features, from management and end users alike. Then things start getting pumped out half-finished-at-best because you can just fix the MVP later, right?
We’re going to get to the backlog sometime… right? We don’t need to keep launching new features every quarter… right?
No problem, you got lucky my brain is mostly working today and that this topic is in my very niche wheelhouse.
It also really grinds my gears that there is no documentation I have found for the actual full list of commands in VA and that I’ve had to trial and error discovery some of them. As though user interfaces weren’t hard enough for people who use alternate input methods already. None of this should be this hard for anyone.
I hear you. DSP does not pay anywhere near enough, especially with how much medical stuff costs these days and close to no bulk-billing doctors left. And physical stuff like even basic switches are ridiculously expensive, even more with the NDIS rorting companies are doing. Assistive tech physical and socio-economic accessibility is something I’ve been angry about thinking a lot about lately.
Ok yeah, I’m beginning to understand your problem.
Firstly, I haven’t opened Voice Access in a while, but this is definitely worse than I remember. Some of it seems to be Android’s fault, some of it seems to be Samsung’s, but there is definitely some bad behaviour going on especially with magnification. It looks like Samsung’s menu and overlay implementations are not working properly with Voice Access’ magnifier, the show numbers and labels commands are all over the place. When using Voice Access’ screen magnifier, they have also allowed for swiping to occur off-screen when zoomed in… so sometimes it swipes the wrong place because it’s trying to do it from the centre of the edge, and sometimes you see nothing happen because that edge of the screen is out of view… Shonky work.
The good news is there’s probably some workarounds for this. I’m constantly using grid mode (“show grid” / “hide grid” / “tap <square_number>” / “more squares” / “fewer squares” / “swipe <direction> <square_number>”, “scroll <direction> from <square_number>” etc.), which reliably accounts for only gestures on the parts of the screen you can see and overrides most other menu, website and other interface items.
It looks also like the Phone Settings > Accessibility > Interaction and dexterity > Voice Access > Settings > More Options > Show Borders On might work a bit better for the “Show numbers” and “show labels” problems with screen magnification. It looks like I have to sometimes hide and then show them again after zooming in or out to get them to recalculate their positions, but it’s better than trying to navigate with borders off.
duplicate commands
I did just have that happen. It looks like it might have to do with CPU or memory consumption, things seem to slow down while VA is going. You might have better luck if your apps that you’re not actively using are fully closed, and you don’t have 5000 chrome tabs like I do. This will definitely present issues for screen recording in addition to VA, along with other heavier tasks.
i also get lots of strange site specific bugs, like on mastodon I have to say my post in a single take, because saying a second sentence will clear what it already wrote. theres been way too many site specific bugs to list, but I run into them often.
Site specific bugs are usually because people are shit at accessibility. If you can send me an example link where I don’t have to have an account or log in, i’ll see if I can debug this one for you and get an issue opened if there’s something they’ve done wrong. Site specific bugs can sometimes be worked around with a different browser though too, browsers are also shit.
also it understands me very poorly, I’m a native english speaker with an australian accent for context, and it really struggles with understanding me.
You’re not fucking wrong, it keeps thinking i’m saying “Shore” instead of “show” and “top” instead of “tap”. This was not this difficult previously, even in a noisier place. I am not sure what is going wrong here, but I can see that the settings for various language interace things are all over the place, it might be something buried deep in a menu somewhere. If you’re able to speak really slowly and precisely, that will help, but I have no speech impediments and it’s fucking driving me nuts. Make sure there’s nothing rubbing against the mic or touching it too.
This is truly a painful UX experience, I’m sorry. Let me know if there’s anything else I can maybe find an alternate action for that’s more reliable, this is ludicrous.
Oh, you’re also Australian. Yeah that explains part of a potential accent understanding problem then. I never tried with an external mic, but even in a very quiet room it wasn’t always 100%. I found that the “Use verbs” setting was helpful to address some of that, because it limits the potential dictionary matching results. If you know you have shonky WiFi too, that can play a part.
Also, make sure in your phone settings under “General Management” then “Keyboard list and default” that you have set your Google Voice Typing language to Australian. I still have to be real slow and deliberate, but it’s a bit better.
No worries on the expensive part, I hear you. It’s shameful that the support for assistive tech on Android cuts off at about the point that the people who need it can afford it.
I’m not a full time voice assistive tech user but I have a some experience in this area. Can you tell me which gestures you’re missing / functions you’re unable to activate and unexpected behaviour? Is it specific apps or websites, or all of them?
Voice Access does have a few settings which have helped me make it a little more reliable (it does not like my accent sometimes, especially if there’s background noise) but without knowing more about the specifics it will be hard to tell whether there are some easy possible fixes or workarounds for the issues you’re having.
it utilizes the power of attention mechanisms to weigh the relevance of input data
By applying a technique called supervised fine-tuning across modalities, Meta was able to significantly boost CM3leon’s performance at image captioning, visual QA, and text-based editing. Despite being trained on just 3 billion text tokens, CM3leon matches or exceeds the results of other models trained on up to 100 billion tokens.
That’s a very fancy way to say they deliberately focussed it on a small set of information they chose, and that they also heavily configure the implementation. Isn’t it?
This sounds like hard-wiring in bias by accident, and I look forward to seeing comparisons with other models on that…
From the Meta paper:
“The ethical implications of image data sourcing in the domain of text-to-image generation have been a topic of considerable debate. In this study, we use only licensed images from Shutterstock.
As a result, we can avoid concerns related to images ownership and attribution, without sacrificing performance.”
Oh no. That… that was the only ethical concern they considered? They didn’t even do a language accuracy comparison? Data ethics got a whole 3 sentences?
For all the self-praise in the paper about state of the art accuracy on low input, and insisting I pronounce “CM3Leon” as Chameleon (no), it would have been interesting to see how well it describes people, not streetlights and pretzels. And how it evaluates text/images generated from outside the cultural context of its pre-defined dataset.
At least she lasted 1 whole minute before that embarassing contradiction. I can’t do another 38 minutes of watching that mess though, too painful right from the start.