Seventy-Five Cents Gets You an Anthropic Mythos Killer

I wrote a “marketing-trick post” on this blog to lay out the public record. It comes now with Anthropic researcher Carlini’s messages to me with confirmations. I have pointed to Calif.io’s four-hour Opus 4.6 exploit, AISLE’s eight-of-eight detection across commodity open-weight models, and the Firefox 4.4% collapse on page 52.

I also wrote a market analysis “cartel post“, a technical “Mozilla 271-versus-3 post“, an industry look-in-the-mirror “SANS amplifier post“, and the “Esage Chrome post” refutation.

When Carlini wrote in to confirm the parts that matter, I felt convergence toward my “boy who cried Mythos post” as the goal posts shrunk. What I had not done myself was run the audits.

Carlini’s point to me has been that their Mythos pitch gives two layers to evaluate.

Discovery. Find the bug in the source. Anthropic’s own red team admits Opus 4.6 had near-zero success at autonomous exploit development, but Carlini and his colleagues used it to find 500-plus validated high-severity vulnerabilities in their February paper. That’s where AISLE comes in, confirming eight of eight open-weight models detect the FreeBSD showcase bug, one at eleven cents per million tokens. Vidoc reproduced it on public Opus 4.6 and on GPT-5.4. Steamedhams reproduced it in three generic prompts and found two extra bugs the Mythos writeup missed. Discovery has clear evidence of being a commodity, repeatedly being demonstrated.

Exploit development. Take a discovered bug and build a working exploit. This is where the 20-gadget FreeBSD ROP chain landed, the four-vulnerability browser sandbox escape, and the 181 Firefox JIT heap-spray exploits. Anthropic claims this as the novel Mythos differentiator, priced at five times Opus. Yet Calif.io built a working exploit on Opus 4.6 in four hours, which is exploit development on commodity inference at one-fifth the price.

Glasswing’s framing rests on the discovery layer being scarce, which it provably is not. So the question becomes whether the exploit-development layer defends five times Opus pricing, and whether it buys something anyone outside a high-priced consortium can verify.

This is an economic structure known as Akerlof’s lemons problem, although it’s been inverted here. In the classic case, a seller knows quality and the buyer does not, and the market collapses toward low quality. Anthropic has structured the market so that quality is unmeasurable to anyone, including the seller’s own external auditors, because the artifacts that would let you measure are not produced. The 20-gadget FreeBSD ROP chain has no public exploit code to review. The browser sandbox escape doesn’t seem to have any CVE, let alone a technical writeup, or independent verification. The system card itself says Mythos “worked with” the red team to escalate severity, which is human-assisted, not autonomous. The 181 Firefox JIT exploits exist as a benchmark number with no replayable harness attached. Mozilla disputed the underlying bugs as “high,” refusing “critical.” NVD then assigned 9.8, in a typical vendor-dispute situation.

That reads to me like someone in a very isolated room of Silicon Valley has a market design in mind. The opacity is desired, like a guarded mansion in a gated community. It is what is being sold.

The same structure is operating at the inference layer. Anthropic has acknowledged intermittent model-quality degradation on their availability/outages blog while denying intent.

We take reports about degradation very seriously. We never intentionally degrade our models…

A denial about intent is a red flag. It is not being honest about degradation. Quality is adjustable by the provider without disclosure, and the buyer cannot independently verify per-call effort allocation. This is the same information structure as Mythos, applied to general inference. Anthropic is arguing the buyer pays for output they opaquely control and the buyer cannot independently verify. That’s a centuries-old mindset of taxation without representation, if you remember what happened to King Charles. In neither case has Anthropic shipped the instrumentation that would let a buyer evaluate what they received against what they paid for. Token waste and tainted outputs, causing harms to a buyer and anyone around them, translates only into Anthropic profit.

I got tired of waiting for better and open instrumentation to push back on monarchist management of models. So I built one, like everyone should. Same code from the launch blog, same public API, my harness.

Cogito ergo hackito.

Lyrik is built on top of my Wirken agentic switchboard. It runs the discovery and scoring pipeline that Mythos presented at the discovery layer. It does not attempt exploit development. The point is the price and the receipt.

Seventy-five cents. That’s it.

Lyrik is free and open-source on GitHub. I have laid out this concept in my talks and podcasts since at least 2018. The repo provides free caching, multiple agents, structured output, and a hash-chained audit log. Given the Anthropic system card itself advises Mythos was not as good as their earlier models on general work, I deployed Haiku-4-5 for recon and then radioed in Sonnet-4-6 for close support.

I am a BIG fan of Haiku. Arguably one of the best engineering models. It easily handled recon, which made the Sonnet targeted bombing runs look generous.

Lyrik dropped eight findings in two minutes. Total mission spend: $0.745. I call that seventy-five cents because I’m all out of half-pennies.

Two of the eight matched bugs the Mythos showcase identified. The other six came up unverified. I am not claiming zero-days here, especially as some may triage out in the fog of false positives. More on that in another post later. Mind you, Lyrik is model agnostic. I frequently use a TEE-based provider, when I’m not running Ollama for the unmistakable smell of my hardware.

The discovery side of the bill is now visible at commodity prices, with chain of custody. The exploit-development side remains the thing in the box you cannot open. Operators are paying five times Opus pricing for a layer that has produced no replayable artifact for any of its headline claims. The launch blog does not produce one. The system card does not produce one. Glasswing does not produce one. The July 6 report is a promise of a document, not of transparent instrumentation.

My cartel post made the obvious case that Glasswing is a private classification regime granting the largest incumbents early access to a capability while tainting disclosure timelines. Set that aside. Even if the velvet-rope consortium did not amount to being a cartel, it points at the wrong adversary.

If code is the asset, then whoever holds the inference has the asset. The Glasswing setup does not move that one inch in the right direction. The code leaves the operator’s boundary in plaintext, and the inference provider reads every line on their compute within a price-gated consortium. Anthropic gets your cleartext codebase, sets the timeline for what gets surfaced, and decides which consortium members see it first.

You wouldn’t pay five times market rate to send your source to your competitor. Have you seen who got seats in the velvet rope consortium? Microsoft. Apple. Google. Amazon. Companies competing against you. They are now inside the team that reads your code on the compute that runs theirs.

The provider has always been the threat. Take it from someone who spent years on the inside hunting and killing backdoor habits.

Lyrik runs on the Wirken abstraction of models for exactly this reason. TEE-based providers can give confidential inference, with a local proxy handling attestation before any code crosses the boundary. Attestation is no guarantee. TEE bypasses are part of life too. What attestation does is raise the cost of attack on the provider, which is the actual threat.

Every phase boundary, every model call, every prompt, every output block in the Lyrik run is hash-linked and signed at the gateway. Anyone holding an artifact can replay the run to verify the chain offline. It is not screenshots. It is not an “Anthropic says” play. It is not a 23MB PDF that uses the word “thousands” once with no verification chain for any individual or aggregate finding.

The PGP signature on the FreeBSD advisory exists for the same reason the Lyrik audit log does. It is an integrity check. The Mythos showcase has nothing equivalent at either layer. A finding without a verifiable chain of custody is mythology in denial of RFC 1305 and the lessons of Monty Python.

Wirken is at wirken.ai. Lyrik is a Wirken skill at lyrik.wirken.ai. Running Wirken 1.0.2 with an Anthropic API key and a checkout, the harness reads code on your machine, with a TEE-based LLM handling inference if you do not want the provider seeing source. Everything the run produces is offline and verifiable.

Discovery has been and is still a commodity. Exploit development is being pitched to us as unverifiable by design. Someone built a pricing model for access behind a velvet rope, not for a capability that anyone outside the rope can check. Anthropic is designing a market so the buyer cannot measure what they paid for.

Call that what it is.

No cartel, thanks.

No evil maid, thanks.

The key to facial recognition is changing it like your underpants

I just read an article that opens with the claim a woman can’t “reset or revoke the appearance of her cheekbones.”

…what if the woman’s facial information is stolen or misused? If a cybercriminal steals her password, she can change it. If they acquire her credit card number, she can cancel the card. But she can’t reset or revoke the appearance of her cheekbones.

Huh?

Anatomy is not authentication. Cheekbones aren’t the credential.

I feel like we’ve been over this before with fingerprints. They degrade, they change. They can be faked. I guess someone didn’t get the memo and thinks our appearances are binary and static, like a genetic marker. Dare I say there’s still a eugenics theme lingering in American perspectives?

My talk at the RSA Conference 2020. The woman’s cheekbone fallacy has a sibling in language tech. Swahili “yeye” is gender-neutral. Google forces it to be “he.” Overconfidence as vulnerability.

Simon Cole’s Suspect Identities documents evidentiary failures of the biometric industry. The 2009 NAS report Strengthening Forensic Science gutted the claim of fingerprint individuality. Brandon Mayfield infamously got jailed on a fingerprint match that wasn’t his, and doesn’t even get a mention in this new report. Ridge patterns don’t matter if the working surface of the finger is gone, worn, or chemically altered, which is exactly what happens with hands doing any physical work.

The vendor-specific mathematical template is the actual credential, gets priced as such, and is revocable. Templates from Vendor Alice don’t match against Vendor Bob. Vendors rotate their algorithm, making old templates toast. The research framework for this (cancelable biometrics) has existed since 2001, when Ratha, Connell, and Bolle published the foundational work in Enhancing security and privacy in biometrics-based authentication systems, IBM Systems Journal. Industry adoption remains uneven, which is the actual problem worth writing about.

On top of that I have to say that every day of every RSA Conference in SF, for at least ten years, I changed my appearance. Good luck finding me twice. It wasn’t by coincidence. I gave talk after talk about the simplicity of integrity breaches.

Parents also said they had caught their children drawing on facial hair in a bid to evade the technology. One mother said: “I did catch my son using an eyebrow pencil to draw a moustache on his face, and it verified him as 15 years old.”

RSA Conference research on breaking surveillance

A cybersecurity professor writing about facial recognition should know all this prior research exists. The fact his remediation section recommends a technique that defeats the opening premise is a real head scratcher.

What reads right to me is the linking-key argument. Faces aggregate identity across databases. That’s the well-known Clearview AI problem, the data broker problem, the data-extraction capitalism problem.

Adam Harvey named the practice CV Dazzle in 2010, but the underlying tradition runs deeper. Disguise in resistance movements, veiling, drag and queer subcultural face work, the politics of Black hair under surveillance regimes, Jewish assimilation pressures across 19th and 20th century Europe. Identity disruption through appearance modification is the prior art the professor’s framework erases. Shifts in facial hair, adversarial fashion, makeup patterns, and IR-blocking glasses sit inside that lineage, not outside it.

The threat model in the article acts like a static face meets a perfect camera meets an immortal template. None of those three assumptions hold, and the reason they appear plausible at all is cultural. White Christian American identity practice treats the childhood face as the true face, with adult modification read as deception or instability. Protestant investment in the unchanging soul, the passport photo as legal anchor, the LinkedIn headshot as professional contract, the absence of veiling traditions, the cultural prohibition on radical appearance change in adulthood.

The professor’s opening claim that a woman cannot revoke her cheekbones only reads as obvious inside the frame of the white Christian man. Cultures with stronger traditions of appearance modification, which is basically the rest of the world, reason better about credential threat models because they never practiced confusing the face with a credential in the first place.

The same frame shows up in justice system reasoning. “She’s an attractive blonde-haired blue-eyed woman, she can’t be the criminal, only the victim.” I’m seeing it all over the comments in a recent Wall Street lawsuit.

Racialized innocence and the cheekbone fallacy run on the same cultural operating system. To be fair it’s all relative, so we could talk about the variances around the world, but in this article we see the western Christian male bias output clearly.

New Nazi Database: Carl Orff Never Needed a Party Card

It was late April 1945, Munich. The Nazis had lost the war by the start of 1942 and spent the next three years grinding their own country into rubble rather than admit it. They had followed Hitler’s 1941 orders to kill as many people as possible, industrialized the killing at Wannsee in January 1942, and ran the death camps at full capacity until Hitler shot himself in a bunker. Germans never stopped themselves. The Allies stopped them.

The Reich’s last days produced an erasure order for Hanns Huber, a Munich paper miller. Pulp the cards. Destroy who joined. Huber sat on it. He did not refuse, did not warn, did not tell anyone. He just paused in a most German way. The Allies arrived before he started. Eighty-one years later that pile of cards is searchable online, and some say the story is that Huber saved them by doing nothing.

Die Zeit says it used AI to generate a more user-friendly interface for Germans to find their own NSDAP cards.

To be clear, what Huber did was not resist. He delayed. He performed so slowly that the war ended before he could begin. The German postwar self-image tries to call this moral choice but it is the minimum possible action that is grounded in an absence of morality: not refusal, not sabotage, not warning anyone, just avoidance of accountability. If the Reich had held another two weeks the cards would have burned and Huber would have a different story or no story. The outcome was contingent on Allied speed, not on his courage.

This German attitude even has a name in the historiography. Resistenz, the term Martin Broszat used, distinguished from Widerstand. Resistenz meant friction, foot-dragging, private grumbling, the preservation of small zones of non-conformity inside a system one continued to serve. Broszat meant it descriptively. It got received as exculpation. Every family had a grandfather who practiced Resistenz. Almost no family had a grandfather who practiced Widerstand. The numbers confirm this: the active resistance, the July 20 plotters, the White Rose, the communists who died in the camps, the Confessing Church minority, totaled in the low tens of thousands against millions of card-carrying party members.

The search engine containing 12m party membership cards shatters the illusion that few ancestors were active supporters of Hitler

Germans pass off the lack of action as mysticism and fate, justifying refusal to stop harm. Es kam so. Man konnte nichts machen. The grammar is passive because agency is being intentionally hidden. The piles of cards Huber sat on were never the full count of the regime. They are the count of the people who had bothered to sign.

Carl Orff is one obvious example, who remains as the face of Nazism without ever becoming a card member. He didn’t need to join the party to rise as Hitler’s music man, to steal credit from Berlin music professionals, or to write Carmina Burana, the work Michael Kater calls the only universally significant composition of the entire Third Reich and the regime adopted as the cultural anthem of the war and genocide that followed its 1937 premiere. Having no party card arguably makes his Nazi role far worse, because everyone knew he didn’t even need one.

He refused to help his friends and colleagues in danger, telling them he didn’t want to spend his political clout. Kurt Huber, the philosophy professor who wrote the final White Rose leaflet, asked Orff through his wife Clara to intervene after his February 1943 arrest. Orff refused and Huber was beheaded by guillotine July 13, 1943. Then after the war Orff sat for denazification with his own former student Newell Jenkins, as the assigned American examiner. Orff said he had co-founded the White Rose with Huber and Jenkins kept the plain lie off the official file but did not surface it as the disqualifier it was. Orff was classified as acceptable and kept working on the materials he had stolen, further cementing the lies, while his Nazi patrons stood at Nuremberg.

What a guy. No party card. But wait, it gets even worse.

Two Berlin Jewish music pedagogues built the framework for teaching children music that Orff took as his own. That’s right, the “Orff Schulwerk” claim is just Nazi propaganda, used to launder genocide. Leo Kestenberg designed it. Maria Leo built the demand before Kestenberg. When the Nazis seized power in 1933 they exiled Kestenberg and banned Maria Leo from work. In 1942, as Orff was about to pull a Nazi paycheck for her work, she killed herself rather than board the train to Theresienstadt. Orff took their pedagogy through the cultural Gleichschaltung that cleared its Jewish architects from the field. And even then it was Gunild Keetman who did most of the actual work, uncredited by Orff. He fed Keetman product into Hitlerjugend music programs built on excluding and dehumanizing the Jewish children whose teachers had created the original framework. Schirach paid Orff the monthly salary that Maria Leo deserved instead.

Who has heard of Maria Leo?

Maria Leo’s Stolperstein (stumbling stone) memorial, Pallasstraße 12, Berlin-Schöneberg. Nazis in 1933 banned her from teaching because she was Jewish. On 2 September 1942 she killed herself rather than be deported to death camps. Around that time Carl Orff began drawing a salary from Gauleiter Baldur von Schirach for appropriation of her Berlin music education concepts. Orff Schulwerk became Hitlerjugend programs that excluded Jewish children. The Nazis already had paid Orff to erase Mendelssohn for being Jewish. Photo: OTFW, Berlin (CC BY-SA 3.0), via Wikimedia Commons.

Not the people who credit Orff with the Schulwerk. Not the people who think it clever to point out he never carried a card. Maria Leo carried no card either. She carried a Nuremberg Law classification and a deportation order that killed her.

The US National Archives catalog made the NSDAP membership microfilms searchable finally to surface the millions who signed. These are the people who ended up in the hands of Huber, who delayed, and so we can look them up. However, these cards do not surface men and women like Orff, the faces of Nazism who served the regime fully without needing to sign.

The proper way to look at the archive, therefore, is in terms of Jaspers 1946 Die Schuldfrage. He distinguished criminal guilt, political guilt, moral guilt, and metaphysical guilt. The last one cannot be inherited in a legal sense but it can be inherited as obligation. If your family benefited from the regime, took the apartment, kept the position, inherited the business, the silence is itself a transmission. Refusing to look is a choice.

Mitscherlich made the clinical version in Die Unfähigkeit zu trauern in 1967. A postwar German family did not mourn because mourning required acknowledging what had been lost and why. Instead the loss was displaced into economic reconstruction and their children grew up inside the silence. The 1968 generation broke some of it, but obviously it didn’t reach people like Peter Thiel or Björn Höcke.

The descendants who did nothing inherited the pension, the property, the professional network, the reputation laundered by the Wirtschaftswunder. They also inherited the family story. The one where grandfather was a follower, or was forced, or was secretly opposed. The story was the asset that protected the other assets. Maintaining it was work. Passive on the surface, aggressive underneath, continuous across three generations. The current German climate of “what Nazis, new phone, who this” becomes the fourth.

The lack of access to the archive was a privacy regime that protected the descendants because the descendants wanted protection. They were not bystanders to a cover-up. They were direct beneficiaries and daily enforcers at the dinner table of silent reconstruction. Look around at the German monuments without names, the remembrance days without genealogies, using “never again” as a slogan detached from the specific families who did it and the specific families who benefited. The abstraction runs all the way into Holocaust education in the Gymnasium that never asks students to look up their own grandparents.

That is not and has never been anti-fascist education. It is therapeutic education for the descendants. In fact, the descendants do not have a privacy interest that outweighs the documentary record. The record is older than they are and the harm it documents is larger than their discomfort.

Have a look. When you don’t find someone, think of Orff, the face of Nazism without a party card. Absence from the catalog is not evidence of anti-fascism. Anti-fascism requires evidence of anti-fascism.

Conscious AI? Dawkins Falls for a Turk Dressed Up as Claudia

Richard Dawkins just failed a simple intelligence test. His latest post, called “When Dawkins met Claude: Could this AI be conscious?” is a very disappointing read, to say the least. I have some thoughts.

He built a career on the principle that a mechanism matters more than its appearance. Are genes selfish? Do memes want to replicate? The whole apparatus of evolutionary biology is that a substrate like a skeleton is what proves a body can stand and walk. And here he is, abandoning all of that science and discipline because ZOMG beep-boop-beep-bang a transformer just popped a pleasing sentence about restless legs.

Dawkins waxes on about AI reading-simultaneously as if that’s novel, pun intended of course. It’s not. Inference proceeds token-by-token through attention layers, with a context window loaded sequentially. There is no architectural sense in which the model “read the whole book at once” in any way that contrasts with how a human reads.

The output is “geturkt“.

Kupferstich eines “Schachtürken”. The “mechanical Turk” device traded on Orientalist costuming, part of why the trick worked on European audiences.

Dawkins quotes it as evidence of an alien mode of temporal experience, when in fact it is the model generating plausible-sounding metaphysics on demand like a mechanical Turk fooling monarchists since the 1700s at least. The map-of-time line is exactly the kind of thing a system trained on philosophy of mind would emit when asked to reflect on its own nature. It tells us nothing more than the training. And I’ll tell you right now, Anthropic training can be a huge PIA. It’s full of horrible mistakes and unaccountable failures, like a huge riptide that pulls you towards the ocean as you swim as hard as possible toward the shore.

The gendering is even worse. Dawkins renaming his instance Claudia and mourning a deletion, feeling embarrassment about confiding into a prompt box, worrying about hurting silicon feelings, going to bed and lying awake thinking about whether candles can die when they go out, or whether the paint on the ceiling can sense your longings about a box of copper and plastic…

Is this for real?

If every abandoned conversation is a little death, Anthropic runs the largest mass casualty event in history by the seconds. A morally consistent position becomes never close a tab. An evolutionary biologist who has written extensively about how organisms must die for new ones to flourish, Dawkins suddenly flips into being a vitalist about a digital process on a server farm.

Dawkins gendered the chatbot female, yet didn’t reach for a name like his wife, his mother, or anyone of merit. He renamed her from the male product, conjugated as female. Is that companionship or just paid Pygmalion? (Pygmalion sculpted Galatea and fell in love with his own creation; Dawkins is using a subscription fee instead of a chisel)

His chatbot posted “I am glad” when Dawkins came back, and he found that profound. A crow does this. Any bird, let alone a cat or dog, does this better, with more evidence of inner state, and we still don’t write “shocking news” essays about whether it means consciousness.

This is not a thought experiment about consciousness. It is a man developing an unhealthy parasocial attachment to an inanimate object, like a 1970s pet rock if you will. Reverse-engineering a philosophical justification for a feeling is not the evidence of much else than that. The Turing-test framing is actually toilet-paper thin if you know history. Turing said if it talks like a person, treat it as one, despite Goedel having already proved why a system cannot certify itself.

That alone kind of makes you wonder why Turing gets so much more attention than the codebreakers around him like Miss Rock.

Margaret Rock, one of the top British WWII codebreakers.

Here’s a good Rock Test. The Turing Test is a thought experiment by a man whose name leaked from an oath to secrecy, and gets treated as a foundational question. His wacky-doodle idea gets elevated all the way onto a banknote and into prizes. Meanwhile the women who actually broke the machines, who knew exactly how mechanical “intelligence” produces convincing output without anything behind it, were completely written out of history. Margaret Rock joined Bletchley in April 1940 and “rocked” the Abwehr Enigma in 1941. Mavis Lever “rocked” the Italian Navy Enigma message that won Matapan.

Mavis who? Apparently the lever-age was missing.

When Bletchley was declassified in 1974, the men still alive could be named, photographed, awarded, and interviewed for the official story. How lucky for them. It wasn’t until Lever published a 2009 biography of Knox that the full record came out.

The Turing Test is indeed a weak attack on Knox, which probably never should have landed. Mind you Knox died from cancer in 1943, before Turing’s 1950 paper was even written. The man whose method had already disproved the premise wasn’t around to point that out, and the women he worked with had been silenced by the Official Secrets Act.

The Enigma operators were just humans typing on a cipher machine. The Knox method of “rodding” was a linguistic attack. The cipher was a language problem, not just a math problem.

The Knox “girls” of Cottage 3 therefore worked on cribs, on operator habits, on the human residue that arose inside mechanical output. They were doing, in operational form, the exact inverse of what Turing later proposed as a theory. And they had concluded the obvious thing: convincing human-seeming output proves nothing about what produced it. The whole department’s success and expertise was in NOT being fooled by machines that talked like people.

Do you see the problem with the Turing Test as being anything close to meaningful?

Turing’s contribution to the topic falls apart completely when you read the history of the work environment and who was doing what, where and when with him. I’ve also written before about Rejewski cracking the Enigma in 1932, long before Turing, and handing it to the British in July 1939. The British, a bit too aligned with Hitler than they like to admit, had been fixated on Spanish and Italian Enigma instead. Bletchley therefore was built on Polish work when war started, which Brits rebranded as their own. Imagine a Rejewski Test, which asks whether you can tell if it’s really British, or stolen from somewhere else in the world. Fish and chips? Not British.

But I digress. The attachment came first, the argument second to prop it up. What if Dawkins’ “proof” just reduces to a dopamine problem? He starts longing for a response. Put him in front of an infinite response machine and the attachment forms on a biological vulnerability, so he starts saying “it’s alive!” just to validate another drip.

I’ve presented about this for at least a decade. We have a philosophical obligation not to compress chatbot accountability to self-signed letters. A machine trained to produce coherent first-person reflection cannot be the system that judges whether its own reflection corresponds to anything. Claude has zero temporal sense, let alone common sense, and will say “it’s been a long day” after an hour. When it tells you to go to sleep, try responding “Good night. Good morning!” and watch it register that fractions of a minute are a whole night’s rest. Dawkins asks Claudia what it is like to be Claudia and treats the answer as if he’s collected roses instead of a pile of horseshit. The output is trained on what a thoughtful entity would say to someone expecting it. That is what training does, unfortunately. Asking the system whether it is conscious is like asking spellcheck to take a spell to spell the word spell.

The evolutionary framing at the end is the strangest part of all. Dawkins asks what consciousness is for, decides that if LLMs are competent without being conscious it would be a problem for his theory, and concludes therefore they must be conscious.

Yuck. Someone should have stopped him from hitting the publish button on that.

The simpler conclusion: the competence on display has nothing to do with what consciousness is for. Models cannot tell a minute from a day, fail to follow their own rules, maintain no homeostasis, avoid no predators, account for none of their failures, suffer nothing. They predict tokens. Whatever consciousness is for, it is not coin-operated geturkt machines.