@General_Effort

General_Effort@lemmy.world · 9 days ago

No. If it’s a copy, then it falls under copyright regardless of how the copy is made. The question wasn’t about copying, though.

Be aware that copyright only covers the creative elements; ie things that other people would do differently. It also doesn’t cover ideas, methods, and the like. It also doesn’t cover very short or obvious creations. So, copyright on code comes from UI design, comments, names, even the ordering of lines, functions, splitting the code into files, using shorthand or not, and so on. Snippets and even short functions are typically not copyrightable. If you have some short program that anyone would write that way, then that’s not copyrightable, beyond comments and maybe names.

General_Effort@lemmy.world · 20 days ago

If I submit code to ReactOS that was trained on leaked Microsoft Windows code, what are the legal implications?

None. There is a good chance that leaked MS code found its way into training data, anyway.

General_Effort@lemmy.world · 6 months ago

It’s not necessary to expose the identities of the users. The age confirmation could happen via a password, PIN, or even a physical USB dongle. Tying such methods to a particular identity adds nothing to the age verification.

If that is not enough, then one would need a permanent, live webcam feed of the user. It could be monitored by AI, and/or police officers could make random checks.

Granted, one would have to make sure that not everyone behind the same router can use age-restricted services; eg with a VPN. That would let them assign connections to individual, anonymous adults. But I’d guess you could do that anyway with some confidence by analyzing usage patterns. Besides, information on who is in a home can also be found in other places such as social media or maybe company websites. So I do not think this is much new information.

But thinking about it, one could compartmentalize this.

The ISP only allows connections to whitelisted servers, including 1 or more government approved VPNs. The ISP refuses connection to these VPNs without age confirmation. The VPN provider does not need to be told the identity of the customer. There needs to be no persistence across sessions. The ISP need not know what sites are visited via VPN. While the VPN provider need not know about sites visited without.

If you do it that way, the ISP ends up knowing less than before.

Since both ISP and VPN servers and offices would be physically located in the country, one would have no problem enforcing prohibitions on data sharing, if desired by lawmakers.

Anyway, this is the only realistic approach in the whole thread. Everything else assumes that Australian law will be followed globally. And then the ISP still has all that usage data. Why not just use a blockchain…

General_Effort@lemmy.world · 6 months ago

I’d lean on the ISPs. Your ISP knows what sites you visit, and they have your location and payment information. They can just insert some verification page when a classified IP is contacted. This gives them hardly any information beyond what they already have. And since they are mainly located in Australia, it is easy to enforce laws on them.

You have to lean on ISPs anyway because it is quite ridiculous to assume that the entire global internet will implement Australian laws. Does anyone believe that their Lemmy instance will implement some AI face scan or cryptography scheme?

You would have to block servers that do not comply with the law anyway. The effective solution would be a whitelist of services that have been vetted. In practice, I think we’ll see the digital equivalent of ok boomer.

If a whitelist seems extreme, then one should have another look at the problem. The point is to make sure that information is only accessed by citizens with official authorization. There is no technological difference between the infrastructure needed to enforce this (or copyrights) and some totalitarian hellscape.

General_Effort@lemmy.world · 8 months ago

via https://duckduckgo.com/?q=DuckDuckGo+AI+Chat&ia=chat&duckai=1 with GPT-4o mini

General_Effort@lemmy.world · 9 months ago

Why do you believe that?

General_Effort@lemmy.world · 10 months ago

Come to think of it. That DMCA argument would really wreck fair use.

It’s illegal to remove “copyright management information” (CMI). In this case meaning the FOSS license. The argument was, that when copilot spits out verbatim snippets of source code without the license, this constitutes removal of the CMI. The point of the argument was that fair use is not a defense under the DMCA. These verbatim snippets are pretty obvious fair use to me, so countering that defense is important if they hope to get anywhere with their suit.

By the same argument, any meme image is illegal. They are taken from somewhere without the original license or attribution. Yikes.

General_Effort@lemmy.world · 10 months ago

Don’t listen to me on that. I have no idea how the community feels on copyright or fair use. Whenever AI comes up, the most dogmatic copyright maximalism dominates. On other subjects, the debate is more nuanced. I don’t know how that fits together at all. But I guarantee you, if Alsup ruled on a case like this/OP, they would… Well, most comments would not like the ruling or him.

General_Effort@lemmy.world · 10 months ago

Yes, I know what you mean. But looking at the comments here, Fair Use is not a popular concept. I remember that Alsup specifically quoted the copyright clause in his ruling. I can’t imagine any argument that would make him rule, on the whole, for the plaintiffs in a case such as this.

General_Effort@lemmy.world · 10 months ago

Judge William Alsup.

Now I remember that guy. He decided oracle vs google. I can’t imagine he has many fans here.

General_Effort@lemmy.world · 10 months ago

Wow, long take. I didn’t want “much the same” to bear a lot of meaning. In the german inquisitorial system, in a criminal case, the judge takes over the (police) investigation from the prosecution. When the police become aware of a possible crime, they inform the bureau of the state attorney. A state attorney is responsible for the investigation and for uncovering the truth. But once the case goes to court, the responsibility goes to the judge.

In a civil suit, the parties are basically in charge and not the judge. It’s true that the judge has a more active role in German civil procedure. While the court is not supposed to run its own investigation, it can request additional evidence if it’s necessary to judge the arguments of either side. I am not clear on the details. Where matters of fact must be determined by an expert, either party can request the court to provide one. But they can also make their own arrangements. The court can also solicit an expert opinion on its own, if necessary. Typically, the expert’s opinion is given as a written statement. An oral disposition may happen when questions remain. Afaik, it’s unusual to depose an expert without having first requested a written statement. Either party or the court may question the witness.

General_Effort@lemmy.world · edit-2 10 months ago

Hmm. In what way is the German system more effective? I know of some hair-raising cases. Me, I blame the law-makers and not the judges, but others see it differently. I can’t think of a single related case, where I’d say that the judgement served everyone’s interests.

ETA: Bad question. You explained how the German system is more effective. I’m wondering about cases where I can see this in action. IE: “well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.”

General_Effort@lemmy.world · 10 months ago

You should emphasize more that the difference adversarial system vs inquisitorial system exists in criminal law only. In civil/private matters - eg copyright disputes like in this instance - continental Europe handles matters much the same.

General_Effort@lemmy.world · edit-2 10 months ago

It doesn’t work like that. A copy is a copy. Only if you can make it credible that you independently produced the same code, can you get through with that. Hence, clean room implementations. It’s not strictly necessary but deters lawsuits.

Apparently there’s some confusion here what the judge ruled. This particular part is about claims under the DMCA, not copyright infringement. The relevant sections can be seen here: https://www.copyright.gov/title17/92chap12.html [edit: link fixed. The claim was that “copyright management information” was removed; prohibited under these sections.]

Here’s the original text for those who want to know more (link via The Verge): https://www.documentcloud.org/documents/24796955-github-copilot-claims-dismissed

General_Effort@lemmy.world · 10 months ago

I’m categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.

Can you name one in Germany? Just asking.

Anyway, at this stage of the trial only legal experts are involved. The judge examines if the legal arguments are sound, assuming the allegations are true. Whether the allegations are actually true will only be determined in the future. That’s also when Fair Use comes in. At that point, you need outside experts to advise on the non-legal aspects.

General_Effort@lemmy.world · 1 year ago

Ahh. TV shows before everything became political. Just two guys hating each other for very silly reasons completely unconnected to anything on earth.

General_Effort@lemmy.world · 1 year ago

Text explaining why the neural network representation of common features (typically with weighted proportionality to their occurrence) does not meet the definition of a mathematical average. Does it not favor common response patterns?

Hmm. I’m not really sure why anyone would write such a text. There is no “weighted proportionality” (or pathways). Is this a common conception?

You don’t need it to be an average of the real world to be an average. I can calculate as many average values as I want from entirely fictional worlds. It’s still a type of model which favors what it sees often over what it sees rarely. That’s a form of probability embedded, corresponding to a form of average.

I guess you picked up on the fact that transformers output a probability distribution. I don’t think anyone calls those an average, though you could have an average distribution. Come to think of it, before you use that to pick the next token, you usually mess with it a little to make it more or less “creative”. That’s certainly no longer an average.

You can see a neural net as a kind of regression analysis. I don’t think I have ever heard someone calling that a kind of average, though. I’m also skeptical if you can see a transformer as a regression but I don’t know this stuff well enough. When you train on some data more often than on other data, that is not how you would do a regression. Certainly, once you start RLHF training, you have left regression territory for good.

The GPTisms might be because they are overrepresented in the finetuning data. It might also be from the RLHF and/or brought out by the system prompt.

General_Effort@lemmy.world · 1 year ago

I accidentally clicked reply, sorry.

B) you do know there’s a lot of different definitions of average, right?

I don’t think that any definition applies to this. But I’m no expert on averages. In any case, the training data is not representative of the internet or anything. It’s also not training equally on all data and not only on such text. What you get out is not representative of anything.

General_Effort@lemmy.world · 1 year ago

A) I’ve not yet seen evidence to the contrary

You should worry more about whether you have seen evidence that supports what you are saying. So, what kind of evidence do you want? A tutorial on coding neural nets? The math? Video or text?

General_Effort@lemmy.world · 1 year ago

That’s a) not how it works and b) not averaging.