False accusation of plagiarism is real threat

It's real systemic risk in OMSCS/OMSA. I know it happened to quite a few students.

OSI False Accusation Survivor with Advice
https://www.reddit.com/r/OMSCS/comments/1h21nsz/osi_false_accusation_survivor_with_advice/
100% Win Rate: How We Fought and Won Against False Plagiarism Allegations in CS6515
https://www.reddit.com/r/OMSCS/comments/1h1g3xj/100_win_rate_how_we_fought_and_won_against_false/

Ever since I saw these horror stories, I'm so worried about getting falsely flagged. Even if you are innocent, if they say your code matched other student's code by 80%, then you may not have evidence to prove innocence.

Some students check in code with github frequently to have a granular log of progress, while others use an IDE that keeps taking snapshots of code every 10 minutes. That's a good idea, although it doesn't reduce the risk from getting falsely accused, but increases the chance of winning the litigation once false accusation happens.

Proactive risk mitigating measure

As a proactive risk mitigating measure, for every coding assignment, I spend substantial time making my code ugly. Am I paranoid ? No, it's a legitimate concern because a typical class has 500~1000 students. With homework questions like "load this csv, and fit a linear regression model, and plot the residual distribution" -- if I implement it in a straight forward way in 13 lines of Python code using scikit-learn library functions, I worry my code will almost identically match at least a few other students. It's just probabilistic inevitability.

Also I worry my code may match AI/LLM. So after I finish my implementation, I go ask ChatGPT and compare its result with mine. If they are looking similar, I change my code to protect myself from false accusation. Also I comment every line of my code with extremely verbose descriptions and reference to the library/API documentation, including something super trivial like using numpy.sum().

It's silly and I wish I didn't have to do it. But unless they tell me how else to better protect myself from false accusation, I will continue to do this.

Meet MOSS

I hear from class mates that MOSS is a commonly used tool to measure software similarity. https://theory.stanford.edu/~aiken/moss/

Apparently, making variable names unique and adding lots of descriptive comments do not protect you from falsely accused by MOSS because MOSS only looks at the code logic. But hey, I think variable name and comments matter. Because when TAs review code that has been flagged as "high similarity score with other students", if they see a lot of comments explaining every step of the way, with its own variable naming convention, they may be less inclined to believe the code was plagiarized.

Cheating is terrible, but false accusation of cheating is even worse.