Guttenberg – Finding plagiarism

Guttenberg is a bot that searches for plagiarism or duplicated answers on Stack Overflow. It’s currently running in SOBotics under the user Guttenberg. The code is available on GitHub.

Implementation

Every 60 seconds, Guttenberg fetches the most recent answers (the “targets”) on Stack Overflow. For each of these answers, possibly related posts (for example answers to related questions) are collected. All those posts will be checked for different characteristics (such as the Jaro-Winkler distance of the posts). If at least one of the characteristics meets the requirements, a message like this will be posted in chat:

sample chat message

 

General commands

The following commands are available. You can get this list by executing the commands-command.
(Some of the commands may be used by room owners only)
CommandDescription
aliveReturns a test reply to inform that the bot is alive
checkChecks a post for plagiarism: check <answer url>
clearTells the bot that everything is okay, after he warned that some executions failed
quotaReturns the remaining api-quota
sayEchoes the user input
statusReturns statistics about the current status
updateChecks if an update is available
rebootReboots Guttenberg. Usage: reboot <soft|hard>

Special feature for mods

In one of the latest releases, Petter has implemented theĀ checkuser-command, which is available for moderators and room-owners. It checks all posts of a given user for plagiarism. In addition to the usual check via linked/related-posts, this command uses Google to find other sources where the user might have copied code or text. (we have a limited quota of 100 requests/day)

Build Status

Leave a Reply