The quantity of posts (answers, questions), comments, reviews and other actions on Stack Overflow is huge. We are right at the beginning of our implementation and need support from other community members to both refine current programs and to address issues that have not been implemented yet.
These are some examples, but you’re stepping in to an area where not only your programming skills are needed but also your ideas and passion are welcomed.
What follows is a list of our projects, and some particular areas they need help with. If you need additional information don’t hesitate to join us in the SOBotics chat room, members from all over the world are active almost 24/7.
With SOBotics/Guttenberg#74, it will be easier to add reasons that trigger a report. It would be helpful, ff someone comes up with new reasons that could improve our results. For this, implementing LINQ could be tested. ( SOBotics/Guttenberg#73 )
At the moment, Guttenberg can only compare posts to linked or related questions and some results from Google (limited to 100 requests/day). Using other sources or different search engines could improve the results.
Over a period of 1 week, we have collected a dataset of all the new answers to old questions. Coupled with SEDE, we can discover a machine learning approach to detect questions posted as answers instead of the present heuristics based approach.
The blacklisted words all have the same value at the moment. Getting data from Sentinel, we can find out as to how many false positives are there and find a way to autoscale the blacklist filter value.
Are you good at machine learning?, do you have ideas on how to improve our algorithms and or feed?.
The NLP system needs improvement both with integrating new data and improving parameters on the machine learning algorithm.
Perspective has given us an api key to test, we are currently reporting hits that are above 0.5. However we have a fairly high amount of “false positive”, mostly due to sentences as “I’m an idiot”. To solve this problem there is the need to implement the feed-back system and probably come up with some ideas on how to consider also our feed.
We’re currently migrating away from JSON to SQLite for data storage. Here’s what needs to be updated to use a database, roughly ordered by when I plan to do them:
userWithIDfunctions should query the database, and
lookupUserInformationshould save the information to the database. Remember that users are not synchronized across rooms, so the table will also
privilegeIDshould probably be a raw enum value and not a key in another table.
details. Storage of messages will need another table containing
As far as I know, none of us in SOBotics are machine learning experts. If you’re good with machine learning and you want to improve FireAlarm’s filter, we’d be grateful! Right now we’re just using Naive Bayes; nothing too fancy or particularly precise.
Tumbleweed badge API returns only the user details. Using the data present on the user, we can find out as to which of their questions received the tumbleweed badge and then show those instead. Adding batches of Tumbleweed posts is the next thought.
APICache is a project that aims to provide a centralised server for getting cached responses from the Stack Exchange API. This saves on bots’ quotas, as responses are cached and made available to everyone, no matter who requested them originally.
APICache only exposes a limited subset of routes at the moment. More development is needed to expand this to a larger subset of the routes available in the SE API.
Citation Detector is a new project that aims to report those posts that have comments asking for users to cite appropriate sources on Hinduism Stack Exchange. (Based on a meta post asking for the same)
Low Quality Posts queue is infested with robo-reviewers who have deleted many posts which turn up as bad audits. We have start to develop a bot in java, but time is running short are you ready to continue this development.