/t/ - Meta

Site Discussion


Name
X
Subject
Email
Message
Files
Password
[New Post]


Ever heard of Zipf's Law and Lorenz Curve?
https://en.wikipedia.org/wiki/Zipf%27s_law https://en.wikipedia.org/wiki/Lorenz_curve
(Also https://en.wikipedia.org/wiki/Pareto_principle but hey 80/20 is understandable)

Here is my idea: every month we tally up all 8chan-style board's population count and activity,
and judge by that we determine which top boards should get the most thread/page limit.
The worst performing boards would be automatically deleted (depending on expected board count).
The 1st board would get N thread limits, 2nd board would get N/(2^f) thread limits, 3rd gets N/(3^f) and so on.
In this instance f would be the competition factor, 1 being standard, 0 < f < 1 being less harsh, f > 1 being more harsh.

Challenge 1: Determining popularity of a board
Challenge 2: Determining expected number of boards and threads (resources)
Challenge 3: Setting a fair f factor that allow things to go smoothly
I will consider this (and simpler stuff like just sorting popular boards on the homepage) once i actually track unique users per day.
Right now I have a mongodb aggregate like:
await Posts.db.aggregate([
            {
                '$match': {
                    '_id': {
                        '$gt': pastHourObjectId
                    }
                }
            },
            {
                '$group': {
                    '_id': '$board',
                    'pph': { '$sum': 1 },
                }
            },
        ]).toArray()

So its not actually using a number stored in the board table or anything. The advantage of this is its indexed, and is a rolling window without needing to do anything really. I could also use $addToSet on the IPs (i think?) to get an array of unique IPs and then just get the length. Probably its O(n) to addtoset. Not sure how fast it would be, but I dont think I can get the count of unique ips straight from mongodb otherwise or without multiple group stages.

/rant. So yeah definitely some thinking to do.
also sry i fucked up the indenting
>> 50
Don't worry, it is all just one small idea about how to create a more competing environment for an 8chan-like community model.
Say, what do you think about Zipf's law / Lorenz Curve / Pareto? Is it worth trying?
What about flooding/spamming? Assuming there are anti-flood/spam mechanisms at play, what other issues would come up?
What about dead boards that are being archived, how to deal with that? Just allow people to IPFS seed/pin them until we unseed/unpin?
What about board name placement? Do we "lease" them out immediately or do we have a one week "grace" period to be fairer?
What about the board owner, does his account get deleted, or just that he loose control over the board and can moderate other boards?
What about sudden deduction of threads in a boards, do we slide 2 threads for every thread created? or maybe even delete more threads?
What about deduction coupled with sliding/flooding/spamming, how would that be dealt with? Can threads be un-slided back to catalog?
Replies: >>52
>>51
To be clear this was never meant to be some kind of 8chan replacement, so i hope people don't have too high expectations yet.
As for some questions:
>Say, what do you think about Zipf's law / Lorenz Curve / Pareto? Is it worth trying?
I don't really like the idea of adding or deleting boards, limiting thread amounts, etc based on popularity to be a good feature built into the software. There are less active boards that people enjoy just as much. 8chan has many boards that dont get dleted ever afaik.

>What about flooding/spamming? Assuming there are anti-flood/spam mechanisms at play, what other issues would come up?
Currently there is a ratelimit on:
- how many posts from an IP in small time
- how many posts from IP with any matching file or text
- how many posts from any IP with any matching file or text
Also there is a limit on PPH before captcha is automatically enabled.
That is the extent of anti spam currently.

There are 2 other main issues because pages are generated as static files:
1. toggling captcha means all pages need to be regenerated to add captcha to the post field, else people cannot make posts. Currently I just delete all the pages and let them build-on-load, but that is susceptible to "stampeding".
2. new threads (because all lower threads will be pushed down) rebuilds all existing pages. I could easily change the strategy to just build the first X pages (since earlier pages are more popular) and leave the higher page numbers to build-on-load. I think that is a decent build strategy to reduce server choke for build-on-load for popular pages, and not generate too many pages every new thread. New posts (non-thread), deleting, sticky, bans, file removing, etc. are not an issue, they only build what is necessary.

>What about dead boards that are being archived, how to deal with that? Just allow people to IPFS seed/pin them until we unseed/unpin?
I haven't thought about archiving, dead boards, etc since there aren't even user owner board yet. And I acn't comment on IPFS as I have no clue how that works, now did I have any plans on implementing archiving in the first place.

>What about board name placement? Do we "lease" them out immediately or do we have a one week "grace" period to be fairer?
Im not sure if you mean for people who pick board names when a board is released, or about board claims. As for new boards, any name. Perhaps the typical pol, b, int, v, a, etc. could be assigned to somebody by the administration at first to prevent some random retard owning them. That's up to staff. When it comes to claims, again, there is no system because there are not user owned boards yet. I think 8chan just lets people email administration after the board owners last login is older than 2 weeks? Seems reasonable to me.

>What about the board owner, does his account get deleted, or just that he loose control over the board and can moderate other boards?
Accounts are currently separate from boards, i.e. you have one account that can be an owner or moderator of multiple boards when the user board creation is implemented. The users who are owner and volunteers are stored within the board document in the db, not the other way around. So deleting boards can be done without having to update a bunch of users in the db.

What about sudden deduction of threads in a boards, do we slide 2 threads for every thread created? or maybe even delete more threads?
When board settings are changed, e.g. reducing max number of threads, all deletion of threads that exceed the new limits is handled.

What about deduction coupled with sliding/flooding/spamming, how would that be dealt with? Can threads be un-slided back to catalog?
Once threads are deleted they are gone. There could be some solution by marking threads as "inactive" or something, but handling when to set that true/false could be tricky.
Replies: >>53
>>52
> I don't really like the idea of adding or deleting boards, limiting thread amounts, etc based on popularity to be a good feature built into the software. There are less active boards that people enjoy just as much. 8chan has many boards that dont get dleted ever afaik.
There are times when people makes a board for memes sake (note: NOT for posting memes) and the board stays dead for the rest of its life.
And also slower boards needs less thread/page space by nature, for them 5 pages or 50 threads are enough, but as boards gets larger it gets competitive.
Also 8chan itself awards the top 5 boards at one time with extra space for accommodation (at least somewhat) so we know that is possible.

> adding captchas and IP detection
The next move would be to have "ban words" and "filter words", or better yet Natural Language Processing methods

> I haven't thought about archiving, dead boards, etc since there aren't even user owner board yet. And I acn't comment on IPFS as I have no clue how that works, now did I have any plans on implementing archiving in the first place.
Keep an eye on archiving, everyone from /v/ to /pol/ to /tech/ use it on a day to day basis on 8chan.
Also IPFS should be easy to adopt (see https://github.com/smugdev/smugboard and https://github.com/yatima1460/Kamina)

> Im not sure if you mean for people who pick board names when a board is released, or about board claims
Claiming and post-deletion name allowance (assuming those are different things)
A rule of thumb is that any name with 3 characters or less gets prioritized as "premium" and the rest can be claimed after 1/2 week + a lucky draw process

> When board settings are changed, e.g. reducing max number of threads, all deletion of threads that exceed the new limits is handled.
I would want this to be softer due to my affinity to the zipf-lorenz-pareto concept that I thought up.

The rest that I didn't quote are the ones that I would agree with.
Replies: >>54 >>55 >>56
>>53 found another error: having parens in URLs would not make sense
Replies: >>55 >>56
frog.gif
(179.7KB, 226x224)
>>53 >>54 
board deletion/popularity
You are correct about some boards staying dead forever and a removal process is necessary, not just claims.
I wasn't aware that 8chan has higher limits for the top 5 boards, very interesting. Ideally there would be a global limit and the server+software should be able to handle that limit for all boards if they so choose. I don't want other boards to have a lower limit simply because they are less popular. Besides, it is configurable in the board settings, so if a less popular board wants their discussion to be more "condensed" (not sure of a better way to describe it) they can lower their own limit.

antispam
Yes, ban words/filters are on the todo list. Also I was thinking of rejecting posts if they have too high ratio of quotes or links versus amount of text (suggested after lynx spammed quotes on futatsu... lol).

archive
I know that there are alot of archiving efforts/websites out there. As for maintaining one myself, initially I would just not delete the HTML for old threads and let nginx keep serving them. IPFS can come later. I'm not saying no, just that its not ready yet.

board names
I agree on the board names. Short, premium, common board names should definitely be reserved and treated with care for claims process.

zipf-lorenz-pareto + post deletion
I need some more clarification on this. I was referring to when changing the max number of threads (in board settings) that threads may be deleted if the max number is reduced. I thought this zipf-lorenz-pareto is referring to multiple boards popularity and how to handle deleting less popular boards.
Still, I don't agree with deleting some less popular boards (in a user-created board scenario). This competition would be GOOD in a website without user created boards in my opinion, because there is generally a set number of boards, and they can be seasonally competing and the number of boards can remain around the same. But with user created boards, there will be far too many boards and inevitable some would be less popular and up for deletion

OK sleep time for me. Probably will be doing only minor contributions until next friday. A grandmaster quest on runescape is being released and i need to grind
Replies: >>58 >>63
>>54
oops forgot to address this properly.

>parens
not sure what you meant. parenthesis? If something is invalid characters for a URL I can add it to the regex later. 
nvm, just saw the "amina)" in the url in >>53
That is actually valid character for a URL so its not easy to fix.

ok this time im out for real
>>55
> Ideally there would be a global limit and the server+software should be able to handle that limit for all boards if they so choose.
> I don't want other boards to have a lower limit simply because they are less popular.
If I am allowed to be the boss of an imageboard, I would enforce the rule that the top 50 boards gets extra large limits,
(reasoning being that 4chan only has ~50 boards on its website, and we don't want extremely fragmented boards ala 8chan)
and that the rest only gets 50/75/100 threads per board as a limit (let's just call it the 50-50 rule)
The top board gets the largest amount, then the next get slightly less, then the 3rd~5th, 6th~10th, 11th~20th, 21~50th getting less and less.
Essentially you want the top boards "earning" more board space, and that niche boards should only use what they need.
This accomplishes two things:
1. Wastes less server space (as the parameters are tweaked to be more competitive, there will be less threads and boards)
2. Allows for fair competition (if board A splits into boards B and C, the board count of B + C should approximate A rather than 2*A)
Replies: >>63
>> 58
Okay, I just checked the actual numbers, 4chan has 63 boards.
So it can also work like this, thread = max(a * (b ** (-ceil(log2(ranking)))), 50)
Instead of the original proposal thread = a * (b ** (-log(ranking))
Replies: >>63 >>66
>>58 >>60
I understand what you're proposing, I just don't agree with the idea. Like you said
>If I am allowed to be the boss
The way I see it, if there is a higher thread limit for the top boards, it will feel restriction for less popular boards. I don't like that.

While the competition generated by ranking boards will promote more posts (something chans desperately need), it would just lead to lower quality posts since the rank has some basis in pph. If you browse a live-posting chan like https://meguca.org or https://chen2.org you will see what I mean.

It's also not really true that it will save server space.
>the top 50 boards gets extra large limits and that the rest only gets 50/75/100
There will be many less popular boards that will take up just as much space even with a lower thread limit than a top board with a higher limit. That is why in >>55 I said it is better suited to sites without user created boards (ala 4chan).
Replies: >>64
>>63
What about unique User ID per day with PPH as supplement? Would that solve thee issue of "drop of quality post" since it would rely more on actual user count?
But then we would need to deal with "one drop posting", in that case users with less posts should weight less than users with more posts.
Replies: >>108
>>60
incorrect formula, it should be 
thread = max(a * ((ceil(log2(ranking)))** b), 50)
OR
thread = a * (ranking ** b)
I already said I don't agree with the idea of changing thread/page limit based on a boards popularity. I think for board ranking (since i will add that), i will jut use the naive solution of sorting by [users, pph].
just one more reply to say: maybe you think im stubborn or rude, but an important part of open source stuff is to know when to say "no" to a feature. thats why im mqking the decision clear now. i would be open for this as an optional feature, however. i just wont add it right away.

once i get some free time next week ill rework the permissions system to allow for user board creation :)
Replies: >>84
6391026d2a03116c501021f9ff656673a932d5e5da0bd6b7177aa74c671373fe.gif
(464.3KB, 758x614)
>>78 i reworked the permission system a bit so there is different perms for site admin, global staff, board owner and board mods/volunteers.
>>64 It is now sorted by users, pph, total posts and limited to 20 boards on homepage. ill next add a separate paginated list of all boards incase there are more than 20.
does there allow nsfw stuff or non-english board?
Replies: >>117
>>116
Sure, why not. But this site is mostly for testing and demo of the board software.

Connecting...
Show Post Actions

Actions:

Staff Actions:

Captcha:

- rules - faq - source code -