Guest logins
(1) By Richard Hipp (drh) on 2020-04-01 12:32:31 [link] [source]
Over on the SQLite Forum there is a big fuss being raised about anonymous posts and how they should be disallowed. I myself do not understand what all the fuss is about nor why some folks are so deeply offending by other people posting anonymously. Nevertheless, I have spent some time thinking about how we might add a "guest login" feature to the Fossil forum.
The idea is that instead of allowing anonymous posts, the forum allows users to login as a guest. They are given a login name like "guest-1", "guest-2", "guest-3", and so forth. Each guest has a unique name. The guest is assigned a password that is displayed to them with an admonition to write it down as there is no opportunity to recover it later, and a persistent (no expiration date) cookie is set on their browser, thus allowing the guest to edit their own posts or to post follow-ups. But no other information (such as an email address) is collected.
Random notes on this idea:
The "guest-" prefix should be configurable using a setting.
Guests are numbered sequentially. Doing this efficiently provides a use-case for the NATSORT collating sequence that I have been experimenting with lately.
Guest logins get stored in the USER table just like any other login. But because guests do not have an associated email address, they cannot become subscribers and cannot receive email notification of changes.
Guest accounts should always require moderation. It should not be possible to set the "4" capability on guest logins.
Self-registered accounts should be prohibited from picking a login name that begins with the guest-name prefix (which defaults to "guest-").
Sequentially numbering guest accounts means that there must be a single central naming authority for the forum. If you do set up multiple Fossil servers running the same forum, then to avoid name conflicts you need to ensure that each one uses a different guest login prefix. But then all the servers need to know each other's naming conventions in order to avoid self-registered accounts from using a guest prefix from a different server (as described in the previous bullet).
Further to the previous bullet, there are already other problems associated with allowing self-registration on a forum that runs on two or more servers. The two servers do not share the same USER table, and hence it is possible for a self-registrant on one server to pick a login that is already being used on a different server.
The only benefit I see to this is that it allows a deliberately unidentified poster to edit their own posts and to post follow-ups. It seems like a lot of work and extra complication for only marginal benefit.
(2.1) By Stephan Beal (stephan) on 2020-04-01 12:58:18 edited from 2.0 in reply to 1 [link] [source]
The only benefit I see to this is that it allows a deliberately unidentified poster to edit their own posts and to post follow-ups. It seems like a lot of work and extra complication for only marginal benefit.
Seeing as we have yet to hear any technical arguments for outright disallowing anonymous posts, only expressions of personal preferences, it would seem a shame to either disable anonymous posting or invest much effort on an alternative, given the caveats which would come along with it.
In my very humble opinion, given the lack of technical arguments, the ones moderating the anonymous posts are the only ones who should have any say-so in the matter (edit: or, obviously, the project lead), as they're the only ones who are "effectively impacted" (for lack of a better phrase) by it.
Apropos caveats: 0.02€
(4) By Richard Hipp (drh) on 2020-04-01 13:20:10 in reply to 2.1 [link] [source]
Technical arguments in favor of guest logins:
Anonymous users can edit their own posts for typos or clarification
Anonymous users can post follow-ups to their own posts and we have reasonable assurance that the follow-up came from the same individual.
The anonymous user has to pass a captcha prior to their initial post. We haven't had a problem with robot posts yet, but robots seem to get smarter every day.
Technical arguments against guest logins:
- There is an extra row in the USER table for each guest.
(8) By ravbc on 2020-04-01 13:59:37 in reply to 4 [link] [source]
- Anonymous users can edit their own posts for typos or clarification
Wouldn't it be possible to set a cookie to remember the user on this one device and allow him editing its entries (those made from this device)?
- Anonymous users can post follow-ups to their own posts and we have reasonable assurance that the follow-up came from the same individual.
We could track users through the cookie described above and at least partly allow them to post follow-ups (until the cookie exists). Then probably also posts from the same user could be marked without "polluting" USER table (eg. by using "one-time" identifiers only for the purpose of displaying one forum thread).
(12) By Richard Hipp (drh) on 2020-04-01 14:14:33 in reply to 8 [link] [source]
Wouldn't it be possible to set a cookie to remember the [anonymous] user
Yes. But that would require a separate table in the database to remember the cookies for all anonymous posts. There would still be one row per anonymous poster, so it does not mitigate the disadvantage of just having guest logins.
posts from the same user could be marked
Yes, tracking cookies would allow us to show "anonymous (same as ...)" for some identifying mark "..." on follow-up posts, so that readers would know that the same anonymous individual was replying. However, that would require an enhancement to the forum post record format to add the additional notation that the anonymous poster was the same one who created some prior post. It would not be a simple as including the cookie value as part of the username (the "U" card) because that would expose the cookie value to anybody who clones the repository, and thus make it trivial to forge a reply.
(19) By ravbc on 2020-04-01 14:55:29 in reply to 12 [link] [source]
Wouldn't it be possible to set a cookie to remember the [anonymous] user
But that would require a separate table in the database to remember the cookies for all anonymous posts.
Isn't there any possibility to not register the cookie value? Couldn't it be used as some form of authorization as an author of earlier posts, without storing it locally in plain-text?
There would still be one row per anonymous poster, so it does not mitigate the disadvantage of just having guest logins.
But it's easier to use ;-)
It would not be a simple as including the cookie value as part of the username (the "U" card) because that would expose the cookie value to anybody who clones the repository, and thus make it trivial to forge a reply.
We're talking about anonymous posting. It is trivial to forge it already. Do we really want to totally remove any form of anonymous posting?
(33) By Dingyuan Wang (gumblex) on 2020-04-03 16:53:43 in reply to 12 [link] [source]
You can use the cookie to store the guest id and sign the cookie using some server-side secret, which eliminates the need for remembering each cookie.
(34) By anonymous on 2020-04-03 17:12:36 in reply to 33 [link] [source]
Sort of like a JWT Javascript web token?
(3) By Warren Young (wyoung) on 2020-04-01 13:06:28 in reply to 1 [link] [source]
I myself do not understand what all the fuss is about
I suspect it would take a psychologist or political scientist to root out the deep motivations here. Until then, we will have to wait for them to express their own motivations.
For myself, I believe the option of anonymous communication is foundational to a free society. (And what is a "forum" but a self-selecting society?) We can talk about ways to make pseudonymity easier, but we should not remove the anonymity option.
They are given a login name like "guest-1", "guest-2", "guest-3", and so forth
Can we summarize the basic mechanism of feature as "automatically generate a unique user name and password"? If so, it seems like a lot of work on someone's part (likely yours, drh) to save some people a tiny bit of work each. One then has to work out how many people are likely to use the feature to evaluate which direction the imbalance of costs swings.
Perhaps consider this posting a design doc for someone else to implement on a branch for evaluation, someone with the motivation to have the feature.
Doing this efficiently provides a use-case for the NATSORT collating sequence
I suspect you could get it to be just as efficient without by sorting on a new ctime
field instead, by analogy with stat(2)
. Something like
SELECT substr(?pfxlen?, login) from user
WHERE login LIKE '?prefix?%'
ORDER BY ctime DESC
LIMIT 1
That should get you the last-used ID.
However, neither solution is necessary if you accept my next point.
Sequentially numbering guest accounts
Why not take the same solution Fossil does to all such problems: use an anonymous hash, perhaps reusing the code behind $nonce
?
it is possible for a self-registrant on one server to pick a login that is already being used on a different server.
Not if the hash is strong and you prevent use of guest-
prefixes on manual registration, per an earlier bullet.
it allows a deliberately unidentified poster to edit their own posts and to post follow-ups
It's a substantial benefit. The question is of its magnitude.
(5) By Richard Hipp (drh) on 2020-04-01 13:27:48 in reply to 3 [link] [source]
Can we summarize the basic mechanism of feature as "automatically generate a unique user name and password"?
I think so, yes.
I wonder: Should we change the existing self-registration system to automatically
generate the password, rather than requiring perspective users to think up
their own and enter it twice? A reasonable password can be generated using:
SELECT lower(hex(randomblob(8)));
Why not ... use an anonymous hash [to name guest logins].
I thought about that. But hash names are longer and harder to read than sequentially numbered names. "guest-42" is easier to see and remember than "guest-a06ab54fccaa60dab1edbb6ed2ed85c9".
(6) By Stephan Beal (stephan) on 2020-04-01 13:34:58 in reply to 5 [link] [source]
What if the guest suffix was the time, without punctuation, milliseconds precision. Certainly that would provide enough collision guarantees for this purpose?
guest-20200401153217123
It's arguably slightly more legible than a hash, but it also unfortunately tells us something arguably private about the user (the precise time they signed up).
(9) By Richard Hipp (drh) on 2020-04-01 14:01:32 in reply to 6 [link] [source]
What if the guest suffix was the time
I gave that approach some thought too. And I looked at variations such as "guest-$DAY-$HASH" where $DAY is the number of days since some epoch (say 2020-01-01) and $HASH is a much shorter hash since we would only need to avoid collisions for guests that register the same day. But, in the end, nothing was quite as convenient (for the reader) as having sequentially numbers guest names. If we do guest logins (and that is a big "if"), it seems like we could probably work around the name collision problem.
(32) By anonymous on 2020-04-01 21:54:16 in reply to 9 [link] [source]
What if we separate the notion of POSTER from the notion of USER?
This way all anon posters would be under the same username 'anonymous', but with different postername. The postername is picked/ assigned on first use. The postername hash then becomes part of the post content and is used to display the post's attribution. If more than one one such postername already exists it's postfixed with some part of initial hash (that probably included date). Thus this can be synced.
The generated poster's id/hash is also sent back in cookie on first use. Thus unlocking own post edits. No need for password... but once cookie is gone, so is the access to that poster's id and associated posts.
Registered users follow the curent route, and have no poster id/name.
If for some reason POSTER table would be empty, then old anon posts would still show tge recorded attribution, yet the current anon users won't have access to respective posts for edits and would need to re-ID.
There're still incentives to register, all anon posts are moderated.
(7) By MG (mgr) on 2020-04-01 13:53:08 in reply to 5 [link] [source]
If the goal is to be able to follow one specific "anonymous" during at least one thread (i did not understand or follow all the arguments over there):
- what about just having a cookie (random hash) generated after the first click on [remain anonymous], maybe per thread or "overall" and then display (parts of) that hash near "anonymous" on each post
- or - completely tech-free - anyone wanting to be a "specific anonymous" in one thread just signs each post with the same "name" ...
(11) By sean (jungleboogie) on 2020-04-01 14:09:39 in reply to 7 [link] [source]
or - completely tech-free - anyone wanting to be a "specific anonymous" in one thread just signs each post with the same "name" ...
Interesting...if the poster stars with the 'remain anonymous', then maybe the text field can be filled in with a simple message stating/recommending to sign their name or use some kind of unique identifier.
That wouldn't prevent mgr from signing my messages, though, if we're both 'anonymous'.
(13) By Warren Young (wyoung) on 2020-04-01 14:16:40 in reply to 5 [link] [source]
automatically generate the password, rather than requiring perspective users to think up their own and enter it twice?
I'd want the option to override the default, just on general principles.
In fact, my latest app accepting a password does so using a mechanism that means even the server doesn't get a copy of the secret, thus has nothing it can disclose. (It's some deeply cool math!)
hash names are longer and harder to read
Sure, but we don't need the full hash. 8 hex digits is enough for a billion unique guests with less than a 25% chance of collision.
Anyway, if a person wishes to be easily identified, there's always the Bill the Badger
route.
(16) By ravbc on 2020-04-01 14:43:43 in reply to 13 [link] [source]
hash names are longer and harder to read
Sure, but we don't need the full hash.
I would say we don't have to show the hash at all. IMHO it'd be enough to distinguish anonymous users only within a single page / thread (assigning them a "random, but readable" name just for this one thread). This of course would not work in mail notifications, but that's IMHO irrelevant.
(10) By sean (jungleboogie) on 2020-04-01 14:03:53 in reply to 3 [link] [source]
I suspect it would take a psychologist or political scientist to root out the deep motivations here.
I think the motivation is to tell different anon posters apart. Is that inherently wrong or bad?
Another anonymous technology is irc. If I join irc without a username, one is uniquely automatically assigned to me, so other can recognize I'm different than another guest/anon account.
(Yes, there might be a way to configure irc differently and allow the same username to be used across multiple different machines and people, but I'm talking about the default, well-known config).
As for the technical implementation, it does seem complicated. We'd probably also want to continue moderating these guest accounts, since they're not real accounts.
(14) By Warren Young (wyoung) on 2020-04-01 14:24:32 in reply to 10 [link] [source]
I suspect it would take a psychologist or political scientist to root out the deep motivations here.
I think the motivation is to tell different anon posters apart.
That is a motivation, if that's as far as it goes, but I suspect there are some advocates of this that have another reason behind that one. In other words, "I want to tell anon posters apart," leads to, "Okay, but why? What would you do if you could uniquely identify the anonymous poster?" And that leads to potentially dark actions. Thus the occasional need — yes, need — for the option to be anonymous.
If you want a specific example, someone working with SQLite might find a security hole and be unwilling to identify themselves for fear that identification would get back to their employer. Email is traceable unless you go to uncommon lengths, so that user might feel that an anonymous posting on the forum might be a better option.
(It's an entirely separate question whether drh thinks that is the best option for reporting a security flaw. As in this whole discussion, we cannot only consider one person's motivations and desires.)
Is that inherently wrong or bad?
Ask any whistleblower or political dissident.
(15) By sean (jungleboogie) on 2020-04-01 14:40:11 in reply to 14 [link] [source]
That is a motivation, if that's as far as it goes, but I suspect there are some advocates of this that have another reason behind that one.
Conjecture. It's probably easy enough to ask the why, but it sounds like you may not trust the reason/motivation provided. Therefore, whatever is explained may not be sufficient enough for your change of mind. That's understandable and I'm not faulting you for that.
(It's an entirely separate question whether drh thinks that is the best option for reporting a security flaw. As in this whole discussion, we cannot only consider one person's motivations and desires.)
There hasn't seem to be a shortage of security flaws reported in sqlite3 when the mailing list was active, but you're right, maybe anonymous posters will feel freer to come forward and express their grievances/security flaws they've found as an anonymous poster.
I'm not in favor of this one way or another and look forward to what's decided on.
(22) By Warren Young (wyoung) on 2020-04-01 15:52:36 in reply to 15 [link] [source]
I suspect there are some advocates of this that have another reason behind that one.
Conjecture.
It is indeed conjecture to ascribe motives to specific people who have not expressed them. I did not intend to do that, though I see that by speaking collectively of the advocates in the other thread, I could be read as doing that.
I'm making a broader point, beyond this specific discussion: it is a simple distillation of history that there are people who wish to pierce veils of anonymity.
I do not wish to make such people's task easier against those who truly wish to remain anonymous. We keep truly problematic posts out with the moderation layer, so for any such post that hits the blockchain, let it remain anonymous.
All of this inherently verges on a free speech debate, which is always political. To keep this on topic, then, given that code is law in the software world, what laws will Fossil encode?
I'd prefer that Fossil offer a toolkit approach, where different users can make it enforce different "laws" to meet local tastes.
(27.2) By ddevienne on 2020-04-01 18:03:05 edited from 27.1 in reply to 22 [link] [source]
All of this inherently verges on a free speech debate
Uh? Free speech? Really? I know I shouldn't argue with you Warren,
you'll out-write me, in much better style that I can ever hope to achieve,
but I have to reply to such a claim that disallowing anonymous posting is
somehow violating someone's free speech, when they can say
whatever they want, under whatever handle, throw away or not.
And the arguments and motivations have already been made in the SQLite
thread. As Sean said, the fact you don't agree or recognize them is a
different issue. Don't start looking for dark motives, or any other
such nonsense please...
I'd say the moderator have much more free speech violating powers
than I do requesting being able to associate posts to people, whoever
they are.
(28) By Warren Young (wyoung) on 2020-04-01 18:43:19 in reply to 27.2 [link] [source]
All of this inherently verges on a free speech debate
Uh? Free speech? Really?
Yes, really.
they can say whatever they want, under whatever handle, throw away or not.
If we could bring back Thomas Paine, would he agree with you?
Don't start looking for dark motives, or any other such nonsense please...
I don't have to go looking for them. It was last month's news.
(29) By sean (jungleboogie) on 2020-04-01 19:04:53 in reply to 28 [link] [source]
Hi Warren,
This brings me to another Fossil feature we can discuss - thread locking.
It's very common for threads to get long winded and deviate so much from the original intent of opening post that it's nothing but non-sense, because this is the internet after all.
Does the group think thread locking is something worthwhile to have 'in the back pocket' as a way to moderate things? The Fossil forum isn't reddit, so we don't have millions and millions of people making vulgar or rude comments that demand thread locking immediately, but it might be something to plan ahead about.
If a moderator of their of forum locks a thread, is that restricting free speech to that person, even if the comments are not vulgar or offensive?
(30) By Warren Young (wyoung) on 2020-04-01 19:21:51 in reply to 29 [link] [source]
another Fossil feature we can discuss - thread locking.
Your prior attempt to raise this subject didn't go anywhere, but if someone wants to talk about it, the other thread would be a better place for it.
Does the group think thread locking is something worthwhile to have 'in the back pocket' as a way to moderate things?
Only if it's subthread locking, else we couldn't continue to discuss guest logins on this thread.
is that restricting free speech
One of the reasons I held off getting political until pressed above is that there's an unfortunate conflation of "free speech" as in the US First Amendment vs the supposed right to say whatever you want online, on any forum. The first exists, the second does not.
This is a privately-operated forum. drh's company runs it on equipment his company pays for using company time. To the extent that he works on it on personal time was well, that is also his private resource to do with as he wishes. He is not required to carry any speech on his forum that he doesn't want. In that sense, there is no "Free Speech" here on the forum.
My point about anonymity and its connection to the right to free speech in the context of this thread is that if a given message gets past the moderators, and it was posted anonymously, that is currently allowed and protected here, and I am advocating that it stay that way.
There have been at least two highly valuable anonymous contributions to these forums just in the past day. It is quite possible that neither would have been posted if the user had been required to identify themselves, even if pseudonymously.
Anonymous posting is a feature, not a bug.
(31) By sean (jungleboogie) on 2020-04-01 19:53:15 in reply to 30 [link] [source]
One of the reasons I held off getting political until pressed above is that there's an unfortunate conflation of "free speech" as in the US First Amendment vs the supposed right to say whatever you want online, on any forum. The first exists, the second does not.
No conflation on my part. Seemed like you were going the way of saying free speech applies to the Fossil forums with your posts to the eff.
My point about anonymity and its connection to the right to free speech in the context of this thread is that if a given message gets past the moderators, and it was posted anonymously, that is currently allowed and protected here, and I am advocating that it stay that way.
I agree with this in that I wouldn't want drh, or any other competent developer, need to burn cycles on something with a fairly low payoff.
(17) By Joel Dueck (joeld) on 2020-04-01 14:43:46 in reply to 1 [link] [source]
Rather than using extending the login/logout paradigm to cover “anonymous” usage, what if you added only a password field to the forum posting form for anonymous users. The user could enter a password of their own choosing (possibly within minimum-complexity restraints) at the time of composing their message. Obviously only a secure hash of the password would be stored. This hash would serve to identify the author and possibly allow edits. (Another possibility is simply disallowing edits for anonymous users.)
This provides a way to distinguish anonymous authors with a fair amount of reliability. But you also eliminate anonymous users from taking up space in the USER table, you eliminate issues around the guest-
prefix, and you eliminate the multiple-server issue.
(18) By anonymous on 2020-04-01 14:50:32 in reply to 17 [link] [source]
(21) By Warren Young (wyoung) on 2020-04-01 15:20:08 in reply to 18 [link] [source]
Tripcodes are a fairly ugly, on their face.
However! What if we were use the first 24 bits of random data as a key into a few tables of random human names? Say, the first 9 bits into a table of 512 uncommon first names, the next 5 into an initial, and the final 10 into a table of 1024 uncommon surnames?
Thus, it could generate "Divilbus Q. Snerdly" and such, which is both pseudonymous and easier for humans to deal with than guest-ABC123
.
The second step has overflow if we use the English alphabet, but we can soak up the 6 extra cases thus:
- 0: Put initial first instead of in the middle
- 1: Use a middle name instead of an initial, using 9 extra bits from the initial random blob
- 2: Hyphenate the surname by taking 10 extra bits from the blob
- 3: Add "Jr."
- 4: Add Roman numeral Ⅱ
- 5: Add Roman numeral Ⅲ
Thus we could also get "Connor X. McNair-Renfield Ⅲ". Delicious!
The main difficulty I see is in finding a free data source we can use for the tables. Everything I've been able to find is locked up in a DB behind a web app. The public data is all geared towards most-popular names.
(23) By anonymous on 2020-04-01 16:05:40 in reply to 21 [link] [source]
it could generate "Divilbus Q. Snerdly" <...> [or] "Connor X. McNair-Renfield Ⅲ"
Sounds nice! And much less likely to backfire than other approaches.
The main difficulty I see is in finding a free data source we can use for the tables.
What is the license on US census data? There's a file with names occurring ≥ 100 times and a JSON API. (Careful, 28K of JSON!)
(24) By Offray (offray) on 2020-04-01 16:22:49 in reply to 21 [link] [source]
Seems like a good approach to keep pseudonymous presence in the Forum.
We could add color and diversity to the less common names by providing an international list instead just an English one... I wonder which combinations could come from that :-).
(25) By Joel Dueck (joeld) on 2020-04-01 16:22:52 in reply to 21 [link] [source]
Tripcodes are a fairly ugly, on their face.
Fossil uses ugly hash prefixes basically everywhere. I think we could handle it ;-)
But if you wanted it to be more human-friendly, rather than consult a giant table it could do something like what pwgen -0A
does: deterministically generate a pronounceable string of letters from a seed (the password or a hash of the password).
(26) By Warren Young (wyoung) on 2020-04-01 17:14:18 in reply to 21 [source]
"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious." — Frederick P. Brooks, Jr. (what an unlikely name!)
In that spirit, I offer this refinement on the prior idea:
union guest_name {
u64 random_blob;
struct elements {
int first:9;
int m_name:1; /* 0: use middle.initial; 1: middle.name */
union middle {
int name:9;
struct {
int initial:5; /* 0-25: A-Z; 26-31: no initial */
int extra:4; /* not used directly; part of middle.name */
};
};
int swap_first:1; /* middle becomes first and vice versa */
int m_surname:10; /* guest's male-line surname */
int hyphenate:1; /* append following with hyphen */
int f_surname:10; /* guest's female-line surname */
int gen_suffix:5; /* see enum gen_suffix */
int unused:18; /* currently unassigned */
};
};
enum gen_suffix {
jr,
sr,
second,
third,
fourth
}
Most of the special cases in my bullet list are orthogonal, so I've just collapsed them into the "no initial" case above, splitting the special cases out into separate data bits. Without this change, you cannot have things like an initial and a hyphenated name and a generational suffix.
The size of gen_suffix
is not a mistake. It is purposely biased to make such suffixes rare; it should be adjusted to approximate the frequency in actual use. Only cases 0-4 get a suffix; 5-31 get none.
(20) By anonymous on 2020-04-01 14:55:44 in reply to 1 [link] [source]
StackOverflow and its siblings has a similar guest concept.
One willing to contribute still has to go through basic registration, specifying an email and picking a moniker, however without a hard requirement to create an account and password, which as such would go the over-email confirmation route.
These half-reg users functionally are full users, can edit their posts, and also collect points to eventually allow voting etc.
My understanding is that this is handled with cookies, also a user record is being created serverside. The cookie can be lost, taking away the reputation points. There's some route to reclaim it (some hoops to jump). Thus the incentive to register fully.
In Fossil's forum the incentive seems to be able to edit own posts, well, get alerts too.
Having a user moniker does help keep a conversation clearer.