Wiki/Report of Meeting 2024-08-01
Report of Meeting 2024-07-25
Present: Ed Gottsman, Raul Miller, Devon McCormick and Bob Therriault
Full transcripts of this meeting are now available below on this wiki page.
1) Bob relayed the email he received from Chris that supported the use of the forums to amplify talk page discussions and posting of Recent Changes to the wiki as long as they curated to avoid noisy minor changes that would not be of interest to the readers.
2) Bob had also asked about getting the access logs to the wiki and Chris felt that it would not be accurate due to issues with robot crawlers and inaccuracy of results generated in this way. Ed agreed that this is an issue but would be interested in looking at the logs to see if this issue could be overcome. Raul pointed out that visitors are different than hits. Hits being each time the page is visited and visitors being only counted once no matter how many times they visit the same page. Ed felt that browsers would be included on the log entry, whereas legitimate crawlers will not be associated with a browser. Bob pointed out that this is an issue in general with the web with agents hiding their real origins to crawl websites.
3) Bob mentioned that he will be travelling to Italy in the fall for 6 weeks and so J wiki meetings will go on hiatus during that time although he will be checking email.
4) Ed wondered if the wiki is the only thing that should be promoted on the forums. Examples of J used in codegolf https://code.golf and sites like rosettacode https://rosettacode.org/wiki/Category:J_User. Bob wonders whether that it would be up to the code creator to post to the forum. Ed wonders whether there might be a curator that tracks those sites for opportunities to see how J is used.
For access to previous meeting reports https://code.jsoftware.com/wiki/Wiki_Development If you would like to participate in the development of the J wiki please contact us on the J forum and we will get you an invitation to the next J wiki meeting held on Thursdays at 23:00 (UTC)
Transcript
And in our last meeting,
Ed and I were talking about a number of things
and I got a response from Chris, which was great.
Turned it around nice and quick for me.
I broke it into three things.
I said, the first thing,
in order to increase the profile of talk pages
in the Wiki, I was wondering if it would be reasonable
to encourage people to post their synopsis
of the suggestion on the JForum
with a link to the talk page for more details.
Talk pages are very low profile
and the fact that there's little response
may be discouraging to people
who would otherwise be contributors.
And Chris replied to that one, good, go ahead.
So that sounds like something I will be doing
and letting people know that's an approach.
For the second one, was, would it be reasonable
to post the most recent changes
of the last week on the forum?
We've been, you've been great about allowing me
to post the Wiki meeting reports,
but I wonder if a list of recent changes
might draw more attention to the Wiki,
maybe once a week on Tuesdays for the meeting reports
and on Fridays for the most recent changes.
His reply to that was agreed,
but it needs to be filtered manually
to include only non-trivial changes
of interest to the forum.
And I think that's a good point.
You get a lot of little, you know,
ceremony changes, adjustments and stuff
that probably don't need to be repeated,
but the ones that are probably more significant,
I think would be a good idea.
Any responses to those first two
before I go to the third one?
- Sounds right.
- Automatically?
How much work is it to filter out the ditzy stuff?
That's a technical term, ditzy.
- Honestly, I'd copy and paste,
and I just pull stuff off.
And I think I don't think that's too much to do.
I mean, once a week I can do that.
And I kind of agree with him.
If we had every little adjustment to typos or stuff,
people wouldn't read it.
- Yeah, that's probably true.
- Kind of what I was thinking.
So I think that's what I will end up doing.
So I'll sort of start working on it.
- That sounds great.
- Yeah, no, it's good.
The third thing though,
was talking about scraping the Wiki
to find out most visited pages.
And so I said, we were wondering if there was a way
to get the data of which page visits,
of which page gets visits on a weekly basis.
This does not seem to be available through MediaWiki,
but would be very useful to ensure
that work on the Wiki was directed
as efficiently as possible.
It might also indicate areas
where valuable information is going unnoticed.
And his reply was,
this has been discussed in the past on other Wikis,
but there are problems in getting valid results.
So Wikis typically do not do this.
He said, for one thing,
most page visits come from search engine bots,
and for another pages might be cached.
So our page hits may not be correct.
Also, it's easy to fake results.
So I guess his feeling is it could be done,
but it's probably not as valid as we might like it to be.
Oh, Devin's on.
Thoughts on that, Ed?
- I guess I would, I accept absolutely everything
that he says in that regard,
but I would maybe wonder a little bit.
The thing about robot hits is that they're gonna be
the same number for all the pages.
So in a given week-
- Or at least across weeks, they might,
they might, from week to week,
you know, a bot might not scan the whole site in one week.
- Oh, that's fair.
So I guess, I think if we had the data,
we could push on that a little bit.
So for example, if you say we're gonna look at the top 10,
that should abstract out the robots,
which presumably are hitting lots and lots of pages
a couple of times.
- And then also, there's, if,
another thing I'm thinking about, you know,
weblog analysis, there's hits and then there's visitors.
You can do different ways of looking at stuff,
and that kind of, you know,
it can help make some informed decisions
or help gain insight into what's,
different kinds of relevance, I guess,
is maybe the best way of saying it.
- How do you distinguish between hits and visitors?
What's the difference for you?
- A visitor is like a user ID.
So if, you know, if a person is editing a page
or refreshing it or, you know,
going back and forth in history and the,
and the top page and back to page
and doing that sort of stuff,
that's still only one visitor for each of those URLs,
because even though they might hit the page a number of times
- Right.
I see.
So every instance of being on the page is a hit
and multiple times during a day would be one visit.
- Right.
- Yeah.
- But we don't necessarily, okay.
That works when the person is logged in.
- Right.
And that's, and like I said,
that would be a different perspective,
because, you know, visitors only make sense
for people that have Wiki IDs,
unless we get into handing out cookies
and analyzing that stuff.
And the beginner population is a different,
is a different, you know,
they don't necessarily have logins.
So it's two different perspectives.
And it might be worth looking at both.
But you know, if we do number,
if we execute the second item,
which is simply showing edits,
you know, here are the pages that have been edited,
we'll catch visitors in the sense that you mean,
or at least many of them.
So anybody whose visitation consisted of editing a page,
we'll catch that in the page edit report.
- Right.
I guess there's four levels of abstraction here.
There's a sheer, there's a hit,
which is like a line in a weblog.
There's the sort of visitor that gets used in ad delivery,
which is, they got,
this is a repeat instance of this UID cookie,
you know, so that dedupes at a certain level.
And then there's the, you know,
hits with an actual Wiki ID.
And then there's edits,
which is a smaller fraction of those.
- I haven't looked at a weblog in years,
but not a weblog, a log in years.
But I know that when you're doing curl
or any rest request,
you include,
and you're not trying to pull a fast one on anybody.
You include your browser type version.
And my question is, do the legitimate crawlers,
like, and I think of myself as an illegitimate crawler
because I lie and say that I am a browser
as a way of indicating, as a way of evading any,
might otherwise be blocks.
- There are browser agent, user agents.
- Right.
But is that what Google does
or does Google play it straight?
- I think back,
that's been years since I've done this myself,
but way back when Google definitely had Googlebot
or something like that in the user agent.
- Yeah.
I mean, it seems to me they could get into a lot of trouble
for not doing that.
- Something on their scale, yeah.
- Yeah, exactly.
- But that doesn't mean that they don't have,
you know, individuals within them,
do harvesting stuff and shove you into their,
into their.
- Yeah, yeah.
- The other thing is in these recent days of LLMs,
there's been a lot of people talking about the bots
and the fact that there's right now,
what would you call it?
A cold war between the bots and the sites
and they'll ban certain bots.
And then some bots now they're seeing
are changing their names to get around that.
- Oh yeah, oh yeah.
And there are whole crawling services
that have, that will pretend,
that have agents for want of a better word
all across the globe and they hit
and they pretend to be Firefox.
And you just can't tell, you can't tell.
- And back when I was working on that stuff,
I remember some of my diagnostic tools were banned
and so I couldn't get them to work
without spoofing user agent.
- Right.
All right, so maybe this needs some more thought.
Yeah.
From his response, he didn't say no.
- No.
- So it's more just, as you say, some more thought
and at some point it might be worth pulling a log
just to see what it looks like.
- Well, that's what I'm thinking.
- Yeah.
- When I said, when I said I'd like to see the log
so we could push on it a little bit.
- Yeah, yeah, and just to get a sense of what's on there
and whether it is, we can essentially clean the data
'cause that's really what we're trying to do at that point.
- Right, and it wouldn't need to be perfect, you know?
I mean, if we do report a page as popular one week
that actually was only accidentally hit
by 11 different search engine crawlers,
that's not a tragedy.
- Yeah.
Hello, Devin.
- Hi.
- Hey, Devin.
- Hi.
- Been, we've been finishing up this room here.
So all my computers, my computer got moved in
and I had to go find my camera.
It was in a bag somewhere.
(laughing)
- Well, welcome.
- So what we've been talking about was
we'd sent out requests to Chris last week,
whether, first thing was whether we could use the forum
to post talk items that might come up on the wiki.
So a person posting on the talk page
and then would be allowed to then redirect to the forum
and send a link back to the talk page for more information
to raise the profile.
Chris thought that was a good idea.
Second thing we asked about was whether we should put
recent changes up once a week.
I said, that was probably a good idea,
except we should filter it out for significant changes,
which I think is actually a really good idea.
And then the third thing is what we were just talking about.
We were thinking about whether crawling
or getting the logs of the wiki,
finding out which pages were getting the most action
would be useful.
And he said in the past in wikis,
they'd found that to be problematic
because of crawlers and bots,
because spoofing the site.
Yeah, yeah.
And now we're thinking about whether
we should look at the logs and then make adjustments
and see if we can actually extract
some useful information out of that.
Yeah, my feeling is that if the crawlers are being honest,
I'm a crawler,
then we should be able to filter out a lot of that stuff.
But that's an empirical test we'd have to make.
Well, and I think the more likely way
that they won't have an effect would be statistical.
So even though Ra was saying they might only
crawl part of the wiki over time,
I wouldn't think you'd see them crawling the same part,
or if they did, that might be significant.
But you wouldn't base all your action on one week.
You'd do it, you'd look for-
Well, no, you do wanna report the week's transactions.
You're not looking at some sort of
weighted moving average across weeks.
I mean, you need numbers for the week of July 3rd,
for example.
And prior week's numbers aren't gonna have any impact
on those, I wouldn't think.
But I think if you saw something change on the site
with one week, that's probably not gonna be actionable.
But if you saw it consistently over a couple of weeks,
you might do something about it.
And you said change, what did you mean by that?
Well, something, suddenly there's a big burst
in activity on that page.
People are looking at that page like crazy.
Yeah, if you have a particular page
that has very high numbers compared to all the other pages,
that is probably significant.
It's probably also gonna be rare.
Yeah.
It would be interesting to see the difference.
Again, get the information.
It might be interesting to take a look at.
That's what I keep coming back to.
Yeah, no, you're right.
Yeah, yeah, yeah, yeah.
We're circling around there.
The other topic that I had put on was the fall,
plans for the fall.
And my plans for the fall is I'm gonna be spending
six weeks in Italy.
Great.
Yeah, no, it should be fun.
Never been to Italy before.
That's great.
Are you renting a place?
Traveling around.
So, number of places.
Wow, six weeks in hotels.
Bed and breakfasts, really.
Yeah.
What dates?
I think we're flying out September the 23rd.
And then we're gonna be back on November the 4th.
Wow, well, it sounds like me.
My wife is going to a conference in Brussels in September.
And we're planning an apartment swap
with someone in Madrid for the month of October.
Oh, yeah, you were saying, yeah.
So, since she's gonna be in Brussels anyway,
I'll probably just go join her there.
Probably around the last week of September.
And then we'll proceed to Madrid.
I think we've got a week in Tuscany,
but other than that, we're into Milano
and then over to Vinicius,
back down to Rome, Naples, the south.
No, that's the end, the rope.
We end up going over to Monaco.
That's one trip we do to go to see Monaco.
And then fly, at the end, we fly out of Rome
through Germany and back to Canada.
We were actually talking about that last week in September,
after the conference is over, we might go to Venice.
'Cause I wanna see it before it sinks under the waves.
Yeah.
Yeah, you know, it's become very expensive.
I believe they're now charging five euro per visitor
in an effort to keep the riffraff out.
Just something to be aware of.
That's okay.
I always just pretend euros or dollars, they're the same.
It's pretty safe, yeah.
I think we've actually got a bed and breakfast
for a couple of nights on the island, so.
Oh, wow, great.
I guess we pay it once and we don't leave.
(laughing)
And I think we pick up the car in Bologna.
So after that, we're touring around,
but before that, it's trains and stuff.
That's great.
But in terms of this waking,
it means for about six weeks there,
I'm not gonna be doing these.
Nothing's happening, yeah.
Yeah.
And leading up to it, I've got a lot of work to do
to try and edit some array casts in advance.
So I'm not going on a hiatus with that.
So going into September,
I think I'm gonna be probably stepping back for this
until November and then picking it up again.
That doesn't mean anything stops.
It just means that I won't be doing the Zoom calls
and stuff like that.
Right.
But I'll still be checking email from time to time.
So, and if anybody else wants to pick up and do Zoom calls,
you're welcome to it.
I don't feel like I'm the only one that can do that.
You know, I feel like
pushing on this idea of using the mailing list,
which pretty much everybody in the J community
is hooked into.
And actually that's a tautology.
The J community is the mailing list.
The ones we're aware of, yeah.
Yeah.
As a way of increasing awareness of community activity,
of J activity, is something interesting to push on.
And I just wonder what else.
So for example, the various CodeGolf sites,
I can't think of any offhand right now,
but they're certainly out there.
When a new J solution gets posted somewhere,
it would be nifty if that automatically popped up
in the mailing list, just for example.
And I don't know what the other opportunities are,
if there are any,
but somehow improving awareness of activity
seems like a good thing to push on.
Yeah.
Yeah.
Rosetta Code.
Rosetta Code, yeah.
Yeah, Ren-Ralston, a ton of stuff with Rosetta Code.
And I think-
Not recently.
Well, no, but in the past,
I know you've posted things that you've put up there.
Yeah.
I think the CodeGolf and Stack Exchange,
I think is probably the big one.
Is that code.golf?
No, it's, yeah, CodeGolf.stackexchange.com.
Okay, yeah.
Yeah, we were talking to Jay,
and I'm trying to think of what his last name was.
He's one of the moderators on it.
And yeah, it was a Stack Exchange one.
So we had a detailed discussion before the episode
about being a moderator on Stack Exchange.
So political.
But yeah, no, that's a good idea.
I mean, the obvious way is to get people
who have posted solutions to then post them to the forum.
And you can welcome that.
Yeah, it's work.
Yeah, it's work.
But it's work for awareness, right?
You get to show what you've done.
Yeah.
Are you just trying to think of something
where you'd either, somebody else would crawl it
and post Jay solutions?
Yeah.
For example, I mean, something either semi-automatic
or something that a single central person
could take responsibility for
and take care of on a weekly basis,
sort of like you editing the Deltas page
off the Wiki and posting it to the forum.
I don't, I don't know.
I don't know.
Reaching-
The LLMs are gonna make it all go away anyway.
The whole idea of being a programmer
is sort of an awkward transitional phase
in the history of computer science, I think.
Yeah, that being a prompter.
A prompter, that's right.
God, God, I'm a teleprompter, yeah.
I think it's more like,
won't make, more like engravers haven't gone away.
They're just not more of a niche thing nowadays.
Yeah, and maybe the people who, go ahead, sorry.
There's still fundamental reasons to think
that a large language model is not a coding model.
They are different things.
They're different things in our brains.
They use different parts of the brain.
Yeah, all of that is true.
But a lot of the web design stuff,
which is where a lot of the popularity went,
is probably gonna go the LLM route
because it's, there wasn't really much of a business model,
you know, long-term need
that they were addressing in the first place.
So it was more of a fad than a,
Well, and there's a much larger corpus
for people who are working in JavaScript and HTML
and stuff of existing code around, I mean, the web.
Yeah, that's true.
The LLM's-
You wanna have a laugh, ask it to code something in J.
It's just pathetic.
Well, that's what I was gonna say is that that's,
the niche languages maybe take a bit longer,
but I don't know that they're immune
because I think in some senses,
I think there's a real opportunity there,
but boy, you'd have to do some training
to get it to actually be intelligent.
Yeah, but what's the point?
I mean, you don't care what language
it ultimately implements in.
If your interface to the exercise is a prompt of some sort,
you don't care if it's coding J or JavaScript.
Do you not care?
Because if they don't care,
then that would mean that LLM to assembly language
or machine language would be the route to go.
I think that's probably accurate.
Yeah, I think if for high speed,
if you could get an LLM to machine code-
But the thing that happens with machine code
is you're locked to a specific hardware platform
where it says a lot of effort for portability
for higher level languages.
You're contradicting yourself, Rolo.
I mean, what that says is-
- Well, no, it was a question.
Is that the right view of how things are gonna go?
And I was thinking that there's a lot of...
One of the things that matters a lot
for any programming environment
is not just machine speed,
but also what that environment gets you access to,
what the interfaces are.
Like if you have a coding language
that's only used for a certain part of a construction
of a car, then that's an environmental niche,
in essence, for that platform.
- All right, terrific.
See you all next week.
- Good to see you guys.
- Yeah, good to see you.
Okay, bye-bye.
- Bye.