Will AI Chatbots Boost Efforts to Make Scholarly Articles Free?

When it comes to getting access to the latest scholarly articles, there’s a stark digital divide. Students and professors affiliated with most colleges have unlimited access to large collections of scholarship such as JSTOR and HeinOnline, because their institutions subscribe to site licenses. To everyone else, though, those and many other scholarly publications are locked, or can only be read by paying hefty per-article fees.

Peter Baldwin, a professor of history at the University of California at Los Angeles, calls it a “grotesque disparity,” one that many professors don’t even realize. After all, they’re spoiled by their easy access to scholarship, and they forget that as soon as their students graduate and leave campus, “you’re sort of expelled from the digital paradise of the university world into that bleak, non-accessible world.”

There is a longstanding call to make scholarship free to all, known as the open access movement. Baldwin argues that this time when AI and ChatGPT are reshaping information could be a turning point that speeds up the move to open up scholarship.

Baldwin’s latest book, “Athena Unbound: Why and How Scholarly Knowledge Should Be Free for All,” looks at the history and future of the open access movement. And fittingly, his publisher made a version of the book available free online.

This professor is not arguing that all information should be free. He’s focused on freeing up scholarship made by those who have full-time jobs at colleges, and who are thus not expecting payment from their writing to make a living. In fact, he argues that the whole idea of academic research hinges on work being shared freely so that other scholars can build on someone else’s idea or see from another scholar’s work that they might be going down a dead-end path.

The typical open access model makes scholarly articles free to the public by charging authors a processing fee to have their work published in the journal. And in some cases that has caused new kinds of challenges, since those fees are often paid by college libraries, and not every scholar in every discipline has equal access to support.

The number of open access journals has grown over the years. But the majority of scholarly journals still follow the traditional subscription model, according to recent estimates.

EdSurge recently connected with Baldwin to talk about where he sees the movement going.

Listen to the episode on Apple Podcasts, Overcast, Spotify, Stitcher or wherever you get your podcasts, or use the player on this page. Or read a partial transcript below, lightly edited for clarity.

EdSurge: How would you describe the state of the open access publishing movement?

Peter Baldwin: It's clear that we are heading in the right direction, but we're also heading there at very different speeds depending on what kinds of content we're talking about. So for the sciences, like physics, mathematics, computer science, they basically function online. They basically [post and comment on free pre-prints]. They've sort of solved the problem effectively for themselves. That's not to say the journals don't still exist. Mathematics journals, for example, I was just told by a prominent mathematician the other day. He says, yeah, no, of course nobody reads the journals, but they're still there.

They're there because they basically are used to validate hiring decisions so that when, you know, a mathematical career is made by getting your article into whatever the most prestigious mathematics journals are, and that sort of validates your application on the job market, but nobody actually reads the printed version [because they saw the pre-print].

If the universities just decoupled their own promotion, tenure and hiring decisions from the prestige hierarchy of the journals, they could put the journals completely out of business insofar as they're signaling prestige.

So this is happening in some disciplines but not others. How does that change so that even the humanities are doing more open access?

One big thing that would move us in this direction would be reform of copyright law. I don't think that's about to happen anytime soon because the interests are so confused and mixed and conflicting that it would be almost impossible to put together sort of a coalition in favor of major copyright reform. But what would be needed is a reduction of the term [that a work is covered by copyright], at least for scientific research and its output.

Right now, copyright law has been extended so far. In the beginning — in the late 18th and early 19th centuries when copyright laws were first written — the term was like 14 years, and then sometimes you could renew it. So after 14 years, bang, it went into the public domain. Now it's life of the author plus 70 years. So, easily well over a century. And that's what makes it something to fight about. And that's why the publishers won't give it up because they have this sort of boondoggle that allows them to have property rights in intellectual property effectively much more than we have property rights in our houses or anything else that we own. It's practically eternal possessive rights that they have.

The reality, of course, is that the vast bulk of all books are totally commercially worthless six months after publication, and yet they remain locked up by copyright law for a century. It just makes no sense. It would be much better to say, let's give them two or three years of commercial value. Two or three years later, most books are not being bought anymore. And the few ones that are being bought, of course, they should stay in copyright and let the publishers and the authors make money off of them. That's fine. But the vast bulk of it is simply no longer commercially valuable in any form. And that should be made free. There's actually no reason not to set it free and allow people to read it at no expense.

How would we do that? Have a system where if a book doesn’t make X amount of money after two years, then it goes into the public domain?

Something like that. Then let's say it suddenly started getting downloaded like mad, it went viral, then it should be the right of the publisher and the author to pull it back out of the public domain and to issue a new edition or whatever. I mean, I'm all for letting people who have something that's commercially valuable to make money off of it. I just think that the stuff that sits there locked up and unusable should be freed because it's good to have it freed. And there's no downside to this because nobody's losing anything. Nobody's losing readership or income or royalties or anything like that.

Right now there’s lots of talk about ChatGPT and other AI systems. How do you see that impacting this movement for open access scholarship?

I have two points that I want to make about ChatGPT. The first is that American copyright law apparently doesn't allow you to copyright anything that's not written by a human. If that's true, and that means that nothing that ChatGPT churns out is actually copyrightable, then this may just blow the bottom out of the copyright system. Because if 80 percent of our content is not copyrightable anymore, what's the point of copywriting? Then the little bits that are copyrighted, people will just ignore it because ChatGPT can do a better job anyway or certainly do an equally good job of circumventing the copyright issue. So it may be that it totally shakes up the whole copyright system.

The second point is that ChatGPT as I understand it at the moment scrapes and feeds off of the crappy end of the web. It's whatever it can get into — it doesn't feed off the good stuff in the web. I don't think it's able to get past the paywalls and into the scholarly databases and into the journals, as far as I know. So insofar as that's true, then all we're getting is a garbage-in, garbage-out product from ChatGPT, and insofar as we want ChatGPT to actually be of use to us and help us, we desperately need it to be allowed access to [scholarship].

Therefore, in a sense, open access is the key to making ChatGPT work. Because good ChatGPT should be based on the stuff that right now the paywalls keep us out of.

What's the point of having an incredibly powerful tool that is fed only garbage when you could have an incredibly powerful tool that really knows the information that's out there? Presumably anybody interested in ChatGPT will also be an open access advocate because they will want ChatGPT to feed off the good parts of the web as well.

It seems like people will want to create custom products that feed AI tools like ChatGPT, so that maybe each discipline will have its own research chatbot or something?

Yeah, Wikipedia, for example is toying with the idea of doing a chat wiki that basically feeds only off of Wikipedia, where at least the information has gone through a vetted process and is not just bilge.

I have to ask about piracy, because there are still large collections that offer free versions of scholarly articles in violation of copyright. How is this impacting attempts at legal open access efforts?

Pirates are the open access movement's best friend, but of course we can't say that in polite company. We have to register a sort of harrumph of disapproval even while saying that they certainly keep the publisher's feet to the fire.

You could look back 20 years ago to the sort of cowboy days of the web. Back then we had sites like Megaupload and Pirate Bay and places that took commercial content — basically pop music and popular films — [and offered illegal copies for download]. That was all clamped down on with international regulation and countries working together. Basically they were shut down and what do we have now? We have Spotify and Apple Music and Netflix. It's obviously not open access, but it is a reasonably open form of access at a reasonable price. To pay 13 bucks a month for Amazon Prime, you get I think something like 15,000 movies and TV shows, you know, as a lending library, that's not a bad model. And clearly most members of the public have decided that they're willing to pay a reasonable price for reasonable access to a ton of good stuff.

So in the academic world, for scholarly knowledge, there are these sites where people go. In some cases they're there because the Russians fund them in order to allow them to sort of stick their nose up the publishing industry of the west, just sort of to be annoying. In other cases they're funded by contributions and voluntary donations and that sort of thing. They're there because the publishing industry has simply been unable to get its act together and deliver content at a reasonable price.