I just finished my MA last year at Umass Boston. My first year, I had a normal Teaching Assistantship. It was fine. But the next year, I had to hunt down an alternate source of funding. I ended up getting an Assistantship through the library, working at the Reference desk. It was, I hate to admit it, a FAR more educational experience than keeping attendance and marking papers. I was, despite my lack of an MLS, basically working as a part-time reference librarian… although there was almost always a real reference librarian available on-call if I got in over my head. Nevertheless, I got very acquainted with databases and systems I never would have otherwise– I now can find chemical abstracts over the Internet. Why I would ever want to again, I can’t tell you, but I can do it. I also know how to deal with business databases, and other systems I’ll never use again.
But I liked it. Actually, I even toyed with the idea of getting an MLS. But there was something about the culture of libraries that I never got comfortable with. One librarian, in the midst of a long conversation one slow Friday afternoon, helped me put my finger on it, very precisely.
I forget how we got on the topic, but we started talking about the role of libraries– what all should be kept, and what should be thrown out as detritus. Coming from an American Studies department that valued Cultural Studies and Social History, I was adamant that too much was being lost to the selections of archivists and librarians. Too many voices are lost to the authority of the archives. Who can judge what is going to be important 15, 20, or 100 years from now? As I saw it then, the goal of libraries and archives, ideally at least, should be one of collection– collecting for both depth and breadth. Collecting indiscriminately. I was tired of finding so many topics I was interested in working on were things that no one had cared enough to keep and preserve. It was then the goal of librarians, and especially reference librarians, to make these huge quantities of information navigable for patrons.
In contrast to my collection/preservation model, the librarian I was speaking to offered a completely different model– one that is taken from communications theorists of a generation ago, and which they borrowed by means of metaphor from electrical engineering. He talked to me about signal-to-noise ratio. He told me that the goal of a librarian was to be a custodian of information, constantly overhauling the collection in order to increase the amount of signal (usable information) and to try to eliminate noise (detritus, misinformation, things that lack scholarly value.)
I was aghast, and wondered how anyone would ever presume to know so much that they could understand exactly what would be of value to future scholarship, and what wouldn’t. Any Historian who’s spent weeks trying to find information about someone who seemingly only exists in a single fascinating document will understand where I was coming from. Today’s trash can be tomorrow’s treasure.
However, after reading this article, I’m starting to wonder if I wasn’t being a bit naive– or at least thinking in impossibly idealized terms. Frankly, I’m starting to wonder if there’s not just too much information, and if that "custodian" model isn’t as outdated as I thought. Looking at the sheer volume of information being produced in a single year, it seems an impossible task to keep it all. (Even when you discard the somewhat-misleading on data that is non-recorded, such as telephone conversations, in the report, the number is staggering.)
I mean sure, we can spider and scrape systems, we can come up with clever ideas to get people to provide tags for free, but ultimately, the methods we have of automating such things are always going to provide as many blind spots as moments of insight… or at least I’m afraid that’s the case.
I think H-Bot, for example, is a brilliant idea, but it sort of points toward the stupidity of computer intelligence… Here’s a little trick: ask H-Bot "When was Teddy Roosevelt president?" And then ask it "When was Theodore Roosevelt president?" You’ll find that Teddy was president in 1908, and Theodore was president in 1903. Both answers are correct, in a fashion, but don’t give the whole story. It’s all because of the very method of data-collection that H-Bot uses.*
Don’t get me wrong, I know the thing’s just beta testing, and it’s not complete, and honestly, I still think it’s a cool project, and I’ve been playing with the thing since I found it last year. But any mechanized data-collection, data-pooling, or data-mining software will always have these sorts of problems– they’re simple little machines, and cannot comprehend complexity. Maybe, at least for the time being, we should keep on encouraging the librarians to throw some stuff out.
*– What this result does point to that’s very interesting is the different use of language at different points in that President’s career. More web sites describe him as Theodore when focusing on the events of 1903 than any other year, whereas 1908 is the big year for Teddy. It would be interesting to look at other variations on different presidential nicknames, and see what kinds of correlations you could find– do people describe them by their nicknames during good times, showing familiarity and comfort, or during bad times, showing derision and lack of respect? It could be a fun thing to look in to…. (Although checking into that a bit, I discovered that H-Bot can’t find James Carter or William Jefferson Clinton, so the question might not be as easy to answer yet…)