A strategy of the aggregate (or, oppression via data)

Tags: #<Tag:0x00007f7368c33128>

Continuing the discussion from overestimating the world:

This is the vectoralist class. (also spelled vectorialist?) Monopolists who operate by control of information. HR departments have always been a part of this, asymmetrically knowing more about your coworkers salaries and the market than you do when you apply for a job. (To every manager or HR dept. who tells me not to share my salary with coworkers, :fu:t4::slightly_frowning_face::fu:t4:.) Vectorialists are the investors, the owners who control and instruct data engines that create “reporting for execs” through hadoop pipelines, the companies that harvest or package or sell advertising audiences or hiring insights. Competitive edges as a service, one that’s often privately and quietly sold by “more benign” companies to legitimately horrible organizations**.

Companies that may or may not replace their privilege-biased human recruiters with privileged-bias statistical algorithms, the countless orgs who trade your SSN correlated with your logged behavioral activity*. They trade in information flows and the market for this is vast. The LinkedIns and Salesforces of the world can monetize these flows; the Mercers and Bannons and state-sponsored trolls of the world can weaponize them. A lot of other folks can just build shitty trading and sales strategies using manipulated noise.

From this thread:

To me, eventually this all begs the following question: What does a healthier data ecosystem look like?
The technology to find buried correlations, approximate probability distributions, adapt to changing coefficients, it’s all out there and so the power of data isn’t going away. So how can you use information processing “for good”? Do you need to avoid asymmetry of information, or be better at assessing concentration of data, or fight the info monopoly of others?

  • *(i’m just salty because despite a former employer being a health tracker w/ quit-smoking type behavioral change programs that is owned by a major health insurance company, them selling this sort of data to other companies including their sibling corps is somehow completely legal under HIPAA?)
    **(currently i got a digimon in this fight too. sigh).
  • if you’re interested in further readings on vectoralists in the anthropocene, McKenzie Wark is a good place to start. I’d love to see a refresher on Moral Mazes for the modern digital economy.
  • I haven’t read Shoshana Zuboff’s “Surveillance Capitalism” yet but I hear it’s a great dive into the data practices of today’s Big Tech.

Is that true though? It is often true that one cannot put the genie back into the bottle so to speak. However In terms of manipulating present day humans in realtime; that data in some ways needs updating to maintain it’s accuracy. It seems dismantling surveillance networks into a new system that would not allow for data collusion could rapidly erode these silos of data over time. Techniques for mining data aggregated data will always exist; but the collection of it requires a priveleged position and a collusion of players that doesn’t have to exist.

Is “avoid asymmetry of information” meant to imply the arming of both sides here or the disarmament of the monopolists?


Those are great metaquestions. Agree 100% that dismantling surveillance networks is hard but letting maintenance of data systems fall to the wayside is easy; it’s the techniques that I’m mostly referring to. All the anarcho-primitivists in the world (read: on #cavetwitter) aren’t going to be able to do away with the field of inferential statistics.

Really a philosophical/open-ended question/pseudo-game-theory problem here. What does “good” mean? To many, it means power in the hands of those that deserve it, have committed to using it for good etc. To others it means a drive to keep shifting and dismantling power. Avoiding asymmetry could mean any of that, or it could mean “radical transparency.”*

*and maybe not just a couple startups publishing their growth numbers, although I’m intrigued by the ones that post salary data.

1 Like

I believe in privacy as a human right, so some far end versions of radical transparency honestly feel a little scifi dystopian to me. I also think right now we live in a time where a lot of marginalized groups still require privacy to ensure survival.

My immediate take is that is a move towards a mutual assured information destruction model. Which sounds more fair in principle; but not really a good end goal either.


Sorry, this got really long, and the back half is about 2020 national electoral politics, so feel free to ignore as much as pleases y’all

This might sound naive, but I think we can to some extent walk away from things as they are now. I know it’s not that simple, and collectively walking away probably entails a more actually democratic social organization than we have now so we can make group decisions and stick to them and not suffer under power disparities that can disregard the decisions.

The “domination and authority” chapter in Anarchy Works has some examples of people walking away from society entirely. Some of these examples are people fleeing for the hills to set up a more distributed, harder to control society or societies. I guess part of the problem with powerful data gathering systems is that they can index and control you whether you want them to or not, but not actively participating maybe limits what they can do to you?

These might be really basic incarnations of what you’re thinking about, but recently I’ve been thinking about two cases: 1) worker pay opacity, especially for freelancers and 2) Bernie Sanders’s organizing app for his supporters.

Number 1 is a problem even for traditional employees, but at least if you work in an office and you are willing to broach the taboo, you can turn to the person next to you and say, “I make $20K a year. How about you?” I think it’s harder when you’re freelance (a writer for example): you’re very isolated, and you have no idea if there’s even a standard pay scale half the time when you’re being offered flat fees for stories. One of the really promising possibilities of emerging freelancers’ unions is aggregating pay information from members that are accessible to all. That’s the first step for agitating for better pay, I think.

In 1), what are the risks of aggregating this information ourselves? It’s information that the “vectorialists” already have, that we suffer for not having. This seems solely beneficial to me, but maybe I’m not thinking of something. In any case, if we want to move through/past capitalism, radical transparency about how it affects those of us at the end of the chain seems really vital.

Number 2 is more of a case study I want to think through.

I might not have all my facts straight, so please take this provisionally and correct me if you know better. Bernie’s new app allows you to punch in people you know and contacts you make. They verify the contact’s identity (using generally but not universally available voter data, it may even be from Democratic Party databases but I’m not positive about this). You can register some data about your contacts, like their intended vote.

The app announcement got a huge amount of blowback from (what I would generalize as) centrist or center-left voters, who claimed it was a privacy invasion and a harassment tool in the making.

All the information in the app’s database is from voter information readily available to political campaigns; AFAIK Democrats and Republicans have built their own aggregated databases of voters, positions, contact instances, etc. Anyone who’s actually done this, please correct me, but I’m pretty sure that when campaigns have you canvass, they are updating the campaign’s database based on the canvassing contacts you make while pounding pavement.

There was an accusation on Twitter that I’m unable to verify that for a few hours, you could use browser dev tools to see the voter ID number for anybody you entered as a contact. Is this an important breach? What is it possible to do with somebody else’s voter ID number? I’m not really sure.

So here’s what I’m thinking about when it comes to this: is this kind of widely (but not universally!) available voter information bad or good? The major parties have very intensely built databases of this stuff; on the Republican side, this is what Cambridge Analytica and other efforts were supposed to benefit. On the Democratic side, Obama’s 2008 app and his 2012 app both used similar features, the 2012 app linked with Facebook and had field canvassing features.

Campaign organizers would probably tell you collecting and storing information like this is vital to be able to track who is an important voter, to be able to follow up with them, encourage them and make sure they are able to get out to vote, and to know how they’re doing when it comes to disseminating the campaign’s message.

Maybe it’s good that this information isn’t universally available. I’m pretty sure but not 100 percent positive that it includes voter registration addresses, which is obviously easily abused if it were in a public database that anybody could look up (the Sanders app just tells you if someone is registered, not their address or party affiliation).

But the two major parties take basic state voter roll data and build their own much more detailed databases on top of that; this data isn’t available to non-major-party candidates or even major party candidates that the party establishments don’t like or have decided to ignore. The Sanders campaign, for example, had a fight with the Democratic Party over equitable voter data sharing in 2016.

The data aggregation efforts of the major parties probably don’t even approach, say, health insurance companies. (Although it’s entirely possible health insurance companies could start selling their wearable device data to political actors?)

So is this level of voter information aggregation good or bad? Is it bad that the two wealthy major parties have intensely developed databases of this information? Would it be improved if it were more publicly available? My understanding is that it’s essentially publicly available to state governments, the federal government, and campaign staff above a certain level, so it’s essentially public but not universally distributed.

1 Like

BTW, just because I’ve become increasingly aware of these things over the last year or so, this article on racist libertarian Charles Murray (of The Bell Curve fame and famous racial intelligence proponent, friend to eugenicists) talks about how Murray and the US, and its contractors, used data gathering to help prosecute the war in Vietnam. Murray was hired supposedly as a social scientist working under contract for USAID (ostensibly an agency of the US State Department but a partner in US military/imperial efforts), integral to the war effort in Thailand.

Robert McNamara, UC Berkeley and Harvard grad, is maybe the most famous proponent of using data and modeling for war purposes, most especially in Vietnam; if you haven’t seen the Errol Morris documentary Fog of War this comes up again and again. In WW2 he was using data for bomber efficiency analysis. He was a business school graduate and went from running Ford Motor Company to bombing Vietnam.

This data/computer driven approach to the Vietnam War also shows up in at least one Adam Curtis documentary, but since I mainlined so many of them on one sleepless night I can’t immediately recall which one…

Yasha Levine’s Surveillance Valley talks about how this tendency goes back to the very formation of the internet; the blog on his site for the book has lots of interviews with him where he talks about his research.

I mention the above because I was surprised at how early in the history of computing state power was putting data collection and analysis to use for murderous purposes. Probably shouldn’t have been surprised!


NGP VAN has massive data stored from voter records, it’s continually updated, and really the primary thing people do during canvassing is data collection. The privacy questions are interesting because it seems hella sketchy for anyone to be able to vote registration status or add information to NGP VAN, but all somebody needs to do is volunteer on a campaign to see tons of that info, which includes addresses as well as phone, email, sometimes voting records and/or predicted vote, contact history, and any other info people can get. I think volunteers will only see info for people they are assigned to, but campaign staff can see a lot. I never really questioned where that information was from, whether it includes all voter registration records in the jurisdiction or just people whose data the candidate was able to mine somehow, but there’s a lot even for poor candidates.

But increasingly this sort of data is far less relevant than social media data and access to people through screens rather than phone or door, which is very time and labor intensive and has very low rate of success, so I’m not really sure this is the type of data to worry about anymore.

1 Like

I didn’t know about McNamara in the Vietnam war but my fun-at-parties anecdote is not about how IBM’s first ever customer (and probably the first widescale commercial deployment of an automated information system) back in the late 1800s was the US Census, but how their second was the Third Reich to whom they delivered information processing systems to index concentration camp prisoners by ability among other metrics such as extermination method as well as absolutely bonkers horrible processing and inference on geneology to catalogue “aryan ancestry”. IBM’s systems upped their record processing capabilities from 60k manually to millions by machine.

1 Like