Open and public archive of political social media talk

maiki · November 13, 2016, 10:30pm

@tim and I were discussing how it would be useful to follow a political figure’s social media, without actually connecting to them on said network. It is important to know what they say, without signaling agreement or having to use a proprietary system.

In that vain we are brainstorming methods to extract Twitter feeds into a usable format.

Here is an initial scan of possible tools to extract user feeds:

Archive My Tweets - Archive your tweets to easily browse and search them - all on your own website and in your control.
Corebird - Easy access to the Twitter REST API, Collections API, Streaming API, TON (Object Nest) API and Twitter Ads API — all from one PHP library.
Ozh’ Tweet Archiver - Import and archive your tweets with WordPress
Phirehose - PHP interface to Twitter Streaming API
Python Twitter - A Python wrapper around the Twitter API.
Tweepy - An easy-to-use Python library for accessing the Twitter API.
tweet_dumper.py - A script to download all of a user’s tweets into a csv

Ideally we could use the streaming API for users to collect the messages and save them for publishing in a different system like WordPress, where we can leverage search and other tools for discovery.

The hosting for WordPress doesn’t really fit with the needs of a streaming capture service, and if we use WordPress to poll we run the risk of hitting a rate limit fairly quickly.

Initial thoughts on pipeline for this is to set up something to save the user stream into some storage, then periodically run a process that turns that data into a feed that we can slurp up from WordPress.

tim · November 14, 2016, 6:13pm

Adding to this (because I’ve used this several times before):

Twitter API 1.1 PHP wrapper - simple PHP wrapper around the Twitter API

I strongly agree. I could build everything we need for the twitter API connection + fetching + storage in a couple hours. Then it’s just a matter of what format we want the output of that to be.

I’m thinking it’ll be a really simple service that will look into stored queries like:

Type: user, val: @timotheus, Fetch Interval: 30 min
Type: tag, val: #bluebeanie, Fetch Interval: 60 min

See if any are due to be fetched, and if so, fetch em! Do pagination to grab all results of each type back to the last ID we have recorded for that type (to get all tweets since last fetch), then store them.

Then have an API interface where you pass GET parameters to get the stored tweet data out.

tim · November 14, 2016, 6:38pm

A simple cURL command I just wrote to get the final url after redirects. It works!

<?php

$d['initial_url'] = 'https://t.co/NOKpf0iHpR';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $d['initial_url']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_URL, $d['initial_url']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_exec($ch);
$d['final_url'] = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
print_r($d);

This outputs:

Array
(
    [initial_url] => https://t.co/NOKpf0iHpR
    [final_url] => https://tim.hithlonde.com/2016/announcing-js-space/
)

Topic		Replies	Views
Data engine: tell me when something interesting has happened Webcraft mediawiki , databases , knowledge , engineering	1	265	June 26, 2021
Pushing RSS to pump.io (how do I do this?) Webcraft pump.io	5	1077	September 24, 2016
Import all my blog past, or a clean slate, or a blend? Webcraft	11	277	April 22, 2019
FeedWordPress Webcraft wordpress-plugins , feedwordpress	0	153	October 20, 2013
What are the WordPress customizations for WikiTribune? Webcraft	5	1079	July 7, 2018

Open and public archive of political social media talk

Related Topics