I wrote a better guide here: https://whyhahm.blogspot.com/p/the-better-howto-on-everything.html.
i've been doing sns translations and subbing as a way to learn korean, and in order to do so, i've written a few tools to help me out. i thought i might share them, in case anyone's interested ^^
the guide is mainly focused on doing the sns translations, but this is kind of a dump of everything i've gathered for kpop-related things, so it has other things too ^^ plus if you want the rss stuff can be used for just collecting posts on your computer without using it for translating haha
important note about livestreams
this system allows you to basically keep track of most sns services people may use, in most ways that they might use them, automatically. it also allows you to hook into it, and write your own software using what's already downloaded.
however, if what you're interested in is only recording livestreams, i would strongly recommend to consider using another system than this.
although this will allow you to record livestreams, it's quite difficult to setup, and can require a decent amount of maintenance as well. unless you know what you're doing, setting this up will likely be more of a weekend project (or longer) than an afternoon one.
here are some other projects i have found that will allow you to record livestreams. i have not personally used them, but there are people i know that do, and say they work.
- https://github.com/taengstagram/instagram-livestream-downloader
- https://github.com/notcammy/PyInstaLive -- supports downloading from multiple instagram accounts (via the
-dfoption)
let me reiterate this to be absolutely clear: if you only want to record instagram livestreams, (and the rest of the functionality this offers seems useless to you) use another tool. it will save you a lot of time and frustration haha
however, if you are interested in the extra functionality offered by this system, read on ahead ^^
overview
the tools:
- webrssview - browser-based rss reader
- rssit - sns to rss
- dlscripts - downloads sns posts and livestreams
- khelpers - creates sns posts and uploads livestreams
the idea for the sns translations is this:
every sns account for groups is added via rssit to webrssview in this way:
- group's korean name [folder]
- 前 [folder]
- ex-member's korean name (nickname)
- member's korean name (nickname)
for example (screenshot):
- 밤비노
- 前
- 성하담
- 고명선 (민희)
- 박은솔
- 변다희
- 서아
since webrssview is an rss reader, it'll refresh the feeds in the interval you specified, making a request to rssit for each feed.
normally the folder/file structure doesn't matter, but in this case it does, due to the khelpers scripts reading webrssview's database.
rssit is then configured to use dlscript's download.py whenever the "social" hook is run. that way, all the photos/videos/(optionally) livestreams are downloaded.
to do the actual sns posts, just run sns.js, followed by the group name, and then optionally the start/end timestamps. it'll open up a file with reddit-like markdown in your favourite text editor, and now all you have to do is to translate it all ^^ once you're done, just run it again to mirror the images/videos to imgur/streamable, and then use the reddit/twitter/blogger options to upload it to your desired location, and you're done!
actually doing it
before i start, you're expected to be familiar with using development tools on your system. it isn't really user-friendly because i wrote it mainly for myself, but it does what it's supposed to do with minimal fuss haha.
if you're using windows, you'll need to use something like cygwin since it was designed for linux. while it is possible to make it work under cygwin (it's been done before), it is rather tricky (remember that cygwin programs use a different filesystem root). redis and mongo can be installed normally (as they are interacted with by network ports, not files), the windows version of node.js seems to work as well, but the rest (python, ffmpeg, and depending on how you install it, youtube-dl) will need to be either installed or compiled manually within cygwin.
first, clone all the repos, then get the prerequisites. make sure you have python 3+pip, node.js+npm, redis, mongo, ffmpeg, and youtube-dl installed and setup properly before.
cd webrssview
npm install
cd ../khelpers
npm install
cd ../rssit
pip3 install -r requirements.txt
# install redis for python as well. optional here, but not optional for dlscripts
# for example: sudo pip3 install redis
cd ../dlscripts
pip3 install -r requirements.txt
to setup rssit, create ~/.config/rssit/config.ini:
[default]
social_hooks = python3 /path/to/dlscripts/download.py
[instagram]
httpheader_Cookie = # instagram cookie header, you can find this via inspecting the network tab in chrome/firefox
no_livedl = true # set to false if you want to download livestreams
[instagram/home]
max_graphql_count = 12 # i wouldn't recommend changing this
# it's the max amount of items a single graphql call will yield
# different calls have different thresholds, this one seems to have 12 as the max
count = 50 # could be smaller too, minimum amount of items to return
# optional, if you want twitter support. fill in the keys with the app you made (apps.twitter.com)
[twitter]
consumer_key =
consumer_secret =
access_token =
access_secret =
count = 200 # minimum amount of items to return (i think 200 is the max twitter allows)
no_videodl = false # whether videos should be sent to download.py or not
# was running out of disk space when i implemented this haha
# if you want to translate, you'll probably want this set to false
# (i.e. download videos)
with_retweets = false # while rssit works with it being set to true,
# i've never tested translating with it being on
# probably going to be buggy
# optional, if you want weibo support
[weibo]
httpheader_Cookie = # weibo cookie header, same procedure as for instagram
with_reshares = false # haven't tested this being on with translations either
to setup dlscripts, create /path/to/dlscripts/tokens.json (don't add lines with # in them):
{
# this is where all the photos/videos/livestreams are stored
"prefix": "~/Pictures/social/",
# if you're on an ntfs/fat filesystem (e.g. if you're using windows), set this to true
# otherwise, you don't have to include this line
"windows": true,
# amount of processes that can be run at once (to not overload your io)
"thresh_processes": 10,
"thresh_sleep_times": 600,
# if livestream parts are to be deleted
"no_live_cleanup": false,
}
and finally for khelpers, create /path/to/khelpers/feeds.toml:
[general]
ignoregroups = [] # groups to ignore (in hangul)
nogroups = [] # folders that aren't groups (e.g. "other")
ignorefolders = [] # ignore everything within these folders
ignore_ex = [] # groups to ignore every ex member of
all_sns_group = [] # groups to post every sns status to twitter to
snssavedir = "~/Documents/autotrans" # path to save the sns markdown files to
editor = ["gedit"] # text editor
dldir = "~/Pictures/social" # should be the same as "prefix" in dlscripts/tokens.json
encrypt_key = "..." # some random text here, doesn't matter, as long as you don't change it
imgur_id = "..." # imgur client id
imgur_secret = "..." # imgur client secret
imgur_user = "..." # imgur username
imgur_pass = "..." # imgur password, encrypted via: node sns.js encrypt password123
# optional
dailymotion_id = "..." # dailymotion client id
dailymotion_secret = "..." # dailymotion client secret
dailymotion_user = "..." # dailymotion username
dailymotion_pass = "..." # encrypted dailymotion password
dailymotion_channel = "people" # category
# optional
twitter_username = "..."
twitter_key = "..." # twitter client key
twitter_secret = "..." # twitter client secret
twitter_access = "..." # twitter access key
twitter_access_secret = "..." # twitter access secret
# optional
reddit_client_id = "..."
reddit_client_secret = "..."
reddit_username = "..."
reddit_password = "..." # encrypted reddit password
# optional
blogger_blogid = "..." # blog id
[instagram]
path = ["sns", "instagram"] # path to the instagram folder in webrssview
categories = false # if true, then groups are within one extra folder below path
ignore_sns = [] # list of usernames to ignore
[twitter]
path = ["sns", "twitter"] # same as above for twitter
categories = false
ignore_sns = []
[weibo]
path = ["sns", "weibo"]
categories = false
ignore_sns = []
[reddit]
#"group name in hangul" = "subreddit"
# e.g.
"밤비노" = "bambino"
["group name in hangul"]
#key_to_override = overridden_value
# e.g. if you have different twitter accounts for different groups
twitter_username = "new twitter username"
# etc.
# parse_feeds.js will try to romanize it automatically for you
# but it will fail at times, especially for group names
# e.g. 소녀시대 = Sonyeoshidae
# this category is for overriding that engine
[roman]
#"name in hangul" = "romanized"
# e.g.
"밤비노" = "Bambino"
# some people/groups have multiple romanizations:
"소녀시대" = ["SNSD", "Girls Generation", "Girls' Generation"]
"써니" = "Sunny"
# etc.
# there is another caveat of the romanization engine:
# it thinks the first character is always the last name.
# works for most cases, but quite obviously not for groups.
# in this case, we need to disable the "nickname creation" process:
[nicks]
"밤비노" = false
# for other groups, if you used an abbreviated version, you can specify the nickname directly here:
"브아걸" = "브라운아이드걸스"
# if the person's name only has 2 characters, the engine will assume it's a nickname
# if their name is indeed only 2 characters long (as is the case with some people),
# you'll want to specify their name here:
"김율" = "율"
# now with some names, different people will have different romanizations
# for example, girls' day/aoa's minah/mina
# in this case, we'll use their account names directly:
["instagram/@kvwowv"]
"민아" = "Mina"
["instagram/@bbang_93"]
"민아" = "Minah"
# you can also specify names/nicknames for usernames directly here as well:
["instagram/@areum0ju"]
names = ["한아름", "이아름"]
nicks = ["아름"]
# you can do the same with twitter/weibo as well (e.g. "twitter/@...", "weibo/@..."),
# but i haven't tested it yet
alright, we're done setting things up, now time to start ^^
cd rssit
python3 rssit.py
# wait until it's initialized (Trying localhost:8123), then in a separate console
cd webrssview
node webrssview
go to http://localhost:8765, right click on "root", and a new folder called "sns".
once it's made, create another beneath that one, called "instagram".
if you want to use another folder structure, just remember to update path in feeds.toml accordingly.
right click on "instagram", and set "thread" to "instagram". it doesn't matter what the thread name is, as long as it's unique. threads just determine which operations can run in parallel, so if every feed had a different thread, every feed could be reloaded at the same time. it's very important that only one instagram feed gets reloaded at a time, otherwise you'll get rate-limited.
next, create a folder for the group you wish to add in hangul, below the last folder, the folder structure should now look something like this:
root
- sns
- instagram
- 밤비노
for each of the members' instagrams, go to http://localhost:8123, and paste them in. for example, pasting https://www.instagram.com/heeyong0104/ should return http://localhost:8123/f/instagram/u/heeyong0104 .
add a new feed with the given url, and use the following format for names:
native name/korean name/other name/etc. (native nick/korean nick/other nick/etc.)
since the hangul engine automatically generates nicknames, if their nickname is their first name, you just have to enter their korean name, e.g.:
변다희
for foreign people (japanese, thai, american, etc.), copy their native names first, then their korean transliterated names, for example, for h.u.b's rui:
渡辺るい/와타나베 루이 (루이)
for ex members, create another folder beneath named 前, and insert them there.
the folder structure should now look something like this:
root
- sns
- instagram
- 밤비노
- 前
- 성하담
- 고명선 (민희)
- 박은솔
- 변다희
- 서아
hopefully by now, all the feeds have been reloaded. let's go ahead and start translating ^^
in another console, run:
cd khelpers
node sns.js 밤비노 180405 # replace the timestamp to any time you wish to start with,
# as long as it's cached in webrssview.
# it will return all posts starting from that date,
# and ending with yesterday, kst
# if you wish to end earlier than yesterday, specify and end timestamp like this:
#node sns.js 밤비노 180405 -180406
# if you've already translated the group before,
# and wish to translate starting where you left off, just run:
#node sns.js 밤비노
it'll take a bit of time as it will be going through and finding important comments,
but once it's done, it'll open up the editor you specified with a file named like: 밤비노_180405_180406.txt,
the second timestamp is the ending timestamp.
once you're done translating, go back to the console and run:
node sns.js 밤비노 180405 180406
this will upload all the images/videos to imgur/streamable,
and create a new file named 밤비노_180405_180406.txt_mod.
now you can either open that file and copy the contents to a reddit post, or, if you're lazy like me, you can just run:
node sns.js 밤비노 180405 180406 re
for reddit though, you will need to have set a title post (by adding - title after the end of the url)
for twitter:
node sns.js 밤비노 180405 180406 tw
and blogger: (you'll need to login and give it a token, it'll guide you through)
node sns.js 밤비노 180405 180406 bl
misc
livestreams
add a feed somewhere in webrssview with this url: http://localhost:8123/f/instagram/reels_tray . make sure it's refreshed somewhere less than 10 seconds (don't use the instagram thread, make a new thread).
home feed
if you want to get posts quicker, you can also use the home feed. just add http://localhost:8123/f/instagram/home to webrssview.
other resources
downloading someone's instagram
rssit is completely accessible by the console as well, so you can download someone's entire instagram:
cd rssit
python3 rssit.py '/f/instagram/u/username' nohooks=true output=social count=-1 max_graphql_count=50 | python3 ../dlscripts/download.py async
nohooks=true makes sure download.py isn't run automatically,
output=social gives the "social" json format (what download.py reads),
count=-1 means every post,
and max_graphql_count=50 increases the limit for graphql to 50
(instagram could limit this later though, so if you have problems, try 24 or 12).
the async option to download.py lets multiple (by default, 10) posts download at once (much faster).
if you want to change the amount, just add a number after, e.g. python3 download.py async 5
getting social media links
there are many resources to find links, but https://www.reddit.com/r/kpopfap/wiki/socialmedia is by far the best list i've found so far (thanks to /u/glitch_my_anus), although it sadly only has girl groups.
if you want boy groups, both https://kpopinfo114.wordpress.com/ and https://www.nautiljon.com/ are amazing as well.
if you want to find links yourself (for example, for a new group), it's a little hard (especially for new groups), but here's some things i've found helpful:
use naver and namu, and search for their names in hangul (group and member names). this is mainly only useful for larger groups tho
check on instagram for posts mentioning the group, then click on the photos, they'll sometimes tag the members' igs there.
try to take note of their real names wherever you can, it'll help you later
if they have an official ig page, check who it's following.
if they have an official fb page, scroll down lots, then search for "with". they might be tagged there. if they are, check their profiles and look for instagram links.
also look for who likes the posts/comments
if they have an official fancafe, register if needed, and they might contain social media links or their real names.
check who the other members follow on instagram, and intersect them:
use this userscript, go to each members' instagrams, open the developer tools (f12), go to the console then run:
idt_following()
copy(JSON.stringify(idt_output))
paste each one to a text editor or something, then go to any instagram page, then:
a = // paste the first account here (replacing //)
b = // second one here
c = // etc.
idt_intersect(a, b, c)
you can send as many accounts as you want to idt_intersect. it'll then return an object like this:
{
1: [...],
2: [...],
3: [...]
}
each number represents how many accounts in common follow the ones listed after.
as for how to know which accounts are by the group members, i'd say look for their real or stage names, and look through their pictures to see if they took photos with other group members, or if they used the name of the group in their posts. this part is really up to you though.
if you have an old ig link that doesn't work anymore, first, try to find posts by them, if you can, their new username will be shown. if not, try googling their username, look for links like "thepicta.com/user/ig_username_you_want_to_find/uid_numbers/post_numbers", the first set of numbers (uid_numbers) will be the user id. if the links (thepicta.com etc.) don't work ("user not found", "posts by @", etc.), then use the userscript linked above, and run in any instagram page:
idt_userbyid(uid_numbers). if it returns 404, then the user is indeed private/deleted, but if not, you should receive an object containing their username and other info.if you have an old ig link that's private (or no posts) and has very few followers, it's probably a fan account. check to see if other members or k-pop stars follow that account.
This comment has been removed by the author.
ReplyDeletedude you're awesome
ReplyDeleteHi. For example, how do I set it up to automatically download when he turn on live streams at https://www.instagram.com/longlivesmdc/ I don't have the expertise to follow. I'd really appreciate your help.
ReplyDeletefor sure ^^ if you have a reddit, twitter, or youtube, send me a message (my account names are all "whyhahm") and then i'll walk you through :)
DeleteThank you for your kindness. I'll be in touch with you soon on Twitter. One question is, should I always keep my computer on to record?
Deleteyeah, that's probably one of the biggest downsides haha.
DeleteI was spending a ton of time converting our blog content into markdown docs, just discovered INK for ALL’s Jekyll compatible export option: http://bit.ly/2ECXoDa|Used need a stupid amount of time converting blog content into Markdown docs. Excited I discovered the INK FOR ALL text editor’s Jekyll-compatible export feature. Such a game-changer
ReplyDeleteDefinitely happy with my new Hugo static website supported by Hugo compatible export feature in Ink for All
ReplyDelete