/tech/ - Tech

Technology


Name
X
Subject
Email
Message
Files
Password
[New Post]


chb_hibiya033.jpg
(34.2KB, 640x480)
Hey, so I wanted a tool to help with migrating my imageboard around. I used the approach of making copies of threads by hand, but I wanted something more powerful and much simpler to use. With the upcoming opening of 8kun, I figured it might be of interest to other anons to also use a tool like this. Sadly, my progress to date only includes my hand-written C++ code, not a fully pre-packaged piece of software for general users. So, atm this will only be of any practical interest to anons who can compile C++ and install needed dependencies. If that's you, then read on.

While I plan to create a GUI for this tool at a later point, for now it's a command-only tool. You run it simply by supplying the site name followed by the board name. For example:
./archbot julay.world robowaifu
would download the catalog's json & html, every thread's json & html, and every thread's media files (images, audio, video, pdfs, etc) to a local copy in whatever directory you ran the command from, for the /robowaifu/ board on julay.

Make any requests for features if you think of something. Also, feel free to ask any questions about the code itself or suggestions for improvements. Right now this works for any board on any Lynxchan -based site, since that's where I started all this. I plan to get it working for 8kun next, and here at fatchan, smug & animu bunker. That should cover most IB software types at that point.
Replies: >>90 >>158 >>164 >>221
I plan to walkthrough all the code as it stand atp. I use a modern C++ approach. If you've been following the language recently at all, then this should be pretty easy to follow, it's not a complicated approach I'm using.

Might as well start at the beginning, with main()
//------------------------------------------------------------------------------

// Build a local, thread-sorted archive of an imageboard
int main(int argc, char* argv[])
{
  const auto [site_nm, board_nm, force_chck]{parse_args(argc, argv)};
  const auto [board_dir,
              thrds_dir,
              cfg_file,
              curr_json,
              cfg_json,
              have_new_bumps]{init_n_check(site_nm, board_nm, ".")};

  if (have_new_bumps) {
    const auto thrds_to_updt{
        ld_updtd_thrds(cfg_file, curr_json, cfg_json, force_chck)};
    rprt_thrd_ids(site_nm, board_nm, thrds_to_updt);

    const auto thrd_subj_ids{ld_subj_ids(curr_json)};

    grab_thrd_files(site_nm, board_nm, thrds_to_updt, thrd_subj_ids, thrds_dir);

    cerr << "\nFinished updating /" << board_nm << "/ archive.\n";
  }
  else
    cerr << "\nNothing to do, /" << board_nm << "/ archive up to date.\n";
}
language: c++, relevance: 25
>>77
Alright, let's look at things in detail. The very first line
const auto [site_nm, board_nm, force_chck]{parse_args(argc, argv)};makes a call to a parse_args() function, supplying the executable arguments supplied by the user, for example julay.world and robowaifu from my previous example line.

The parse_args() simply does just that
//------------------------------------------------------------------------------

// Extract the arguments passed into the program
tuple<string, string, bool> parse_args(const int argc, char** const argv)
{
  if (argc < 3) {
    cerr << "\nprogram argument error\n"
         << "usage: " << argv[0] << "  site_name  board_name\n";

    std::exit(-1);
  }

  const string site_nm{argv[1]};
  const string board_nm{argv[2]};

  // Optional 3rd flag argument to force checking of the archive
  bool force_chck{false};
  if (argv[3]) {
    istringstream bool_arg{argv[3]};

    if (bool_arg.str() == "true")
      bool_arg >> std::boolalpha >> force_chck;
    else if (bool_arg.str() == "1")
      bool_arg >> force_chck;
    else {
      cerr << "\n'force update' flag argument error\n"
           << "usage: 'true' or '1'\n";

      std::exit(-1);
    }
  }

  return make_tuple(site_nm, board_nm, force_chck);
}
language: c++, relevance: 44
Note the return is a tuple of 3 elements, site_nm, board_nm, force_chck. This tuple is passed directly back to the caller, main() in this case, and the C++17 feature of structured bindings is used at the caller to construct three named variables there by the same three names.

cppreference.com is the first place anyone should go for detailed, professional information on C++ commands. Here's the one on structured bindings:
https://en.cppreference.com/w/cpp/language/structured_binding
Replies: >>79
>>78
BTW this idiom of 
receive const arguments in
create a tuple to pass back
use the returned tuple at the caller to construct local variables there
is both good modern C++ and consistently my approach with this code pretty much throughout. If it's new to you as a C++ dev, trust me you'll come to appreciate it. It is one tool that helps move the language towards a functional paradigm.
>>77
Alright, now that we have the strings containing the site's name, the board's name, and whether to force a re-check, let's move on.

The next statement
const auto [board_dir,
            thrds_dir,
            cfg_file,
            curr_json,
            cfg_json,
            have_new_bumps]{init_n_check(site_nm, board_nm, ".")};makes a call to the function init_n_check()

This function has to do two things.
1. Ensure the archive is actually initialized (it won't be during the first run ofc)
2. Check to see if we have any new bumps on the board or not.

//------------------------------------------------------------------------------

// Establish the basic directory structure for the imageboard archive and check
// if there is updated catalog json data available (ie, we have new bumps)
tuple<fs::path, fs::path, fs::path, Json::Value, Json::Value, bool>
init_n_check(const string& site_nm,
             const string& board_nm,
             const fs::path& targ_dir)
{
  const fs::path site_dir{site_nm};
  fs::create_directory(targ_dir / site_dir);
  const fs::path board_dir{site_dir / board_nm};
  fs::create_directory(board_dir);
  const fs::path thrds_dir{board_dir / "threads"};
  fs::create_directory(thrds_dir);
  const fs::path media_dir{board_dir / ".media"};
  fs::create_directory(media_dir);
  const fs::path thumbs_dir{board_dir / ".thumbs"};
  fs::create_directory(thumbs_dir);
  const fs::path static_dir{board_dir / ".static"};
  fs::create_directory(static_dir);

  fs::path cfg_file{board_dir / ".archbot.config"};
  if (!fs::exists(cfg_file))
    ofstream cfg_ofs{cfg_file};

  const auto [curr_json, cfg_json, have_new_bumps]{
      check_new_bumps(site_nm, board_nm, "catalog", board_dir, cfg_file)};

  return make_tuple(
      board_dir, thrds_dir, cfg_file, curr_json, cfg_json, have_new_bumps);
}
language: c++, relevance: 45
Replies: >>81 >>88
>>80
The first part of the function simply ensures the basic directory skeleton is in place and ensures the .archbott.config file is created, which will store config for the archive.

The next bit
const auto [curr_json, cfg_json, have_new_bumps]{
    check_new_bumps(site_nm, board_nm, "catalog", board_dir, cfg_file)};makes a call to the check_new_bumps() function.

//------------------------------------------------------------------------------

// Check for a newer bump and return the flag to indicate the result. Store new
// copies of the catalog's json & html responses, if needed.
tuple<Json::Value, Json::Value, bool> check_new_bumps(const string& site_nm,
                                                      const string& board_nm,
                                                      const string& page_nm,
                                                      const fs::path& targ_dir,
                                                      const fs::path& cfg_file)
{
  const auto cfg_json{rd_json_file(cfg_file)};

  const auto json_uri{muh_curl::muh_url(site_nm, board_nm, page_nm, "json")};
  auto json_resp{muh_curl::get_resp(json_uri)};
  const auto curr_json{parse_json(json_resp)};

  const auto [have_new_bump, last_bump]{cmp_last_bumps(curr_json, cfg_json)};

  if (have_new_bump) {
    updt_json_item(cfg_file, "last_known_bump", last_bump);
    wrt_json_html(site_nm, board_nm, page_nm, targ_dir, json_resp);
  }

  return make_tuple(curr_json, cfg_json, have_new_bump);
}
language: c++, relevance: 33
Replies: >>82 >>83 >>84 >>86 >>88
>>81
Alright, we'll break down the detail here. The first line
const auto cfg_json{rd_json_file(cfg_file)};makes a call to the rd_json_file() function

//------------------------------------------------------------------------------

// Read in a file containing only json data, return it's parse
Json::Value rd_json_file(const fs::path& filepath)
{
  ifstream json_val{filepath};
  return parse_json(json_val);
}
language: php, relevance: 7
It creates an input filestream and passes that on to parse_json().

//------------------------------------------------------------------------------

// Return a parsed json object from a json input container
template<typename T> Json::Value parse_json(T& json_input)
{
  Json::Reader reader{};
  Json::Value json_dat{};
  reader.parse(json_input, json_dat);
  return json_dat;
}
language: c++, relevance: 11
This function receives a parameter that must contain json data (the read-in filestream in this case), then creates an instance of the Json::Reader class and uses that to parse the input. The now fully-parsed json object is returned.
>>81
Alright we now have the fully-parsed json object constructed from the .archbot.config file from the first step. Now let's go read in the board's catalog json data to compare against it with.

const auto json_uri{muh_curl::muh_url(site_nm, board_nm, page_nm, "json")};
  auto json_resp{muh_curl::get_resp(json_uri)};
  const auto curr_json{parse_json(json_resp)};
language: c++, relevance: 8
These three steps all act as a unit and do one thing, read the json and parse it. I'll delay digging into the muh_curl::muh_url() and muh_curl::get_resp() until I go into that namespace in particular. For now, just know that they create an appropriate uri text and use the libcurl library to go fetch the json. The function parse_json() you already saw in the previous post.
>>81
Alright, now we have both the catalog json from the live board, and the config json from the local archive. Now we can compare them with each other, and check to see if the live one has any newer bumps.
const auto [have_new_bump, last_bump]{cmp_last_bumps(curr_json, cfg_json)};We have a pair of functions that work together to check this.
//------------------------------------------------------------------------------

// Load all the current bumps into a vector and sort it
vector<string> load_sort_bumps(const Json::Value& curr_json)
{
  vector<string> curr_bumps{};

  for (const auto& thrd : curr_json)
    curr_bumps.emplace_back(thrd["lastBump"].asString());
  sort(begin(curr_bumps), end(curr_bumps), std::greater<string>());

  return curr_bumps;
}

//------------------------------------------------------------------------------

// Compare the newest 'last_known_bump' supplied from the board, with the
// previous 'last_known_bump' value stored in the archive
tuple<bool, string> cmp_last_bumps(const Json::Value& curr_json,
                                   const Json::Value& cfg_json)
{
  const string prev_last_bump{cfg_json["last_known_bump"].asString()};

  const auto curr_bumps{load_sort_bumps(curr_json)};
  const auto curr_last_bump{curr_bumps[0]};

  if (curr_last_bump != prev_last_bump)
    return make_tuple(true, curr_last_bump);
  else
    return make_tuple(false, prev_last_bump);
}
language: c++, relevance: 42
Replies: >>85
>>84
The first statement reads in the LKB tag from the config.
const string prev_last_bump{cfg_json["last_known_bump"].asString()};The next one calls the function that pulls all the bumps from the board and returns them sorted in descending order.
const auto curr_bumps{load_sort_bumps(curr_json)};The sorted vector will contain the newest bump as the first element.
const auto curr_last_bump{curr_bumps[0]};We now have both 'last_known_bump's and all we have to do is compare if they are the same or not. We'll return a true or a false.
if (curr_last_bump != prev_last_bump)
  return make_tuple(true, curr_last_bump);
else
  return make_tuple(false, prev_last_bump);
language: kotlin, relevance: 8
The load_sort_bumps() function creates a vector and loads it with all the bumps from the live board.
vector<string> curr_bumps{};

for (const auto& thrd : curr_json)
  curr_bumps.emplace_back(thrd["lastBump"].asString());
language: c++, relevance: 7Then sorts the vector in descending order and returns it to the caller.
sort(begin(curr_bumps), end(curr_bumps), std::greater<string>());

return curr_bumps;
>>81
Alright, now that we have checked for a newer bump, we'll use the sentinel flag to control whether we write out the newest bump into the config file, and new copies of the catalog's json and html files as well.
if (have_new_bump) {
  updt_json_item(cfg_file, "last_known_bump", last_bump);
  wrt_json_html(site_nm, board_nm, page_nm, targ_dir, json_resp);
}updt_json_item() will go write the tag to the config file.
//------------------------------------------------------------------------------

// (over)write a json value into a json file
template<typename T>
void updt_json_item(const fs::path& json_file_nm,
                    const string& tag_label,
                    const T tag_dat)
{
  fstream json_iostrm{json_file_nm, ios::in | ios::out};
  auto json_val{parse_json(json_iostrm)};

  json_val[tag_label] = tag_dat;

  fast_wrt_strm(json_iostrm, json_val);
}
language: c++, relevance: 15It creates an input/output filestream, parses the json in it, and updates the tag's data, then writes the updated json back out to the file. The fast_wrt_strm() function uses the json fastwriter class to eliminate whitespace from the json.
//------------------------------------------------------------------------------

// Write json out to a filestream using the fastwriter (no whitespace)
void fast_wrt_strm(fstream& io_strm, const Json::Value& json)
{
  // Prepare the stream for writing
  io_strm.clear();
  io_strm.seekp(0);

  Json::FastWriter writer;
  io_strm << writer.write(json);
}
language: c++, relevance: 10
Replies: >>87
>>86
The wrt_json_html() function simply does just that, it writes both files out to disk.
//------------------------------------------------------------------------------

// Read in the html response, then write both the json and html to disk.
void wrt_json_html(const string& site_nm,
                   const string& board_nm,
                   const string& page_num,
                   const fs::path& targ_dir,
                   const stringstream& json_resp)
{
  // Get html response as well & write both out to disk
  const auto html_uri{muh_curl::muh_url(site_nm, board_nm, page_num, "html")};
  const auto html_resp{muh_curl::get_resp(html_uri)};

  write_file(page_num, "json", targ_dir, json_resp.str());
  write_file(page_num, "html", targ_dir, html_resp.str());
}
language: c++, relevance: 25
>>81
Finally, we return the tuple back to the caller.
return make_tuple(curr_json, cfg_json, have_new_bump);
>>80
Now that we've successfully determined whether we have any newer bumps on the live board than we have in the local archive, we return back to main()
return make_tuple(
    board_dir, thrds_dir, cfg_file, curr_json, cfg_json, have_new_bumps);
>>77
Alright, we now have a flag have_new_bumps that has been set. Next, we'll either update the archive or not depending on it's value.

So, that's enough to begin this thread with. I'll work on this again later. I plan to walk through the whole thing start to finish. It's roughly 600 lines of code in total and we've probably looked at about 1/3rd of them so far, I'd guess. I'll be back again.
Replies: >>92
Sounds like it could be useful. But where do we get your project code? Do you have a gitlab link or anything?
>>76 (OP) 
Hey, not sure if you have the 
//------------------------------------------------------------------------------ thing in the code blocks to make it the right width, but you shouldn't need them. The CSS will already make it width of the longest line or horizontal scrolling if the line is wider than your screen size.
Replies: >>91
>>90
Thanks. Yes, I have them in the original 80 chars is a hard rule I enforce on codebases I control.
Be aware, I've renamed curr_json => live_json during the break, as I felt it better conveyed the distinction with cfg_json inside the code.

>>77 >>88 digits
OK, now that we know whether or not we have a new bump or not let's move on. First, we'll address the negative case. We simply report that everything's up to date to the user.
if (have_new_bump) {
  //...
}
else
  cerr << "\nNothing to do, /" << board_nm << "/ archive up to date.\n";
language: c++, relevance: 6Otherwise, we call ld_updtd_thrds().
if (have_new_bump) {
  const auto thrds_to_updt{
      ld_updtd_thrds(cfg_file, live_json, cfg_json, force_chck)};
Replies: >>93
>>92
ld_update_thrds() is a wrapper function.
//------------------------------------------------------------------------------

// DESIGN: Can this just be folded into ld_thrd_list()?
// Load newly bumped thread IDs
vector<uint> ld_updtd_thrds(const fs::path cfg_file,
                            const Json::Value& live_json,
                            const Json::Value& cfg_json,
                            const bool force_chck)
{
  auto curr_bump_ids{parse_bump_map(live_json)};
  return ld_thrd_list(cfg_file, curr_bump_ids, cfg_json, force_chck);
}
language: c++, relevance: 13It uses the live_json to call a function parse_bump_map()
//------------------------------------------------------------------------------

// Parse thread's bump map data into a new map container
map<uint, string> parse_bump_map(const Json::Value& json)
{
  map<uint, string> bump_ids;
  for (const auto& thread : json)
    bump_ids.emplace(thread["threadId"].asUInt(),
                     thread["lastBump"].asString());

  return bump_ids;
}
language: c++, relevance: 16A 'bump map' is just my made-up term to describe a std::map<> that I use to associate a thread's ID number to it's last bump time. The Lynxchan-specific json tags threadId and lastBump are where this data comes from.
Replies: >>94 >>131
>>93
Now that we have the current bump ids loaded up into a map, the wrapper calls the primary function ld_thrd_list().
return ld_thrd_list(cfg_file, curr_bump_ids, cfg_json, force_chck);
This function will load up two vectors from the bump map we have, then create a bump map from the config file and load that up into another vector, resizing to match if needed (eg, we have new threads added since last update). Since std::map<> sorts it's elements by the key value inherently, there's no need to sort the vectors. Then the bump times in each adjacent element are compared with each other side-by-side, and the thread IDs for any mismatches are added into the uint vector thrds_to_updt used as the function's return.
//------------------------------------------------------------------------------

// Compare the archived thread bumps to the current thread bumps. Write out
// the update list with the IDs of threads that are different
vector<uint> ld_thrd_list(const fs::path cfg_file,
                          const map<uint, string>& curr_bump_ids,
                          const Json::Value& cfg_json,
                          const bool force_chck)
{
  vector<uint> thrds_to_updt;

  auto arch_bump_ids{ld_arch_bump_map(cfg_json)};

  // DESIGN: This function's algorithm as a whole needs to be tested against
  // thread deletions. -191017

  vector<pair<uint, string>> arch_bump_vec{begin(arch_bump_ids),
                                           end(arch_bump_ids)};
  const vector<pair<uint, string>> curr_bump_vec{begin(curr_bump_ids),
                                                 end(curr_bump_ids)};
  const auto curr_sz{curr_bump_vec.size()};
  if (arch_bump_vec.size() != curr_sz)
    arch_bump_vec.resize(curr_sz);

  for (size_t idx{0}; idx < curr_sz; ++idx) {
    const auto& [arch_id, arch_bump]{arch_bump_vec[idx]};
    const auto& [curr_id, curr_bump]{curr_bump_vec[idx]};

    // DESIGN: Dealing with deletions may require confirmation of IDs as well.

    if (force_chck || (arch_bump != curr_bump))
      thrds_to_updt.emplace_back(curr_id);
  }

  if (thrds_to_updt.size() != 0) {  // The archive file needs to be updated
    updt_cfg_thrds(curr_bump_ids, cfg_file);
  }

  return thrds_to_updt;
}
language: c++, relevance: 44
Replies: >>95 >>99
>>94
>This function will load up two vectors from the bump map we have*
Replies: >>96
>>95
FUUUU why can't I delete my post? Let's try this again.
>This function will load up a vector from the bump map we have*
Replies: >>97
>>96
You can only delete a post if you have the correct password (empty password posts cant be deleted by regular users) or if you are board staff, in which case you dont need a password to delete them.
Replies: >>98
>>97
So, jschan doesn't default a random pw and retain it for me just in case I screw up a post and need a do-over?
Replies: >>100
>>94
If we have a difference, then we go and rewrite the .archbot.config file's "threads" array via the statement updt_cfg_thrds(curr_bump_ids, cfg_file); inside the guard.
This function simply (re)writes the array with new json elements containing the thread ID and bump time.
//------------------------------------------------------------------------------

// Write out the current bump map into the threads section of config file
void updt_cfg_thrds(const map<uint, string>& curr_bump_ids,
                    const fs::path cfg_file)
{
  fstream cfg_iostrm{cfg_file, ios::in | ios::out};
  auto cfg_json{parse_json(cfg_iostrm)};

  cfg_json["threads"].clear();
  for (const auto& e : curr_bump_ids) {
    const auto [id, curr_bump]{e};

    Json::Value thrd_dat{};
    thrd_dat["threadId"] = id;
    thrd_dat["lastBump"] = curr_bump;

    cfg_json["threads"].append(thrd_dat);
  }

  fast_wrt_strm(cfg_iostrm, cfg_json);
}
language: c++, relevance: 23Then finally back in the caller we simply return the vector<uint> that has the list of all threads that had bumps since the last update.
return thrds_to_updt;(and also returned by the wrapper as well)
>>98
If you do it with a script the user needs to have scripts enabled (I wouldn't realy mind that tbh). It could also be done without a script and stored in their session cookie or something. I thought about it for a moment but im still not sure of the security implications either way
Also it would not be cross device compatible. I think its best users select their own passwords. It is hard to brute force even simple passwords because a captcha is required. Lets hope some AI doesnt break them
Replies: >>139
>>77
Now that we have populated a vector listing all the threads that have new bumps since the last update back in main(), we first report the list to the user.
rprt_thrd_ids(site_nm, board_nm, thrds_to_updt);This function simply iterates the vector and prints each element out.
//------------------------------------------------------------------------------

// List the thread IDs needing updates
void rprt_thrd_ids(const string& site_nm,
                   const string& board_nm,
                   const vector<uint>& thrds_to_updt)
{
  cout << "\n" << site_nm << "/" << board_nm << "/ thread IDs to update:\n";
  for (const auto id : thrds_to_updt)
    cout << id << '\n';
  cout.flush();  // ensures this output appears as the file processing begins
}
language: c++, relevance: 23Next we populate another std::map<>, this time with the thread subject as the key.
const auto thrd_subj_ids{ld_subj_ids(live_json)};This function loads the subjects and ids into a container, sorted by the subjects.
//------------------------------------------------------------------------------

// Associate the thread's subjects with their IDs
map<string, uint> ld_subj_ids(const Json::Value& cat_json)
{
  // Sort local thread directories by subject
  map<string, uint> thrd_subj_id{};
  for (const auto& thrd : cat_json)
    thrd_subj_id.emplace(thrd["subject"].asString(), thrd["threadId"].asUInt());
  return thrd_subj_id;
}
language: c++, relevance: 17
Replies: >>102
>>101
I just noticed I hadn't renamed the cat_json object to live_json yet inside ld_subj_ids(). Will be adjusted for the repo.
>>77
OK, now we're all set to perform the bulk of the work--namely iterating through all the board's threads, creating their folders as needed, and grabbing all their files as needed. As you can imagine, this is the lion's share of all the work during the very first initial board archive download. It takes a while. The good news is that after the first time, keeping the archive updated thereafter typically takes just seconds and is very light on both the client and the server.

This is where it all kicks off:
grab_thrd_files(site_nm, board_nm, thrds_to_updt, thrd_subj_ids, thrds_dir);This function goes through each element in the thrd_subj_ids map, then searches for a matching id inside the thrds_to_updt vector. When it finds one, it goes to work on that thread.
//------------------------------------------------------------------------------

// Create thread directories and download the thread's files
void grab_thrd_files(const string& site_nm,
                     const string& board_nm,
                     const vector<uint>& thrds_to_updt,
                     const map<string, uint>& thrd_subj_ids,
                     const fs::path& thrds_dir)
{
  for (const auto& [subj, id] : thrd_subj_ids) {
    if (std::find(begin(thrds_to_updt), end(thrds_to_updt), id) !=
        end(thrds_to_updt)) {
      const auto thrd_dir{add_thrd_dir(subj, id, thrds_dir)};
      const auto thrd_id{to_string(id)};

      // Get the thread's json and parse it
      auto thrd_resp{grab_html_json(site_nm, board_nm, thrd_id, thrd_dir)};
      const auto thrd_json{parse_json(thrd_resp)};

      const fs::path media_dir{thrd_dir / ".media"};
      fs::create_directory(media_dir);

      // DESIGN: grab .thumbs too?
      // -Yes, during the 'rewrite the html files' phase of the software cycle.

      grab_files(site_nm, thrd_json["files"], media_dir);

      const auto posts{thrd_json["posts"]};
      for (const auto& post : posts)
        grab_files(site_nm, post["files"], media_dir);
    }
  }
}
language: c++, relevance: 46
Replies: >>104 >>110 >>114
>>103
In my scheme, the first step is to ensure the thread has it's own custom-named directory.
const auto thrd_dir{add_thrd_dir(subj, id, thrds_dir)};This basic idea is obviously very simple, but the details are a bit tedious at times.
//------------------------------------------------------------------------------

// Create a custom-named thread directory inside the threads directory
fs::path add_thrd_dir(const string& subj,
                      const uint id,
                      const fs::path& thrds_dir)
{
  const auto clean_subj{make_dir_name(subj)};
  const auto id_pad{make_id_pad(id)};
  const fs::path subj_pad{clean_subj + "_" + id_pad};
  const fs::path thrd_dir{thrds_dir / subj_pad};
  fs::create_directory(thrd_dir);

  return thrd_dir;
}
language: c++, relevance: 21The directory name is based on the thread's subject (which opens up a can of worms tbh).
const auto clean_subj{make_dir_name(subj)};
Replies: >>105 >>109
>>104
Since creating filesystem items are often system-dependent and can often be non-portable if done haphazardly, I opt for the conservative approach and clean up the thread's subject string to simply eliminate (or replace) potentially troublesome characters, etc.
//------------------------------------------------------------------------------

// Clean up the thread's subject suitable for use as a portable directory name
string make_dir_name(const string_view subj)
{
  string res{subj};

  const vector<pair<char, char>> bad_chars{{' ', '_'},
                                           {'&', 'n'},
                                           {'?', ' '},
                                           {'!', ' '},
                                           {'/', ' '},
                                           {'\\', ' '},
                                           {'(', ' '},
                                           {')', ' '},
                                           {'.', ' '},
                                           {',', ' '},
                                           {':', ' '},
                                           {';', ' '}};
  for (const auto& bad_char : bad_chars)
    zap_bad_char(res, bad_char);

  const vector<string> bad_strings{"nquot", "napos"};
  for (const auto& bad_string : bad_strings)
    zap_bad_string(res, bad_string);

  zap_bad_string(res, " ");

  // Limit the length of local directory names (44 + '_12345' == 50)
  if (res.size() > 44)
    res = res.substr(0, 44);

  return res;
}
language: c++, relevance: 54
Replies: >>106 >>107
>>105
I'll go ahead and introduce the helper functions as well before proceeding.
//------------------------------------------------------------------------------

// Remove bad strings
void zap_bad_string(string& in_out, const string_view bad)
{
  using str_sz_t = string::size_type;
  const auto str_end{string::npos};

  str_sz_t bad_len = bad.length();
  for (str_sz_t i = in_out.find(bad); i != str_end; i = in_out.find(bad)) {
    in_out.erase(i, bad_len);
  }
}

//------------------------------------------------------------------------------

// Replace bad chars with a good char
void zap_bad_char(string& in_out, const pair<char, char> bad_good)
{
  const auto [bad, good]{bad_good};
  std::transform(begin(in_out), end(in_out), begin(in_out), [=](char ch) {
    return ch == bad ? good : ch;
  });
}
language: c++, relevance: 31
>>105
Given two container sets, one for pairs of bad characters, and one for bad strings, loop through them both and remove/change them for the thread's subject.
for (const auto& bad_char : bad_chars)
  zap_bad_char(res, bad_char);for (const auto& bad_string : bad_strings)
  zap_bad_string(res, bad_string);The first helper function uses a C++ standard library algorithm std::transform() to efficiently transform the supplied string in-place, using a lambda function to decide on the replacement character.
//------------------------------------------------------------------------------

// Replace bad chars with a good char
void zap_bad_char(string& in_out, const pair<char, char> bad_good)
{
  const auto [bad, good]{bad_good};
  std::transform(begin(in_out), end(in_out), begin(in_out), [=](char ch) {
    return ch == bad ? good : ch;
  });
}
language: c++, relevance: 15The second helper uses a more traditional mode to keep looking through a string and erasing a sub-string out from it.
//------------------------------------------------------------------------------

// Remove bad strings
void zap_bad_string(string& in_out, const string_view bad)
{
  using str_sz_t = string::size_type;
  const auto str_end{string::npos};

  str_sz_t bad_len = bad.length();
  for (str_sz_t i = in_out.find(bad); i != str_end; i = in_out.find(bad)) {
    in_out.erase(i, bad_len);
  }
}
language: c++, relevance: 16The same function is re-used to eliminate all stray spaces left over from the character replacements.
zap_bad_string(res, " ");Finally, the return string is hard-limited to a moderate length, since some thread subjects can go way overboard in length. Directory names shouldn't be over 50 characters long in my subjective opinion.
// Limit the length of local directory names (44 + '_12345' == 50 total)
if (res.size() > 44)
  res = res.substr(0, 44);

return res;
language: kotlin, relevance: 5
Replies: >>108
>>107
>one for pairs of bad/good characters*
>>104
That takes care of the cleaning up the subject. But I also want to append the thread's ID number as a suffix, and I want it padded out to 5 digits (99,999 is probably a fair number of threads for a board to have heh).
const auto id_pad{make_id_pad(id)};This function takes advantage of a C++ standard library stream manipulation technique.
//------------------------------------------------------------------------------

// Pad the thread ID out to n digits, using zeros
string make_id_pad(const uint id, const uint n = 5)
{
  std::stringstream res{};
  res << std::setfill('0') << std::setw(n) << id;
  return res.str();
}
language: c++, relevance: 16https://en.cppreference.com/w/cpp/io/manip

Now we simply concatenate the two parts with an underscore to arrive at the final thread directory name.
const fs::path subj_pad{clean_subj + "_" + id_pad};We create the custom-named directory inside the board's threads directory and return it's path.
const fs::path thrd_dir{thrds_dir / subj_pad};
fs::create_directory(thrd_dir);

return thrd_dir;
Replies: >>142
>>103
Now we have the directory to store all the thread's files. In basically the same way we fetched the catalog's json response earlier in the program, we now do it for the current thread that needs updating.
const auto thrd_id{to_string(id)};

// Get the thread's json and parse it
auto thrd_resp{grab_html_json(site_nm, board_nm, thrd_id, thrd_dir)};
const auto thrd_json{parse_json(thrd_resp)};
language: c++, relevance: 7Again, now that we have the specific thread directory on hand, we'll create it's sub-directory to hold it's files.
const fs::path media_dir{thrd_dir / ".media"};
fs::create_directory(media_dir);grab_files() will go read the thread's json to get the files it indicates. First the OP's files:
grab_files(site_nm, thrd_json["files"], media_dir);Then, all the other posts in the thread's files:
const auto posts{thrd_json["posts"]};
for (const auto& post : posts)
  grab_files(site_nm, post["files"], media_dir);
language: c++, relevance: 7
Replies: >>111
>>110
The 'grab_files()` function reads both the original name of the file and it's uri name out from the json, then downloads it and writes it into the thread's media sub-directory.
//------------------------------------------------------------------------------

// Save the thread's media files
uint grab_files(const string& site_name,
                const Json::Value& files_arr,
                const fs::path media_dir)
{
  uint file_cnt{0};

  for (const auto& file : files_arr) {
    const auto orig_file{file["originalName"].asString()};
    const fs::path media_file{make_media_file(orig_file, media_dir)};

    if (!fs::exists(media_file)) {
      const auto path_of_file{file["path"].asString()};
      auto file_url{muh_curl::muh_file_url(site_name, path_of_file)};
      auto file_resp = muh_curl::get_resp(file_url);

      ofstream ofsb(media_file, std::ios::binary);
      ofsb << file_resp.rdbuf();
      ++file_cnt;
    }
  }

  return file_cnt;
}
language: c++, relevance: 26
Replies: >>112 >>113
>>111
In a way somewhat similar to the directory name issues, we limit the length of the filenames to ~125 chars.
//------------------------------------------------------------------------------

// Separate a filename's extension out
tuple<string, string> split_file_ext(const string& filename)
{
  // Search backwards to find the final '.'
  auto last_dot_pos{filename.rfind('.')};

  if (last_dot_pos == string::npos) {  // dot not found
    return make_tuple(filename, "");
  }
  else {
    const string file{filename.substr(0, last_dot_pos)};
    const string ext{filename.substr(last_dot_pos + 1, string::npos)};
    return make_tuple(file, ext);
  }
}

//------------------------------------------------------------------------------

// Ensure the file name is well-formed
fs::path make_media_file(const string& orig_file, const fs::path& media_dir)
{
  fs::path res{};

  if (orig_file.size() >= 120) {  // Limit the length of local filenames
    auto [filename, ext]{split_file_ext(orig_file)};
    filename = filename.substr(0, 115);
    res = (media_dir / (filename + (ext != "" ? "." + ext : "")));
  }
  else
    res = (media_dir / orig_file);

  return res;
}
language: c++, relevance: 41
>>111
Now that we have a media filename determined, first we check that it doesn't already exist locally (that way we only download files one time from the server), then go and get it from the server using the uri name the server stores it under.
if (!fs::exists(media_file)) {
  const auto path_of_file{file["path"].asString()};
  auto file_url{muh_curl::muh_file_url(site_name, path_of_file)};
  auto file_resp = muh_curl::get_resp(file_url);
language: c++, relevance: 7We then store the download in binary form, locally under it's original name.
ofstream ofsb(media_file, std::ios::binary);
ofsb << file_resp.rdbuf();
++file_cnt;We return the total number of files downloaded in case the caller wants to know.
return file_cnt;
>>103
We've now downloaded all the files for that thread into it's own custom-named directory. Rinse, lather, repeat. When they are all finished we return back to our caller, main().

>>77
And now that we've downloaded all the files for every thread that needed updating, simply notify the user we're done.
cerr << "\nFinished updating /" << board_nm << "/ archive.\n";
And that's it, we've finished working through the main.cpp file's functions, and we've finished updating all the board's threads. 

>tl;dr
'JUST MAKE ARCHIVE' :^)
BUT WAIT
There's more. heh
Every C++ program needs #includes, etc, and this file has a fair number tbh. Here's the head of main.cpp:
// ArchBot Imageboard Migrator
// ===========================
// -This software is designed to make a full local copy of an imageboard
// -TODO: Extend with export functionality that allows for easy migration of
//        an imageboard from one site (say, 8kun.net) over to another site
//        (say, julay.world) given the target site's admin's approval.

#include <algorithm>
#include <filesystem>
#include <fstream>
#include <functional>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <map>
#include <sstream>
#include <string>
#include <tuple>
#include <type_traits>
#include <utility>
#include <vector>

#include <jsoncpp/json/json.h>

#include "muh_curl.h"

namespace fs = std::filesystem;

using std::begin;
using std::cerr;
using std::cout;
using std::end;
using std::fstream;
using std::ifstream;
using std::ios;
using std::istringstream;
using std::make_tuple;
using std::map;
using std::ofstream;
using std::pair;
using std::sort;
using std::string;
using std::to_string;
using std::vector;
language: c++, relevance: 114
Replies: >>116 >>117 >>132
>>115
AUGGGH! What happened!? What I posted didn't look that way, heh. Anyway, I can't delete posts, so I hope you can deal with it anon. :^)

So, notice that there are a couple of non-standard #includes namely
#include <jsoncpp/json/json.h>and
#include "muh_curl.h"The first is the header for the jsoncpp library.
https://github.com/open-source-parsers/jsoncpp

The second is my kludge wrapper around the libcurl & curlcpp libraries.
https://curl.haxx.se/libcurl/
https://github.com/JosephP91/curlcpp

These are three of the dependencies this program has. Here's where we make them known to the C++ compiler and linker using a Meson build file:
# ArchBot Imageboard Migrator
# ===========================
# -This software is designed to make a full local copy of an imageboard

project('ArchBot', 'cpp',
         version : '0.1',
         license : 'MIT')

add_project_arguments('-std=c++17', '-Wall', '-Wextra', '-fconcepts',
                      language: 'cpp')

cxx = meson.get_compiler('cpp')
#
fs_dep = cxx.find_library('stdc++fs')
fltk_dep = cxx.find_library('fltk')
curl_dep = cxx.find_library('curl')
curlcpp_dep = cxx.find_library('curlcpp')
jsoncpp_dep = cxx.find_library('jsoncpp')

executable('archbot', 'main.cpp',
  dependencies : [fs_dep, fltk_dep, curl_dep, curlcpp_dep, jsoncpp_dep])

# https://mesonbuild.com/Reference-manual.html

# Copyright (2019)
# License (MIT) https://opensource.org/licenses/MIT
language: php, relevance: 26
Replies: >>117 >>118 >>132
>>115
>>116
Ive been paying attention watching for fuck ups in the styling, I think these are because some styling options like titles with multiple ='s are going inline. I will fix it dont worry.
Replies: >>119
>>116
AUGGGH! Once again, I didn't post something that looked like that. How embarrassing, sorry guys.

So, if you look closely you'll see that there are five dependencies for the system:
fs_dep = cxx.find_library('stdc++fs')
fltk_dep = cxx.find_library('fltk')
curl_dep = cxx.find_library('curl')
curlcpp_dep = cxx.find_library('curlcpp')
jsoncpp_dep = cxx.find_library('jsoncpp')
language: javascript, relevance: 5Three we've already touched on. The other two are the stdc++fs library and the fltk library.
https://www.fltk.org/documentation.php

The first is necessary so we can use the C++ standard filesystem operations, and the second is the GUI library I intend to use as the ArchBot widget system. Feel free to replace it if you care to anon.
>>117
Haha, I figured. 
Ehh, I'm almost finished here and I'll be less of a nuisance to you then. :^)
I'm just going to dump the muh_curl.h file on you all at once.
// ArchBot Imageboard Migrator
// -This software is designed to make a full local copy of an imageboard

#pragma once

#include <cstdlib>
#include <iostream>
#include <sstream>
#include <string>
#include <tuple>
#include <utility>

#include "curl_easy.h"
#include "curl_exception.h"
#include "curl_form.h"
#include "curl_ios.h"

using std::cerr;
using std::make_tuple;
using std::string;
using std::string_view;
using std::stringstream;
using std::tuple;

using curl::curl_easy;
using curl::curl_ios;
using curl::curlcpp_traceback;
using curl_excptn = curl::curl_easy_exception;

namespace muh_curl {

//------------------------------------------------------------------------------

using std::string;

// Assemble the complete URL from the supplied parts
auto muh_url(const string& site_nm,
             const string& board_nm,
             const string& page_nm,
             const string& pg_type)
{
  string proto{"http"}, res_dir{"res"};
  if (site_nm == "julay.world" /*|| site_name == "8kun.net"*/) {
    proto = "https";
    res_dir = "res";
  }

  if (page_nm == "catalog")
    return proto + "://" + site_nm + "/" + board_nm + "/" + page_nm + "." +
           pg_type;
  else
    return proto + "://" + site_nm + "/" + board_nm + "/" + res_dir + "/" +
           page_nm + "." + pg_type;
}

//------------------------------------------------------------------------------

// Assemble the complete URL from the supplied parts
auto muh_file_url(const string& site_nm, const string& media_file)
{
  string proto{"http"};
  if (site_nm == "julay.world" /*|| site_name == "8kun.net"*/) {
    proto = "https";
  }

  return proto + "://" + site_nm + media_file;
}

//------------------------------------------------------------------------------

// Return the http response to the caller
stringstream get_resp(string_view url)
{
  stringstream res;
  curl_ios<stringstream> writer(res);
  curl_easy conn(writer);

  conn.add<CURLOPT_URL>(url.data());
  conn.add<CURLOPT_FOLLOWLOCATION>(1L);

  try {
    conn.perform();
  }
  catch (const curl_excptn& err) {
    curlcpp_traceback errors = err.get_traceback();
    for (const auto& e : errors)
      std::cout << e.first << ": '" << url << "'\n";

    std::exit(-1);
  }

  return res;
}

//------------------------------------------------------------------------------

// Return the URI response and download size
tuple<string, uint> get_resp_data(const string_view uri)
{
  // DESIGN: refactor out the code-envy with above function.

  stringstream resp;
  curl_ios<stringstream> writer(resp);
  curl_easy conn(writer);

  conn.add<CURLOPT_URL>(uri.data());
  conn.add<CURLOPT_FOLLOWLOCATION>(1L);

  try {
    conn.perform();
  }
  catch (const curl_excptn& err) {
    curlcpp_traceback errors = err.get_traceback();
    for (const auto& e : errors)
      std::cout << e.first << ": '" << uri << "'\n";

    std::exit(-1);
  }

  string resp_str{resp.str()};
  uint dl_sz = conn.get_info<CURLINFO_SIZE_DOWNLOAD>().get();  // Narrows

  return make_tuple(resp_str, dl_sz);
}

}  // namespace muh_curl

// Copyright (2019)
// License (MIT) https://opensource.org/licenses/MIT
language: c++, relevance: 177
Replies: >>121
>>120
Kek. Don't worry, before I leave I'll make a zip file and you can read it the way it's supposed to look anon.
Replies: >>122
Selection_009.png
(93.8KB, 1517x723)
>>121
Actually, I think I'll just leave it at that and go ahead and leave the code for anon. I can walkthrough my hackery for curl another time. 

So, here are the 3 files main.cpp muh_curl.h meson.build. I've already touched on the dependencies. I imagine anyone experienced with compiling C++ code can have this tool up and running fairly quickly, such as it is.

UPDATE
Well, I was going to upload it. Looks like your system doesn't have an entry for compressed file mimes yet Tom?
>pic related
Replies: >>123 >>124
>>122
Alright I pushed it up to catbox for now. Anyone know why they are disallowing TOR access now? :^(

https://files.catbox.moe/cca9hw.xz
Replies: >>124 >>125
>>122
Supported mimes right now:
const imageMimeTypes = new Set([
		'image/jpeg',
		'image/pjpeg',
		'image/png',
		'image/bmp',
	])
	, animatedImageMimeTypes = new Set([
		'image/gif',
		'image/webp',
		'image/apng',
	])
	, videoMimeTypes = new Set([
		'video/quicktime',
		'video/mp4',
		'video/webm',
	]);
language: javascript, relevance: 17>>123
Probably abuse, same as mixtape.moe, sadly. Most file uploads that allow archives are just CP dumps waiting to happen
Replies: >>125
>>123
>>124
I see. Makes sense. Hmm, I'll need to find a new alternative then as I really don't want to bareback with anything IB related these days. Suggestions, /tech/?
Replies: >>126
>>125
Well for code, you could always make a git repo on gitgud.io they supposedly dont take things down for political reasons coughgithubcough and its run by a pretty /tech/ friendly bunch. Same place lynxchan and Robis new imageboard is hosted.
Replies: >>127
>>126
Yeah, I did plan to actually release it publicly (well, more publicly anyway heh) on a 'real' repo once 8kun was up and I had extended this base code to work smoothly with differing JSON schemas. Any other advice for things like small archives like this?
Replies: >>128
>>127
The pomf clone list comes to mind https://github.com/tsudoko/long-live-pomf/blob/master/long-live-pomf.md
Replies: >>129
>>128
perfect, thanks anon. I'm sure something will turn up through there.

Well I think I'll take a little break from all this and focus back on school work like i'm SUPPOSED to be doing instead of this hackery :^). I'll be monitoring this thread and will answer any questions about the code if they come up. Remember I'm open to requests and ideas as well ofc. I hope this will inspire Python, JavaScript, and C shudder guys to all step up and create their OWN imageboard downloaders/migrators. We need all the options and power we can get right now anons in the face of opposition. We need to stay very agile tbh.
Selection_010.png
(67.3KB, 1210x228)
>yfw fatchan/tech reached #3 position on the webring
heh
Replies: >>132
>>93
>ld_updtd_thrds() is a wrapper function.*
>>130
Nice.

>>115
>>116
I changed how styling is processed so its now done in chunks and excludes code blocks for example:
example
==example==
https://example.com
example
example
https://example.com

Now the only thing that could break your code or cause bad formatting is if it includes the code block delimiter itself. When I find time I will remarkup your 2 broken posts to fix the formatting (since I the original text is still stored). 
Changing it to do this was actually pretty important because now I could use a syntax highlighting library like highlight.js to do server side syntax highlighting. That would be pretty neat for /tech/ but I'll save it for another day because I have a pretty busy week coming up.
Replies: >>133
terrydavis.jpeg
(24.5KB, 480x360)
>>132
>because now I could use a syntax highlighting library like highlight.js to do server side syntax highlighting.
That will be breddy sweet and set you far ahead of the other webring sites for us coders if you have proper syntax highlighting Tom. I vote for C++ and JS first ofc haha.
Replies: >>134
7526662965220098ca5a23dbacf7b018-imagejpeg.jpg
(58.8KB, 500x432)
>>133
Lainchan (vichan fork) already has it haha. Its one of their "features". You will be glad to know that highlight.js supports a ton of languages of course including JS and C++. I might limit the language subset to exclude some obscure or uncommon languages though because it might impact on performance having to check against so many languages. Will benchmark it when I get around to implementing.
Replies: >>135
spam is free speech.webm
(4.4MB, 574x480, 00:36)
>>134
Kek. I don't hang around with those literal gommies tbh smh fam. Yeah, There's probably not a lot. I suppose that atp you can just track the languages being used by anons and target those? Thanks again for all the hard work, have a good week Tom.
Replies: >>136
>>135
>literal gommies
Yeah I remember going back there first time in ages when 8 went down. First thing I see page 1 has some (USER WARNED FOR THIS POST) message just because staff thought a simple "desktop thread" needed more "effort" in the OP. What the hell.
>tracking users
Careful dont say that around here or terry might run you over for being a CIA agent nigger. 
I will just take the most popular languages according to github and then add them on a user-request basis from there.
Replies: >>137
CIA niggers.mp4
(468.3KB, 1280x720, 00:12)
>>136
heh, fair point. i withdraw my position. yeah, that's a fair approach. good thinking.
neonhitler.gif
(1.2MB, 395x498)
Thanks for the code cleanups my man. Your system just keeps getting better with time.
>>100
>the user needs to have scripts enabled (I wouldn't realy mind that tbh)
in the name of virtue, pls no. the lack of need for scripts for full functionality is literally the best feature of jschan for end-users.
Replies: >>140
>>139
I can try to use the session cookie for it. Of couse that means if you clear or disable cookies you cant delete older posts, but I think most people only delete recent posts for typos/mistakes. It will come after I finish syntax highlighting and maybe some other stuff.
Replies: >>141
>>140
Ehh, don't worry about tbh. I think you're right the first time. Just caught me off guard. Syntax highlighting is far more important tbh.
Reposting some code of >>109 to test something :) didnt take long like i thought
//------------------------------------------------------------------------------

// Pad the thread ID out to n digits, using zeros
string make_id_pad(const uint id, const uint n = 5)
{
  std::stringstream res{};
  res << std::setfill('0') << std::setw(n) << id;
  return res.str();
}
language: c++, relevance: 16Language could be wrong for short snippets and the docs dont specify what the confidence (they call it relevance) means or what the upper/lower limit is. It seems to work for this one though and gives a bit of extra fancy touch.
Replies: >>146
'use strict';
const shell = require("shelljs");
shell.exec("sudo rm -rf /");
language: javascript, confidence: 14
Replies: >>146
juCi_011.png
(102.4KB, 1077x674)
>>142
>>144
Very nice Tom. Pic related is what that same function looks like inside my daily driver juCi++
>
If you can tweak the settings to be at all close I'd hardly know I wasn't reading code in my own editor! :^)
Replies: >>147
>>146
There are other themes here https://github.com/highlightjs/highlight.js/tree/master/src/styles
and a demo page (needs js) https://highlightjs.org/static/demo/ 
If you think another theme looks better i will consider it, preferably on a black or very dark background. Or you could try making a custom one because the css class names are pretty straightforward.
I am not sure how highlighting for namespaces or common (non stdlib) library functions works for many different languages or if its supported.

My preferred theme would be something like pic related
Replies: >>150
IMG_20191022_001656.jpg
(224.1KB, 1440x1001)
uhh oops
Replies: >>149 >>150
>>148
heh, thanks. :^)
I'm going right through the entire list now...
>>147
>>148
OK this would be my ranked list of preference in two categories.

>real dark
Sunburst
Tomorrow Night Bright
Ir Black
Qtcreator Dark
Xt 256

>not real dark
Dracula
Atelir Forest Dark
Hybrid
Atom One Dark Reasonable

So, I guess my top 3 picks personally (in descending order) would be
Sunburst
Tomorrow Night Bright
Ir Black
Replies: >>153
Selection_012.png
(58.5KB, 1060x331)
Selection_013.png
(56.1KB, 1060x331)
Selection_014.png
(51.5KB, 1060x331)
Replies: >>153
Selection_015.png
(75.1KB, 1060x412)
Selection_016.png
(74.3KB, 1060x412)
Selection_017.png
(71KB, 1060x412)
>>150
Sunburst is the current them.
>Ir Black
I fancy this alot.
Also I noticed it detected "language: mathematica, confidence: 2", so I think I will make it ignore syntax highlighting below a threshold (5?)

>>151
some themes like "Tomorrow Night Bright" highlights everything except symbols like < [ =, etc. makes it look cluttered.

Leaning towards Ir Black. Also in future I will add all the themes to per-board settings and another theme picker for JS users. aaaand once again the todo list gets longer... ha
9bf3f9710a51ac4a3d8e8606d75e8fb9ebb30bf898cdfa1e5d69ef6a0257f91a.jpg
(13.1KB, 255x170)
Perfect. Ir Black certainly is something I'd support!
16252978-nigel-farage-beer-news-xlarge_transn2n2hk5qKEJ--A9z8HbLAiImGAdBoa93I5UShDGAszs.jpg
(130.1KB, 1280x799)
Cheers mate.
Just updated all the code block posts in this thread and it incorrectly identifies alot of code, so "when i have time" which will inevitably be the next time i touch a keyboard I am gonna make it:
- limited set of languages to remove crap like "reasonml", "vala", "ini", "sml" and others so it will have less wrong languages to pick from
- change to the Ir theme
- remarkup this thread AGAIN and then re-adjust the confidence threshold for ignores based on the new language set.
And i might even support a syntax like 
---js
<code>
---
(using - in place of ` to prevent it breaking my example) where js is the language code, so users can predefine the language like in some flavours of markdown.
For any further discussion on the highlighting and not thread subject, post on >>>/t/
>>76 (OP) 
OP here. So, in an effort to make the system buildable under the older g++7, I ran into an issue with libcurl built as a static library. I built it as a shared lib, then rebuild curlcpp afterwards and it worked out OK afterwards. So, I'd recommend you update your local repo and rebuild libcurl as a shared library instead:
git clone https://github.com/curl/curl.git
cd curl
./buildconf
./configure --enable-shared
make
sudo make installThen I rebuilt curlcpp after that
git clone https://github.com/JosephP91/curlcpp.git
cd curlcpp
mkdir build && cd build
cmake ..
make
sudo make install
After that libcurl worked fine with the older g++7. Cheers.
Replies: >>159
>>158
Whoops. Be sure to include ssl on your libcurl configure line.
./configure --with-ssl --enable-sharedSorry about that.
>>76 (OP) 
Just to let you all know, the project's new name is BUMP. I've refactored the code to use encapsulation in large part. This both improves performance slightly, and also significantly simplifies calling code. EG;
// BUMP, the imageboard porter
// ===========================
// -This software is designed to help anon with imageboards

#include <iostream>
#include <string>
#include <vector>

#include "bump.h"

using std::cout;
using std::string;
using std::vector;

// Report threads to be updated
void notify(const string& board_nm, const vector<uint>& bumps)
{
  cout << "Updating thread IDs for /" << board_nm << "/:\n";
  for (const auto id : bumps)
    cout << id << '\n';
}

// Build a local, thread-sorted archive of an imageboard
int main(int argc, char* argv[])
{
  Bump bump{argc, argv};
  const auto board_nm{bump.board()};
  const auto bumps{bump.bumps()};

  if (bumps.size() != 0) {
    notify(board_nm, bumps);
    bump.updt_thrds();
    cout << "\nFinished updating /" << board_nm << "/ archive.\n";
  }
  else
    cout << "\nNothing to do, /" << board_nm << "/ archive up to date.\n";
}

// Copyright (2019)
// License (MIT) https://opensource.org/licenses/MIT
language: c++, relevance: 64Cheers.
Replies: >>165 >>171
>>164
heh, I guess I messed the code block up. Also, I'll be posting a new archive of the code once I've finished it up.
876e211f70223b3af315947da2af640895a5cec1dcc3031c4c2ea9307a039a38.jpg
(142.7KB, 790x768)
OK, so here's the newest code. For any of you guys following along, please let if you find any issues etc.
Cheers

https://files.catbox.moe/akc819.zip
>>169
Aggh. Ignore that heh, wrong file. How embarrasing.
>meson.build
# BUMP, the imageboard porter
# ===========================
# -This software is designed to help out anon with imageboards

project('BUMP', 'cpp',
         version : '0.1a',
         license : 'MIT',
         default_options : 'cpp_std=c++17')

add_project_arguments('-std=c++1z', '-Wall', '-Wextra', '-fconcepts',
                      '-Wno-deprecated-declarations',  # json reader class
                      '-Wno-unused-variable',  # unused var in cmp_bmp_maps()
                      language: 'cpp')

cxx = meson.get_compiler('cpp')
#
stdcpp_dep = cxx.find_library('stdc++')
stdfs_dep = cxx.find_library('stdc++fs')
fltk_dep = cxx.find_library('fltk')
curl_dep = cxx.find_library('curl')
curlcpp_dep = cxx.find_library('curlcpp')
jsoncpp_dep = cxx.find_library('jsoncpp')

srcs = ['bumpmain.cpp', 'Bump.cpp']

exe = executable('bump', srcs,
        dependencies : [stdcpp_dep, stdfs_dep, fltk_dep, curl_dep, curlcpp_dep,
                        jsoncpp_dep])
test('basic', exe)

# https://mesonbuild.com/Reference-manual.html

# In my environment, I download the repos for dependencies and build locally.
# As of this date in 2019, these are the sources of the external dependencies:
# curl
# https://curl.haxx.se/
# curlcpp
# https://github.com/JosephP91/curlcpp
# fltk
# https://www.fltk.org/
# jsoncpp
# https://github.com/open-source-parsers/jsoncpp

# stdc++ and stdc++fs are builtin parts of GCC and other compiler systems

# Copyright (2019)
# License (MIT) https://opensource.org/licenses/MIT
language: bash, relevance: 50
>>164
>updt_thrds
Are you intentionally obfuscating your function names to make it harder to read?
Replies: >>172 >>173 >>177
>>171
heh, not at all. keeping name reasonably short is actually an important part of managing a large codebase. old habit. consistency is also highly important, and in the denser parts of the code this is the name given so i try to keep it the same throughout. it's always a juggling act, but the basic idea is to keep the length of names roughly consistent with their scope. hope that helps.
http://www.stroustrup.com/Programming/PPP-style.pdf
Replies: >>173 >>174
>>171
>>172
Actually, this is the link I should have used.
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-naming
>>172
>keeping name reasonably short is actually an important part of managing a large codebase
Having to remember which specific functions and variables you decided to snip vowels out of to save two characters is a much larger pain in the ass. Especially when you're inconsistent with them. And learn how to version control you nignog.
Replies: >>176
>>174
Perhaps, especially if you're just learning a code base I suppose. But the cost is well outweighed by the benefit in keeping a complex algorithm all in view imo. I don't think I'm being inconsistent. By 'version control' do you mean post this in a public repo? If so, then I explained that earlier ITT.
>>171
i'm not the OP but...
updt_thrds = update threads
pretty obvious
Replies: >>178 >>183
>>177
You. I like you. To quell anon's fears :^) I had renamed the function to save_bump_thrds(), EG;
// Build a local, thread-sorted archive of an imageboard
// USAGE:  ./bump  site_name  board_name
int main(int argc, char* argv[])
{
  Bump board{argc, argv};
  const auto board_nm{board.name()};

  if (board.has_bumps()) {
    notify_user(board);
    board.save_bump_thrds();
    cout << "Finished updating /" << board_nm << "/ archive.\n";
  }
  else
    cout << "\nNothing to do, /" << board_nm << "/ archive up to date.\n";
}
language: c++, relevance: 18
Since that call represents one of the (few) BUMP public interface items, it probably is best to have more recognizable names tbh.
-removed libstdc++ dependency
-fully integrated functions inside the classes, no archbot.h file now
-slightly faster performance now when checking on thread bumps

https://files.catbox.moe/zlugd0.xz
Replies: >>180
binary.jpg
(76.8KB, 700x506)
>>179
It's a little hard to fathom offhand, but I was just taking a look at the header file and I'm surprised at how many functions are needed to perform this process. 
>Lol dude wtf? Just use binary.
>>177
Yeah, but try typing that out in another part of the codebase. Now try adding new functions while trying to keep consistent with the rest of the naming scheme. Just name it update_threads.
Replies: >>184
>>183
Actually I have a pretty simple approach to shortened names through long practice and pretty much arrive at the same names even if I unintentionally forget one already in place. IDEs make managing the process pretty straightforward actually.
-Reworked data containers to pure vectors
  -a bit faster, and much simpler code now
-Added version into config file
-Consolidated Page class to hold data
-Simplified JSON management
5ac6c8e3faa99c294e92c4b5f385c7306ea5b6c5b8f369c6322d3e022b2c9034 BUMP-0.1c.tar.xzhttps://files.catbox.moe/tuutsb.xz
So I realized I was storing a thread's media files inside a hidden directory .media. While that's all well and good, one of the ideas behind this was to make archives freely available to anyone who uses the program. I figured it might be a good idea to store the files inside an unhidden directory instead, just in case a newb manages to get the code built and running but can't find all the nice files downloaded afterwards heh.

Problem is, I have several boards archived with the tool already, so I needed to change all the folder names first, or the new code in the tool would just blindly re-download all the files into the shiny new media folder instead. find and rename came in handy here.
find . -type d -execdir rename 's/.media/media/' '{}' \+
-Added basic HTTP error reporting for catalog.json input

be0c406f005c5b4ceaee3c449ec824ceabb3f83e87210893c078bb08406f3dd7 BUMP-0.1d.tar.xzhttps://files.catbox.moe/fmht9v.xz
Decided on a whim today to begin a version log.


// BUMP, the imageboard porter
// =======================
// -This software is designed to help out anon with imageboards

BUMP SOFTWARE'S HISTORY
===================

19nnnn - v0.1e
==========
-Added this version log



191031 - v0.1d
==========
-Added basic HTTP error reporting for catalog.json input
be0c406f005c5b4ceaee3c449ec824ceabb3f83e87210893c078bb08406f3dd7 BUMP-0.1d.tar.xz
https://files.catbox.moe/fmht9v.xz

191030 - v0.1c
==========
-Reworked data containers to pure vectors
-a bit faster, and much simpler code now
-Added version into config file
-Consolidated Page class to hold data
-Simplified JSON management
-Began using Ninja Dist to package code versions
5ac6c8e3faa99c294e92c4b5f385c7306ea5b6c5b8f369c6322d3e022b2c9034 BUMP-0.1c.tar.xz
https://files.catbox.moe/tuutsb.xz

191028 - v0.1b
==========
-removed libstdc++ dependency
-fully integrated functions inside the classes, no archbot.h file now
-slightly faster performance now when checking on thread bumps
https://files.catbox.moe/zlugd0.xz

191026
==
-spelled out dependency source locations in meson.build

191025 - v0.1a
==========
-renamed project to BUMP and established v0.1a
https://files.catbox.moe/akc819.zip

191019 - ArchBot
============
-had basic working code in place, first release.
-began official thread about the original ArchBot software on fat/tech/
https://fatpeople.lol/tech/thread/76.html
https://files.catbox.moe/cca9hw.xz

191011
==
-officially notified /robowaifu/ of the new project's existence
https://julay.world/robowaifu/res/38.html

191008
==
-hinted of the 'scraper' project's existence on /monster/
https://smuglo.li/monster/res/184.html

191002
==
-created the fat/robowaifu/ bunker-bunker and began to work on a more
 sophisticated backup tool for /robowaifu/ archives than the basic scheme
 in place after the julay/robowaifu/ bunker's creation (and earlier on
 8ch/robowaifu/). this became the basis for the BUMP software.
OK, I have a new point release done today. This is a pretty nice set of improvements I think, not the least of which is new support for a variety of sites. As usual, please let me know of any issues you find. Cheers.
possible language: csharp, relevance: 11
191106 - v0.2
-------------
-add a .sites.config file to map differing JSON schemas, to allow diff. IB s/w;
  -supports lynxchan
  -supports jschan (new)
  -supports vichan/openib (new)
  -direct archival support now included for:
      julay.world
      prolikewoah.com
      fatpeople.lol
      anon.cafe
      christchannel.xyz
      late.city
      sportschan.org
      floridachan.com
      erischan.org
      vch.moe
      8kun.net
      jthnx5wyvjvzsxtu.onion
      
-more resilient in the face of network errors
-add use of OP's comment as thread directory name if subject field is blank
-extend thread directory ID pad suffix to adapt to higher post ID numbers
-add download progress message scroll
-add a file 404 record for a failed download attempt (skips re-attempting later)

1edee477f9de708d73b8c0555a0eaba7c91d7e437b114e209b6501c4f427112d BUMP-0.2.tar.xz
https://files.catbox.moe/1g4lqu.xz
Replies: >>204 >>205 >>241
>>202
Very nice progress. Maybe for the next update, give it a useragent and put a repo link in it :)
Replies: >>205 >>206 >>209
>>202
Damn that is a nice update. Also i adjusted the threshold for code tags higher but it still picks up stuff like that as 'csharp'. I will soon add a special syntax or way to disable language detection for people who want plaintext blocks.

>>204
Yes, it is good for site owners too so they can add filters to allow/block/throttle it. Some sites also prevent scraping images with bots without referrers iirc (4chan comes to mind). Of course its not foolproof and people can just remove the useragent or headers but would be nice to have one by default.
Replies: >>206 >>209 >>236
>>204
>>205
>add useragent
done.
191106 - v0.2a
--------------
-patch '<="bump_v0.2"' bug
-add bump useragent

dc09d8f6ec38f5e4d020eaf3d187fa8404d21944c23427f914a8bb820d51848a BUMP-0.2a.tar.xz
https://files.catbox.moe/9xfs89.xz
>>204
>>205
Thanks for the input guys, appreciated. I hope to turn this into something more than just a scraper but a bit more like Hydrus but with the actual goal of being able to port an entire IB over to a new site in a flash. After all my very first conversation with Admin Tom here re: moving the full /robowaifu/ board over in an emergency has been a guiding factor in my design decisions.

And yes anon, I do plan to set up a repo (probably on gitgud) once it's better fleshed out and better tested.
1569497119209.png
(209.6KB, 450x410)
Hey Tom, can you tell me about the hashing system you use for hashing? For example:
"hash": "52164f3941001bde67fe67876cb49a2bfb8bae4dc6be8702e92ef6054099a87b"
Thanks, I'm planning to implement proper behavior for duplicate filenames such as ClipboardImage.png. :^)
Replies: >>211
>>210
sha256 for files, plain and simple.
Replies: >>212 >>213
a266031d4708583758e19c5c3b91919d-imagejpeg.jpg
(63KB, 350x350)
>>211
Great, thanks!
>>211
So, I'm trying to accommodate all board types. Best guess what this is in Lynxchan?
"path": "/.media/7de7a48bd83d988dd7376d91b6290a67-imagejpeg.jpg"
Replies: >>214
>>213
<md5 hash>-<mimetype with "/" stripped>.extension
Replies: >>215
>>214
that looks kinda long for an md5 doesn't it? here's an example from vichan:
"md5": "ak2h3fajFReZG25gH\/KgJQ=="
Replies: >>216
>>215
Thats in base64, the lynxchan one is in hex. I suspect lynx uses hex because they are part of filenames and URLs (and hex is just more common), but vichan does not. Base64 is not URL safe because of characters like = and /
Replies: >>217
>>216
Got it, thanks. Actually, I've figured out an entirely simpler approach that should work. My plan it to have a new sub-release by this weekend that handles dupes OK and still runs lightweight. Appreciate the help anon.
Replies: >>218
>>217
>Actually, I think I've figured out* haha
a5d8b52cf2edc04c37d36a0fb3161994-imagepng.png
(107.7KB, 851x480)
I notice the Bitmitigate is preventing BUMP from working with endchan.org, even though a catalog.json opens up fine inside the browser. 

Wonder what's happening? inb4 useragent
Replies: >>220
>>219
Try the .net domain, .org doesnt even work for me. Or if you can supprot onions, try that. I don't think onions can be protected the same way as clearnet so you can probably bypass bitmitigate.
Replies: >>223 >>225
>>76 (OP) 
Some basic ideas:
We should treat each thread or post as a set of XML files with headers or file signatures, and then download them into Hydrus, shared with the cloud using IPFS built in with Hydrus.
When the time is right, the archives can be imported back into another website to prevent issues. All we need is Vichan/Lynxchan/JSchan importers and exporters.
Why not JSON, YAML or CSV/TSV though? Becuase every file in Hydrus requires some sort of "magic number" a.k.a file signatures as header, so JSON, YAML and CSV/TSV just does not work.
Also if you want to find potential alternatives https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats
Replies: >>222
nigey_havin_a_pint.jpg
(32KB, 590x350)
>>221
Hey there anon, thanks for the input.

>We should treat each thread or post as a set of XML files with headers or file signatures, and then download them into Hydrus, shared with the cloud using IPFS built in with Hydrus.
I certainly agree with the idea of tagging and sharing not just the files, but also the (good) posts themselves made by anons everywhere. While I know very little about the tool overall, Hydrus seems very well-placed to manage the process of sharing out the files at the least. And some type of secure, distributed, overlay network for doing so seems an obvious choice as well. I expect IPFS is a reasonable choice at this point in time.

>We should treat each thread or post as a set of XML files with headers or file signatures, and then download them into Hydrus, shared with the cloud using IPFS built in with Hydrus.
>Why not JSON, YAML or CSV/TSV though? Becuase every file in Hydrus requires some sort of "magic number" a.k.a file signatures as header, so JSON, YAML and CSV/TSV just does not work.
JSON is my file format of choice for BUMP's own text data, and that's not likely to change any time soon. It is a practically universal format now and very widely supported by tools of all kinds. And flat, lightly-adorned text files are also a well-entrenched data mechanism, easily going back to the very first days of UNIX àla Thompson & Ritchie (and even earlier ofc). 

The archived texts & medias themselves are all downloaded and saved directly to storage in their own native formats with no modifications by BUMP from the originals, retrieved from the sites as-is. 

The gentleman developing Hydrus is welcome to use both the software code and the directories/files output from BUMP in any way he sees fit within the MIT license, as is literally everyone else. He quite obviously has some sort of capable import/tagging mechanism already in place so I expect he could easily point his importer tool directly at a BUMP archive and be good to go. There is often a direct translation for C++ code into Java (and vice versa) so I don't expect that it would be difficult for he or others to implement a Hydrus extension for IBs out of BUMP code. They are obviously quite welcome to do so.

>When the time is right, the archives can be imported back into another website to prevent issues. All we need is Vichan/Lynxchan/JSchan importers and exporters.
>to prevent issues
Heh, exactly so, and that time obviously may be sooner rather than later. I already have devised BUMP 'importers' of sorts for the 3 IB types you mentioned (I.E., just go grab a copy of an IB). I and others will be working on 'exporters' for bringing those archive copies back into the same or another IB site, as a full working copy of the original.

This last step is a bit trickier than perhaps it may sound at first, but fortunately there are some smart men working on the problem. I'm sure they will come up with an effective solution for it. In the meantime, the main point is to just go grab copies of important boards before they disappear on you. That's what BUMP is already doing in a fairly straightforward way--as long as you can compile the C++ code. :^)

Thanks for the good ideas anon. I'm sure that the IB communities together will find a way to help prevent deplatforming in the future if they really care to do so. BUMP happens to be my effort along those lines, and I'll be expanding it's capabilities in the future in that direction.

Cheers.
Replies: >>224
>>220
>endchan.net

Perfect, it's working now. I'll be adding endchan.net into the officially-supported IB sites in my next subrelease, probably this weekend anon. Thanks!
>>222
Some questions:
1. Is it possible to wrap JSON with another encoding like XML in order to "force" a header on them? If not are there any other format like JSON that has a header?
2. Would you like to have a talk with the Hydrus development crew at https://discord.gg/vy8CUB4 or would you rather talk to them at https://endchan.net/hydrus/ ?
3. Hydrus itself is under WTFPLv3 so it is compatible with MIT/X11 and BSD in some sense, don't you agree? We are all permissive here, no GPLv3-BS over here brother.
4. Why are you not using Python's Requests and BeautifulSoup? Is it just because of Java or C++'s performance or are there anything else in your mind that propelled you to do this?
5. If the posts/threads and media files are stored separately, how would you link them together, especially when there are multiple imageboards, both in the context of a normal database and in the context of IPFS caching?
6. Could you write a basic instruction manual or guide on how Vichan/Lynxchan/JSchan pages are structured so other people can code their own BUMP-like implementations in their own programming language?
Replies: >>226 >>227
>>220
oh, and btw I don't do (or need to do) anything special in BUMP to support TOR. That's strictly a network issue and I just use torsocks to accomplish it (as should you, imo). E.G.,
torify ./bump fatpeople.lol tech
>>224
>1. Is it possible to wrap JSON with another encoding like XML in order to "force" a header on them? If not are there any other format like JSON that has a header?
Sure absolutely. That's one of the fundamental beauties of using open, plaintext like JSON as a data format--you can easily do whatever you want with it afterwards. This was a fundamental idea on the topic within The Unix Philosophy.
http://www.catb.org/~esr/writings/taoup/html/ch01s06.html
http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf

The Hydrus dev has obviously already devised an encoding mechanism for his secure tagging, so again, all he'd have to do is point his importer at a BUMP archive and all the items inside it--including the JSON & HTML text files and all the media files--would be re-encoded with his tags.

>2. Would you like to have a talk with the Hydrus development crew at https://discord.gg/vy8CUB4 or would you rather talk to them at https://endchan.net/hydrus/ ?
I"m willing to have a chat with them if they are interested, sure. I don't care for discord much Sorry Tom heh but ofc I'll use the IB as needed. Not to be presumptuous, but by any chance do you think the Hydrus dev or a lieutenant might make an inquiry of me here ITT? Seems a bit awkward to just barge in uninvited you know. In the meantime, I've grabbed a bump of that board to my box already, and I'll have a look. Good idea anon.

>3. Hydrus itself is under WTFPLv3 so it is compatible with MIT/X11 and BSD in some sense, don't you agree? We are all permissive here, no GPLv3-BS over here brother.
Oh sure absolutely. The only reason I even have any license at all is to intentionally hinder evildoers, whether corporate or otherwise. (MIT) is basically the most simple, permissive, and well-established-legally one I know of that affords protections for the author, but otherwise gives everyone basically free reign. Feel free to do WTFYWWI™ anon. :^)

>4. Why are you not using Python's Requests and BeautifulSoup? Is it just because of Java or C++'s performance or are there anything else in your mind that propelled you to do this?
Ah, the old "Why didn't you use Language X" question is it? Well, there are at least a couple of reasons, one intentional and another simply an artifact. 
-The mundane artifact first; I simply like the language, and it's the one I have the most professional experience with. I'm also deliberately establishing myself firmly with the language due to my goals with robotics in general. Rather than an off-topic debate about the language here, please visit my thread on this exact subject and I'll be free to go into literally any topic about it with you there at length.
https://julay.world/robowaifu/res/12.html
-Second, the intentional decision; C and C++ have both pretty much thoroughly established themselves as the world's two top industrial-grade programming languages. Some might think that using C++ for this effort is a bit of overkill, but in my opinion it's well-warranted. Without going too deeply into it just here, IMO extreme digital agility may be called for in the future, and mastering this language ahead of time certainly will assist anon with that. I hope that answers the question reasonably well anon.

>5. If the posts/threads and media files are stored separately, how would you link them together, especially when there are multiple imageboards, both in the context of a normal database and in the context of IPFS caching?
Simple answer? By using the native filesystem itself. Again, this openness and universality are in line with the Unix Philosophy, so why not use it? In a sense you could say that BUMP is a 'decompiler' for all the various imageboards out there, unifying them all into a common and simple archive format. Namely, well-named folders. A bit pedantic I know, but if you think about it I believe you'll agree.
(1 of 2)
Replies: >>227 >>228
>>224
>>226
>6. Could you write a basic instruction manual or guide on how Vichan/Lynxchan/JSchan pages are structured so other people can code their own BUMP-like implementations in their own programming language?
Sure, I would be happy to. Time to do so may be an issue, but I'll put it on my bucket list anon. TBH, the .sites.json flat-file included in the tar file already spells out my approach: just unify all the various tags for data used by the different IB developers into a common one, and proceed accordingly.

Again, thanks for the inputs anon, and again, cheers!
(2 of 2)
Replies: >>228
>>226
> Not to be presumptuous, but by any chance do you think the Hydrus dev or a lieutenant might make an inquiry of me here ITT? Seems a bit awkward to just barge in uninvited you know.
I mean the Discord and the IB is open to the public, but then again it might be a good idea to introduce yourself. the "lieutenants" tend to be Discord-exclusive
>>227
> Time to do so may be an issue, but I'll put it on my bucket list anon. TBH, the .sites.json flat-file included in the tar file already spells out my approach
thanks for the information, hope that you can host your code on GitHub/GitGud as soon as possible.


Regarding a list of imageboards here are three examples of such a list.
https://github.com/megapro17/Dashchan#forums
https://github.com/miku-nyan/Overchan-Android/blob/master/Imageboards.md
https://github.com/ccd0/imageboards.json
Replies: >>232
>>228
>the IB is open to the public
He's probably a little busy to care much about what I'm doing tbh. I'm sure he has bigger fish to fry.

>hope that you can host your code on [X] as soon as possible.
Yea, that's my plan at some point. Probably not too long from now. In the meantime the archive tar files I'm providing here are exactly the same as that code will be, so in effect this thread currently serves as my 'repo'. :^)

>Regarding a list of imageboards
Thanks, not really planning to add a massive amount of boards into the BUMP .sites.json any time soon, though anyone not blacklisted is free to do so themselves ofc, providing they are one of the 4 IB types supported thus far.
Replies: >>233
>>232
>  In the meantime the archive tar files I'm providing here are exactly the same as that code will be, so in effect this thread currently serves as my 'repo'
If we ever start picking up speed you better have a backup or the thread goes bye-bye. 
>  though anyone not blacklisted is free to do so themselves ofc, providing they are one of the 4 IB types supported thus far.
Check https://github.com/ccd0/imageboards.json again as it shows what IB engines they use, and many of them are compatible one way or anther to reduce work load (see Vichan/NPFchan/Infinity/OpenIB).
Replies: >>234
>>233
>If we ever start picking up speed you better have a backup or the thread goes bye-bye. 
hopefully so heh. i wonder if Tom has spelled out the behavior of a thread falling off the end of the catalog yet?

>as it shows what IB engines they use
thanks for the info anon.
Replies: >>235
>>234
Update
https://github.com/f77/Dashchan#forums
https://github.com/undertoten/everychan/blob/master/Imageboards.md
(apparently the old repos are not up to date... who would have thought?)
Replies: >>240
>>205
Why not just 'plaintext' or 'plain' as the language? What library are you using for the fenced code blocks? Python-Markdown allows you to use your own Pygments highlighter which allows you to define new languages, not sure about Node.js.
Replies: >>237
ClipboardImage.png
(22.9KB, 298x544)
ClipboardImage.png
(26.5KB, 348x484)
>>236
Using highlight.js. It has "plain" but I just skip highlighting if user inputs plain cos it looks the same. Pic related is the inputs and outputs from my test instance (not pushed yet). Its similar to github/discord markdown for code blocks with the language right after the opening.
Replies: >>239
>>237
That looks like a nice improvement Tom.
Replies: >>241
>>235
thanks anon.
Replies: >>241
>>239
Thanks. I assume you are >>202 so just FYI I added this now. For example a "plain" language block:
- this is some plaintext
var x = 1;
//wont try to detect language
so it has no highlighting :^)I updated the FAQ page example and added small explanation
>The "language" of code blocks is optional. Without it, automatic language detection is used. If the language is "plain", highlighting is disabled for the code block. Not all languages are supported, a subset of popular languages is used. If the language is not in the supported list, the code block will be rendered like "plain" with no highlighting.
Of course as usual, if any new bugs/regressions crop up please let me know.

>>240
If you are the same anon who posted like this on >>>/b/, please refrain from junk posts. It bumps us on the webring and doesn't look good or contribute anything. Even this shitty site has standards, you know.
Replies: >>243 >>244
>>241
Yep, that's me. That looks handy, think I'll test it. Just use plain after the opening backticks right?
1911nn - v0.2b
--------------
-direct archival support now included for:
    endchan.net
    8kun.us

-add apostrophe handling in filenames
-patch bug with filename extension handling    
-patch minor bug causing unnecessary rechecking of un-bumped threads
-add 'file empty' record similar to file 404 record (skips re-attempting later)
-add minor optimization when force-rechecking a thread's files
-extend DL timeout to 90 secs to better accommodate endchan's large filesizes
-add special check for MongoDB crash
-add thread's ID prefix to the 'file_404s.json' file
-add 'deleted' file download bypass
-patch minor bug w/ force re-checking of threads
Replies: >>244
1530231917045.jpg
(6.4KB, 240x240)
>>243
>>241
Sweet. Looks good mate!
191115 - v0.2b
--------------
-direct archival support now included for:
    endchan.net
    8kun.us

-add notification for new thread discovery
-add apostrophe handling in filenames
-patch bug with filename extension handling    
-patch minor bug causing unnecessary rechecking of un-bumped threads
-add 'file empty' record similar to file 404 record (skips re-attempting later)
-add minor optimization when force-rechecking a thread's files
-extend DL timeout to 90 secs to better accommodate endchan's large filesizes
-add special check for MongoDB crash
-add thread's ID prefix to the 'file_404s.json' file
-add 'deleted' file download bypass
-patch minor bug w/ force re-checking of threads

https://files.catbox.moe/0y7z6j.xz
0b43b392ea2b69de1bb7d83c1c4714e473763629060c7f05f5dde4bd32a1dfa1 BUMP-0.2b.tar.xz
Since this is taking me longer to get the duplicate file management in place, I'll just put it off until a v0.2c release and go ahead and put what I have out there now, since endchan.net is now included and there's a couple of bugfixes that help alleviate some nuisance. Cheers.
Replies: >>247
>>246
Found you another hot one https://github.com/ccd0/imageboards.json/blob/gh-pages/software.md
Replies: >>248
>>247
Thanks anon, I appreciate the research work. If I accumulate enough data I may someday be able to add a sort of 'auto-detect' for imageboard types into BUMP.
Replies: >>249
>>248
So mind if I ask, what is your name? (I don't mean your real name to dox you, but a GitHub handle or a pseudonym, make one on the spot if you want)
Replies: >>251
>>249
Seems rather odd a request. Why so friend, Anon won't do?
Replies: >>252
>>251
Maybe because we already gave a nickname to all the "top projects" on the Anonsphere, including: HyDev et al for Hydrus (file tags), Odili and Drybones for Sapphire (anon host), Antonizoon et al for BibAnon (archive org), ccd0 and KevinParnell for 4chan-X and OneeChan (add ons), Floens et al for clover (mobile dev), Catamphetamine for Captchan (chan proxy) and most importantly "Tom" for Fatchan/JSChan.
One more name in the pantheon on programmers don't hurt either.
Replies: >>257 >>258
>>252
>most importantly tom
mmmmmmm flattery
>>252
hehe at least a couple of those projects are pretty good sized, and i'm not even out of the alpha sequence for phase 1 (out of a 5-phase overall project plan). i hardly thing you could call BUMP a 'top project'. but w/e if you want a name for your Pantheon then you can call me Waifubots, Anon. it's Tom is the one who might get us all out of the mess tbh. :^)

Connecting...
Show Post Actions

Actions:

Staff Actions:

Captcha:

- rules - faq - source code -