• twelve@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    19
    ·
    1 year ago

    I still find astonishing that tech crunch buys the argument of ML model training.

    No one in their sane mind would use the API (that have always been rate limited) for fetch data for text generation. People would use HTTP or, even better, archives of reddit.

    Why? Because there is better or no rate limit, there is no need to write anything (only reading) and it will stay free 🙂 Also super fresh data is not dramatically useful (except in very specific corner cases when something in the news change the way we talk)

    • Hotzilla@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      1 year ago

      Web crawling has always worked through raw HTTP/HTML parsing, why create site specific API calls that require authentication and are throttled.

      This excuse is pure bullshit.

    • AstralJaeger@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      Considering the Reddit API has a hilariously low limit, I fully understand why the AI bro’s will use a scraping approach instead. I’ve built small discord bots that had a difficult time following the API because you had so little Requests available! I was in the process of building an event-driven system which used multiple API tokens in order to be able to keep up with multiple feeds. Its just terrible.