• faltryka@lemmy.world
    link
    fedilink
    English
    arrow-up
    41
    arrow-down
    1
    ·
    8 days ago

    So what do we train gpt on when stack overflow degrades?

    Will library docs be enough? Maybe.

    • PriorityMotif@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      8 days ago

      SO is already degraded because they didn’t allow new answers even though the old answers are based on old depreciated versions and no longer relevant.

    • jacksilver@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      8 days ago

      This has been a concern of mine for a long time. People act like docs and code bases are enough, but it’s obvious when looking up something niche that it isn’t. These models need a lot of input data, and we’re effectively killing the source(s) of new data.

      • faltryka@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 days ago

        It feels like less stack overflow is a narrowing, and that’s kind of where my question comes from. The remaining content for training is the actual authoritative library documentation source material. I’m not sure that’s necessarily bad, it’s certainly less volume, but it’s probably also higher quality.

        I don’t know the answer here, but I think the situation is a lot more nuanced than all of the black and white hot takes.

    • UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      8 days ago

      There’s a serious argument that StackOverflow was, itself, a patch job in a technical environment that lacked good documentation and debug support.

      I’d argue the mistake was training on StackExchange to begin with and not using an actual stack of manuals on proper coding written by professionals.

      The problem was never having the correct answer but sifting out of the overall pool of information. When ChatGPT isn’t hallucinating, it does that much better than Stack Exchange