GitHub Reverses Course: Will Train AI Models on User Data Starting April 24
GitHub will begin using customer interaction data — including inputs, outputs, code snippets, and context — to train its AI models starting April 24, marking a dramatic reversal of its previous privacy commitments.
The Policy Change
Who's Affected
- Copilot Free, Pro, and Pro+ users: Data used for training by default
- Copilot Business and Enterprise: Exempt (contract terms)
- Students and teachers: Exempt
- Opt-out available: Via /settings/copilot/features (disable "Allow GitHub to use my data for AI model training")
What Data GitHub Collects
- Accepted or modified model outputs
- Code snippets shown as inputs
- Code context near cursor position
- Comments and documentation
- File names and repository structure
- Copilot chat interactions
- Thumbs up/down feedback
The Privacy Implications
Private Repositories Are No Longer Truly Private
The policy FAQ explicitly states: "If a Copilot user has their settings set to enable model training on their interaction data, code snippets from private repositories can be collected and used for model training while the user is actively engaged with Copilot while working in that repository."
This means private repos are "GitHub private*" — the asterisk denoting that GitHub's definition of "private" has limits.
Community Reaction
- 59 thumbs-down votes vs just 3 positive reactions on the announcement
- 39 community comments, overwhelmingly negative
- Only GitHub VP of developer relations expressed support
Context
GitHub cites similar policies at Anthropic, JetBrains, and parent Microsoft as justification. Chief Product Officer Mario Rodriguez claims using Microsoft employee data led to "meaningful improvements" including higher suggestion acceptance rates.
Source: The Register