Your Files Are a Mess – And It’s About to Confuse Microsoft 365 Copilot

Data hygiene has always been important. But now it matters even more.
Your organisation has thousands of files across SharePoint, OneDrive, and Exchange. Some haven’t been touched in years. Others are outdated, duplicated, or belong to employees who’ve long since left.

Now you’re preparing to roll out Microsoft 365 Copilot. You’re expecting it to be a game-changer, surfacing the right documents, summarizing information, answering questions. But there’s a problem.

Copilot doesn’t know what’s relevant. It works with whatever content users have access to. And if your tenant is cluttered with stale, outdated content, that’s exactly what Copilot might pull into its answers.

In this guide, you’ll learn how to clean up your Microsoft 365 tenant so Copilot can perform as intended. Specifically:

  • Why outdated content creates real issues for Copilot
  • How to reduce that noise using Microsoft Purview retention policies
  • How to control what Copilot can access in the short term with Restricted SharePoint Search (RSS).

This isn’t just about Copilot. It’s about setting a foundation for better content governance and compliance across your organisation.

Want to read up on retention first? Check out this blog post: How to create retention policies in SharePoint Online.


1. Why Outdated Files Are a Problem for Copilot

The core challenge is straightforward: Copilot draws from the same content sources as your users from SharePoint sites, OneDrive folders, Exchange mailboxes. It doesn’t distinguish between old and new, accurate and outdated.

If your tenant contains years-old proposals, deprecated product specs, or outdated policy documents, those files are still fair game for Copilot to reference.

The impact?

  • Users receive outdated or incorrect answers
  • Trust in Copilot drops
  • Your content governance problems become even harder to manage

Cleaning up this stale content is essential if you want Copilot to deliver relevant, accurate, and timely responses.


2. Use Retention Policies to Clean Old Content

Microsoft Purview ’s Data Lifecycle Management tools allow you to define how long content should be kept, when it should be deleted, and which files should be retained for compliance reasons.

Here’s how you can use it:

  • Create Retention Policies
    Target files in SharePoint and OneDrive based on activity or age.
    For example, you might configure a policy to delete any document on SharePoint Online that hasn’t been modified in 7 years.
  • Use Retention Labels
    Labels apply specific rules to certain types of files (instead of containers) independent of where they’re stored. If you move the file, the label stays active.
    A good use case: automatically deleting CVs 30 days after last modification, regardless of the folder they’re in.
  • Use Records Management for High-Value Content
    For critical files, add a disposition review to ensure someone manually approves deletion.

This keeps your tenant clean without relying on users to manually delete old files. The less digital noise, the better Copilot performs.

Key Retention Logic to Remember:

  • Retention wins over deletion: if one rule says delete and another says retain, the item stays.
  • Longest retention wins: when multiple policies say to retain, the longest duration applies.
  • Shortest deletion wins: when multiple policies say to delete, the shortest period applies.
  • Explicit wins over implicit: a file-level label overrides a broader site-level policy.

3. Use Restricted SharePoint Search to Control Copilot Scope

Cleaning up your tenant with retention policies is essential, but it takes time. If your organisation is rolling out Copilot now, you need a short-term solution to avoid surfacing irrelevant or overshared content. You could use SharePoint Restricted Search to only roll out Copilot to your most active sites, how ever, it does remove the other SharePoint sites from your search results unless your users search inside the site itself. I do not recommend SharePoint Restricted Search (RSS) as a long-term solution. You will just have to clean up your tenant’s files. For a full step-by-step guide on RSS, see my other post: How to get up SharePoint Restricted Search Step-by-Step.

What Is RSS?

Restricted SharePoint Search allows you to specify which SharePoint sites should appear in global search results and Copilot answers. It works on an “allow list” model: only sites you’ve approved will be included in organisation-wide search and Copilot queries.

Everything not on that list is excluded from global results, even if users still technically have access. Keep this in mind when you decide to use RSS.

This gives you a practical way to limit Copilot’s reach while you’re still cleaing up your data. It’s designed to assist Copilot roll-out, part of SharePoint Advanced Management and you can only enable it if you have an active Copilot license in your tenant or have the SharePoint Advanced Management add-on (now included with M365 Copilot).

How It Works

RSS is off by default. Once you turn it on (via SharePoint Online PowerShell), only SharePoint sites in your “allowed list” will appear in search results and Copilot answers.

You can:

  • Add up to 100 SharePoint sites to the allowed list
  • Include hub sites (their associated sites don’t count against the 100)
  • Adjust the list over time as you complete permission reviews

Changes take effect within about an hour.

It’s important to note that RSS doesn’t change actual permissions. Users can still access content they have direct rights to. It simply reduces the surface area Copilot can draw from, helping you contain potential oversharing until you’re ready to expand.


A Few Things to Keep in Mind

  • RSS doesn’t act as a deny list. It’s an “allow list only” model.
  • Sites outside of RSS do not show up in SharePoint or Teams wide search.
  • Site-specific searches still work. If a user navigates to a site they have access to, they can still search within it.
  • It only affects SharePoint-based results. Exchange, Planner, Loop, etc., aren’t included.
  • Hub sites count as one. Associated sites don’t use up your 100-site limit.
  • Recently accessed and shared content is still available (up to 2000 items).

Next Steps

Once RSS is in place:

  • Regularly review SharePoint usage reports and permissions
  • Use Data Access Governance reports in SharePoint Advanced Management to detect high-risk sites
  • Gradually expand your allowed list as you clean up oversharing

And remember, RSS isn’t forever. It’s a launchpad. Allowing you a limited-group Copilot rollout while you improve governance across your tenant.


4. Start with This Cleanup Checklist

Here’s a practical set of next steps to improve content hygiene today:

✅ Set retention policies for SharePoint, OneDrive, Exchange
✅ Use retention labels for exceptions and regulatory data
✅ Identify inactive SharePoint sites and folders
✅ Educate users on storing current content in the right places
✅ Schedule regular reviews of stale content
As last resort: enable Restricted SharePoint Search to limit Copilot’s reach


Wrap-Up: Better Data Means A Better Copilot

If you want Microsoft 365 Copilot to give your teams smart, reliable answers, you need to give it clean, trustworthy content. That starts with tackling the digital clutter: old files, outdated documents, and forgotten folders that no one’s touched in years.

Retention policies help you build a long-term foundation for content hygiene and compliance. Restricted SharePoint Search gives you a short-term safety net while you get there. But ultimately, there’s no shortcut: better data hygiene means better Copilot results.

Treat this as more than a Copilot prep task. It’s your chance to raise the bar on content governance across the board—so Copilot isn’t just accurate, it’s genuinely useful.

Start small. Stay consistent. And remember: clean data in, useful answers out.

.

Edine

Once jokingly nicknamed a sloth. It became my inspirational animal. Writes about Microsoft 365 technologies.

Leave a Reply