[Concept,9/9] pickman: Add already-applied commit detection / comparison

Message ID 20251224213045.3010514-10-sjg@u-boot.org
State New
Headers
Series pickman: Provide better ways to check cherry-picks |

Commit Message

Simon Glass Dec. 24, 2025, 9:30 p.m. UTC
  From: Simon Glass <simon.glass@canonical.com>

When a cherry pick has already been completed this can confuse pickman
and make it hard for a reviewer to figure out what is going on.

Attempt to detect already-applied commits (by commit subject) as a way
to reduce duplicate cherry-picks and improve the reliability of the
automation.

The agent now compares actual patch content using git show and diff,
only skipping commits that are similar with minor differences like
line numbers or conflict resolutions. This should reduces false
positives while maintaining robust duplicate detection.

Co-developed-by: Claude <noreply@anthropic.com>
Signed-off-by: Simon Glass <simon.glass@canonical.com>
---

 tools/pickman/README.rst | 29 ++++++++----
 tools/pickman/agent.py   | 62 +++++++++++++++++++-------
 tools/pickman/control.py | 73 +++++++++++++++++++++++++++---
 tools/pickman/ftest.py   | 96 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 230 insertions(+), 30 deletions(-)
  

Patch

diff --git a/tools/pickman/README.rst b/tools/pickman/README.rst
index cc1ce5f81d5..857edb09f85 100644
--- a/tools/pickman/README.rst
+++ b/tools/pickman/README.rst
@@ -129,18 +129,29 @@  Already-Applied Detection
 
 Sometimes commits have already been applied to the target branch through a
 different path (e.g., directly merged or cherry-picked with different hashes).
-Pickman's Claude agent detects this situation automatically.
+Pickman detects this situation automatically in two ways.
 
-**How it works**
+**Pre-Cherry-Pick Detection**
 
-When the first cherry-pick fails with conflicts, the agent checks if the
-commits are already present in the target branch by searching for matching
-commit subjects::
+Before starting cherry-picks, pickman checks for potentially already-applied
+commits by searching for matching commit subjects in the target branch::
 
     git log --oneline ci/master --grep="<subject>" -1
 
-If all commits in the set are found (same subjects, different hashes), the
-agent:
+Commits that match are marked as "maybe already applied" and passed to the
+Claude agent with the hash of the potentially matching commit. The agent then:
+
+1. Compares the actual patch content between the original and found commits
+2. Uses ``git show`` and ``diff`` to analyze the changes
+3. Skips commits that are similar with only minor differences (line numbers,
+   context, conflict resolutions)
+4. Proceeds with commits that differ significantly in actual changes
+
+**Fallback Detection**
+
+If pre-detection missed something and the first cherry-pick fails with
+conflicts, the agent performs the same subject search and patch comparison
+process. If all commits in the set are verified as already applied, the agent:
 
 1. Aborts the cherry-pick
 2. Writes a signal file (``.pickman-signal``) with status ``already_applied``
@@ -148,7 +159,8 @@  agent:
 
 **What pickman does**
 
-When pickman detects the ``already_applied`` signal:
+When pickman detects the ``already_applied`` signal or when the agent reports
+pre-detected applied commits:
 
 1. Marks all commits as 'skipped' in the database
 2. Updates the source position to advance past these commits
@@ -161,6 +173,7 @@  This ensures:
 - The source position advances so the next ``poll`` iteration processes new
   commits
 - No manual intervention is required to continue
+- False positives are minimized by comparing actual patch content
 
 CI Pipelines
 ------------
diff --git a/tools/pickman/agent.py b/tools/pickman/agent.py
index 1d5a3df2442..b081c36b504 100644
--- a/tools/pickman/agent.py
+++ b/tools/pickman/agent.py
@@ -45,12 +45,12 @@  def check_available():
     return True
 
 
-async def run(commits, source, branch_name, repo_path=None):
+async def run(commits, source, branch_name, repo_path=None):  # pylint: disable=too-many-locals
     """Run the Claude agent to cherry-pick commits
 
     Args:
         commits (list): list of AgentCommit namedtuples with fields:
-                       hash, chash, subject
+                       hash, chash, subject, applied_as
         source (str): source branch name
         branch_name (str): name for the new branch to create
         repo_path (str): path to repository (defaults to current directory)
@@ -69,18 +69,42 @@  async def run(commits, source, branch_name, repo_path=None):
     if os.path.exists(signal_path):
         os.remove(signal_path)
 
-    # Build commit list for the prompt
-    commit_list = '\n'.join(
-        f'  - {commit.chash}: {commit.subject}'
-        for commit in commits
-    )
+    # Build commit list for the prompt, marking potentially applied commits
+    commit_entries = []
+    has_applied = False
+    for commit in commits:
+        entry = f'  - {commit.chash}: {commit.subject}'
+        if commit.applied_as:
+            entry += f' (maybe already applied as {commit.applied_as})'
+            has_applied = True
+        commit_entries.append(entry)
+
+    commit_list = '\n'.join(commit_entries)
+
+    # Add note about checking for already applied commits
+    applied_note = ''
+    if has_applied:
+        applied_note = '''
+
+IMPORTANT: Some commits may already be applied. Before cherry-picking commits 
+marked as "maybe already applied as <hash>", verify they are truly the same commit:
+1. Compare the actual changes between the original and found commits:
+   - Use: git show --no-ext-diff <original-hash> > /tmp/orig.patch
+   - Use: git show --no-ext-diff <found-hash> > /tmp/found.patch
+   - Compare: diff /tmp/orig.patch /tmp/found.patch
+2. If the patches are similar with only minor differences (like line numbers,
+   context, or conflict resolutions), SKIP the cherry-pick and report that
+   it was already applied.
+3. If the patches differ significantly in the actual changes being made,
+   proceed with the cherry-pick as they are different commits.
+'''
 
     # Get full hash of last commit for signal file
     last_commit_hash = commits[-1].hash
 
     prompt = f"""Cherry-pick the following commits from {source} branch:
 
-{commit_list}
+{commit_list}{applied_note}
 
 Steps to follow:
 1. First run 'git status' to check the repository state is clean
@@ -113,10 +137,11 @@  Steps to follow:
    - For complex conflicts, describe what needs manual resolution and abort
    - When fix-ups are needed, amend the commit to add a one-line note at the
      end of the commit message describing the changes made
-6. After ALL cherry-picks complete, verify with 
+6. After ALL cherry-picks complete, verify with
    'git log --oneline -n {len(commits) + 2}'
    Ensure all {len(commits)} commits are present.
-7. Run final build: 'buildman -L --board sandbox -w -o /tmp/pickman' to verify everything still works together
+7. Run final build: 'buildman -L --board sandbox -w -o /tmp/pickman' to verify
+   everything still works together
 8. Report the final status including:
    - Build result (ok or list of warnings/errors)
    - Any fix-ups that were made
@@ -132,11 +157,16 @@  Important:
 
 CRITICAL - Detecting Already-Applied Commits:
 If the FIRST cherry-pick fails with conflicts, BEFORE aborting, check if the commits
-are already present in ci/master with different hashes. Do this by searching for
-commit subjects in ci/master:
-   git log --oneline ci/master --grep="<subject>" -1
-If ALL commits are already in ci/master (same subjects, different hashes), this means
-the series was already applied via a different path. In this case:
+are already present in ci/master with different hashes. For each commit:
+1. Search for the commit subject: git log --oneline ci/master --grep="<subject>" -1
+2. If found, verify it's actually the same commit by comparing patches:
+   - git show --no-ext-diff <original-hash> > /tmp/orig.patch
+   - git show --no-ext-diff <found-hash> > /tmp/found.patch
+   - diff /tmp/orig.patch /tmp/found.patch
+3. Only consider it "already applied" if the patches are similar with minor
+   differences (like line numbers, context, or conflict resolutions)
+If ALL commits are verified as already applied (same content, different hashes),
+this means the series was already applied via a different path. In this case:
 1. Abort the cherry-pick: git cherry-pick --abort
 2. Delete the branch: git branch -D {branch_name}
 3. Write a signal file to indicate this status:
@@ -206,7 +236,7 @@  def cherry_pick_commits(commits, source, branch_name, repo_path=None):
 
     Args:
         commits (list): list of AgentCommit namedtuples with fields:
-                       hash, chash, subject
+                       hash, chash, subject, applied_as
         source (str): source branch name
         branch_name (str): name for the new branch to create
         repo_path (str): path to repository (defaults to current directory)
diff --git a/tools/pickman/control.py b/tools/pickman/control.py
index 4a993c72b88..3ae085d3507 100644
--- a/tools/pickman/control.py
+++ b/tools/pickman/control.py
@@ -73,7 +73,7 @@  CheckResult = namedtuple('CheckResult', [
 
 # Named tuple for commit with author
 # hash: Full SHA-1 commit hash (40 characters)
-# chash: Abbreviated commit hash (typically 7-8 characters) 
+# chash: Abbreviated commit hash (typically 7-8 characters)
 # subject: First line of commit message (commit subject)
 # author: Commit author name and email in format "Name <email>"
 CommitInfo = namedtuple('CommitInfo',
@@ -83,7 +83,9 @@  CommitInfo = namedtuple('CommitInfo',
 # hash: Full SHA-1 commit hash (40 characters)
 # chash: Abbreviated commit hash (typically 7-8 characters)
 # subject: First line of commit message (commit subject)
-AgentCommit = namedtuple('AgentCommit', ['hash', 'chash', 'subject'])
+# applied_as: Short hash if potentially already applied, None otherwise
+AgentCommit = namedtuple('AgentCommit',
+                         ['hash', 'chash', 'subject', 'applied_as'])
 
 # Named tuple for prepare_apply result
 ApplyInfo = namedtuple('ApplyInfo',
@@ -475,6 +477,43 @@  def get_branch_commits():
     return current_branch, commits
 
 
+def check_already_applied(commits, target_branch='ci/master'):
+    """Check which commits are already applied to the target branch
+
+    Args:
+        commits (list): List of CommitInfo tuples to check
+        target_branch (str): Branch to check against (default: ci/master)
+
+    Returns:
+        tuple: (new_commits, applied) where:
+            new_commits: list of CommitInfo for commits not yet applied
+            applied: list of CommitInfo for commits already applied
+    """
+    new_commits = []
+    applied = []
+
+    for commit in commits:
+        # Check if a commit with the same subject exists in target branch
+        try:
+            # Use git log with --grep to search for the subject
+            # Escape any special characters in the subject for grep
+            escaped_subject = commit.subject.replace('"', '\\"')
+            result = run_git(['log', '--oneline', target_branch,
+                             f'--grep={escaped_subject}', '-1'])
+            if result.strip():
+                # Found a commit with the same subject
+                applied.append(commit)
+                tout.info(f'Skipping {commit.chash} (already applied): '
+                         f'{commit.subject}')
+            else:
+                new_commits.append(commit)
+        except Exception:  # pylint: disable=broad-except
+            # If grep fails, assume the commit is not applied
+            new_commits.append(commit)
+
+    return new_commits, applied
+
+
 def show_commit_diff(res, no_colour=False):
     """Show the difference between original and cherry-picked commit patches
 
@@ -522,7 +561,7 @@  def show_commit_diff(res, no_colour=False):
 
 def show_check_summary(bad, verbose, threshold, show_diff, no_colour):
     """Show summary of check results
-    
+
     Args:
         bad (list): List of CheckResult objects with problems
         verbose (bool): Whether to show verbose output
@@ -1216,7 +1255,23 @@  def execute_apply(dbs, source, commits, branch_name, args):  # pylint: disable=t
         tuple: (ret, success, conv_log) where ret is 0 on success,
             1 on failure
     """
-    # Add commits to database with 'pending' status
+    # Check for already applied commits before proceeding
+    _, applied = check_already_applied(commits)
+
+    # Build mapping of applied commits by hash
+    applied_map = {}
+    if applied:
+        for c in applied:
+            # Get the hash of the applied commit in target branch
+            escaped_subject = c.subject.replace('"', '\\"')
+            result = run_git(['log', '--oneline', 'ci/master',
+                             f'--grep={escaped_subject}', '-1'])
+            if result.strip():
+                applied_hash = result.split()[0]
+                applied_map[c.hash] = applied_hash
+        tout.info(f'Found {len(applied)} potentially already applied commit(s)')
+
+    # Add all commits to database with 'pending' status (agent updates later)
     source_id = dbs.source_get_id(source)
     for commit in commits:
         dbs.commit_add(commit.hash, source_id, commit.subject, commit.author,
@@ -1224,9 +1279,10 @@  def execute_apply(dbs, source, commits, branch_name, args):  # pylint: disable=t
     dbs.commit()
 
     # Convert CommitInfo to AgentCommit format expected by agent
-    agent_commits = [AgentCommit(c.hash, c.chash, c.subject) for c in commits]
+    agent_commits = [AgentCommit(c.hash, c.chash, c.subject,
+                                 applied_map.get(c.hash)) for c in commits]
     success, conv_log = agent.cherry_pick_commits(agent_commits, source,
-                                                          branch_name)
+                                                  branch_name)
 
     # Check for signal file from agent
     signal_status, signal_commit = agent.read_signal_file()
@@ -1273,6 +1329,11 @@  def execute_apply(dbs, source, commits, branch_name, args):  # pylint: disable=t
             tout.info(f"Use 'pickman commit-source {source} "
                       f"{commits[-1].chash}' to update the database")
 
+    # Update database with the last processed commit if successful
+    if success:
+        dbs.source_set(source, commits[-1].hash)
+        dbs.commit()
+
     return ret, success, conv_log
 
 
diff --git a/tools/pickman/ftest.py b/tools/pickman/ftest.py
index 5d88e65c7cb..d6f9e537a04 100644
--- a/tools/pickman/ftest.py
+++ b/tools/pickman/ftest.py
@@ -3532,5 +3532,101 @@  This is a normal commit without cherry-pick info.
         self.assertEqual(kwargs.get('raise_on_error'), False)
 
 
+class TestCheckAlreadyApplied(unittest.TestCase):
+    """Tests for the check_already_applied function."""
+
+    def setUp(self):
+        """Set up test data."""
+        self.commits = [
+            control.CommitInfo('abc123def456', 'abc123d', 'Add new feature',
+                               'Author <email>'),
+            control.CommitInfo('def456abc123', 'def456a', 'Fix bug',
+                               'Author <email>')
+        ]
+        self.quoted_commit = [
+            control.CommitInfo('abc123def456', 'abc123d',
+                               'Add "quoted" feature', 'Author <email>')
+        ]
+        self.single_commit = [
+            control.CommitInfo('abc123def456', 'abc123d', 'Add new feature',
+                               'Author <email>')
+        ]
+
+    @mock.patch('pickman.control.run_git')
+    @mock.patch('pickman.control.tout')
+    def test_check_already_applied_none_applied(self, mock_tout, mock_run_git):
+        """Test check_already_applied when no commits are already applied."""
+        # Mock git log returning empty (no matches)
+        mock_run_git.return_value = ''
+
+        new_commits, applied = control.check_already_applied(self.commits)
+
+        self.assertEqual(len(new_commits), 2)
+        self.assertEqual(len(applied), 0)
+        self.assertEqual(new_commits, self.commits)
+        mock_tout.info.assert_not_called()
+
+    @mock.patch('pickman.control.run_git')
+    @mock.patch('pickman.control.tout')
+    def test_check_already_applied_some_applied(self, mock_tout, mock_run_git):
+        """Test check_already_applied when some commits are already applied."""
+        # First commit returns a match, second doesn't
+        mock_run_git.side_effect = ['xyz789 Add new feature', '']
+
+        new_commits, applied = control.check_already_applied(self.commits)
+
+        self.assertEqual(len(new_commits), 1)
+        self.assertEqual(len(applied), 1)
+        self.assertEqual(new_commits[0].hash, 'def456abc123')
+        self.assertEqual(applied[0].hash, 'abc123def456')
+        mock_tout.info.assert_called_once()
+
+    @mock.patch('pickman.control.run_git')
+    @mock.patch('pickman.control.tout')
+    def test_check_already_applied_all_applied(self, mock_tout, mock_run_git):
+        """Test check_already_applied when all commits are already applied."""
+        # Both commits return matches
+        mock_run_git.side_effect = ['xyz789 Add new feature', 'uvw123 Fix bug']
+
+        new_commits, applied = control.check_already_applied(self.commits)
+
+        self.assertEqual(len(new_commits), 0)
+        self.assertEqual(len(applied), 2)
+        self.assertEqual(applied, self.commits)
+        self.assertEqual(mock_tout.info.call_count, 2)
+
+    @mock.patch('pickman.control.run_git')
+    @mock.patch('pickman.control.tout')
+    def test_check_already_applied_with_quotes_in_subject(
+            self, unused_mock_tout, mock_run_git):
+        """Test check_already_applied handles quotes in commit subjects."""
+        mock_run_git.return_value = ''
+
+        new_commits, applied = control.check_already_applied(self.quoted_commit)
+
+        # Verify git was called with escaped quotes
+        mock_run_git.assert_called_once_with([
+            'log', '--oneline', 'ci/master',
+            '--grep=Add \\"quoted\\" feature', '-1'
+        ])
+        self.assertEqual(len(new_commits), 1)
+        self.assertEqual(len(applied), 0)
+
+    @mock.patch('pickman.control.run_git')
+    @mock.patch('pickman.control.tout')
+    def test_check_already_applied_git_error(self, unused_mock_tout,
+                                             mock_run_git):
+        """Test check_already_applied handles git errors gracefully."""
+        # Mock git command raising an exception
+        mock_run_git.side_effect = Exception('Git error')
+
+        new_commits, applied = control.check_already_applied(self.single_commit)
+
+        # Should treat as not applied when git fails
+        self.assertEqual(len(new_commits), 1)
+        self.assertEqual(len(applied), 0)
+        self.assertEqual(new_commits, self.single_commit)
+
+
 if __name__ == '__main__':
     unittest.main()