The lines are as follows
How to find a set of unique strings and break it into non-intersecting groups by the following criterion: if two lines have coincidences of non-empty values in one or more columns, they belong to the same group. For example, lines
1,2,3 4,5,6 1,5,7
belong to one group. Initially I thought to make through a three of HashSet (for each column) to quickly see if the string is included in the list of unique values, then adding either to the list of already grouped rows or to the list of unique rows. But the algorithm in this case has a performance bottleneck: if you want to merge groups, you must go through each group in the list. Algorithm on a large amount of data (> 1 million records), with a large number of mergers, works slowly. If the mergers are small (about thousands), it works quickly. I caught the stuck in this place and do not know how to optimize this bottleneck or whether it is necessary to use other data structures and algorithms. Can someone tell me which direction to dig. I will be grateful for any thoughts on this matter.