diff --git a/README.md b/README.md
index 218fcf7e6aff28123ba9c9ac10dfe8cbc8654dde..aa3d9292c611e35a7dc3f25a24d2d10bd4b69107 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,10 @@
-# OrthoFinder — Accurate inference of orthologous gene groups made easy!
+# OrthoFinder — Accurate inference of orthologous gene groups, orthologues, gene trees and rooted species tree made easy!
 What does OrthoFinder do?
 ==========
 OrthoFinder is a program for finding orthogroups from one or more species. An orthogroup is the set of genes that are descended from a single gene in the last common ancestor of the species being clustered. OrthoFinder accounts for gene length biases that are inherent in BLAST scores, normalises for differences in species divergence times, and accounts for orthogroup specific differences in gene evolultion rates. For more details see the OrthoFinder paper below.
 
+NEW!!! OrthoFinder now also automatically infers the gene tree for each orthogroup, the rooted species tree, all orthologues between all species and calculates summary statistics. This is all performed automatically with the same simple command and the same input as before!
+
 **Emms, D.M. and Kelly, S. (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy, Genome Biology 16:157**
 
 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0721-2
@@ -13,6 +15,9 @@ https://github.com/davidemms/OrthoFinder
 
 What's New
 ==========
+**Sep. 2016**: OrthoFinder now infers the gene tree for each orthogroup, the rooted species tree, all orthologues between all species and calculates summary statis
+    tics.
+
 **Jul. 2016**: OrthoFinder now outputs **summary statistics** for the orthogroups produced. Statistics are in the files **Statistics_Overall.csv, Statistics_PerSpecies.csv** and **OrthologousGroups_SpeciesOverlaps.csv**.
 
 **Jul. 2016**: Provided **standalone binaries** for those without access to python (download the package from OrthoFinder's GitHub **releases tab**).
@@ -25,7 +30,7 @@ What's New
 
 Usage
 =====
-OrthoFinder runs as a single command that takes as input a directory of fasta files, one per species, and outputs a file containing the orthologous groups of genes from these species. 
+OrthoFinder runs as a single command that takes as input a directory of fasta files of proteomes (amino acid sequences), one per species, and outputs a file containing the orthogroups of genes from these species, a gene tree for each orthogroups, the rooted species tree and all orthologues between all the species: 
 
 **python orthofinder.py -f fasta_directory -t number_of_processes**
 
@@ -43,13 +48,14 @@ See below for details on:
 
 ###Standalone Binaries
 
-If you do not have access to a python 2.X version you can use the standalone binaries in the bin directory instead e.g.:
+If you do not have access to a python 2.X version or haven't installed the python ete2 library you can use the standalone binaries in the bin directory instead e.g.:
 
 **bin/orthofinder -f ExampleDataset -t 16**
 
 Output File Format
 ==================
-OrthoFinder generates three output files 
+###Orthogroups
+OrthoFinder generates three output files for orthogroups: 
 
 **1) OrthologousGroups.csv** is a tab separated text file. Each row comprises a single orthogroup and contains all the genes that belong to that orthogroup. The genes are organized into separate columns where each column corresponds to a single species.
 
@@ -71,17 +77,28 @@ Most of the terms in the files **Statistics_Overall.csv** and **Statistics_PerSp
 - Single-copy orthogroup: An orthogroup with exactly one gene (and no more) from each species. These orthogroups are ideal for inferring a species tree. Note that trees for all orthogroups can be generated using the trees_for_orthogroups.py script.
 - Unassigned gene: A gene that has not been put into an orthogroup with any other genes.
 
+###Orthologues, Gene Trees & Rooted Species Tree
+The orthologues, gene trees and rooted species tree are in a sub-directory called Orthologues_\<date\>
+
 Installing Dependencies
 =======================
 OrthoFinder is written to run on linux and requires the following to be installed and in the system path:
 
-1. Python 2.7 (version 3 isn't currently supported) together with the scipy libraries stack 
+1. Python 2.7 together with the scipy libraries stack (If you don't have python 2.7 you can still use the standalone binaries) 
 
 2. BLAST+ 
 
 3. The MCL graph clustering algorithm 
 
-To use the trees_for_orthogroups.py utility there are two additional dependencies which should be installed and in the system path:
+After the orthogroups have been calculated OrthoFinder will infer gene trees, the rooted species tree and orthologues if the the following are available in the system path:
+
+1. FastME
+
+2. DLCpar-search
+
+3. The python ete2 library (If you don't have this you can still use the standalone binaries instead)
+
+To use the trees_for_orthogroups.py utility, which infers trees using multiple sequence alignments, there are two additional dependencies which should be installed and in the system path:
 
 1. MAFFT
 
@@ -104,6 +121,13 @@ MCL
 ---
 mcl is available in the repositories for some linux distributions and so can be installed in the same way as any other package. E.g. on Ubuntu "sudo apt-get install mcl". Alternatively it can be built from source which will likely require the build-essential or equivalent package on the Linux distribution being used. Instructions are provided on the MCL webpage.  
 
+FastME
+------
+This is a single executable file that can be obtained from: http://www.atgc-montpellier.fr/fastme/binaries.php 
+
+DLCpar
+------
+DLCpar can be obtained from: http://compbio.mit.edu/dlcpar/
 
 Setting up and running OrthoFinder
 ==================================
diff --git a/orthofinder.py b/orthofinder.py
index 432898cae0d83ecb113ab0acc0c667031fd4072b..5fd08b46088c886b6088b1021637efac759d1e55 100755
--- a/orthofinder.py
+++ b/orthofinder.py
@@ -25,6 +25,9 @@
 # For any enquiries send an email to David Emms
 # david_emms@hotmail.com
 
+nThreadsDefault = 16
+nAlgDefault = 1
+
 import sys                                      # Y
 import subprocess                               # Y
 import os                                       # Y
@@ -48,7 +51,7 @@ import Queue                                    # Y
 import warnings                                 # Y
 import time                                     # Y
 
-version = "0.7.1"
+version = "1.0.0"
 fastaExtensions = {"fa", "faa", "fasta", "fas"}
 picProtocol = 1
 if sys.platform.startswith("linux"):
@@ -190,6 +193,8 @@ class FullAccession(IDExtractor):
             for line in idsFile:
                 if line.startswith("#"): continue
                 id, accession = line.rstrip().split(": ", 1)
+                # Replace problematic characters
+                accession = accession.replace(":", "_").replace(",", "_").replace("(", "_").replace(")", "_")
                 if id in self.idToNameDict:
                     raise RuntimeError("ERROR: A duplicate id was found in the fasta files: % s" % id)
                 self.idToNameDict[id] = accession                
@@ -476,10 +481,9 @@ class MCL:
                     for iSpecies in xrange(nSpecies):
                         row.append(", ".join(sorted(ogDict[iSpecies])))
                 thisOutputWriter.writerow(row)
-        print("Output directory:")
-        print("   " + os.path.split(outputFilename)[0] + "/")
-        print("\nOrthogroups:")
-        print("   " + "   ".join([os.path.split(fn)[1] for fn in [outputFilename, singleGeneFilename]]) + "\n   (and " + os.path.split(outputFilename[:-3] + "txt")[1] + " in OrthoMCL format)")
+        resultsFilesString = "Orthologous groups have been written to tab-delimited files:\n   %s\n   %s\n" % (outputFilename, singleGeneFilename)
+        resultsFilesString += "And in OrthoMCL format:\n   %s" % (outputFilename[:-3] + "txt")
+        return resultsFilesString
 
 """
 scnorm
@@ -707,7 +711,7 @@ class BlastFileProcessor(object):
                         sys.stderr.write("but found a query/hit in the Blast%d_%d.txt for sequence %d_%d (i.e. %s sequence in species %d).\n" %  (iSpecies, jSpecies, kSpecies, sequencekID, ord(sequencekID+1), kSpecies))
                         sys.exit()
         except Exception:
-            sys.stderr.write("Malformatted line in %sBlast%d_%d.txt\nOffending line was:\n" % (fileInfo.inputDir, iSpecies, jSpecies))
+            sys.stderr.write("Malformatted line in %sBlast%d_%d.txt\nOffending line was:\n" % (fileInfo.inputDir, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies]))
             sys.stderr.write("\t".join(row) + "\n")
             sys.exit()
         return B       
@@ -1058,21 +1062,19 @@ def Stats(ogs, speciesNamesDict, iSpecies, resultsDir, iResultsVersion):
         # Sizes
         Stats_SizeTable(writer_sum, writer_sp, properOGs, allGenesCounter, iSpecies, speciesPresence)
         Stats_SpeciesOverlaps(filename_overlap, speciesNamesDict, iSpecies, speciesPresence)
-        
-    print("\nOrthogroup statistics:")
-    print("   " + "   ".join([os.path.split(fn)[1] for fn in [filename_sp, filename_sum, filename_overlap]]))
+
+    statsFiles = "Orthogroup statistics:\n"
+    statsFiles += "   " + "   ".join([os.path.split(fn)[1] for fn in [filename_sp, filename_sum, filename_overlap]]) + "\n"
     summaryText = """OrthoFinder assigned %d genes (%0.1f%% of total) to %d orthogroups. Fifty percent of all genes were in orthogroups 
 with %d or more genes (G50 was %d) and were contained in the largest %d orthogroups (O50 was %d). There were %d 
 orthogroups with all species present and %d of these consisted entirely of single-copy genes.""" % (nAssigned, pAssigned, nOgs, G50, G50, O50, O50, nCompleteOGs, nSingleCopy)
-    return summaryText
+    return summaryText, statsFiles
           
 
 """
 OrthoFinder
 -------------------------------------------------------------------------------
 """   
-nBlastDefault = 16
-nAlgDefault = 1
 mclInflation = 1.5
 
 def CanRunCommand(command, qAllowStderr = False):
@@ -1146,7 +1148,7 @@ def PrintHelp():
     print("""-t number_of_blast_threads, --threads number_of_blast_threads
     The number of BLAST processes to be run simultaneously. This should be increased by the user to at least 
     the number of cores on the computer so as to minimise the time taken to perform the BLAST all-versus-all 
-    queries. [Default is %d]\n""" % nBlastDefault)
+    queries. [Default is %d]\n""" % nThreadsDefault)
     
     print("""-a number_of_orthofinder_threads, --algthreads number_of_orthofinder_threads
     The number of threads to use for the OrthoFinder algorithm and MCL after BLAST searches have been completed. 
@@ -1198,6 +1200,7 @@ def AssignIDsToSequences(fastaDirectory, outputDirectory):
     originalFastaFilenames = sorted([f for f in os.listdir(fastaDirectory) if os.path.isfile(os.path.join(fastaDirectory,f))])
     originalFastaFilenames = [f for f in originalFastaFilenames if len(f.rsplit(".", 1)) == 2 and f.rsplit(".", 1)[1].lower() in fastaExtensions]
     returnFilenames = []
+    previousSpeciesIDs = range(iSpecies)
     newSpeciesIDs = []
     with open(idsFilename, 'ab') as idsFile, open(speciesFilename, 'ab') as speciesFile:
         for fastaFilename in originalFastaFilenames:
@@ -1222,9 +1225,12 @@ def AssignIDsToSequences(fastaDirectory, outputDirectory):
             iSeq = 0
             outputFasta.close()
     if len(originalFastaFilenames) > 0: outputFasta.close()
-    return returnFilenames, originalFastaFilenames, idsFilename, speciesFilename, newSpeciesIDs
+    return returnFilenames, originalFastaFilenames, idsFilename, speciesFilename, newSpeciesIDs, previousSpeciesIDs
 
 if __name__ == "__main__":
+    sys.path.append(os.path.split(os.path.abspath(__file__))[0] + "/scripts")
+    import get_orthologues
+    
     print("\nOrthoFinder version %s Copyright (C) 2014 David Emms\n" % version)
     print("""    This program comes with ABSOLUTELY NO WARRANTY.
     This is free software, and you are welcome to redistribute it under certain conditions.
@@ -1234,12 +1240,13 @@ if __name__ == "__main__":
         sys.exit()
              
     # Control
-    nBlast = nBlastDefault
+    nBlast = nThreadsDefault
     nProcessAlg = nAlgDefault
     qUsePrecalculatedBlast = False  # remove, just store BLAST to do
     qUseFastaFiles = False  # local to argument checking
     qXML = False
     qOnlyPrepare = False
+    qOrthologues = True
 #    qUseSubset = False # forget once arguments parsed and have decided what the BLAST files are (Effectively part of BLASTFileProcessor)
                      
     # Files         
@@ -1314,6 +1321,7 @@ if __name__ == "__main__":
 #            workingDir_previous = GetDirectoryArgument(arg, args)
         elif arg == "-p" or arg == "--prepare":
             qOnlyPrepare = True
+            qOrthologues = False
         elif arg == "-h" or arg == "--help":
             PrintHelp()
             sys.exit()
@@ -1383,6 +1391,10 @@ if __name__ == "__main__":
         if resultsDir == None: resultsDir = util.CreateNewWorkingDirectory(fastaDir + "Results_")
         workingDir = resultsDir + "WorkingDirectory" + os.sep
         os.mkdir(workingDir)
+    if qUsePrecalculatedBlast:
+        print("%d thread(s) for BLAST searches" % nBlast)
+    if not qOnlyPrepare:
+        print("%d thread(s) for OrthoFinder algorithm" % nProcessAlg)
      
     # check for BLAST+ and MCL - else instruct how to install and add to path
     print("\n1. Checking required programs are installed")
@@ -1398,7 +1410,7 @@ if __name__ == "__main__":
     if not qUseFastaFiles:
         print("Skipping")
     else:
-        newFastaFiles, userFastaFilenames, idsFilename, speciesIdsFilename, newSpeciesIDs = AssignIDsToSequences(fastaDir, workingDir)
+        newFastaFiles, userFastaFilenames, idsFilename, speciesIdsFilename, newSpeciesIDs, previousSpeciesIDs = AssignIDsToSequences(fastaDir, workingDir)
         speciesToUse = speciesToUse + newSpeciesIDs
         print("Done!")
     seqsInfo = GetSeqsInfo(workingDir_previous if qUsePrecalculatedBlast else workingDir, speciesToUse)
@@ -1491,8 +1503,6 @@ if __name__ == "__main__":
             while proc.is_alive():
                 proc.join()
         
-        print("Done!")  
-        
         # remove BLAST databases
         for f in glob.glob(workingDir + "BlastDBSpecies*"):
             os.remove(f)
@@ -1541,20 +1551,28 @@ if __name__ == "__main__":
     
     print("\n6. Creating files for Orthologous Groups")
     print(  "----------------------------------------")
+    if not qOrthologues: PrintCitation()
     ogs = MCL.GetPredictedOGs(clustersFilename_pairs)
     resultsBaseFilename = util.GetUnusedFilename(resultsDir + "OrthologousGroups", ".csv")[:-4]         # remove .csv from base filename
     resultsBaseFilename = resultsDir + "OrthologousGroups" + ("" if iResultsVersion == 0 else "_%d" % iResultsVersion)
     idsDict = MCL.WriteOrthogroupFiles(ogs, [idsFilename], resultsBaseFilename, clustersFilename_pairs)
     speciesNamesDict = SpeciesNameDict(speciesIdsFilename)
-    MCL.CreateOrthogroupTable(ogs, idsDict, speciesNamesDict, speciesToUse, resultsBaseFilename)
-    summaryText = Stats(ogs, speciesNamesDict, speciesToUse, resultsDir, iResultsVersion)
-    print("\n7. Summary")
-    print(  "----------")
-    print(summaryText)
+    orthogroupsResultsFilesString = MCL.CreateOrthogroupTable(ogs, idsDict, speciesNamesDict, speciesToUse, resultsBaseFilename)
+    print(orthogroupsResultsFilesString)
+    summaryText, statsFile = Stats(ogs, speciesNamesDict, speciesToUse, resultsDir, iResultsVersion)
     if qXML:
         numbersOfSequences = list(np.diff(seqsInfo.seqStartingIndices))
         numbersOfSequences.append(seqsInfo.nSeqs - seqsInfo.seqStartingIndices[-1])
         orthoxmlFilename = resultsBaseFilename + ".orthoxml"
         MCL.WriteOrthoXML(speciesInfo, ogs, numbersOfSequences, idsDict, orthoxmlFilename, speciesToUse)
-    PrintCitation()
+    
+    if qOrthologues:
+        print("\nRunning Orthologue Prediction")
+        print(  "=============================")
+        orthologuesResultsFilesString = get_orthologues.GetOrthologues(workingDir, resultsDir, clustersFilename_pairs, nBlast)
+        print(orthogroupsResultsFilesString)
+        print(orthologuesResultsFilesString.rstrip())
+    print(statsFile)
     print("")
+    print(summaryText)
+    PrintCitation()
diff --git a/scripts/get_orthologues.py b/scripts/get_orthologues.py
new file mode 100755
index 0000000000000000000000000000000000000000..f22758c59f832ac2bd83bb8f644856f6fb04da06
--- /dev/null
+++ b/scripts/get_orthologues.py
@@ -0,0 +1,609 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+Created on Thu Jun  9 16:00:33 2016
+
+@author: david
+"""
+
+import sys
+import os
+import ete2
+import subprocess
+import numpy as np
+from collections import Counter, defaultdict
+import cPickle as pic
+import itertools
+import multiprocessing as mp
+import Queue
+
+sys.path.append(os.path.split(os.path.abspath(__file__))[0] + "/..")
+import trees_for_orthogroups as tfo
+import orthofinder
+import root_from_duplications as rfd
+import process_trees as pt
+
+#print(orthofinder.__file__)
+
+nThreads = orthofinder.nThreadsDefault
+
+class Seq(object):
+    def __init__(self, seqInput):
+        """ Constructor takes sequence in any format and returns generators the 
+        Seq object accordingly. If performance is really important then can write 
+        individual an @classmethod to do that without the checks"""
+        if type(seqInput) is str:
+            self.iSp, self.iSeq = map(int, seqInput.split("_"))
+        elif len(seqInput) == 2:
+            if seqInput[0] is str:
+                self.iSp, self.iSeq = map(int, seqInput)
+            else:
+                self.iSp= seqInput[0]
+                self.iSeq = seqInput[1]
+        else:
+            raise NotImplemented
+    
+    def __eq__(self, other):
+        return (isinstance(other, self.__class__)
+            and self.__dict__ == other.__dict__)
+
+    def __ne__(self, other):
+        return not self.__eq__(other)         
+        
+    def __repr__(self):
+        return self.ToString()
+#  
+#    def __hash__(self):
+#        return 100000 * self.iSp + self.iSeq
+    
+    def ToString(self):
+        return "%d_%d" % (self.iSp, self.iSeq)
+
+# ==============================================================================================================================
+        
+class OrthoGroupsSet(object):
+    def __init__(self, orthofinderWorkingDir, clustersFilename_pairs, idExtractor = orthofinder.FirstWordExtractor):
+        self.workingDirOF = orthofinderWorkingDir
+        self.seqIDsFN = orthofinderWorkingDir + "SequenceIDs.txt"
+        self.speciesIDsFN = orthofinderWorkingDir + "SpeciesIDs.txt"
+        self.speciesIDsEx = orthofinder.FullAccession(self.speciesIDsFN)
+        self._Spec_SeqIDs = None
+        self._extractor = idExtractor
+        self.clustersFN = clustersFilename_pairs
+        self.seqIDsEx = None
+        self.ogs = self.OGs()
+        self.speciesToUse = orthofinder.GetSpeciesToUse(self.speciesIDsFN)
+        self.seqsInfo = orthofinder.GetSeqsInfo(orthofinderWorkingDir, self.speciesToUse)
+        self.fileInfo = orthofinder.FileInfo(inputDir=orthofinderWorkingDir, outputDir = orthofinderWorkingDir, graphFilename="")
+#        nSeqs, nSpecies, speciesStartingIndices = orthofinder.BlastFileProcessor.GetNumberOfSequencesInFileFromDir(workingDir, self.speciesToUse)
+#        self.bfp = orthofinder.BlastFileProcessor(workingDir, self.speciesToUse, nSeqs, nSpecies, speciesStartingIndices)
+
+    def SequenceDict(self):
+        if self.seqIDsEx == None:
+            self.seqIDsEx = self._extractor(self.seqIDsFN)
+        return self.seqIDsEx.GetIDToNameDict()
+        
+    def SpeciesDict(self):
+        d = self.speciesIDsEx.GetIDToNameDict()
+        return {k:v.rsplit(".",1)[0] for k,v in d.items()}
+        
+    def Spec_SeqDict(self, qSplitBar=True):
+        if self._Spec_SeqIDs != None:
+            return self._Spec_SeqIDs
+#        try:
+        seqs = self.SequenceDict()
+        specs = self.SpeciesDict()
+        self._Spec_SeqIDs = dict()
+        for seq in seqs:
+            iSpecies = seq.split("_")[0]
+            if iSpecies not in specs: continue
+            if qSplitBar:
+                self._Spec_SeqIDs[seq] = specs[iSpecies].replace(". ", "_").replace(" ", "_") + "_" + seqs[seq].split("|")[0]
+            else:
+                self._Spec_SeqIDs[seq] = specs[iSpecies].replace(". ", "_").replace(" ", "_") + "_" + seqs[seq]
+        return self._Spec_SeqIDs
+    
+    def OGs(self):
+        ogs = orthofinder.MCL.GetPredictedOGs(self.clustersFN)     
+        ogs = [[Seq(g) for g in og] for og in ogs if len(og) >= 4]   
+        return ogs
+        
+#    def BitScores(self, iSp, jSp):
+##        return self.bfp.GetBLAST6Scores(self.speciesToUse.index(iSp), self.speciesToUse.index(jSp), False)
+#        return orthofinder.BlastFileProcessor.GetBLAST6Scores(self.seqsInfo, self.fileInfo, iSp, jSp, False)
+##        with open(self.workingDir + "B%d_%d.pic" % (iSp, jSp), 'rb') as infile:
+##            return pic.load(infile)
+
+# ==============================================================================================================================
+
+def lil_min(M):
+    n = M.shape[0]
+    mins = np.ones((n, 1), dtype = np.float64) * 9e99
+    for kRow in xrange(n):
+        values=M.getrowview(kRow)
+        if values.nnz == 0:
+            continue
+        mins[kRow] = min(values.data[0])
+    return mins 
+
+def lil_max(M):
+    n = M.shape[0]
+    maxes = np.zeros((n, 1), dtype = np.float64)
+    for kRow in xrange(n):
+        values=M.getrowview(kRow)
+        if values.nnz == 0:
+            continue
+        maxes[kRow] = max(values.data[0])
+    return maxes
+     
+# ==============================================================================================================================      
+# ASTRAL
+
+def GetOGsToUse(ogSet):
+    return range(100, min(10000, len(ogSet.ogs)))
+
+def CreateTaxaMapFile(ogSet, i_ogs_to_use, outputFN):
+    """Get max number of sequences per species"""
+    sp_max = defaultdict(int)
+    for iog in i_ogs_to_use:
+        thisCount = defaultdict(int)
+        for seq in ogSet.ogs[iog]:
+            thisCount[seq.iSp] += 1
+        for iSp in thisCount:
+            sp_max[iSp] = max(sp_max[iSp], thisCount[iSp])
+    with open(outputFN, 'wb') as outfile:
+        for iSp, m in sp_max.items():
+            outfile.write(("%d:" % iSp) + ",".join(["%d_%d" % (iSp, j) for j in xrange(m)]) + "\n")
+        
+def ConvertTree(treeString):
+    """for trees with sequence names iSp_jSeq replaces the jSeq with 0, 1,..."""
+    tree = ete2.Tree(treeString)
+    sp_counts = defaultdict(int)
+    for seq in tree:
+        iSp, jSeq = seq.name.split("_")
+        kSeq = sp_counts[iSp]
+        sp_counts[iSp] += 1
+        seq.name = "%s_%d" % (iSp, kSeq)
+    return (tree.write() + "\n")
+
+def ConcatenateTrees(i_ogs_to_use, treesPat, outputFN):
+    with open(outputFN, 'wb') as outfile:
+        for iog in i_ogs_to_use:
+            with open(treesPat % iog, 'rb') as infile:
+                for line in infile: 
+                  if ";" in line: outfile.write(ConvertTree(line))
+    
+def RunAstral(ogSet, treesPat, workingDir):
+    dir_astral = workingDir + "ASTRAL/"
+    os.mkdir(dir_astral)
+    i_ogs_to_use = GetOGsToUse(ogSet)
+    tmFN = dir_astral + "Taxa_map.txt"
+    CreateTaxaMapFile(ogSet, i_ogs_to_use, tmFN)
+    treesFN = dir_astral + "TreesFile.txt"
+    ConcatenateTrees(i_ogs_to_use, treesPat, treesFN)
+    speciesTreeFN = workingDir + "SpeciesTree_astral.txt"
+    subprocess.call(" ".join(["java", "-Xmx6000M", "-jar", "~/software/ASTRAL-multiind/Astral/astral.4.8.0.jar", "-a", tmFN, "-i", treesFN, "-o", speciesTreeFN]), shell=True)
+    return speciesTreeFN
+
+# ==============================================================================================================================      
+# DendroBlast   
+
+def Worker_BlastScores(cmd_queue, seqsInfo, fileInfo, nProcesses, nToDo):
+    while True:
+        try:
+            i, args = cmd_queue.get(True, 1)
+            nDone = i - nProcesses + 1
+            if nDone >= 0 and divmod(nDone, 10 if nToDo <= 200 else 100 if nToDo <= 2000 else 1000)[1] == 0:
+                orthofinder.util.PrintTime("Done %d of %d" % (nDone, nToDo))
+            B = orthofinder.BlastFileProcessor.GetBLAST6Scores(seqsInfo, fileInfo, *args, qExcludeSelfHits = False)
+            with open(fileInfo.outputDir + "Bit%d_%d.pic" % args, 'wb') as outfile:
+                pic.dump(B, outfile, protocol = orthofinder.picProtocol)
+        except Queue.Empty:
+            return 
+                
+class DendroBLASTTrees(object):
+    def __init__(self, ogSet, outD, nProcesses):
+        self.outD = outD
+        self.ogSet = ogSet
+        self.nProcesses = nProcesses
+        self.species = sorted(map(int, self.ogSet.SpeciesDict().keys()))
+        treesDir = outD + "Trees/"
+        self.workingDir = outD + "WorkingDirectory/"
+        treesIDsDir = self.workingDir + "Trees_ids/"
+        distancesDir = self.workingDir + "Distances/"
+        dirs = [self.workingDir, treesDir, distancesDir, treesIDsDir]
+        for d in dirs:
+            if not os.path.exists(d):
+                os.mkdir(d)
+        self.treesPatIDs = treesIDsDir + "OG%07d_tree_id.txt"
+        self.treesPat = treesDir + "OG%07d_tree.txt"
+        self.distPat = distancesDir + "OG%07d.phy"
+        # Check files exist
+        
+    def ReadAndPickle(self): 
+        cmd_queue = mp.Queue()
+        i = 0
+        for iSp in xrange(len(self.ogSet.seqsInfo.speciesToUse)):
+            for jSp in xrange(len(self.ogSet.seqsInfo.speciesToUse)):
+                cmd_queue.put((i, (iSp, jSp)))           
+                i+=1
+        runningProcesses = [mp.Process(target=Worker_BlastScores, args=(cmd_queue, self.ogSet.seqsInfo, self.ogSet.fileInfo, nThreads, i)) for i_ in xrange(nThreads)]
+        for proc in runningProcesses:
+            proc.start()
+        for proc in runningProcesses:
+            while proc.is_alive():
+                proc.join() 
+                
+    def NumberOfSequences(self, species):
+        ids = self.ogSet.SequenceDict()
+        counts = Counter([g.split("_")[0] for g in ids.keys()])
+        counts = {int(k):v for k,v in counts.items() if int(k) in species}
+        return counts
+           
+    def GetOGMatrices(self):
+        """
+        ogMatrices contains matrix M for each OG where:
+            Mij = 0.5*max(Bij, Bmin_i)/Bmax_i
+        """
+        ogs = self.ogSet.OGs()
+        ogsPerSpecies = [[[(g, i) for i, g in enumerate(og) if g.iSp == iSp] for iSp in self.species] for og in ogs]
+        nGenes = [len(og) for og in ogs]
+        nSeqs = self.NumberOfSequences(self.species)
+        ogMatrices = [np.zeros((n, n)) for n in nGenes]
+        for iiSp, sp1 in enumerate(self.species):
+            orthofinder.util.PrintTime("Processing species %d" % sp1)
+            Bs = [orthofinder.LoadMatrix("Bit", self.ogSet.fileInfo, iiSp, jjSp) for jjSp in xrange(len(self.species))]
+            mins = np.ones((nSeqs[sp1], 1), dtype=np.float64)*9e99 
+            maxes = np.zeros((nSeqs[sp1], 1), dtype=np.float64)
+            for B, sp2 in zip(Bs, self.species):
+                mins = np.minimum(mins, lil_min(B))
+                maxes = np.maximum(maxes, lil_max(B))
+            for jjSp, B  in enumerate(Bs):
+                for og, m in zip(ogsPerSpecies, ogMatrices):
+                    for gi, i in og[iiSp]:
+                        for gj, j in og[jjSp]:
+                                m[i, j] = 0.5*max(B[gi.iSeq, gj.iSeq], mins[gi.iSeq]) /  maxes[gi.iSeq]
+        return ogs, ogMatrices
+    
+    def WriteOGMatrices(self, ogs, ogMatrices):
+        newMatrices = []
+        for iog, (og, m) in enumerate(zip(ogs, ogMatrices)):
+            # dendroblast scores
+            n = m.shape[0]
+            m2 = np.zeros(m.shape)
+            for i in xrange(n):
+                for j in xrange(i):
+                    m2[i, j] = -np.log(m[i,j] + m[j, i])
+                    m2[j, i] = m2[i, j]
+            self.WritePhylipMatrix(m2, [g.ToString() for g in og], self.distPat % iog)
+            newMatrices.append(m2)
+        return newMatrices
+    
+    def WritePhylipMatrix(self, m, names, outFN):
+        with open(outFN, 'wb') as outfile:
+            n = m.shape[0]
+            outfile.write("%d\n" % n)
+            for i in xrange(n):
+                outfile.write(names[i] + " ")
+                values = " ".join(["%.6g" % (0. + m[i,j]) for j in range(n)])   # hack to avoid printing out "-0"
+                outfile.write(values + "\n")
+    
+    def SpeciesTreeDistances(self, ogs, ogMatrices, method = 0):
+        spPairs = list(itertools.combinations(self.species, 2))
+        D = [[] for _ in spPairs]
+        if method == 0:
+            """ closest distance for each species pair in each orthogroup"""
+            for og, m in zip(ogs, ogMatrices):
+                spDict = defaultdict(list)
+                for i, g in enumerate(og):
+                    spDict[g.iSp].append(i)
+                for (sp1, sp2), d_list in zip(spPairs, D):
+                    distances = [m[i,j] for i in spDict[sp1] for j in spDict[sp2]]
+                    if len(distances) > 0: d_list.append(min(distances))
+#                    d_list.append(min(distances) if len(distances) > 0 else None)
+        return D, spPairs
+    
+    def PrepareSpeciesTreeCommand(self, D, spPairs):
+        n = len(self.species)
+        M = np.zeros((n, n))
+        for (sp1, sp2), d in zip(spPairs, D):
+            sp1 = self.species.index(sp1)
+            sp2 = self.species.index(sp2)
+            x = np.median(d)
+            M[sp1, sp2] = x
+            M[sp2, sp1] = x
+        speciesMatrixFN = os.path.split(self.distPat)[0] + "/SpeciesMatrix.phy"
+        with open(speciesMatrixFN, 'wb') as outfile:
+            outfile.write("%d\n" % n)
+#            speciesDict = self.ogSet.SpeciesDict()
+            for i in xrange(n):
+#                outfile.write(speciesDict[str(self.species[i])] + " ")
+                outfile.write(str(self.species[i]) + " ")
+                values = " ".join(["%.6g" % (0. + M[i,j]) for j in range(n)])   # hack to avoid printing out "-0"
+                outfile.write(values + "\n")       
+        treeFN = os.path.split(self.treesPatIDs)[0] + "/SpeciesTree_ids.txt"
+        cmd = " ".join(["fastme", "-i", speciesMatrixFN, "-o", treeFN, "-w", "O"] + (["-s"] if n < 1000 else []))
+        return cmd, treeFN
+                
+    def PrepareGeneTreeCommand(self):
+        cmds = []
+        for iog in xrange(len(self.ogSet.ogs)):
+            nTaxa = len(self.ogSet.ogs[iog])
+            cmds.append([" ".join(["fastme", "-i", self.distPat % iog, "-o", self.treesPatIDs % iog, "-w", "O"] + (["-s"] if nTaxa < 1000 else []))])
+        return cmds
+
+    def RenameTreeTaxa(self, treeFN, newTreeFilename, idsMap, qFixNegatives=False):     
+#        with open(treeFN, "rb") as inputTree: treeString = inputTree.next()
+        try:
+            tree = ete2.Tree(treeFN)
+            for node in tree.get_leaves():
+                node.name = idsMap[node.name]
+            if qFixNegatives:
+                for n in tree.traverse():
+                    if n.dist < 0.0: n.dist = 0.0
+            tree.write(outfile = newTreeFilename, format=4)  
+        except:
+            pass
+    
+    def RunAnalysis(self):
+        ogs, ogMatrices_partial = self.GetOGMatrices()
+        ogMatrices = self.WriteOGMatrices(ogs, ogMatrices_partial)
+        D, spPairs = self.SpeciesTreeDistances(ogs, ogMatrices)
+        cmd_spTree, spTreeFN_ids = self.PrepareSpeciesTreeCommand(D, spPairs)
+        cmds_geneTrees = self.PrepareGeneTreeCommand()
+        print("\n3. Inferring gene and species trees")
+        print(  "-----------------------------------")
+        tfo.RunParallelCommandSets(self.nProcesses, [[cmd_spTree]] + cmds_geneTrees, qHideStdout = True)
+        seqDict = self.ogSet.Spec_SeqDict()
+        for iog in xrange(len(self.ogSet.ogs)):
+            self.RenameTreeTaxa(self.treesPatIDs % iog, self.treesPat % iog, seqDict, qFixNegatives=True)
+        self.RenameTreeTaxa(spTreeFN_ids, self.workingDir + "SpeciesTree_unrooted.txt", self.ogSet.SpeciesDict(), qFixNegatives=True)       
+        #spTreeFN_ids = RunAstral(self.ogSet, self.treesPatIDs, self.workingDir)   
+        #self.RenameTreeTaxa(spTreeFN_ids, self.workingDir + "SpeciesTree_astral_unrooted.txt", self.ogSet.SpeciesDict(), qFixNegatives=True)     
+        return len(ogs), D, spPairs, spTreeFN_ids
+
+# ==============================================================================================================================      
+# DLCPar
+
+def GetTotalLength(tree):
+    return sum([node.dist for node in tree])
+  
+def AllEqualBranchLengths(tree):
+    lengths = [node.dist for node in tree]
+    return (len(lengths) > 1 and len(set(lengths)) == 1)
+
+def RootGeneTreesArbitrarily(treesPat, nOGs, outputDir):
+    filenames = [treesPat % i for i in xrange(nOGs)]
+    outFilenames = [outputDir + os.path.split(treesPat % i)[1] for i in xrange(nOGs)]
+    treeFilenames = [fn for fn in filenames if fn.endswith(".txt")]
+    nErrors = 0
+    with open(outputDir + 'root_errors.txt', 'wb') as errorfile:
+        for treeFN, outFN in zip(treeFilenames, outFilenames):
+            try:    
+                t = ete2.Tree(treeFN)
+                if len(t.get_children()) != 2:
+                    R = t.get_midpoint_outgroup()
+                    # if it's a tree with 3 genes all with zero length branches then root arbitrarily (it's possible this could happen with more than 3 nodes)
+                    if GetTotalLength(t) == 0.0:
+                      for leaf in t:
+                        R = leaf
+                        break
+                    elif AllEqualBranchLengths(t):
+                      # more generally, for any branch length all branches could have that same length
+                      for leaf in t:
+                        R = leaf
+                        break
+                    t.set_outgroup(R)
+                t.resolve_polytomy()
+                t.write(outfile = outFN)
+            except Exception as err:
+                try:
+                    t = ete2.Tree(treeFN)
+                    for leaf in t:
+                       R = leaf
+                       break
+                    t.set_outgroup(R)
+                    t.resolve_polytomy()
+                    t.write(outfile = outFN)
+                except:
+                    errorfile.write(treeFN + ": " + str(err) + '\n')
+                    nErrors += 1    
+    if nErrors != 0:
+      print("WARNING: Some trees could not be rooted")
+      print("Usually this is because the tree contains genes from a single species.")    
+
+def WriteGeneSpeciesMap(d, ogSet):
+    fn = d + "GeneMap.smap"
+    iSpecies = ogSet.SpeciesDict().keys()
+    with open(fn, 'wb') as outfile:
+        for iSp in iSpecies:
+            outfile.write("%s_*\t%s\n" % (iSp, iSp))
+    return fn
+
+def RunDlcpar(treesPat, ogSet, nOGs, speciesTreeFN, workingDir):
+    """
+    
+    Implementation:
+    - (skip: label species tree)
+    - sort out trees (midpoint root, resolve plytomies etc)
+    - run
+    
+    """
+    rootedTreeDir = workingDir + "Trees_ids_arbitraryRoot/"
+    if not os.path.exists(rootedTreeDir): os.mkdir(rootedTreeDir)
+    RootGeneTreesArbitrarily(treesPat, nOGs, rootedTreeDir)
+    geneMapFN = WriteGeneSpeciesMap(rootedTreeDir, ogSet)
+
+    dlcparResultsDir = workingDir + 'dlcpar/'
+    if not os.path.exists(dlcparResultsDir): os.mkdir(dlcparResultsDir)
+    filenames = [rootedTreeDir + os.path.split(treesPat % i)[1] for i in xrange(nOGs)]
+    
+    dlcCommands = ['dlcpar_search -s %s -S %s -D 1 -C 0.125 %s -O %s' % (speciesTreeFN, geneMapFN, fn, dlcparResultsDir + os.path.splitext(os.path.split(fn)[1])[0]) for fn in filenames]
+#    print(dlcCommands[0])
+    # use this to run in parallel
+    tfo.RunParallelCommandSets(nThreads, [[c] for c in dlcCommands], qHideStdout = True)
+    return dlcparResultsDir
+
+# ==============================================================================================================================      
+# Main
+
+def WriteTestDistancesFile(workingDir):
+    testFN = workingDir + "SimpleTest.phy"
+    with open(testFN, 'wb') as outfile:
+        outfile.write("4\n1_1 0 0 0.2 0.25\n0_2 0 0 0.21 0.28\n3_1 0.21 0.21 0 0\n4_1 0.25 0.28 0 0")
+    return testFN
+
+def CanRunDependencies(workingDir):
+    # FastME
+    testFN = WriteTestDistancesFile(workingDir)
+    outFN = workingDir + "SimpleTest.tre"
+    if os.path.exists(outFN): os.remove(outFN)        
+    if not orthofinder.CanRunCommand("fastme -i %s -o %s" % (testFN, outFN), qAllowStderr=False):
+        print("ERROR: Cannot run fastme")
+        print("Please check FastME is installed and that the executables are in the system path\n")
+        return False
+    os.remove(testFN)
+    os.remove(outFN)
+    # DLCPar
+    if not orthofinder.CanRunCommand("dlcpar_search --version", qAllowStderr=False):
+        print("ERROR: Cannot run dlcpar_search")
+        print("Please check DLCpar is installed and that the executables are in the system path\n")
+        return False
+    return True    
+        
+def PrintHelp():
+    print("Usage")    
+    print("-----")
+    print("get_orthologues.py orthofinder_results_directory [-t max_number_of_threads]")
+    print("get_orthologues.py -h")
+    print("\n")
+    
+    print("Arguments")
+    print("---------")
+    print("""orthofinder_results_directory
+    Generate gene trees for the orthogroups, generated rooted species tree and infer ortholgues.\n""")
+    
+    print("""-t max_number_of_threads, --threads max_number_of_threads
+    The maximum number of processes to be run simultaneously. The deafult is %d but this 
+    should be increased by the user to the maximum number of cores available.\n""" % orthofinder.nThreadsDefault)
+        
+    print("""-h, --help
+   Print this help text""")
+    orthofinder.PrintCitation()   
+
+def GetResultsFilesString(rootedSpeciesTreeFN):
+    st = ""
+    baseResultsDir = os.path.abspath(os.path.split(rootedSpeciesTreeFN[0])[0] + "./../Trees/")
+    st += "Gene trees:\n   %s\n" % baseResultsDir
+    if len(rootedSpeciesTreeFN) == 1:
+        resultsDir = os.path.split(rootedSpeciesTreeFN[0])[0]
+        st += "Rooted species tree:\n   %s\n" % rootedSpeciesTreeFN[0]
+        st += "Species-by-species orthologues:\n   %s\n" % resultsDir
+    else:
+        st += "\nMultiple potential outgroups were identified for the species tree. Each case has been analysed separately.\n" 
+        st+=  "Please review the rooted species trees and use the results corresponding to the correct one.\n\n"        
+        for tFN in rootedSpeciesTreeFN:
+            resultsDir = os.path.split(tFN)[0] + "/"
+            st += "Rooted species tree:\n   %s\n" % tFN
+            st += "Species-by-species orthologues directory:\n   %s\n\n" % resultsDir
+    return st
+            
+
+def GetOrthologues(orthofinderWorkingDir, orthofinderResultsDir, clustersFilename_pairs, nProcesses):
+    ogSet = OrthoGroupsSet(orthofinderWorkingDir, clustersFilename_pairs, idExtractor = orthofinder.FirstWordExtractor)
+    if len(ogSet.speciesToUse) < 4: 
+        print("ERROR: Not enough species to infer species tree")
+        orthofinder.Fail()
+
+    print("\n1. Checking required programs are installed")
+    print(  "-------------------------------------------")
+    if not CanRunDependencies(orthofinderWorkingDir): orthofinder.Fail()   
+    
+    print("\n2. Reading sequence similarity scores")
+    print(  "-------------------------------------")
+    resultsDir = orthofinder.util.CreateNewWorkingDirectory(orthofinderResultsDir + "Orthologues_")
+    
+    db = DendroBLASTTrees(ogSet, resultsDir, nProcesses)
+    db.ReadAndPickle()
+    nOGs, D, spPairs, spTreeFN_ids = db.RunAnalysis()
+     
+    print("\n4. Best outgroup(s) for species tree")
+    print(  "------------------------------------")   
+    spDict = ogSet.SpeciesDict()
+    roots, clusters, rootedSpeciesTreeFN, nSupport = rfd.GetRoot(spTreeFN_ids, os.path.split(db.treesPatIDs)[0] + "/", rfd.GeneToSpecies_dash, nProcesses, treeFmt = 1)
+    if len(roots) > 1:
+        print("Observed %d duplications. %d support the best roots and %d contradict them." % (len(clusters), nSupport, len(clusters) - nSupport))
+        print("Best outgroups for species tree:")  
+    else:
+        print("Observed %d duplications. %d support the best root and %d contradict it." % (len(clusters), nSupport, len(clusters) - nSupport))
+        print("Best outgroup for species tree:")  
+    for r in roots: print("  " + (", ".join([spDict[s] for s in r]))  )
+    
+    qMultiple = len(roots) > 1
+    if qMultiple: print("\nAnalysing each of the potential species tree roots.")
+    resultsSpeciesTrees = []
+    for i, (r, speciesTree_fn) in enumerate(zip(roots, rootedSpeciesTreeFN)):
+        if qMultiple: 
+            resultsDir_new = resultsDir + "Orthologues_for_potential_outgroup_%d/" % i
+        else:
+            resultsDir_new = resultsDir + "Orthologues/"
+        os.mkdir(resultsDir_new)
+        resultsSpeciesTrees.append(resultsDir_new + "SpeciesTree_rooted.txt")
+        db.RenameTreeTaxa(speciesTree_fn, resultsSpeciesTrees[-1], db.ogSet.SpeciesDict(), qFixNegatives=True)
+
+        print("\n5%s. Reconciling gene and species trees" % ("-%d"%i if qMultiple else "")) 
+        print(  "-------------------------------------" + ("--" if qMultiple else ""))   
+        print("Root: " + (", ".join([spDict[s] for s in r])))
+        dlcparResultsDir = RunDlcpar(db.treesPatIDs, ogSet, nOGs, speciesTree_fn, db.workingDir)
+
+        # Orthologue lists
+        print("\n6%s. Inferring orthologues from gene trees" % ("-%d"%i if qMultiple else ""))
+        print(  "----------------------------------------" + ("--" if qMultiple else ""))      
+        pt.get_orthologue_lists(ogSet, resultsDir_new, dlcparResultsDir, db.workingDir)     
+     
+    print("\n7. Writing results files")
+    print(  "------------------------")   
+    
+    return GetResultsFilesString(resultsSpeciesTrees)
+     
+if __name__ == "__main__":
+    if len(sys.argv) < 2 or sys.argv[1] == "--help" or sys.argv[1] == "-h":
+        PrintHelp()
+        sys.exit()
+        
+    # get arguments
+    userDir = None
+    nProcesses = None
+    
+    args = sys.argv[1:]    
+    while len(args) != 0:
+        arg = args.pop(0)
+        if arg == "-t" or arg == "--threads":
+            if len(args) == 0:
+                print("Missing option for command line argument -t")
+                orthofinder.Fail()
+            arg = args.pop(0)
+            try:
+                nProcesses = int(arg)
+            except:
+                print("Incorrect argument for number of threads: %s" % arg)
+                orthofinder.Fail()   
+        else:
+            userDir = arg
+        
+    # Check arguments
+    print("0. Getting Orthologues")
+    print("----------------------")
+    if nProcesses == None:
+        print("""\nNumber of parallel processes has not been specified, will use the default value.  
+Number of parallel processes can be specified using the -t option.""")
+        nProcesses = orthofinder.nThreadsDefault
+    print("Using %d threads for alignments and trees" % nProcesses)
+    
+    orthofinderWorkingDir, orthofinderResultsDir, clustersFilename_pairs = tfo.GetOGsFile(userDir)
+    resultsString = GetOrthologues(orthofinderWorkingDir, orthofinderResultsDir, clustersFilename_pairs, nProcesses)
+    print(resultsString)
+    orthofinder.PrintCitation()
+    
+    
+    
diff --git a/scripts/process_trees.py b/scripts/process_trees.py
new file mode 100644
index 0000000000000000000000000000000000000000..94d696b40f1e84360b09d71011861eae4c072be0
--- /dev/null
+++ b/scripts/process_trees.py
@@ -0,0 +1,158 @@
+import re
+import os
+import sys
+import csv
+import glob
+import itertools
+import numpy as np
+import cPickle as pic
+from ete2 import Tree
+from scipy import sparse
+from collections import defaultdict
+
+sys.path.append(os.path.split(__file__)[0] + "/..")
+import orthofinder 
+
+def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
+    return [int(text) if text.isdigit() else text.lower() for text in re.split(_nsre, s)]
+
+#make_dicts checks every leaf against every other leaf to find the ancestor node and checks this node against speclist and duplist to see which dictionary the gene-pair should be placed in.
+def make_dicts(dlcparResultsDir, outputDir):
+    if not os.path.exists(outputDir): os.mkdir(outputDir) 
+    trees = glob.glob(dlcparResultsDir + '/*.locus.tree')
+    recons = glob.glob(dlcparResultsDir + '/*.locus.recon')
+    trees.sort(key = natural_sort_key)
+    recons.sort(key = natural_sort_key)
+    Orthologs = defaultdict(list)
+    for tree, recon in zip(trees, recons):
+        t = Tree(tree, format = 8)
+        reconlist = []
+        with open(recon, 'r') as rec:
+            for line in rec:
+                reconlist.append(line.split())
+        speclist = [x[0] for x in reconlist if 'spec' == x[2]]
+        speclist = [t&n for n in speclist]
+        for node in speclist:
+            c1, c2 = node.get_children()
+            for leaf1 in c1.get_leaf_names():
+                for leaf2 in c2.get_leaf_names(): 
+                  Orthologs[leaf1].append(leaf2)
+                  Orthologs[leaf2].append(leaf1)     
+    picFN = outputDir + 'totalorthodict_fast.pic'
+    with open(picFN, 'wb') as F:
+        pic.dump(Orthologs, F)
+    return Orthologs, picFN
+    
+def GetSpeciesGenesInfo(ogSet):
+    speciesLabels = orthofinder.GetSpeciesToUse(ogSet.speciesIDsFN) 
+    seqsInfo = orthofinder.GetSeqsInfo(ogSet.workingDirOF, speciesLabels)
+    genenumbers = list(np.diff(seqsInfo.seqStartingIndices))
+    genenumbers.append(seqsInfo.nSeqs - seqsInfo.seqStartingIndices[-1])
+    return speciesLabels, genenumbers
+        
+def one_to_one_efficient(orthodict, genenumbers, speciesLabels, iSpecies, outputDir):
+    """ speciesLabels is an ordered list of the speciesIDs
+        try to mostly deal with iSpecies which is the ordinal number not the label it is given
+    """
+    #Creates all matrices and appends them to matrixlist.
+    orthofinder.util.PrintTime("Processing orthologues for species %d" % iSpecies)
+    matrixlist = []
+    numspecies = len(speciesLabels)
+    speciesLabelsReverse = {label:i for i, label in enumerate(speciesLabels)}
+    for j in range(numspecies):
+        if iSpecies > j:
+            matrixlist.append(sparse.lil_matrix((genenumbers[iSpecies], genenumbers[j]), dtype=np.dtype(np.int8)))
+        else:
+            matrixlist.append(None)
+    #Fill matrices with orthodata
+    iSpecieslist = [x for x in orthodict if x.startswith('%d_' % speciesLabels[iSpecies])]
+    for count, queryGene in enumerate(iSpecieslist):
+        _,iGene = map(int, queryGene.split('_'))
+        for Gene in orthodict[queryGene]:
+            jSpLabel,jGene = map(int,Gene.split('_'))
+            jSp = speciesLabelsReverse[jSpLabel]
+            if iSpecies > jSp:
+                matrixlist[jSp][iGene, jGene]  = 1
+    for j, m in enumerate(matrixlist):    
+        with open(outputDir + 'ortholog_%d_%d_matrix.pic' % (iSpecies, j), 'wb') as file:
+            pic.dump(m, file)
+    return matrixlist   
+    
+def multiply(specmin, specmax, matrixDir):
+    with open(matrixDir + 'ortholog_%d_%d_matrix.pic' % (specmax, specmin), 'rb') as F:
+        M = pic.load(F)    
+    M = M.tocsr()
+    product = M.dot(M.transpose())
+    return product, M
+
+def WriteOrthologues(resultsDir, spec1, spec2, orthologues, speciesDict, sequenceDict):
+    d1 = resultsDir + "Orthologues_" + speciesDict[str(spec1)] + "/"
+    d2 = resultsDir + "Orthologues_" + speciesDict[str(spec2)] + "/"
+    with open(d1 + '%s__v__%s.csv' % (speciesDict[str(spec1)], speciesDict[str(spec2)]), 'wb') as outfile1, open(d2 + '%s__v__%s.csv' % (speciesDict[str(spec2)], speciesDict[str(spec1)]), 'wb') as outfile2:
+        writer1 = csv.writer(outfile1)
+        writer2 = csv.writer(outfile2)
+        writer1.writerow((speciesDict[str(spec1)], speciesDict[str(spec2)]))
+        writer2.writerow((speciesDict[str(spec2)], speciesDict[str(spec1)]))
+        for genes1, genes2 in orthologues:
+            writer1.writerow((", ".join([sequenceDict["%d_%d" % (spec1, o)] for o in genes1]), ", ".join([sequenceDict["%d_%d" % (spec2, o)] for o in genes2])))
+            writer2.writerow((", ".join([sequenceDict["%d_%d" % (spec2, o)] for o in genes2]), ", ".join([sequenceDict["%d_%d" % (spec1, o)] for o in genes1])))
+
+def GetOrthologues(orig_matrix, orig_matrix_csc, index):
+    orthologues = orig_matrix.getrowview(index).nonzero()[1]
+    index = orthologues[0]
+    originalSpeciesGenes = orig_matrix_csc.getcol(index).nonzero()[0]
+    return (originalSpeciesGenes, orthologues)
+                
+#Takes in the output of multiply, finds all of the orthology relationships which it writes to textfiles and returns the number of each type of orthology relationship.
+def find_all(matrix, orig_matrix):
+    orig_matrix_csc = orig_matrix.tocsc()   # most efficient for column access
+    orig_matrix = orig_matrix.tolil()       # most efficient for rowaccess
+    orthologues = []
+    done = set()
+    for iIndex in xrange(matrix.shape[0]):
+        if matrix[iIndex, iIndex] == 0 or iIndex in done: continue
+        orthologuesSp1, orthologuesSp2 = GetOrthologues(orig_matrix, orig_matrix_csc, iIndex)
+        done.update(orthologuesSp1)
+        orthologues.append((orthologuesSp1, orthologuesSp2))
+    return orthologues
+        
+def species_find_all(speciesDict, sequenceDict, matrixDir, resultsDir):
+    # Calls multiply and find_all on each species pair, and appends the numbers from find_all's output to the relevant csv lists.
+    speciesIDs = sorted(map(int, speciesDict.keys()))
+    nspecies = len(speciesIDs)           
+    for index1 in xrange(nspecies):
+        d = resultsDir + "Orthologues_" + speciesDict[str(speciesIDs[index1])]
+        if not os.path.exists(d): os.mkdir(d)
+    for index1, index2 in itertools.product(xrange(nspecies), xrange(nspecies)):      
+        if index1 >= index2: continue
+        product, M = multiply(index1, index2, matrixDir)
+        orthologues = find_all(product, M)
+        WriteOrthologues(resultsDir, speciesIDs[index2], speciesIDs[index1], orthologues, speciesDict, sequenceDict)    
+    
+def get_orthologue_lists(ogSet, resultsDir, dlcparResultsDir, workingDir):
+    # -> Matrices
+    matrixDir = workingDir + "matrices_orthologues/"
+    orthodict, orthodictFN = make_dicts(dlcparResultsDir, matrixDir)
+    
+    # -> dictionary
+    speciesLabels, genenumbers = GetSpeciesGenesInfo(ogSet)
+    for iSpecies in xrange(len(speciesLabels)):
+        one_to_one_efficient(orthodict, genenumbers, speciesLabels, iSpecies, matrixDir)
+        
+    # -> csv files
+    species_find_all(ogSet.SpeciesDict(), ogSet.SequenceDict(), matrixDir, resultsDir)
+    
+    
+if __name__ == "__main__":
+    import get_orthologues as go
+    d = "/home/david/projects/Orthology/OrthoFinder/Development/Orthologues/ExampleDataset_Results/"
+    orthofinderWorkingDir = d + "WorkingDirectory/"
+    clustersFilename_pairs = orthofinderWorkingDir + "clusters_OrthoFinder_v0.7.0_I1.5.txt_id_pairs.txt"
+    resultsDir = d + "Orthologues_Aug18_test2/"
+    dlcparResultsDir = resultsDir + "/WorkingDirectory/dlcpar/"
+    workingDir = resultsDir + "/WorkingDirectory/"
+    
+    ogSet = go.OrthoGroupsSet(orthofinderWorkingDir, clustersFilename_pairs, idExtractor = orthofinder.FirstWordExtractor)
+    
+    get_orthologue_lists(ogSet, resultsDir, dlcparResultsDir, workingDir)
+        
\ No newline at end of file
diff --git a/scripts/root_from_duplications.py b/scripts/root_from_duplications.py
new file mode 100644
index 0000000000000000000000000000000000000000..da905e8d635e7e2676d30fc1a51dc0e6de02b146
--- /dev/null
+++ b/scripts/root_from_duplications.py
@@ -0,0 +1,497 @@
+# -*- coding: utf-8 -*-
+"""
+Created on Tue Jul 26 14:34:46 2016
+
+@author: david
+"""
+
+"""
+Performance improvements:
+- Use an unrooted tree representation so that the trees don't need to be rerooted at all
+- convert all the gene names to species names once per tree
+
+"""
+import os
+import glob
+#import cProfile as profile
+#import pstats
+import ete2
+import argparse
+import itertools
+import multiprocessing as mp
+from collections import Counter
+
+def cmp_equal(exp, act):
+    """exp - expected set of species
+    act - actual set of species
+    """
+    return exp == act
+    
+def cmp_noExtra(exp, act):
+    """exp - expected set of species
+    act - actual set of species
+    """
+    return act.issubset(exp)
+
+# criteria with which to compare sets of species
+#compare = cmp_equal
+compare = cmp_noExtra
+
+# which clades must satisfy criteria
+criteria = "crit_all"
+#criteria = "crit_one"
+
+spTreeFormat = 3
+qSerial = False
+nProcs = 64
+#nProcs = 16
+
+class Node(object):
+    """ The class allows the user to get the 'child' nodes in any of the three directions
+    rather than just for the two 'child' nodes in ete2's model
+    """
+    def __init__(self, node):
+        """
+        node - the ete2 node
+        """
+        self.node = node
+        
+    def get_child_species_clades(self, c):
+        nodes = [c] if c.is_leaf() else c.get_children() 
+        return [ch.sp_down for ch in nodes]
+        
+    def get_grandchild_species_clades(self, c):
+        return [self.get_child_species_clades(ch) for ch in c.get_children()]
+
+    def get_up_grand_species_clades(self):
+        """ 
+        Exceptions:
+        - if tree is non-binary
+        """
+        if self.node.is_root(): 
+            children = self.node.get_children()
+            if len(children) != 3: raise Exception("Expected binary tree")
+            return [self.get_grandchild_species_clades(c) for c in children]            
+        ancestors = self.node.get_ancestors()
+        c = ancestors[0]
+        # must be at least one ancestor
+        ch_temp = c.get_children()
+        if len(ch_temp) == 3: 
+            # both easy
+            c1, c2 = ch_temp[:2] if ch_temp[2] != self.node else ch_temp[1:3] if ch_temp[0] != self.node else (ch_temp[0], ch_temp[2])
+            c1clades = self.get_child_species_clades(c1) 
+            c2clades = self.get_child_species_clades(c2) 
+            return [c1clades, c2clades]
+        elif len(ch_temp) == 2:
+            # easy one
+            c1 = ch_temp[0] if ch_temp[0] != self.node else ch_temp[1]
+            c1clades = self.get_child_species_clades(c1)
+            # difficult one
+            if len(ancestors) < 2: raise Exception("Expected binary tree")
+            c2 = ancestors[1]
+            ch_temp = c2.get_children()  
+            if len(ch_temp) == 3:
+                # easy - both are children
+                c21, c22 = ch_temp[:2] if ch_temp[2] != c else ch_temp[1:3] if ch_temp[0] != c else (ch_temp[0], ch_temp[2])
+                c21clade = c21.sp_down
+                c22clade = c22.sp_down
+            else:
+                c21 = ch_temp[0] if ch_temp[0] != c else ch_temp[1]
+                c21clade = c21.sp_down
+                c22clade = c2.sp_up     # c21clade = everything not in the others, work out at the end
+            return [c1clades, [c21clade, c22clade]]
+        else:
+            raise Exception("Expected binary tree")
+      
+    def get_grandrelative_clades_stored(self, allTaxa):
+        """
+        returns the hierarchically stored sets of species 
+        """
+        if self.node.is_root():
+            children = self.node.get_children()
+            if len(children) != 3: return None
+            return [self.get_grandchild_species_clades(c) for c in children]
+        else:
+            children = self.node.get_children()
+            if len(children) != 2: return None
+            down_clades = [self.get_grandchild_species_clades(c) for c in children]
+            return down_clades + [self.get_up_grand_species_clades()]
+            
+    def GetSpeciesSets(self, allTaxa, GeneMap):
+        if self.node.is_root():
+            na, nb, nc = self.node.get_children()
+            a = na.get_leaf_names()
+            b = nb.get_leaf_names()
+            c = nc.get_leaf_names()
+        else:
+            ch = self.node.get_children()
+            if len(ch) != 2: return None
+            na, nb = ch
+            a = set(na.get_leaf_names())
+            b = set(nb.get_leaf_names())
+            # now get node with set of species, c, below it
+            c = allTaxa.difference(a).difference(b)
+        return [set(map(GeneMap, x)) for x in (a,b,c)]
+
+    @staticmethod
+    def ToSpecies(clades_of_clades, GeneMap):
+        return [[set(map(GeneMap, [gene for gene in grandchild])) for grandchild in child] for child in clades_of_clades]
+        
+    @staticmethod            
+    def FlatenSpeciesSet(clades_of_clades):
+        return set([sp for child in clades_of_clades for grandchild in child for sp in grandchild])
+      
+   
+def StoreSpeciesSets(tree, GeneMap, allTaxa):
+    for node in tree.traverse('postorder'):
+        if node.is_leaf():
+            node.add_feature('sp_down', {GeneMap(node.name)})
+        elif node.is_root():
+            continue
+        else:
+            node.add_feature('sp_down', set.union(*[ch.sp_down for ch in node.get_children()]))
+    for node in tree.traverse('preorder'):
+        if node.is_root():
+            node.add_feature('sp_up', set())
+        else:
+            parent = node.up
+            if parent.is_root():
+                others = [ch for ch in parent.get_children() if ch != node]
+                node.add_feature('sp_up', set.union(*[other.sp_down for other in others]))
+            else:
+                others = [ch for ch in parent.get_children() if ch != node]
+                sp_downs = set.union(*[other.sp_down for other in others])
+                node.add_feature('sp_up', parent.sp_up.union(sp_downs))
+
+def GetStoredSpeciesSets(node):
+    children = node.get_children()
+    if node.is_root():
+        if len(children) != 3: return None
+        return [ch.sp_down for ch in children]
+    else:
+        if len(children) != 2: return None
+        return [ch.sp_down for ch in children] + [node.sp_up]
+
+def GeneToSpecies_dash(g):
+  return g.split("_", 1)[0]
+  
+def GeneToSpecies_secondDash(g):
+  return "_".join(g.split("_", 2)[:2])
+  
+def GeneToSpecies_3rdDash(g):
+  return "_".join(g.split("_", 3)[:3])
+  
+def GeneToSpecies_dot(g):
+  return g.split(".", 1)[0]
+                   
+def LocalCheck_clades(clade1, clade2, expClades, GeneToSpecies):
+    """Expected clades are now in tree structure going down two levels: [[A,B], [C,D]]
+    Observed clades should match up with expected clades in terms of containing no unexpected species
+    """   
+    X = set.union(*expClades[0])
+    Y = set.union(*expClades[1])
+    x0, x1 = expClades[0] if len(expClades[0]) == 2 else (None, None) if len(expClades[0]) == 1 else (Exception, Exception)
+    y0, y1 = expClades[1] if len(expClades[1]) == 2 else (None, None) if len(expClades[1]) == 1 else (Exception, Exception)
+    if x0 == Exception or y0 == Exception: raise Exception
+    matches = 0
+    for actClades in [clade1, clade2]:
+        if criteria == "crit_one":
+            for clade in actClades:
+                # left is (A u B) or (C u D) if length 1 else (A, B) or (AuB, AuB) or (C, D) or (CuD, CuD) if length 2 
+                if len(clade) == 1:
+                    c = clade[0]
+                    if compare(X, c):
+                        matches += 1
+                        break
+                    elif compare(Y, c):
+                        matches += 1
+                        break
+                else:
+                    c0 = clade[0] 
+                    c1 = clade[1]
+                    if x0 != None and ((compare(x0, c0) or compare(x1, c1)) or (compare(x1, c0) or compare(x0, c1))):
+                        matches += 1
+                        break
+                    elif compare(X, c0) or compare(X, c1):
+                        matches += 1
+                        break
+                    elif y0 != None and ((compare(y0, c0) or compare(y1, c1)) or (compare(y1, c0) or compare(y0, c1))):
+                        matches += 1
+                        break
+                    elif compare(Y, c0) or compare(Y, c1):
+                        matches += 1
+                        break
+            return (matches == 2)
+        elif criteria == "crit_all":
+            iUsed = None
+            for clade in actClades:
+                # left is (A u B) or (C u D) if length 1 else (A, B) or (AuB, AuB) or (C, D) or (CuD, CuD) if length 2 
+                if len(clade) == 1:
+                    c = clade[0]
+                    if compare(X, c) and iUsed != 0:
+                        iUsed = 0
+                        continue
+                    elif compare(Y, c) and iUsed != 1:
+                        iUsed = 1
+                        continue
+                    else:
+                        return False
+                else:
+                    c0 = clade[0] 
+                    c1 = clade[1]
+                    if (iUsed != 0 and x0 != None) and ((compare(x0, c0) and compare(x1, c1)) or (compare(x1, c0) and compare(x0, c1))):
+                        iUsed = 0
+                        continue
+                    elif iUsed != 0 and (compare(X, c0) and compare(X, c1)):
+                        iUsed = 0
+                        continue
+                    elif (iUsed != 1 and y0 != None) and ((compare(y0, c0) and compare(y1, c1)) or (compare(y1, c0) and compare(y0, c1))):
+                        iUsed = 1
+                        continue
+                    elif iUsed != 1 and (compare(Y, c0) and compare(Y, c1)):
+                        iUsed = 1
+                        continue
+                    else: 
+                        return False
+            return True
+        else:
+            raise NotImplemented  
+     
+def SupportedHierachies(t, G, S, GeneToSpecies, species, dict_clades, clade_names, treeName):
+    """
+    Only get the species sets in the first instance as work out the clades as and when
+    """
+    supported = []
+    # Pre-calcualte species sets on tree: traverse tree from leaves inwards
+    StoreSpeciesSets(t, GeneToSpecies, G)
+    for counter, n in enumerate(t.traverse()):
+        if n.is_leaf(): continue
+        # just get the list of putative descendant species at this point without worrying about grandchild clades
+        spSets = GetStoredSpeciesSets(n)
+        if spSets == None: continue # non-binary
+        clades = None
+        # check each of three directions to the root
+        for i, j in itertools.combinations(range(3), 2):
+            s1 = spSets[i]
+            s2 = spSets[j]
+            if min(len(s1), len(s2)) < 2: continue
+            k1 = frozenset(species)
+            k2 = frozenset(species)
+            for kk in dict_clades: 
+                if  s1.issubset(kk) and len(kk) < len(k1): k1 = kk
+                if  s2.issubset(kk) and len(kk) < len(k2): k2 = kk
+            if k1 != k2: continue
+            if len(k1) == 1:
+                # nothing to learn from a terminal species
+                continue
+            elif k1 == species:
+                # putative ancient duplication
+                continue
+            elif all([not clade.isdisjoint(s1) for clade0 in dict_clades[k1] for clade in clade0]) and all([not clade.isdisjoint(s2) for clade0 in dict_clades[k1] for clade in clade0]):
+                # Passed the check that the required species are present but at this point don't know where in the tree
+                # Get grandchild clades as required (could still avoid the more costly up clade if it's not required)
+                if clades == None:
+                    N = Node(n)
+                    clades = N.get_grandrelative_clades_stored(G)
+                    if clades == None: break   # locally non-binary in vicinity of node, skip to next node
+                if not LocalCheck_clades(clades[i], clades[j], dict_clades[k1], GeneToSpecies): continue
+                supported.append(frozenset(k1))
+    return supported   
+    
+"""
+Parallelisation wrappers
+================================================================================================================================
+"""
+
+def SupportedHierachies_wrapper(treeName, GeneToSpecies, species, dict_clades, clade_names):
+    if not os.path.exists(treeName): return []
+    t = ete2.Tree(treeName, format=1)
+    G = set(t.get_leaf_names())
+    S = list(set(map(GeneToSpecies, G)))
+    if len(S) < 4:
+        return []
+    result = SupportedHierachies(t, G, S, GeneToSpecies, species, dict_clades, clade_names, treeName)    
+#    print(treeName)
+    return result
+    
+def SupportedHierachies_wrapper2(args):
+    return SupportedHierachies_wrapper(*args)
+    
+"""
+End of Parallelisation wrappers
+================================================================================================================================
+"""
+
+def get_partitions(tree):
+    """ 
+    .. versionadded: 2.1
+    
+    It returns the set of all possible partitions under a
+    node. Note that current implementation is quite inefficient
+    when used in very large trees.
+
+    t = Tree("((a, b), e);")
+    partitions = t.get_partitions()
+    """
+    all_leaves = frozenset(tree.get_leaf_names())
+    all_partitions = set([all_leaves])
+    for n in tree.iter_descendants():
+        p1 = frozenset(n.get_leaf_names())
+        p2 = frozenset(all_leaves - p1)
+        all_partitions.add(p1)
+        all_partitions.add(p2)
+    return all_partitions
+    
+def AnalyseSpeciesTree(speciesTree):
+    species = frozenset(speciesTree.get_leaf_names())
+    parts = list(get_partitions(speciesTree))
+    nSpecies = len(species)
+    dict_clades = dict() # dictionary of clades we require evidence of duplicates from for each partition
+    clade_names = dict()
+    for p in parts:
+        if len(p) == nSpecies: continue
+        speciesTree.set_outgroup(list(species.difference(p))[0])
+        if len(p) == 1:
+            # no use for identifying clades
+            continue
+        n = speciesTree.get_common_ancestor(p)
+        clade_names[p] = n.name + "_" + str(hash("".join(p)))[-4:]
+        # go down two steps
+        ch = n.get_children()
+        ch0 = [ch[0]] if ch[0].is_leaf() else ch[0].get_children() 
+        ch1 = [ch[1]] if ch[1].is_leaf() else ch[1].get_children() 
+        dict_clades[p] = [[set(c.get_leaf_names()) for c in ch0], [set(c.get_leaf_names()) for c in ch1]]
+    return species, dict_clades, clade_names
+
+def RootAtClade(tree, accs_in_clade):
+    if len(accs_in_clade) == 1:
+        tree.set_outgroup(list(accs_in_clade)[0])
+        return tree
+    accs = set(tree.get_leaf_names())
+    dummy = list(accs.difference(accs_in_clade))[0]
+    tree.set_outgroup(dummy)
+    node = tree.get_common_ancestor(accs_in_clade)
+    tree.set_outgroup(node)
+    return tree
+    
+def ParsimonyRoot(allSpecies, clades, supported_clusters):
+    c = Counter(supported_clusters)
+    contradictions = dict()
+    for clade in clades:
+        clade_p = allSpecies.difference(clade)
+        against = 0
+        for observed, n in c.items():
+            if (not observed.issubset(clade)) and (not observed.issubset(clade_p)):
+                against += n
+        contradictions[clade] = against
+    m = min(contradictions.values())
+    n = len(supported_clusters)
+    nSupport = n-m
+    roots = []
+    for clade, score in contradictions.items():
+        if score == m:
+            if len(clade) > len(allSpecies)/2:
+                roots.append(allSpecies.difference(clade))
+            elif len(clade) == len(allSpecies)/ 2:
+                if allSpecies.difference(clade) not in roots: roots.append(clade) 
+            else:
+                roots.append(clade)
+    return roots, nSupport
+
+def PlotTree(speciesTree, treesDir, supported_clusters, qSimplePrint=True):
+    c = Counter(supported_clusters)
+    print("Observed duplications")
+    if qSimplePrint:
+        for x in c.most_common():
+            print(x)
+        return
+    S = set(speciesTree.get_leaf_names())
+    tree_for_figure = speciesTree.copy()
+    for cluster, count in c.items():
+        complement = None
+        if len(cluster) == 1: continue
+        n = tree_for_figure.get_common_ancestor(cluster)
+        # cluster could straddle root
+        if n == tree_for_figure:
+            complement = S.difference(cluster) # complement
+#            print(complement)
+            if len(complement) == 1:
+#                continue
+                n = tree_for_figure & list(complement)[0]
+            else:
+                n = tree_for_figure.get_common_ancestor(complement)
+#        print(('r' if complement != None else 'g', count, cluster))
+        face = ete2.faces.TextFace(" " + str(count), fgcolor=('red' if complement else 'green'))
+        n.add_face(face, 1) 
+    tree_for_figure.render(treesDir + "Duplications.pdf")
+
+def GetRoot(speciesTreeFN, treesDir, GeneToSpeciesMap, nProcessors, treeFmt=None):
+    if treeFmt == None: treeFmt = spTreeFormat
+    speciesTree = ete2.Tree(speciesTreeFN, format=treeFmt)
+    species, dict_clades, clade_names = AnalyseSpeciesTree(speciesTree)
+    pool = mp.Pool(nProcessors, maxtasksperchild=1)       
+    list_of_lists = pool.map(SupportedHierachies_wrapper2, [(fn, GeneToSpeciesMap, species, dict_clades, clade_names) for fn in glob.glob(treesDir + "/*")])
+    clusters = []
+    for l in list_of_lists:
+        clusters.extend(l)
+    roots, nSupport = ParsimonyRoot(species, dict_clades.keys(), clusters)
+    roots = list(set(roots))
+    speciesTrees_rootedFNs =[]
+    for i, r in enumerate(roots):
+        speciesTree = RootAtClade(speciesTree, r) 
+        speciesTree_rootedFN = os.path.splitext(speciesTreeFN)[0] + "_%d_rooted.txt" % i 
+#    speciesTree = LabelNodes()
+        speciesTree.write(outfile=speciesTree_rootedFN, format=4)
+        speciesTrees_rootedFNs.append(speciesTree_rootedFN)
+    return roots, clusters, speciesTrees_rootedFNs, nSupport
+      
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input_tree")
+    parser.add_argument("-s", "--separator", choices=("dot", "dash", "second_dash", "3rd_dash"))
+    parser.add_argument("-S", "--Species_tree")
+    parser.add_argument("-d", "--directory", action="store_true")
+    parser.add_argument("-v", "--verbose", action="store_true")
+    args = parser.parse_args()
+      
+    GeneToSpecies = GeneToSpecies_dash
+    if args.separator and args.separator == "dot":
+        GeneToSpecies = GeneToSpecies_dot  
+    elif args.separator and args.separator == "second_dash":
+        GeneToSpecies = GeneToSpecies_secondDash  
+    elif args.separator and args.separator == "3rd_dash":
+        GeneToSpecies = GeneToSpecies_3rdDash  
+    
+    if not args.directory:
+        speciesTree = ete2.Tree(args.Species_tree, format=spTreeFormat)
+        species, dict_clades, clade_names = AnalyseSpeciesTree(speciesTree)
+        t = ete2.Tree(args.input_tree, format=1)
+        G = set(t.get_leaf_names())
+        S = list(set(map(GeneToSpecies, G)))
+        filename = 'profile_stats.stats'
+#        profile.run('c = SupportedHierachies_wrapper(args.input_tree, GeneToSpecies, species, dict_clades, clade_names) ', filename)
+        c = SupportedHierachies_wrapper(args.input_tree, GeneToSpecies, species, dict_clades, clade_names)      
+        for cc in c: print(cc)
+#        stats = pstats.Stats('profile_stats.stats')
+#        stats.sort_stats('cumulative')
+#        stats.print_stats(0.3)
+    elif qSerial:
+        speciesTree = ete2.Tree(args.Species_tree, format=spTreeFormat)
+        species, dict_clades, clade_names = AnalyseSpeciesTree(speciesTree)
+        clusters = []
+        for fn in glob.glob(args.input_tree + "/*"):
+            c = SupportedHierachies_wrapper(fn, GeneToSpecies, species, dict_clades, clade_names)
+            clusters.extend(c)
+        roots, nSupport = ParsimonyRoot(species, dict_clades.keys(), clusters)
+        if len(roots) > 1: print("Observed %d duplications. %d support the best roots and %d contradict them." % (len(clusters), nSupport, len(clusters) - nSupport))
+        else: print("Observed %d duplications. %d support the best root and %d contradict it." % (len(clusters), nSupport, len(clusters) - nSupport))
+        print("Best outgroup(s) for species tree:")
+        for r in roots: 
+            print(r)
+        if args.verbose: PlotTree(speciesTree, args.input_tree, clusters)
+    else:
+        root, clusters, _, nSupport = GetRoot(args.Species_tree, args.input_tree, GeneToSpecies, nProcs, treeFmt = 1)
+        for r in root: print(r)
+        speciesTree = ete2.Tree(args.Species_tree, format=1)
+        if args.verbose: PlotTree(speciesTree, args.input_tree, clusters, qSimplePrint=False)
+ 
diff --git a/trees_for_orthogroups.py b/trees_for_orthogroups.py
index 9f0004b5e8be279c806543ccd8cbfcabdaf60cd5..7dbd61de3da03f1e85c80ab1cd3b34a2202a6cd4 100755
--- a/trees_for_orthogroups.py
+++ b/trees_for_orthogroups.py
@@ -40,15 +40,17 @@ import glob
 
 import orthofinder    
 
-version = "0.6.1"
-nProcessesDefault = 16
-    
+version = "1.0.0"   
 
-def RunCommandSet(commandSet):
-    orthofinder.util.PrintTime("Runing command: %s" % commandSet[-1])
-    for cmd in commandSet:
-        subprocess.call(cmd, shell=True)
-    orthofinder.util.PrintTime("Finshed command: %s" % commandSet[-1])
+def RunCommandSet(commandSet, qHideStdout):
+#    orthofinder.util.PrintTime("Runing command: %s" % commandSet[-1])
+    if qHideStdout:
+        for cmd in commandSet:
+            subprocess.call(cmd, shell=True, stdout=subprocess.PIPE)
+    else:
+        for cmd in commandSet:
+            subprocess.call(cmd, shell=True)
+#    orthofinder.util.PrintTime("Finshed command: %s" % commandSet[-1])
     
 class FastaWriter(object):
     def __init__(self, fastaFileDir):
@@ -90,7 +92,7 @@ class FastaWriter(object):
     def SortSeqs(self, seqs):
         return sorted(seqs, key=lambda x: map(int, x.split("_")))
                     
-def Worker_RunCommand(cmd_queue):
+def Worker_RunCommand(cmd_queue, nProcesses, nToDo, qHideStdout):
     """ repeatedly takes items to process from the queue until it is empty at which point it returns. Does not take a new task
         if it can't acquire queueLock as this indicates the queue is being rearranged.
         
@@ -98,25 +100,31 @@ def Worker_RunCommand(cmd_queue):
     """
     while True:
         try:
-            commandSet = cmd_queue.get(True, 10)
-            RunCommandSet(commandSet)
+            i, commandSet = cmd_queue.get(True, 1)
+            nDone = i - nProcesses + 1
+            if nDone >= 0 and divmod(nDone, 10 if nToDo <= 200 else 100 if nToDo <= 2000 else 1000)[1] == 0:
+                orthofinder.util.PrintTime("Done %d of %d" % (nDone, nToDo))
+            RunCommandSet(commandSet, qHideStdout)
         except Queue.Empty:
             return   
     
-def RunParallelCommandSets(nProcesses, commands):
-    
+def RunParallelCommandSets(nProcesses, commands, qHideStdout = False):
+    """nProcesss - the number of processes to run in parallel
+    commands - list of lists of commands where the commands in the inner list are completed in order (the i_th won't run until
+    the i-1_th has finished).
+    """
     # Setup the workers and run
     cmd_queue = mp.Queue()
-    for cmd in commands:
-        cmd_queue.put(cmd)
-    runningProcesses = [mp.Process(target=Worker_RunCommand, args=(cmd_queue,)) for i_ in xrange(nProcesses)]
+    for i, cmd in enumerate(commands):
+        cmd_queue.put((i, cmd))
+    runningProcesses = [mp.Process(target=Worker_RunCommand, args=(cmd_queue, nProcesses, i+1, qHideStdout)) for i_ in xrange(nProcesses)]
     for proc in runningProcesses:
         proc.start()
     
     for proc in runningProcesses:
         while proc.is_alive():
-            proc.join(60.)
-            time.sleep(1)    
+            proc.join(10.)
+            time.sleep(2)    
 
 def WriteTestFile(workingDir):
     testFN = workingDir + "SimpleTest.fa"
@@ -254,7 +262,7 @@ def PrintHelp():
     
     print("""-t max_number_of_threads, --threads max_number_of_threads
     The maximum number of processes to be run simultaneously. The deafult is %d but this 
-    should be increased by the user to the maximum number of cores available.\n""" % nProcessesDefault)
+    should be increased by the user to the maximum number of cores available.\n""" % orthofinder.nThreadsDefault)
         
     print("""-h, --help
    Print this help text""")
@@ -396,7 +404,7 @@ if __name__ == "__main__":
     if nProcesses == None:
         print("""Number of parallel processes has not been specified, will use the default value.  
    Number of parallel processes can be specified using the -t option\n""")
-        nProcesses = nProcessesDefault
+        nProcesses = orthofinder.nThreadsDefault
     print("Using %d threads for alignments and trees\n" % nProcesses)
     
     ogs = orthofinder.MCL.GetPredictedOGs(clustersFilename_pairs)