Entering edit mode

2.8 years ago

mohammadhassanj
▴
150

I have a table with 10000 rows, each row having 10 columns and their numbers vary from 0 to 80.Now, if I want to compare the mean of these rows.Maybe because of the largeness of one of the numbers in the row and the compensation of the smaller numbers of the same row, I can not make a fair comparison (because the value of each column is also important to me).

I know that I can use standard deviation. But I want to use a method that combines standard deviation and mean for each row to have just one number so I can sort the rows based on those rows

If I understand correctly, you want to sort the rows based on some summary statistics but you're worried about different rows having different distributions. First you should look at the distribution of the data. If the data is not unimodal, measures of central tendency (mean, median, mode) may not be useful. Second if the problem is with outliers, you could decide whether or not to include them. As already suggested, the median is a more robust measure than the mean but depending on your data. you may also want to consider the mode (i.e. the most frequent value). Similarly if interested in a measure of dispersion, the inter-quartile range is more robust than the standard deviation.

Consider using the median instead of mean if you worry about a potential outlier in a row that can bias your mean. Or try to see if log normalization makes your data more normal. What kind of data are we talking about btw?

Rows are microRNAs and in columns there are genes (gene set) that are targeted by each microRNA. Numbers each row are the number of databases that report the targeting of a particular gene by miRNA. My goal is to look at the microRNA that targets the gene's set. So it should have the highest average and the least standard deviation. Can they be compared by normalizing each row individually?

In that case you can try to see if you can test for enrichment, with hypergeometric test (Fisher's exact).

could you please explain more about this? how can I do that?

I am not sure, I never seen such an approach like yours yet. But I was thinking maybe you can test for each miRNA if a certain gene set is enriched (and do this for all 10 gene sets). What you need for such enrichment analysis, is for each miRNA the total number of gene targets in a gene set (you say you have this number already). Furthermore you'll need the total number of gene targets of that gene set (so the total number of genes targeted by any miRNA). Then you need to find the total targets of the specific miRNA on the whole genome (so all genes targeted by this miRNA), and finally the total number of targets by all miRNAs. You can use Fisher's exact with these 4 numbers, and test for enrichment. Of course use p-value adjustment such as FDR adjustment.