Snakecy's NOTE


  • Home

  • Archives

  • About

  • Search

Perl Demo

Posted on 2016-02-11   |   In open-source   |     |   Views

install tutorial

Perl install

sudo apt-get update

sudo apt-get upgrade

sudo apt-get install -y perl

perl -version

  • my $secstr = join(“\t\t”,@items2[1…$#items2]);
  • perl example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/usr/bin/perl -w
die "perl $0 perl infile1 infile2 > outfile.xls\n" unless @ARGV == 2;
$infile1 = shift;
$infile2 = shift;
print "ID\tP1_counts\tP2_counts\n";
my %data;
open IN, $infile1 or die $!;
while(<IN>){
chomp;
my @items = split /\s+\/;
my $str = $items[1];
$data{$items[0]} = $str;
}
close IN;
my $str1 = 0;
# print "$str1\n";
open IN, $infile2 or die $!;
while(<IN>){
chomp;
my @items2 = split /\s+/;
my $str2 = $items2[1];
if($data{$items2[0]}){
print "$items2[0]\t$data{$items2[0]}\t$str2\n";
delete $data{$items2[0]};
}else{
print "$items2[0]\t$str1\t$str2\n";
delete $data{$items2[0]};
}
}
close IN;
foreach my $chr (keys %data){
print "$chr\t$data{$chr}\t$str1\n";
}
# output combine to files
#open COMBINE, "> tmp.xls" or die "Unable to create combine file : $!";
#foreach (sort keys %data){

# COMBINE $data{$itesm[0]}{$items[1]},"\n";
#}
#close COMBINE;

How to split array in perl file

1
2
3
4
5
        my $datetime = $items2[5];
# my @straing = split /[-\s:]+/, $datatime;
my @time = split(/[[:space:]]+/,$datetime);
my @day = split(/-/,$time[0]);
my @hr = split(/:/,$time[1]);

Question: If there is 10g file data 10g, but there are a lot of row is duplicated and need to merge the duplicate rows with a line, there are two ways to achieve. Ref

  • cat data |sort|uniq > new_data # cost too much time

  • A small tool processed by perl.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/perl
use warnings;
use strict;
# creat a hash, each raw as a key value, then using the number of each line to fill the key value.

my %hash;
my $script = $0; # Get the script name
sub usage
{
printf("Usage:\n");
printf("perl $script <source_file> <dest_file>\n");
}
# If the number of parameters less than 2 ,exit the script
if ( $#ARGV+1 < 2) {
&usage;
exit 0;
}

my $source_file = $ARGV[0]; #File need to remove duplicate rows
my $dest_file = $ARGV[1]; # File after remove duplicates rows
open (FILE,"<$source_file") or die "Cannot open file $!\n";
open (SORTED,">$dest_file") or die "Cannot open file $!\n";
while(defined (my $line = <FILE>))
{
chomp($line);
$hash{$line} += 1;
# print "$line,$hash{$line}\n";
}
foreach my $k (keys %hash) {
print SORTED "$k,$hash{$k}\n"; #change the line and print out the col, as well as the number of col to objective file
}
close (FILE);
close (SORTED);

Regular Expression

1
2
3
4
5
str="this is a string"
# estimate if the character "this" is contained in str, using the following statement

[[ $str =~ "this" ]] && echo "\$str contains this"
[[ $str =~ "that" ]] || echo "\$str does NOT contain this"

judge symbol “[[“
match symbol “=~”

Libsvm Package

Posted on 2016-02-10   |   In open-source   |     |   Views

About the applications using libsvm tools on different platform.

libsvm tutorial

which can be used on the following platform, such as Java, matlab(64 bit), python, svm-toy.

liblinear-java-1.95.jar

1
2
training:  java -cp liblinear-java-1.95.jar de.bwaldvogel.liblinear.Train -s 0 data_file
prediction: java -cp liblinear-java-1.95.jar de.bwaldvogel.liblinear.Prediction -b 1 test_file data_file.model output_file

Spark liblinear

  • Limited for the JDK version (before 8u)

Resolve the debug problem of WARN NativeCodeLoader

Posted on 2016-02-10   |   In cloud-tech   |     |   Views

Debug

Resolve the debug problem of when to start up spark, WARN NativeCodeLoader will turn on.

1
2
3
4
5
export HADOOP_HOME=/home/admin/hadoop
#export PATH=$HADOOP_HOME/bin:$PATH
#export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
#export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
1
2
WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS

Solutions

Three ways to solve the problem

install libgfortran3
linstall libatlas3-base libopenblas-base
OpenBlase

  • in sbt file: libraryDependencies += “com.github.fommil.netlib” % “all” % “1.1.2”

    • -Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS
    • https://github.com/mridulm/netlib-java
  • or in commandline

    1
    2
    3
    4
    $ sudo apt-get install libgfortran3
    // check the libgfortran3
    $ dpkg -l libgfortran3
    $ sudo apt-get install gfortran
  • resolve the problem (important)

    • refers to https://github.com/fommil/netlib-java#machine-optimised-system-libraries
1
2
3
4
5
6
sudo apt-get install libatlas3-base libopenblas-base
sudo update-alternatives --config libblas.so.3
sudo update-alternatives --config liblapack.so.3
$ /etc/ld.so.conf
add /usr/lib/libblas.so.3 & /usr/lib/liblapack.so.3
$ sudo ldconfig

Mac License

Posted on 2016-02-10   |   In License   |     |   Views

Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.

- Open the terminal, then type in “ sudo xcodebuild -license”

  • Type in “ enter”

  • Type in “space” for more to read, or “q” for quit

  • Finally, type in “agree” and enter.

Machine Learning notes (01)

Posted on 2016-01-16   |   In cloud-tech   |     |   Views

Machine learning package for Python (1)

Basic Requirement

Introduce

Scikit-learn
  • A module for machine learning in Python, which is based on Numpy, Scipy and matplotlib.

    • Scilit-learn requires:

      • Python (>=2.6 or >=3.3)

      • Numpy (>=1.6.1)

      • Scipy (>=0.9)

      For OSX installation

      1
      2

      pip install -U numpy scipy scikit-learn
  • Before install the required tools above, we need to install homebrew to finish the installation.

    • In mac terminal, run this
      1
      2
      /usr/bin/ruby \
      -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
12345
SZhou

SZhou

The unexamined life is not worth living. --Socrates

24 posts
5 categories
22 tags
RSS
GitHub LinkedIn Weibo
Creative Commons

Links

DataTopics Chinabyte
© 2016 SZhou
Powered by Hexo
Theme - NexT.Mist
  |   hits from vistors