2014-07-17

eclipse JUnit Unit test

Install JUnit

Download junit-4.10.jar from http://sourceforge.net/projects/junit/files/junit/4.10/ to workspace/your-project/lib/
start eclipse
select your-project top node
right click
select [Refresh]
select your-project top node
right click
select [Property]
select [Java Build Path]
select [Libraries] tab
click [add JARs]
select lib/junit-4.10.jar
click [OK]

ref

http://www.okapiproject.com/java/java_tools/eclipse/vol_2/eclipse_junitfuncion.htm

generate unit-test

select your class(want test)
right click
[New] -> [JUnit test case]
[Finish]

run unit-test

right click generated (edited) unit-test class
[run as] -> [JUnit Test]

unit-test example

test this class

public class ChangeNum {
  public static void main(String[] args) {
    ChangeNum cn1 = new ChangeNum();
    cn1.changeNum(args[0]);
  }

  int changeNum(String s1) {
    int n1 = Integer.parseInt(s1);
    return n1;
  }
}

select this class
right click
[New] -> [JUnit test case]
[Finish]
edit generated file as:

import junit.framework.TestCase;

public class ChangeNumTest extends TestCase {
  public ChangeNumTest(String name) {
    super(name);
  }
 
  public void testChangeNum() {
    String ts1 = "100"; 
    int tn1 = 100;
    ChangeNum tcn1 = new ChangeNum(); 
    int test1 = tcn1.changeNum(ts1); 
    assertEquals("Error not number type", tn1, test1); 
  }
}

right click ChangeNumTest class
[run as] -> [JUnit Test]

Ref

http://www.javaroad.jp/opensource/js_eclipse9.htm

run all unit-test

select your packege
right click
[new]->[other]->[java]->[JUnit]->[JUnit Test Suite]->[Next]
click [select All]
click [Finish]

2014-07-17

Use git in eclipse

create git repository

Select your project top node
Right Click
[Team] -> [Share Project]
select [Git]
check [Use or create repository in parent folder of project]
select your project folder
Click [Create Repository]
Click [Finish]

Ref

add file to git

select file
Right Click
[Team] -> [Add to index]

Ref

http://gomafuace.seesaa.net/article/301694478.html

browse git repository

[Window]-> [Open perspective]->[other]
select [Git]

Ref

http://gomafuace.seesaa.net/article/301694478.html

2014-07-17

Windowsで作られたeclipse workspaceをLinuxで動かす

select project top node
right click
[property]->[Resorce]
[text file encoding]->[other]->[MS932]
[new text file line delimitter]->[other] -> [windows]

Ref

http://pentan.info/program/tools/eclipse_sjis.html

2013-12-17

LDA

LDAのリンクいろいろ

入門

LDA入門 http://www.slideshare.net/tsubosaka/tokyotextmining

ソフトウェア

Python GensimでLDAを使うための前準備・パッケージのインストール http://hivecolor.com/id/54
LSIやLDAを手軽に試せるGensimを使った自然言語処理入門 http://yuku-tech.hatenablog.com/entry/20110623/1308810518
Rでトピック分析（LDA:Latent Dirichlet Allocatoion） http://noahs--ark.blogspot.jp/2013/03/r.html

応用

http://en.wikipedia.org/wiki/Dynamic_topic_model
トピックモデルの応用：関係データ、ネットワークデータ http://www.ism.ac.jp/~daichi/lectures/ISM-2012-TopicModels_day2_3_timeseries.pdf
Latent Dirichlet Allocation(LDA)を用いたニュース記事の分類 http://developer.smartnews.be/blog/2013/08/19/lda-based-channel-categorization-in-smartnews/
NMFとかLDAの論文紹介 http://nlpyutori.g.hatena.ne.jp/yaruki_nil/20100910/1284089305

拡張

Supervised latent Dirichlet allocation for classification http://www.cs.cmu.edu/~chongw/slda/

2013-12-17

CRF++

CRF++の使い方などのまとめ

CRFの原理

CRFがよくわからなくてお腹が痛くなってしまう人のための30分でわかるCRFのはなし http://d.hatena.ne.jp/echizen_tm/20111206/1323180144
条件付き確率場(CRF)メモ http://d.hatena.ne.jp/jetbead/20110929/1317253922
条件付き確率場の推論と学習 http://www.slideshare.net/rezoolab/seminar-19715143
- Computer Visionの話などもあるスライド
How Conditional Random Fields and Logistic Regression could be the same? http://stats.stackexchange.com/questions/63826/how-conditional-random-fields-and-logistic-regression-could-be-the-same

CRF++の使い方解説

http://nlp.kimura-s.otaru-uc.ac.jp/index.php?CRF%2B%2B
http://yongsun.me/2008/03/a-beginners-note-of-crf/
- 学習したモデルファイルの中身を見る方法
Crfと素性テンプレート http://www.slideshare.net/uchumik/crf-8416551

CRF 応用

CRF を使った Web 本文抽出　http://www.slideshare.net/shuyo/crf-web
いろいろ論文紹介 http://d.hatena.ne.jp/n_shuyo/20100716/nlp

CRF++以外のCRF実装

CRFsuite http://www.chokkan.org/software/#id492076
- CRF++と同じくC++で実装されている。CRF++と異なり最適化を　liblbfgs　http://www.chokkan.org/software/liblbfgs/ 　で行う(CRF++は最適化部分が分離していない)。L1ノルム正規化にはlbfgssではなくlbfgs-bが必要なはずなので、L2ノルムしかできないようだ？CRF++はL1ノルムも扱える
可変次数 Linear-Chain CRF http://vocrf.net/index_ja.html
- CRFsuiteをさらに改造したものなようだ　github : https://github.com/hiroshi-manabe/crfsuite-variableorder
conditional random fields package for R http://stackoverflow.com/questions/6524232/conditional-random-fields-package-for-r
http://crf.sourceforge.net/
FlexCRFs http://flexcrfs.sourceforge.net/ C++実装。並列計算できる

CRF++の使い方の練習

CRF++を実際に動かして使い方の練習をする

簡単のため、次の１つの要素のみのtemplateファイルを使う

template

U02:%x[0,0]

このtemplateを使って次の学習データを学習する

train.data

A +
A +
A +
B -
B +
B +
C +
X -
X -
Y -
Z -

次のコマンドで学習を行う

$ crf_learn  -t template train.data   model

学習結果は以下のようなファイルとなる

model.txt

version: 100
cost-factor: 1
maxid: 12
xsize: 1

+
-

U02:%x[0,0]

0 U02:A
2 U02:B
4 U02:C
6 U02:X
8 U02:Y
10 U02:Z

0.6462678961244115
-0.6462678961244128
0.2016113127964356
-0.2016113127964371
0.3374158284496933
-0.3374158284496931
-0.5213066287709371
0.5213066287709373
-0.3374158284496930
0.3374158284496934
-0.3374158284496929
0.3374158284496936

この学習結果を次のテストファイルに適用する

test.data

A +
X -
B +
C +

$ crf_test -v2 -m model test.data

次の実行結果を得る

# 0.230404
A	+	+/0.784576	+/0.784576	-/0.215424
X	-	-/0.739354	+/0.260646	-/0.739354
B	+	+/0.599462	+/0.599462	-/0.400538
C	+	+/0.662584	+/0.662584	-/0.337416

Aの答えが+か-のどちらにラベルになるかの確率

+/0.784576	-/0.215424

は、学習データ中の最初の２つの数値

0.6462678961244115
-0.6462678961244128

で決まる。これを手計算してみる。

p=exp(0.6462678961244115);
n=exp(-0.6462678961244128);
p/(p+n) =  .7845760890555549;
n/(p+n) =  0.215423910944445

となりcrf_testの結果と一致した。
Bについても学習データの

0.2016113127964356
-0.2016113127964371

から計算すると同じ様に計算できる。

CRF++の練習その２

次の入れ替えをした学習データで同じ事をしてみる
A:2 B:3 C:4 X:7 Y:8 Z:9

train.data

2 +
2 +
2 +
3 -
3 +
3 +
4 +
7 -
7 -
8 -
9 -

学習結果も名前が入れ替わっただけの同じものになる。
今度は学習データにない 0 5 を入力するとどうなるか確かめる

test.data

2 +
9 -
0 +
5 -

結果は以下のようになり 0,5は確率0.5でどちらともいえない結果になった

# 0.129962
2	+	+/0.784576	+/0.784576	-/0.215424
9	-	-/0.662584	+/0.337416	-/0.662584
0	+	+/0.500000	+/0.500000	-/0.500000
5	-	+/0.500000	+/0.500000	-/0.500000

CRF++は数値データの補完が出来ないことを確認した

2013-12-17

nltk 学習用リンク

テザリング

iphone linux

脱獄不要！iPhoneとアプリでテザリングできるiRingerの使い方 http://netbuffalo.doorblog.jp/archives/4371453.html

Linux(Gnome2)から使う場合に上記よりもやや簡単な方法を発見した

記事と同じようにiphoneをセットアップ
gnomeの無線LANアイコンを右クリックして無線LAN管理GUIメニューを次のように開く
- 接続を編集する　→　無線　→　追加　→　
  - モード　アドホック　
  - IP4タブでiphoneのテザリングソフトに表示されてるipと１つずらしたアドレスを設定
    - 例えばテザリングソフトでiphoneのアドレスが10.0.0.10と表示してるならパソコン側のIP4設定は10.0.0.11の255.255.255.0で他は空白
- 新規接続に適当な名前を設定して保存
無線LANのアイコンを左クリック → 表示されていない無線LANネットワークに接続 → 設定したネットワーク名を選択でadhocネットワークが起動
iphoneを設定したネットワークに接続
パソコン側firefoxのproxyを上記記事と同じように設定

以上で接続できた。gentoo linuxで動作を確認。KDE4などでも同様にGUIだけでテザリング設定できると思われる

proxychains

firefoxの設定しなくても以下でも接続できる

/etc/proxychains.conf

localnet  10.0.0.0/255.255.255.0
socks5 10.0.0.10 8888

usage

proxychains ssh hoge@example.com

tsocks

/etc/socks/tsocks.conf

local = 10.0.0.0/255.255.255.0
server = 10.0.0.10
server_type = 5
server_port = 8888
#tordns_enable = false

usage

# nano /etc/hosts

ipaddress example.com

$ tsocks ssh hoge@example.com

DNSを自動解決するにはソースからビルドしなおす

http://debiancdn.wordpress.com/2009/07/08/macports%E3%81%AEportfile%E3%82%92%E3%81%84%E3%81%98%E3%81%A3%E3%81%A6socks%E5%95%8F%E9%A1%8C%E8%A7%A3%E6%B1%BA/

Install JUnit

ref

generate unit-test

run unit-test

unit-test example

Ref

run all unit-test

create git repository

Ref

add file to git

Ref

browse git repository

Ref

Ref

入門

ソフトウェア

応用

拡張

CRFの原理

CRF++の使い方解説

CRF 応用

CRF++以外のCRF実装

CRF++の使い方の練習

CRF++の練習その２

python 環境

python text mining tips

YaCy

proxychains

tsocks

usage

DNSを自動解決するにはソースからビルドしなおす