R and Python Tables
R and Python Tables
Create table data.frame(a=c(0.3, 1.5, 7), data.table(i=c("x", "y", "z"), pd.DataFrame(data={"a": [0.3, 1.5, 7],
b=c(NA, 6, 2), a=c(0.3, 1.5, 7), "b": [None, 6, 2]},
row.names=c("x", "y", "z")) b=c(NA, 6, 2), key="i") index=["x", "y", "z"])
Get column "a" T$a; T[["a"]]; T["a"]; T[, "a"] T$a; T[["a"]]; T[, a]; T[, "a", with=F] T.a; T["a"]; T.loc[:, "a"]
Add column T$d <- new_column T[, d := new_column]; # see help(`:=`) T["d"] = new_column;
# or any of the above forms set(T, NULL, "d", new_column) T.insert(0, "d", new_column)
Remove column T[["d"]] <- NULL; # idem T[, d := NULL]; del T["d"];
T <- T[names(T) != "d"] set(T, NULL, “d”, NULL) T.drop("d", axis=1, inplace=True)
Get subset (example) T[1:2, c("a", "c")] T[1:2, .(a, c)] # .(a, c) == list(a, c) T.ix[1:2, ["a", "c"]] # ix == iloc + loc
Apply func to rows/cols* lapply(T,func); apply(T, 2, func) T[, lapply(.SD, func), .SDcols=-1] T.apply()
Mean column* T$sum <- rowMeans(T) T[, sum := rowMeans(.SD)] T["sum"] = T.mean(axis=1)
Add two columns* T$b + T$c T$b + T$c; T[, b + c] T.b + T.c
* Untested.
More about:
● R data.table differeces from data.frame (data.table FAQ) - http://datatable.r-forge.r-project.org/datatable-faq.pdf
● pandas DataFrame comparison with data.frame - http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html
● http://graphlab.com/learn/translator/
● https://sites.google.com/site/gappy3000/home/pandas_r
● https://drive.google.com/folderview?id=0ByIrJAE4KMTtaGhRcXkxNHhmY2M&usp=sharing
● http://mathesaurus.sourceforge.net/matlab-python-xref.pdf