CSNIPPEX

Valerio Terragni
vterragni[-AT-]cse.ust.hk
Yepang Liu Shing-Chi Cheung
Department of Computer Science and Engineering
The Hong Kong University of Science and Technology

Outline

What is CSNIPPEX?

CSNIPPEX ECLIPSE PLUG-IN

CSNIPPEX DATA-SET


What is CSNIPPEX?

Many high quality code snippets in stackoverflow.com do not compile due to missing type information (missing import declarations, missing jars etc.). CSNIPPEX is an Eclipse plug-in to automatically convert code snippets from stackoverflow.com into compilable Java source code files by resolving external dependencies, generating import declarations, and fixing syntactic errors. In this website you can download CSNIPPEX and a data-set of compilable Java files extracted from 93,092 stackoverflow.com posts.

For more detailed information please refer to our publication
Valerio Terragni, Yepang Liu and Shing-Chi Cheung.
CSNIPPEX: Automated Synthesis of Compilable Code Snippets from Q&A Sites.
to appear in ISSTA 2016: The 25th International Symposium on Software Testing and Analysis, Saarbrücken, Germany, July, 2016.

CSNIPPEX ECLIPSE PLUG-IN

Requirements

Java JDK runtime environment 1.7 or higher

be sure eclipse is runnign with JDK (see here for help)

ECLIPSE IDE 3.x download

ECLIPSE PLUGIN UPDATE SITE

http://sccpu2.cse.ust.hk/csnippex/plugin

WATCH INSTALLATION GUIDE AND DEMO

CSNIPPEX DATA-SET

We release the data-set of compilable Java source files of 93,092 stackoverflow posts. We believe that this dataset can facilitate future research on analyzing crowd-generated big data by various static and dynamic code analysis techniques. We also provide a framework to browse the data-set selectively so that you can select those Java files related to a specific library.

If you want to use this data-set for your research please cite our ISSTA paper.

Note that these code snippets require JAVA JDK 1.7 to compile. Note that not all of these Java files are executable, they might throw runtime exceptions if executed. Note that all source code in stackoverflow.com is under the MIT license.

DOWNLOAD CSNIPPEX DATA-SET (16 GB)



Archive content

/src source code of the analysis framework

./compilable_code_snippets data-set of 93,092 compilable code snippets from stackoverflow (stored in JSON fiel format)

./lib contains the library (gson-2.3.1.jar) required for reading the data-set

./libs contains around 3000 libraries referred by the code snippets

HOW TO READ THE DATA-SET ( see Main.java )

// this is the path of the folder containing the synthesized java
// classes
final File folder = new File("./compilable_code_snippets");
// this the list that will contain the code snippets
ArrayList<CGroup> data = new ArrayList<CGroup>();
// com.google.gson.Gson library
final Gson gson = new Gson();
// scan all code snippets json files
System.out.println("Loading compilation groups from file.....");
for (final File f : folder.listFiles()) {
	if (!f.isFile() || !f.getName().endsWith(".json"))
		continue;
	try {
		final BufferedReader br = new BufferedReader(new FileReader(f));
		ArrayList<CGroup> newList = gson.fromJson(br,
				new TypeToken<ArrayList<CGroup>>() {
		}.getType());
		// you can either process individually or load all in memory
		data.addAll(newList);
	} catch (final IOException e) {
		e.printStackTrace();
	}
}

// now all the compilable java files are loaded in the list "data"

System.out.println("Completed!   "+ data.size() + " c-groups are loaded in memory");
// scan the result
for (CGroup cg : data) {
	// A Cgroup is a collection of compilation units from a single
	// stackoverflow post
	// each c-group has the unique ID of the stackoveflow post
	Integer id = cg.answerId;
	// it has also a classpath of the referenced external jars.
	// Note that we excluded the jars of the jdk_1.7 as we assume they
	// are always in the classpath
	// Note that we used the path separator of Win ";"
	String classPath = cg.classPath;
	// you can scan each c-units of the current c-group
	for (CUnit cu : cg.units) {
		// if you want to generate the file className field speicify the
		// fileName.
		String className = cu.className;
		String fileName = className + ".java";
		// you can get the import declaration to select those java files
		// using some particular library
		HashSet<String> importDecalrations = cu.imports;
		// IMPORTANT to generate the Java file please use this method invocation
		String fileContent = cu.getStringCode();
		// you can write this on file
		// note that the default package is "test"
	}

}