
CSNIPPEX
Valerio Terragni
vterragni[-AT-]cse.ust.hk
Yepang Liu
Shing-Chi Cheung
Department
of Computer
Science and Engineering
The
Hong Kong
University of
Science and Technology
Outline
What is CSNIPPEX?
Many high quality code snippets in stackoverflow.com do not compile due to missing type information (missing import declarations, missing jars etc.). CSNIPPEX is an Eclipse plug-in to automatically convert code snippets from stackoverflow.com into compilable Java source code files by resolving external dependencies, generating import declarations, and fixing syntactic errors. In this website you can download CSNIPPEX and a data-set of compilable Java files extracted from 93,092 stackoverflow.com posts.
For more detailed information please refer to our publication
Valerio Terragni, Yepang Liu and Shing-Chi Cheung.
CSNIPPEX: Automated Synthesis of Compilable Code Snippets from Q&A Sites.
to appear in ISSTA 2016: The 25th International Symposium on Software Testing and Analysis, Saarbrücken, Germany, July, 2016.
Requirements
Java JDK runtime environment 1.7 or higher
be sure eclipse is runnign with JDK (see here for help)
ECLIPSE IDE 3.x download
http://sccpu2.cse.ust.hk/csnippex/plugin
CSNIPPEX DATA-SET
We release the data-set of compilable Java source files of 93,092 stackoverflow posts. We believe that this dataset can facilitate future research
on analyzing crowd-generated big data by various static and dynamic code analysis techniques. We also provide a framework to browse the data-set selectively so that you can select those Java files related to a specific library.
If you want to use this data-set for your research please cite our ISSTA paper.
Note that these code snippets require JAVA JDK 1.7 to compile. Note that not all of these Java files are executable, they might throw runtime exceptions if executed. Note that all source code in stackoverflow.com is under the MIT license.
DOWNLOAD CSNIPPEX DATA-SET (16 GB)
Archive content
/src
source code of the analysis framework
./compilable_code_snippets
data-set of 93,092 compilable code snippets from stackoverflow (stored in JSON fiel format)
./lib
contains the library (gson-2.3.1.jar) required for reading the data-set
./libs
contains around 3000 libraries referred by the code snippets
HOW TO READ THE DATA-SET ( see Main.java )
// this is the path of the folder containing the synthesized java // classes final File folder = new File("./compilable_code_snippets"); // this the list that will contain the code snippets ArrayList<CGroup> data = new ArrayList<CGroup>(); // com.google.gson.Gson library final Gson gson = new Gson(); // scan all code snippets json files System.out.println("Loading compilation groups from file....."); for (final File f : folder.listFiles()) { if (!f.isFile() || !f.getName().endsWith(".json")) continue; try { final BufferedReader br = new BufferedReader(new FileReader(f)); ArrayList<CGroup> newList = gson.fromJson(br, new TypeToken<ArrayList<CGroup>>() { }.getType()); // you can either process individually or load all in memory data.addAll(newList); } catch (final IOException e) { e.printStackTrace(); } } // now all the compilable java files are loaded in the list "data" System.out.println("Completed! "+ data.size() + " c-groups are loaded in memory"); // scan the result for (CGroup cg : data) { // A Cgroup is a collection of compilation units from a single // stackoverflow post // each c-group has the unique ID of the stackoveflow post Integer id = cg.answerId; // it has also a classpath of the referenced external jars. // Note that we excluded the jars of the jdk_1.7 as we assume they // are always in the classpath // Note that we used the path separator of Win ";" String classPath = cg.classPath; // you can scan each c-units of the current c-group for (CUnit cu : cg.units) { // if you want to generate the file className field speicify the // fileName. String className = cu.className; String fileName = className + ".java"; // you can get the import declaration to select those java files // using some particular library HashSet<String> importDecalrations = cu.imports; // IMPORTANT to generate the Java file please use this method invocation String fileContent = cu.getStringCode(); // you can write this on file // note that the default package is "test" } }